* libguile/jit.c (emit_direct_tail_call): Assert self-tail call has
mcode.
(opcodes_seen, bitvector_ref, bitvector_set, compile1): Make the
opcodes_seen set more compact, and log all instruction emissions at
level 3.
(compute_mcode): Don't overwrite mcode if compilation fails.
* libguile/jit.c (compile_alloc_frame, compile_alloc_frame_slow): Move
slow path out of line.
(emit_alloc_frame_for_sp_fast, emit_alloc_frame_for_sp_slow): New
helpers.
(emit_alloc_frame): Refactor to use the new helpers.
(compile_push, compile_push_slow): Use the new helpers.
(compile_assert_nargs_ee_locals, compile_assert_nargs_ee_locals_slow):
Split off a slow path.
* libguile/jit.c (struct pending_reloc): Rename target_vcode_offset
field to target_label_offset.
(inline_label_offset, slow_label_offset): New helpers.
(emit_direct_tail_call): Use inline_label_offset helper.
(add_pending_reloc): Factor out of add_inter_instruction_patch.
(add_inter_instruction_patch): Use inline_label_offset helper.
(add_slow_path_patch): New helper.
(continue_after_slow_path): New helper.
Add slow path compilers for all instructions.
(compile_slow_path): New helper.
(compile): Compile slow paths after main code.
(compute_mcode): Allocate twice as many labels.
* libguile/jit.c (emit_alloc_frame_for_sp):
* libguile/vm-engine.c (ALLOC_FRAME, RESET_FRAME):
* libguile/vm.c (vm_increase_sp, scm_i_vm_prepare_stack):
(return_unused_stack_to_os, vm_expand_stack, alloc_frame):
(scm_call_with_stack_overflow_handler):
* libguile/vm.h (struct scm_vm): Remove sp_min_since_gc handling. It
was a very minor optimization when it was centralized in vm.c, but now
with JIT it's causing too much duplicate code generation.
* libguile/jit.c (scm_jit_compute_mcode): If a caller wants mcode for a
loop but the function already has mcode, instead of punting, just
compile again.
This fixes a bug whereby the compiler would sometimes allocate floats in
marked space.
* libguile/gc-inline.h (scm_inline_gc_malloc_pointerless_words): New
internal helper.
* libguile/intrinsics.h (SCM_FOR_ALL_VM_INTRINSICS):
* libguile/intrinsics.c (allocate_pointerless_words):
(allocate_pointerless_words_with_freelist): New intrinsics.
* libguile/jit.c (compile_allocate_pointerless_words):
(compile_allocate_pointerless_words_immediate): New compilers.
* libguile/vm-engine.c (allocate_pointerless_words)
(allocate_pointerless_words_immediate): New opcodes.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/effects-analysis.scm (param):
* module/language/cps/reify-primitives.scm (reify-primitives):
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
* module/language/cps/types.scm (allocate-words):
(allocate-words/immediate):
* module/system/vm/assembler.scm (system): Add support for the new
opcodes.
* libguile/intrinsics.c (scm_atan1): New intrinsic, wrapping scm_atan.
(scm_bootstrap_intrinsics): Add new intrinsics.
* libguile/intrinsics.h (scm_t_f64_from_f64_f64_intrinsic): New
intrinsic type.
(SCM_FOR_ALL_VM_INTRINSICS): Add intrinsics for floor, ceiling, sin,
cos, tan, asin, acos, atan, and their unboxed counterparts.
* libguile/jit.c (sp_f64_operand): New helper.
(compile_call_f64_from_f64, compile_call_f64_from_f64_f64): Call out
to intrinsics.
* libguile/vm-engine.c (call-f64<-f64-f64): New opcode.
* module/language/cps/effects-analysis.scm: Add new intrinsics.
* module/language/cps/reify-primitives.scm (compute-known-primitives):
Add new intrinsics.
* module/language/cps/slot-allocation.scm (compute-var-representations):
Add 'f64 slot types for the new unboxed intrinsics.
* module/language/cps/specialize-numbers.scm (specialize-operations):
Support unboxing the new intrinsics.
* module/language/cps/types.scm: Define type inferrers for the new
intrinsics.
* module/language/tree-il/cps-primitives.scm: Define CPS translations
for the new intrinsics.
* module/language/tree-il/primitives.scm (*interesting-primitive-names*):
(*effect-free-primitives*, atan): Define primitive resolvers.
* module/system/vm/assembler.scm: Export assemblers for the new
intrinsics.
(define-f64<-f64-f64-intrinsic): New helper.
Some components of this have been wired up for a while; this commit
finishes the compiler, runtime, and JIT support.
* libguile/intrinsics.h (SCM_FOR_ALL_VM_INTRINSICS):
* libguile/intrinsics.c (scm_bootstrap_intrinsics): Declare the new
intrinsics.
* libguile/jit.c (compile_call_f64_from_f64): Define code generators for
the new intrinsics.
* libguile/vm-engine.c (call-f64<-f64): New instruction.
* module/language/cps/effects-analysis.scm:
* module/language/cps/reify-primitives.scm (compute-known-primitives):
* module/language/cps/slot-allocation.scm (compute-var-representations):
* module/language/cps/specialize-numbers.scm (specialize-operations):
* module/language/tree-il/cps-primitives.scm (abs):
* module/system/vm/assembler.scm (system, define-f64<-f64-intrinsic):
(sqrt, abs, fsqrt, fabs):
* module/language/cps/types.scm (fsqrt, fabs): Add new f64<-f64
primitives.
* libguile/jit.c (OLD_FP_FOR_RETURN_TRAMPOLINE): Initialize static const
var from CPP define instead of T0.
(compile_return_values, emit_return_to_interpreter_trampoline): Adapt
to upper-casing.
This change speeds up the indirect branches at return sites by taking
advantage of the CPU's return address stack.
* libguile/jit.c (emit_push_frame): Don't store the mra; we do that via
a trampoline.
(emit_handle_interrupts_trampoline): Take MRA from link register
instead of T0.
(compile_call, compile_call_label): Compute MRA via the new
jmpi_with_link lightening instruction.
(compile_return_values): Return to caller via ret instead of jmp.
(compile_handle_interrupts): Jump to handle-interrupts trampoline via
jmpi_with_link, to provide the MRA.
(initialize_jit): Bless the trampolines so that they are valid
operands to BX on ARM.
This patch is a bit unfortunate, in the sense that it exposes some of
the JIT guts to the rest of the VM. Code needs to treat "machine return
addresses" as valid if non-NULL (as before) and also not equal to a
tier-down trampoline. This is because tier-down at a return needs the
old frame pointer to load the "virtual return address", and the way this
patch works is that it passes the vra in a well-known register. It's a
custom calling convention for a certain kind of return.
* libguile/jit.h (scm_jit_return_to_interpreter_trampoline): New
internal global.
* libguile/jit.c: (scm_jit_clear_mcode_return_addresses): Move here,
from vm.c. Instead of zeroing return addresses, set them to the
return-to-interpreter trampoline.
* libguile/vm-engine.c (return-values): Don't enter mcode if the mra is
scm_jit_return_to_interpreter_trampoline.
* libguile/vm.c (capture_continuation): Treat the tier-down trampoline
as NULL.
* libguile/jit.c (compile_alloc_frame): Stop initializing locals.
(compile_bind_rest): Use emit_alloc_frame.
* libguile/vm-engine.c (assert_nargs_ee_locals, allocate_frame): Don't
initialize locals.
(bind_rest): Don't initialize locals, and assert that the locals count
has a minimum.
* libguile/jit.c (jit_alloc_fn): On targets that need a dynamically
allocated literal pool, we will need to trace that pool, so pass a
pointerful malloc. Fixes JIT on AArch64.
* libguile/jit.c (compile_ursh_immediate):
(compile_ulsh_immediate): Fix immediate/register variant calling.
Happily a benefit of lightening, as type safety did this for us.
(DEFINE_CLOBBER_RECORDING_EMITTER_R_R_2): Pass JIT state.