* libguile/jit.c (compile_ulogand_immediate, compile_ulogand_immediate_slow)
* libguile/vm-engine.c (ulogand_immediate): New JIT and interpreter
support for ulogand/immediate.
* module/language/cps/guile-vm/lower-primcalls.scm (string-ref):
(vtable-vtable?):
(vtable-field-boxed?): Emit ulogand/immediate.
* module/language/cps/guile-vm/reify-primitives.scm (reify-primitives):
Remove logand/immediate. Only emit ulogand/immediate if the immediate
is a u8. Refactor mul/immediate.
* module/language/cps/specialize-numbers.scm (specialize-operations):
Produce ulogand/immediate if the result is a u64.
* module/language/cps/effects-analysis.scm:
* module/language/cps/types.scm (logand/immediate): Add effect and type
inference for logand/immediate, ulogand/immediate,
* module/language/cps/utils.scm (primcall-raw-representations):
ulogand/immediate makes a u64.
* module/language/tree-il/compile-cps.scm (convert): Generate
logand/immediate if possible.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/system/vm/assembler.scm (system): Add ulogand/immediate
emitter.
* libguile/loader.h (SCM_OBJCODE_MINOR_VERSION): Bump.
Previously, these instructions were compiled directly. However, the JIT
also wants to fuse comparisons with branches, and an intervening "drop"
breaks that. Instead we take a different approach and rely on the
compiler only emitting push/pop/drop in a peephole sort of way, and we
can just simulate the temp stack space and dispatch directly.
* libguile/jit.c (compile_push, compile_push_slow, compile_pop):
(compile_pop_slow, compile_drop, compile_drop_slow): Crash if these are
called.
(COMPILE_WIDE_OP1, COMPILE_WIDE_OP2, COMPILE_WIDE_OP3, COMPILE_WIDE_OP4)
(COMPILE_WIDE_OP5):
(COMPILE_WIDE_DOP1, COMPILE_WIDE_DOP2, COMPILE_WIDE_DOP3)
(COMPILE_WIDE_DOP4, COMPILE_WIDE_DOP5): New helpers. I didn't manage to
share implementation with COMPILE_*.
(COMPILE_WIDE_*): New variants of compilers that take their operands
from an array instead of parsing the inline operands, relying on
convention for how the compiler emits these instructions.
(parse_wide_operands): New helper to parse out "wide" operands.
(compile1, compile_slow_path): If we see a "push", parse wide operands.
(compile): Move label capture for the slow path into compile_slow_path,
because wide operands may skip some instructions.
* libguile/jit.c (compile_*): Instead of using the minimum sized types
that can represent the instruction's operand, use uint32_t. This will
allow us to handle push/pop/drop without moving the SP.
Recognize `raise-exception` in the same way we recognize `throw`, though
it is a bit less optimized and the boot story is not as complicated.
* doc/ref/vm.texi (Non-Local Control Flow Instructions):
* libguile/jit.c (compile_unreachable):
(compile_unreachable_slow):
* libguile/vm-engine.c (VM_NAME):
* module/language/cps/compile-bytecode.scm (compile-function):
* module/system/vm/assembler.scm (emit-unreachable): Add new
"unreachable" instruction, inserted after a call to non-continuable
`raise-exception`.
* module/language/tree-il/compile-cps.scm (raise-exception):
* module/language/tree-il/primitives.scm
(*interesting-primitive-names*): Recognize raise-exception, and if it is
called with just one argument, prune that branch of the control-flow
graph.
* libguile/intrinsics.h:
* libguile/intrinsics.c (lookup_bound_public, lookup_bound_private): Two
new intrinsics.
(scm_bootstrap_intrinsics): Wire them up.
* libguile/jit.c (compile_call_scm_from_scmn_scmn):
(compile_call_scm_from_scmn_scmn_slow):
(COMPILE_X8_S24__N32__N32__C32): Add JIT support for new instruction
kind.
* libguile/vm-engine.c (call-scm<-scmn-scmn): New instruction, takes
arguments as non-immediate offsets, to avoid needless loads and register
pressure.
* module/language/cps/effects-analysis.scm: Add cases for new
primcalls.
* module/language/cps/compile-bytecode.scm (compile-function): Add new
primcalls.
* module/language/cps/reify-primitives.scm (cached-module-box): If the
variable is bound, call lookup-bound-public / lookup-bound-private as
appropriate instead of separately resolving the module, name, and doing
the bound check.
* module/language/tree-il/compile-bytecode.scm (emit-cached-module-box):
Use new instructions.
* module/system/vm/assembler.scm (define-scm<-scmn-scmn-intrinsic):
(lookup-bound-public, lookup-bound-private): Add assembler support.
* doc/ref/vm.texi (Instruction Set, Constant Instructions): Document new
instruction.
* libguile/instructions.c (FOR_EACH_INSTRUCTION_WORD_TYPE): New first
word kind with zi16 operand.
* libguile/jit.c (compile_make_immediate, compile_make_immediate_slow):
New compilers.
(COMPILE_X8_S8_ZI16): New operand kind.
* libguile/vm-engine.c (make-immediate): New instruction.
* module/language/bytecode.scm:
* module/system/vm/assembler.scm (encode-X8_S8_ZI16<-/shuffle):
(signed-bits, load-constant): Support the new instruction kind.
* module/system/vm/disassembler.scm (disassemblers)
(sign-extended-immediate, code-annotation): Support for zi16
operands.
* libguile/jit.c (UNREACHABLE): New register state.
(unreachable): New predicate.
(ASSERT_HAS_REGISTER_STATE): Succeed when unreachable.
(compile_throw, compile_throw_value, compile_throw_value_and_data):
Set unreachable flag.
(compile_receive_values): Reload FP if needed.
* libguile/jit.c (emit_direct_tail_call): Assert self-tail call has
mcode.
(opcodes_seen, bitvector_ref, bitvector_set, compile1): Make the
opcodes_seen set more compact, and log all instruction emissions at
level 3.
(compute_mcode): Don't overwrite mcode if compilation fails.
* libguile/jit.c (compile_alloc_frame, compile_alloc_frame_slow): Move
slow path out of line.
(emit_alloc_frame_for_sp_fast, emit_alloc_frame_for_sp_slow): New
helpers.
(emit_alloc_frame): Refactor to use the new helpers.
(compile_push, compile_push_slow): Use the new helpers.
(compile_assert_nargs_ee_locals, compile_assert_nargs_ee_locals_slow):
Split off a slow path.
* libguile/jit.c (struct pending_reloc): Rename target_vcode_offset
field to target_label_offset.
(inline_label_offset, slow_label_offset): New helpers.
(emit_direct_tail_call): Use inline_label_offset helper.
(add_pending_reloc): Factor out of add_inter_instruction_patch.
(add_inter_instruction_patch): Use inline_label_offset helper.
(add_slow_path_patch): New helper.
(continue_after_slow_path): New helper.
Add slow path compilers for all instructions.
(compile_slow_path): New helper.
(compile): Compile slow paths after main code.
(compute_mcode): Allocate twice as many labels.
* libguile/jit.c (emit_alloc_frame_for_sp):
* libguile/vm-engine.c (ALLOC_FRAME, RESET_FRAME):
* libguile/vm.c (vm_increase_sp, scm_i_vm_prepare_stack):
(return_unused_stack_to_os, vm_expand_stack, alloc_frame):
(scm_call_with_stack_overflow_handler):
* libguile/vm.h (struct scm_vm): Remove sp_min_since_gc handling. It
was a very minor optimization when it was centralized in vm.c, but now
with JIT it's causing too much duplicate code generation.
* libguile/jit.c (scm_jit_compute_mcode): If a caller wants mcode for a
loop but the function already has mcode, instead of punting, just
compile again.
This fixes a bug whereby the compiler would sometimes allocate floats in
marked space.
* libguile/gc-inline.h (scm_inline_gc_malloc_pointerless_words): New
internal helper.
* libguile/intrinsics.h (SCM_FOR_ALL_VM_INTRINSICS):
* libguile/intrinsics.c (allocate_pointerless_words):
(allocate_pointerless_words_with_freelist): New intrinsics.
* libguile/jit.c (compile_allocate_pointerless_words):
(compile_allocate_pointerless_words_immediate): New compilers.
* libguile/vm-engine.c (allocate_pointerless_words)
(allocate_pointerless_words_immediate): New opcodes.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/effects-analysis.scm (param):
* module/language/cps/reify-primitives.scm (reify-primitives):
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
* module/language/cps/types.scm (allocate-words):
(allocate-words/immediate):
* module/system/vm/assembler.scm (system): Add support for the new
opcodes.
* libguile/intrinsics.c (scm_atan1): New intrinsic, wrapping scm_atan.
(scm_bootstrap_intrinsics): Add new intrinsics.
* libguile/intrinsics.h (scm_t_f64_from_f64_f64_intrinsic): New
intrinsic type.
(SCM_FOR_ALL_VM_INTRINSICS): Add intrinsics for floor, ceiling, sin,
cos, tan, asin, acos, atan, and their unboxed counterparts.
* libguile/jit.c (sp_f64_operand): New helper.
(compile_call_f64_from_f64, compile_call_f64_from_f64_f64): Call out
to intrinsics.
* libguile/vm-engine.c (call-f64<-f64-f64): New opcode.
* module/language/cps/effects-analysis.scm: Add new intrinsics.
* module/language/cps/reify-primitives.scm (compute-known-primitives):
Add new intrinsics.
* module/language/cps/slot-allocation.scm (compute-var-representations):
Add 'f64 slot types for the new unboxed intrinsics.
* module/language/cps/specialize-numbers.scm (specialize-operations):
Support unboxing the new intrinsics.
* module/language/cps/types.scm: Define type inferrers for the new
intrinsics.
* module/language/tree-il/cps-primitives.scm: Define CPS translations
for the new intrinsics.
* module/language/tree-il/primitives.scm (*interesting-primitive-names*):
(*effect-free-primitives*, atan): Define primitive resolvers.
* module/system/vm/assembler.scm: Export assemblers for the new
intrinsics.
(define-f64<-f64-f64-intrinsic): New helper.
Some components of this have been wired up for a while; this commit
finishes the compiler, runtime, and JIT support.
* libguile/intrinsics.h (SCM_FOR_ALL_VM_INTRINSICS):
* libguile/intrinsics.c (scm_bootstrap_intrinsics): Declare the new
intrinsics.
* libguile/jit.c (compile_call_f64_from_f64): Define code generators for
the new intrinsics.
* libguile/vm-engine.c (call-f64<-f64): New instruction.
* module/language/cps/effects-analysis.scm:
* module/language/cps/reify-primitives.scm (compute-known-primitives):
* module/language/cps/slot-allocation.scm (compute-var-representations):
* module/language/cps/specialize-numbers.scm (specialize-operations):
* module/language/tree-il/cps-primitives.scm (abs):
* module/system/vm/assembler.scm (system, define-f64<-f64-intrinsic):
(sqrt, abs, fsqrt, fabs):
* module/language/cps/types.scm (fsqrt, fabs): Add new f64<-f64
primitives.
* libguile/jit.c (OLD_FP_FOR_RETURN_TRAMPOLINE): Initialize static const
var from CPP define instead of T0.
(compile_return_values, emit_return_to_interpreter_trampoline): Adapt
to upper-casing.
This change speeds up the indirect branches at return sites by taking
advantage of the CPU's return address stack.
* libguile/jit.c (emit_push_frame): Don't store the mra; we do that via
a trampoline.
(emit_handle_interrupts_trampoline): Take MRA from link register
instead of T0.
(compile_call, compile_call_label): Compute MRA via the new
jmpi_with_link lightening instruction.
(compile_return_values): Return to caller via ret instead of jmp.
(compile_handle_interrupts): Jump to handle-interrupts trampoline via
jmpi_with_link, to provide the MRA.
(initialize_jit): Bless the trampolines so that they are valid
operands to BX on ARM.