* libguile/gc-inline.h:
* libguile/threads.h (SCM_INLINE_GC_GRANULE_WORDS)
(SCM_INLINE_GC_GRANULE_BYTES, SCM_INLINE_GC_FREELIST_COUNT): Move
definitions here, from gc-inline.h.
(struct scm_thread): Inline freelist vectors.
* libguile/threads.c (thread_mark): Update marker for pointerless
freelists.
(on_thread_exit): Clear individual freelist entries, instead of the
vector as a whole.
(guilify_self_2): No need to alloc freelist vectors.
* libguile/threads.h (struct scm_thread): Move async-related bits up, so
that the VM can access them easier. Likewise for freelists (which we
will inline soon).
* libguile/jit.c (OLD_FP_FOR_RETURN_TRAMPOLINE): Initialize static const
var from CPP define instead of T0.
(compile_return_values, emit_return_to_interpreter_trampoline): Adapt
to upper-casing.
This change speeds up the indirect branches at return sites by taking
advantage of the CPU's return address stack.
* libguile/jit.c (emit_push_frame): Don't store the mra; we do that via
a trampoline.
(emit_handle_interrupts_trampoline): Take MRA from link register
instead of T0.
(compile_call, compile_call_label): Compute MRA via the new
jmpi_with_link lightening instruction.
(compile_return_values): Return to caller via ret instead of jmp.
(compile_handle_interrupts): Jump to handle-interrupts trampoline via
jmpi_with_link, to provide the MRA.
(initialize_jit): Bless the trampolines so that they are valid
operands to BX on ARM.
The existing calli / callr interface is for ABI calls. Sometimes though
you want to call some of your own code, just to get the current return
address. ARM's branch-and-link instructions are ideal for this but they
don't exist on x86; there we emulate them by adding corresponding
pop_link_register / push_link_register instructions that are no-ops on
ARM.
* lightening.h (FOR_EACH_INSTRUCTION): Add jit_jmpi_with_link,
pop_link_register, push_link_register.
* lightening/arm-cpu.c:
* lightening/x86-cpu.c:
* lightening/aarch64-cpu.c (jmpi_with_link, push_link_register)
(pop_link_register): Add implementations.
* lightening/arm.h:
* lightening/aarch64.h:
* lightening/x86.h (JIT_LR): New definition.
* tests/link-register.c: New test.
This patch is a bit unfortunate, in the sense that it exposes some of
the JIT guts to the rest of the VM. Code needs to treat "machine return
addresses" as valid if non-NULL (as before) and also not equal to a
tier-down trampoline. This is because tier-down at a return needs the
old frame pointer to load the "virtual return address", and the way this
patch works is that it passes the vra in a well-known register. It's a
custom calling convention for a certain kind of return.
* libguile/jit.h (scm_jit_return_to_interpreter_trampoline): New
internal global.
* libguile/jit.c: (scm_jit_clear_mcode_return_addresses): Move here,
from vm.c. Instead of zeroing return addresses, set them to the
return-to-interpreter trampoline.
* libguile/vm-engine.c (return-values): Don't enter mcode if the mra is
scm_jit_return_to_interpreter_trampoline.
* libguile/vm.c (capture_continuation): Treat the tier-down trampoline
as NULL.
* libguile/bytevectors.c (INTEGER_ACCESSOR_PROLOGUE)
(scm_bytevector_copy_x, bytevector_large_set): Rewrite checks to reliably
detect overflows.
(make_bytevector): Constrain the bytevector length to avoid later
overflows during allocation.
(make_bytevector_from_buffer): Fix indentation.
(scm_bytevector_length): Use 'scm_from_size_t' to convert a 'size_t',
not 'scm_from_uint'.
* libguile/fports.c (fport_seek): Check for overflow before the implicit
conversion of the return value.
* libguile/guardians.c (guardian_print): Use 'scm_from_ulong' to convert
an 'unsigned long', not 'scm_from_uint'.
* libguile/ports.c (scm_unread_string): Change a variable to type 'size_t'.
(scm_seek, scm_truncate_file): Use 'scm_t_off' instead of
'off_t_or_off64_t' to avoid implicit type conversions that could
overflow, because 'ptob->seek' and 'ptob->truncate' use 'scm_t_off'.
* libguile/r6rs-ports.c (bytevector_input_port_seek)
(custom_binary_port_seek, bytevector_output_port_seek): Rewrite offset
calculations to reliably detect overflows. Use 'scm_from_off_t' to
convert a 'scm_t_off', not 'scm_from_long' nor 'scm_from_int'.
(scm_get_bytevector_n_x, scm_get_bytevector_all, scm_unget_bytevector)
(bytevector_output_port_write): Rewrite checks to reliably detect
overflows. Use 'size_t' where appropriate.
(bytevector_output_port_buffer_grow): Rewrite size calculations to
reliably detect overflows. Minor change in the calculation of the new
size: now it is max(min_size, 2*current_size), whereas previously it
would multiply current_size by the smallest power of 2 needed to surpass
min_size.
* libguile/strings.c (make_stringbuf): Constrain the stringbuf length to
avoid later overflows during allocation.
(scm_string_append): Change overflow check to use INT_ADD_OVERFLOW.
* libguile/strports.c (string_port_write): Rewrite size calculations to
reliably detect overflows.
(string_port_seek): Rewrite offset calculations to reliably detect
overflows. Use 'scm_from_off_t' to convert a 'scm_t_off', not
'scm_from_long'.
(string_port_truncate): Use 'scm_from_off_t' to convert a 'scm_t_off',
not 'scm_from_off_t_or_off64_t'.
* libguile/vectors.c (scm_c_make_vector): Change a variable to type
'size_t'.
* module/language/cps/closure-conversion.scm (compute-elidable-closures):
New function.
(convert-one, convert-closures): Add ability to set "self" variable of
$kfun to $f, hopefully avoiding passing that argument in some cases.
* module/language/cps/compile-bytecode.scm (compile-function): Pass the
has-closure? bit on through to the assembler.
* module/system/vm/assembler.scm (begin-standard-arity)
(begin-opt-arity, begin-kw-arity): Only reserve space for the closure
as appropriate.
* module/language/cps/slot-allocation.scm (allocate-args)
(compute-defs-and-uses, compute-needs-slot)
(compute-var-representations): Allow for closure slot allocation
differences.
* module/language/cps/cse.scm (compute-defs):
* module/language/cps/dce.scm (compute-live-code):
* module/language/cps/renumber.scm (renumber, compute-renaming):
(allocate-args):
* module/language/cps/specialize-numbers.scm (compute-significant-bits):
(compute-defs):
* module/language/cps/split-rec.scm (compute-free-vars):
* module/language/cps/types.scm (infer-types):
* module/language/cps/utils.scm (compute-max-label-and-var):
* module/language/cps/verify.scm (check-distinct-vars):
(compute-available-definitions): Allow closure to be #f.
* libguile/jit.c (compile_alloc_frame): Stop initializing locals.
(compile_bind_rest): Use emit_alloc_frame.
* libguile/vm-engine.c (assert_nargs_ee_locals, allocate_frame): Don't
initialize locals.
(bind_rest): Don't initialize locals, and assert that the locals count
has a minimum.
* libguile/intrinsics.h (SCM_FOR_ALL_VM_INTRINSICS):
* libguile/intrinsics.c (scm_bootstrap_intrinsics): Remove the atomic
intrinsics, as the JIT no longer needs them.
* libguile/vm-engine.c (VM_NAME): Inline the intrinsics.
These operations emit the same code that GCC does for corresponding
operations under the sequential consistency memory model. It would be
possible to relax to acquire/release or something in the future.
* libguile/atomics-internal.h (scm_atomic_compare_and_swap_scm): Change
to use same API as atomic-box-compare-and-swap!.
* libguile/intrinsics.c (atomic_compare_and_swap_scm): Remove loop, as
we're using the strong variant now.
* libguile/async.c (scm_i_async_pop, scm_i_async_push): Adapt to new API
of scm_atomic_compare_and_swap_scm.
* libguile/i18n.c (SCM_MAX_ALLOCA): New macro.
(SCM_STRING_TO_U32_BUF): Accept an additional variable to remember
whether we used malloc to allocate the buffer. Use malloc if the
allocation size is greater than SCM_MAX_ALLOCA.
(SCM_CLEANUP_U32_BUF): New macro.
(compare_u32_strings, compare_u32_strings_ci, str_to_case): Adapt.
* libguile/strings.c (SCM_MAX_ALLOCA): New macro.
(normalize_str, unistring_escapes_to_r6rs_escapes): Use malloc if the
allocation size is greater than SCM_MAX_ALLOCA.
* test-suite/tests/i18n.test, test-suite/tests/strings.test: Add tests.