* module/language/cps/slot-allocation.scm (lookup-send-parallel-moves):
Rename from `lookup-parallel-moves'.
(lookup-receive-parallel-moves): New function. Now we attach "receive
moves" to call and prompt conts instead of to their continuations.
(compute-shuffles): Refactor to allow a continuation to have both send
and receive shuffles.
(compute-frame-size): Refactor for new shuffles mechanism
(allocate-slots): Allow calls to proceed directly to kargs.
* module/language/cps/graphs.scm (rename-keys, rename-intset)
(rename-graph, compute-reverse-control-flow-order)
(compute-live-variables): Move here from slot-allocation.
* module/language/cps/utils.scm: Remove duplicate compute-idoms
definition.
(compute-defs-and-uses, compute-var-representations): Move here from
slot-allocation.
* module/language/cps/slot-allocation.scm: Move routines out to utils
and graphs.
* module/language/cps.scm:
* module/language/cps/contification.scm:
* module/language/cps/cse.scm:
* module/language/cps/dce.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/types.scm: Allow $kargs to follow $kfun. In that
case, the function must be well-known and callers are responsible for
calling with the appropriate arity.
* module/language/cps/compile-bytecode.scm: Emit "unchecked-arity" for
$kargs following $kfun.
* module/system/vm/assembler.scm: Adapt.
* libguile/intrinsics.c (scm_atan1): New intrinsic, wrapping scm_atan.
(scm_bootstrap_intrinsics): Add new intrinsics.
* libguile/intrinsics.h (scm_t_f64_from_f64_f64_intrinsic): New
intrinsic type.
(SCM_FOR_ALL_VM_INTRINSICS): Add intrinsics for floor, ceiling, sin,
cos, tan, asin, acos, atan, and their unboxed counterparts.
* libguile/jit.c (sp_f64_operand): New helper.
(compile_call_f64_from_f64, compile_call_f64_from_f64_f64): Call out
to intrinsics.
* libguile/vm-engine.c (call-f64<-f64-f64): New opcode.
* module/language/cps/effects-analysis.scm: Add new intrinsics.
* module/language/cps/reify-primitives.scm (compute-known-primitives):
Add new intrinsics.
* module/language/cps/slot-allocation.scm (compute-var-representations):
Add 'f64 slot types for the new unboxed intrinsics.
* module/language/cps/specialize-numbers.scm (specialize-operations):
Support unboxing the new intrinsics.
* module/language/cps/types.scm: Define type inferrers for the new
intrinsics.
* module/language/tree-il/cps-primitives.scm: Define CPS translations
for the new intrinsics.
* module/language/tree-il/primitives.scm (*interesting-primitive-names*):
(*effect-free-primitives*, atan): Define primitive resolvers.
* module/system/vm/assembler.scm: Export assemblers for the new
intrinsics.
(define-f64<-f64-f64-intrinsic): New helper.
Some components of this have been wired up for a while; this commit
finishes the compiler, runtime, and JIT support.
* libguile/intrinsics.h (SCM_FOR_ALL_VM_INTRINSICS):
* libguile/intrinsics.c (scm_bootstrap_intrinsics): Declare the new
intrinsics.
* libguile/jit.c (compile_call_f64_from_f64): Define code generators for
the new intrinsics.
* libguile/vm-engine.c (call-f64<-f64): New instruction.
* module/language/cps/effects-analysis.scm:
* module/language/cps/reify-primitives.scm (compute-known-primitives):
* module/language/cps/slot-allocation.scm (compute-var-representations):
* module/language/cps/specialize-numbers.scm (specialize-operations):
* module/language/tree-il/cps-primitives.scm (abs):
* module/system/vm/assembler.scm (system, define-f64<-f64-intrinsic):
(sqrt, abs, fsqrt, fabs):
* module/language/cps/types.scm (fsqrt, fabs): Add new f64<-f64
primitives.
* module/language/cps/closure-conversion.scm (compute-elidable-closures):
New function.
(convert-one, convert-closures): Add ability to set "self" variable of
$kfun to $f, hopefully avoiding passing that argument in some cases.
* module/language/cps/compile-bytecode.scm (compile-function): Pass the
has-closure? bit on through to the assembler.
* module/system/vm/assembler.scm (begin-standard-arity)
(begin-opt-arity, begin-kw-arity): Only reserve space for the closure
as appropriate.
* module/language/cps/slot-allocation.scm (allocate-args)
(compute-defs-and-uses, compute-needs-slot)
(compute-var-representations): Allow for closure slot allocation
differences.
* module/language/cps/cse.scm (compute-defs):
* module/language/cps/dce.scm (compute-live-code):
* module/language/cps/renumber.scm (renumber, compute-renaming):
(allocate-args):
* module/language/cps/specialize-numbers.scm (compute-significant-bits):
(compute-defs):
* module/language/cps/split-rec.scm (compute-free-vars):
* module/language/cps/types.scm (infer-types):
* module/language/cps/utils.scm (compute-max-label-and-var):
* module/language/cps/verify.scm (check-distinct-vars):
(compute-available-definitions): Allow closure to be #f.
This should reduce frame sizes.
* libguile/vm-engine.c (halt): Adapt to multiple-values change. Also
adapt to not having the boot closure on the stack.
(receive, receive-values, subr-call, foreign-call): Adapt to expect
values one slot down.
(prompt): Capture one less word for the values return.
* libguile/vm.c (vm_dispatch_pop_continuation_hook):
(vm_dispatch_abort_hook): Adapt for where to expect values.
(vm_builtin_values_code): Add a call to shuffle-down before
returning. This is more overhead than what existed before, but the
hope is that the savings elsewhere pay off.
(vm_builtin_values_code): Adapt to different values location.
(reinstate_continuation_x, compose_continuation): Adapt to place
resume args at right position.
(capture_delimited_continuation): Remove unused sp and ip arguments.
(abort_to_prompt): Adapt to capture_delimited_continuation change.
(scm_call_n): Adapt to not reserve space for the boot closure.
* module/language/cps/compile-bytecode.scm (compile-function): When
returning values, adapt reset-frame call for return calling convention
change. Adapt truncating or rest returns to expect values in the
right place.
* module/language/cps/slot-allocation.scm (compute-shuffles):
(allocate-lazy-vars, allocate-slots): Allocate values from the "proc
slot", not proc-slot + 1.
* module/system/vm/assembler.scm (emit-init-constants): Reset the frame
before returning so that the return value is in the right place.
* test-suite/tests/rtl.test: Update for return convention change.
* libguile/foreign.c (get_foreign_stub_code): Update for return calling
convention change.
* libguile/frames.h: Add machine return address to diagram.
(SCM_FRAME_MACHINE_RETURN_ADDRESS):
(SCM_FRAME_SET_MACHINE_RETURN_ADDRESS): New macros.
(SCM_FRAME_PREVIOUS_SP):
(SCM_FRAME_DYNAMIC_LINK):
(SCM_FRAME_SET_DYNAMIC_LINK): Adapt for new frame size.
* libguile/vm-engine.c (halt): Set frame size to 3.
(call, call-label): Set mRA to 0.
* libguile/vm.c (push_interrupt_frame, reinstate_continuation_x):
(scm_call_n): Set frame size to 3. In push_interrupt_frame, init the
mRA of the frame.
(vm_builtin_call_with_values_code, vm_handle_interrupt_code): Allocate
larger frames.
* module/language/cps/slot-allocation.scm (allocate-slots): Frame size
is 3.
* module/system/vm/disassembler.scm (define-clobber-parser): Bump frame
size.
* libguile/frames.c (scm_frame_return_address): Use
SCM_FRAME_VIRTUAL_RETURN_ADDRESS.
(scm_c_frame_previous): Likewise.
* libguile/frames.h: Update diagram for new names.
(union scm_vm_stack_element): Rename "as_ip" to "as_vcode", and
add "as_mcode" for machine code pointers.
(SCM_FRAME_VIRTUAL_RETURN_ADDRESS)
(SCM_FRAME_SET_VIRTUAL_RETURN_ADDRESS): Rename to these, from
SCM_FRAME_RETURN_ADDRESS and SCM_FRAME_SET_RETURN_ADDRESS.
* libguile/vm-engine.c (halt, call, call-label, return-values)
(return-from-interrupt): Adapt to renamings. Make "halt" have frame
size as a parameter.
* libguile/vm.c (scm_i_vm_mark_stack): Adapt to renaming.
(push_interrupt_frame): Take mRA as additional argument. In future we
will set it as frame mRA.
(capture_continuation): Adapt to renaming.
(scm_call_n): Adapt to renaming and make frame size adjustable.
(push_interrupt_frame, reinstate_continuation_x): Make frame size
adjustable.
* module/language/cps/slot-allocation.scm (allocate-slots): Make frame
size adjustable.
* libguile/intrinsics.h (scm_t_thread_mra_intrinsic): New type; use for
push_interrupt_frame.
(scm_t_thread_u8_scm_sp_vra_intrinsic): Rename from the same but was
"ra" instead of "vra", and change type to uint32_t*.
* module/system/vm/disassembler.scm (define-clobber-parser):
Parameterize clobber set for calls by frame size.
* module/language/cps/types.scm (annotation->type):
* module/language/cps/effects-analysis.scm (annotation->memory-kind):
Add case for string memory kinds. Remove special type and effect
inferrers for string-length.
* module/language/cps/slot-allocation.scm (compute-var-representations):
Remove string-length.
* module/language/tree-il/compile-cps.scm (ensure-string): New helper.
(string-length): Add custom converter.
* module/language/cps/effects-analysis.scm::
* module/language/cps/reify-primitives.scm (reify-primitives):
* module/language/cps/slot-allocation.scm (compute-var-representations):
* module/language/cps/types.scm (assume-u64, assume-s64): Add primitives
that assume the range of a u64 or s64 value is within certain bounds.
This is useful when extracting e.g. a length from a 64-bit word when
you know the length is less than 2**48.
* module/language/cps/slot-allocation.scm (add-prompt-control-flow-edges):
Allow for terms that don't continue, and add them to the minimal
prompt control flow edges set.
* module/language/cps.scm ($branch): Refactor to be its own CPS term
type, not relying on $continue to specify a continuation (which before
was only for the false case) or a source location. Update allllllll
callers.
* module/language/cps/compile-bytecode.scm (compile-function)
(emit-bytecode):
* module/language/cps/slot-allocation.scm (allocate-slots):
* module/language/cps/optimize.scm (cps-default-optimization-options):
Allow the "lazy vars" optimization, a form of slot precoloring, to be
disabled. It will be disabled at -O0 or -O1, to speed compilation
times.
* module/language/cps/slot-allocation.scm
(compute-reverse-control-flow-order): For graphs without back-edges,
use a simplified computation of reverse control flow order.
* module/language/cps/compile-bytecode.scm (compile-function): Remove
helper to look up constants now that primcalls can take parameters.
* module/language/cps/devirtualize-integers.scm (peel-trace): Remove
extra argument to expression-effects.
* module/language/cps/effects-analysis.scm (constant?, indexed-field):
Remove unused definitions.
(expression-effects): Remove "constants" argument; constants come from
primcall params.
(compute-effects): Don't compute a constants table.
* module/language/cps/slot-allocation.scm ($allocation): Remove
"constant-values" field.
(lookup-constant-value, lookup-maybe-constant-value): Remove; unused.
(allocate-slots): Don't create a constants table.
* module/language/cps/specialize-primcalls.scm
(compute-defining-expressions, compute-constant-values): Move these
definitions here, which were previously in utils.scm
* module/language/cps/utils.scm: Remove moved definitions.
* module/language/cps/primitives.scm (*macro-instruction-arities*):
Declare new u64->s64, s64->u64, sadd, ssub, smul, sadd/immediate,
ssub/immediate, smul/immediate, slsh, and slsh/immediate primcalls
that don't have corresponding VM instructions.
* module/language/cps/effects-analysis.scm: The new instructions are
effect-free.
* module/language/cps/reify-primitives.scm (wrap-unary, wrap-binary):
(wrap-binary/exp, reify-primitives): Add horrible code that turns
e.g. sadd into a series of s64->u64, uadd, and then u64->s64. This
way we keep our ability to do range inference on unboxed signed
arithmetic, but we still bottom out to the same instructions for both
unboxed signed and unboxed unsigned arithmetic.
* module/language/cps/types.scm: Add type inferrers for new
instructions. Remove type checkers for some effect-free primitives.
* module/language/cps/compile-bytecode.scm (compile-function): Add
pseudo-emitter for u64->s64 and s64->u64 no-ops.
* module/language/cps/slot-allocation.scm (compute-var-representations):
If an optimization pass decided to e.g. use untag-fixnum for one
definition of a variable and e.g. vector-length for the other, assume
that their values are compatible. We don't know at this point whether
the values are meant to be s64 (e.g. because vector-length is a subset
of the s64 range) or u64 (e.g. because although we're calling
untag-fixnum on the value, actually we now that the value is
non-negative, or actually we just want the unsigned bits). Anyway we
default to u64. In the future we can perhasps be more precise.
* libguile/vm-engine.c (srsh, srsh/immediate): New instructions.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/effects-analysis.scm:
* module/language/cps/reify-primitives.scm (reify-primitives):
* module/language/cps/slot-allocation.scm (compute-var-representations):
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
* module/language/cps/types.scm (srsh, srsh/immediate):
* module/system/vm/assembler.scm: Add support for new instructions.
* module/language/cps/types.scm (ulsh, ursh): Remove type checkers, as
these are effect-free. Limit range of ursh count.
* module/language/cps/compile-bytecode.scm (compile-function): Update
add/immediate, etc.
* module/language/cps/slot-allocation.scm (compute-needs-slot):
Simplify.
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
Rework for add/immediate, etc.
* module/language/cps/types.scm (define-unary-result!)
(define-binary-result!): Take types as params instead of variables, so
we can share this code with /imm variants.
(add/immediate, sub/immediate, uadd/immediate, usub/immediate)
(umul/immediate, ulsh/immediate, ursh/immediate): Update type
inferrers.
* module/language/cps/closure-conversion.scm (convert-one):
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/effects-analysis.scm (define-primitive-effects*)
(expression-effects, primitive-effects): Only fall back to passing
constant table if the immediate parameter is false. Adapt closure
effects analysis.
* module/language/cps/slot-allocation.scm (compute-needs-slot): Remove
special cases for free-ref/free-set!.
* module/language/cps/compile-bytecode.scm (compile-function): Make
load-f64, load-s64, and load-u64 take an immediate parameter instead
of a CPS value.
* module/language/cps/effects-analysis.scm: Remove CPS argument from
immediate load instructions.
* module/language/cps/slot-allocation.scm (compute-needs-slot): Remove
special case for load-64 etc.
* module/language/cps/specialize-numbers.scm
(specialize-u64-scm-comparison): Adapt.
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
Adapt.
* module/language/cps/types.scm (define-type-inferrer*): Also take param
argument.
(define-type-inferrer, define-predicate-inferrer): Adapt.
(define-type-inferrer/param): New helper.
(load-f64, load-s64, load-u64): Adapt inferrers to pass on value from
param.
* module/language/cps/utils.scm (compute-constant-values): Adapt.
* module/language/cps.scm ($primcall): Add "param" member, which will be
a constant parameter to the primcall. The idea is that constants used
by primcalls as immediates don't need to participate in optimizations
in any way -- they should not participate in CSE, have the same
lifetime as the primcall so not part of DCE either, and don't need
slot allocation. Indirecting them through a named $const binding is
complication for no benefit. This change should eventually improve
compilation time and memory usage, once we fully take advantage of it,
as the number of labels and variables will go down.
* module/language/cps/closure-conversion.scm:
* module/language/cps/compile-bytecode.scm:
* module/language/cps/constructors.scm:
* module/language/cps/contification.scm:
* module/language/cps/cse.scm:
* module/language/cps/dce.scm:
* module/language/cps/effects-analysis.scm:
* module/language/cps/elide-values.scm:
* module/language/cps/handle-interrupts.scm:
* module/language/cps/licm.scm:
* module/language/cps/peel-loops.scm:
* module/language/cps/prune-bailouts.scm:
* module/language/cps/prune-top-level-scopes.scm:
* module/language/cps/reify-primitives.scm:
* module/language/cps/renumber.scm:
* module/language/cps/rotate-loops.scm:
* module/language/cps/self-references.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/specialize-numbers.scm:
* module/language/cps/specialize-primcalls.scm:
* module/language/cps/split-rec.scm:
* module/language/cps/type-checks.scm:
* module/language/cps/type-fold.scm:
* module/language/cps/types.scm:
* module/language/cps/utils.scm:
* module/language/cps/verify.scm:
* module/language/tree-il/compile-cps.scm: Adapt all users.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/slot-allocation.scm ($allocation)
(lookup-nlocals, compute-frame-size, allocate-slots): Adapt to
have one frame size per function, for all clauses.
* module/language/cps/slot-allocation.scm
(add-prompt-control-flow-edges): Fix to add links from prompt bodies
to handlers, even in cases where the handler can reach the body but
the body can't reach the handler.
* test-suite/tests/compiler.test ("prompt body slot allocation"): Add
test case.