* libguile/instructions.c (FOR_EACH_INSTRUCTION_WORD_TYPE): Add word
types for immediate f64 and u64 values.
(TYPE_WIDTH): Bump up by a bit, now that we have 32 word types.
(NOP, parse_instruction): Use 64-bit meta type.
* libguile/vm-engine.c (load-f64, load-u64): New instructions.
* module/language/bytecode.scm (compute-instruction-arity): Add parser
for new instruction word types.
* module/language/cps/compile-bytecode.scm (compile-function): Add
special-cased assemblers for new instructions, and also for scm->u64
and u64->scm which I missed before.
* module/language/cps/effects-analysis.scm (load-f64, load-u64): New
instructions.
* module/language/cps/slot-allocation.scm (compute-needs-slot): load-f64
and load-u64 don't need slots.
(compute-var-representations): Update for new instructions.
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
Specialize scm->f64 and scm->u64 to make-f64 and make-u64.
* module/language/cps/types.scm (load-f64, load-u64): Wire up to type
inference, though currently type inference only runs before
specialization.
* module/language/cps/utils.scm (compute-defining-expressions): For some
reason I don't understand, it's possible to see two definitions that
are equal but not equal? here. Allow for now.
(compute-constant-values): Punch through type conversions to get
constant u64/f64 values.
* module/system/vm/assembler.scm (assembler): Support for new word
types. Export the new assemblers.
* libguile/vm-engine.c (add/immediate, sub/immediate)
(uadd/immediate, usub/immediate, umul/immediate): New instructions.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/slot-allocation.scm (compute-needs-slot):
* module/language/cps/types.scm:
* module/system/vm/assembler.scm (system):
* module/language/cps/effects-analysis.scm: Support
for new instructions.
* module/language/cps/optimize.scm (optimize-first-order-cps): Move
primcall specialization to the last step -- the only benefit of doing
it earlier was easier reasoning about side effects, and we're already
doing that in a more general way with (language cps types).
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
Specialize add and sub to add/immediate and sub/immediate, and
specialize u64 addition as well. U64 specialization doesn't work now
though because computing constant values doesn't work for U64s; oh
well.
* module/language/tree-il/compile-cps.scm (convert): Box results of
bv-f32-ref and bv-f64-ref. Unbox the argument to bv-f32-set! and
bv-f64-set!.
* libguile/vm-engine.c (bv-f32-ref, bv-f64-ref): Results are raw.
(bv-f32-set!, bv-f64-set!): Take unboxed arguments.
* module/system/vm/assembler.scm (emit-scm->f64, emit-f64->scm):
Export.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/effects-analysis.scm: Add support for scm->f64 and
f64->scm.
* module/language/cps/slot-allocation.scm (compute-var-representations):
Add cases for primops returning raw values.
* module/language/cps/types.scm (bv-f32-ref, bv-f32-set!)
(bv-f64-ref, bv-f64-set!): Deal in &f64 values instead of reals.
* libguile/loader.c (scm_find_slot_map_unlocked): Rename from
scm_find_dead_slot_map_unlocked.
* libguile/vm.c (struct slot_map_cache_entry, struct slot_map_cache)
(find_slot_map): Rename, changing "dead_slot" to "slot".
(enum slot_desc): New type.
(scm_i_vm_mark_stack): Interpret slot maps as having two bits per
slot, allowing us to indicate that a slot is live but not a pointer.
* module/language/cps/compile-bytecode.scm (compile-function): Adapt to
emit-slot-map name change.
* module/system/vm/assembler.scm (<asm>): Rename dead-slot-maps field to
slot-maps.
(emit-slot-map): Rename from emit-dead-slot-map.
(link-frame-maps): 2 bits per slot.
* module/language/cps/slot-allocation.scm (lookup-slot-map): Rename from
lookup-dead-slot-map.
(compute-var-representations): New function.
(allocate-slots): Adapt to encode two-bit slot representations.
* libguile/vm-engine.c: S24/S12/S8 operands addressed relative to the
SP, not the FP. Cache the SP instead of a FP-relative locals
pointer. Further cleanups to follow.
* libguile/vm.c (vm_builtin_call_with_values_code): Adapt to mov operand
addresing change.
* module/language/cps/compile-bytecode.scm (compile-function): Reify
SP-relative local indexes where appropriate.
* module/system/vm/assembler.scm (emit-fmov*): New helper, exported as
emit-fmov.
(shuffling-assembler, define-shuffling-assembler): Rewrite to shuffle
via push/pop/drop.
(standard-prelude, opt-prelude, kw-prelude): No need to provide for
shuffling args.
* test-suite/tests/rtl.test: Update.
* module/language/cps/slot-allocation.scm: Don't reserve slots 253-255.
* module/language/cps/utils.scm (solve-flow-equations): Revert to take
separate in and out maps. Take an optional initial worklist.
* module/language/cps/slot-allocation.scm: Adapt to solve-flow-equations
change.
* module/language/cps/slot-allocation.scm (compute-lazy-vars):
(compute-live-variables): Adapt to solve-flow-equations interface
change.
* module/language/cps/utils.scm (solve-flow-equations): Move here. Use
an init value instead of an init map.
* module/language/cps/slot-allocation.scm (allocate-slots): Even if an
expression does not define a live value, it might need a place to
put its value. In that case we should stop scanning for hints,
otherwise e.g. an (current-module) primcall whose value isn't used
could clobber a hinted variable.
* module/language/cps/slot-allocation.scm (allocate-slots): For
continuations of $call, $callk, and $values with multiple
predecessors, recalculate the set of live slots. Fixes miscompilation
of ice-9/futures.scm:process-future!, broken since the previous patch,
now that $kreceive continuations can have multiple predecessors.
* module/language/cps/dfg.scm (compute-live-variables): Convert to use
intsets, and fold in compute-maximum-fixed-point.
(print-dfa): Update.
* module/language/cps/slot-allocation.scm (dead-after-def?)
(dead-after-use?, allocate-slots): Convert to use intsets.
* module/language/cps/slot-allocation.scm (allocate-slots): Avoid
allocating locals in the range [253,255].
* module/system/vm/assembler.scm: List exports explicitly. For
operations with limited-range operands, export wrapper assemblers that
handle shuffling their operands into and out of their range.
(define-assembler): Get rid of enclosing begin.
(shuffling-assembler, define-shuffling-assembler): New helpers to
define shuffling wrapper assemblers.
(emit-mov*, emit-receive*): New functions.
(shuffle-up-args): New helper.
(standard-prelude, opt-prelude, kw-prelude): Call shuffle-up-args
after finishing.
* test-suite/tests/compiler.test ("limits"): Add test cases.
* module/language/cps/slot-allocation.scm (dead-after-def?):
(dead-after-use?, allocate-slots): Remove some needless remapping
between label indexes in the CFA, the DFA, and their names.
* module/language/cps.scm ($closure, $program): New CPS types, part of
low-level (first-order) CPS.
(build-cps-exp, build-cps-term, parse-cps, unparse-cps)
(compute-max-label-and-var): Update for new CPS types.
* module/language/cps/closure-conversion.scm: Rewrite to produce a
$program with $closures, and no $funs.
* module/language/cps/reify-primitives.scm:
* module/language/cps/compile-bytecode.scm (compile-fun):
(compile-bytecode): Adapt to new first-order format.
* module/language/cps/dfg.scm (compute-dfg): Add $closure case.
* module/language/cps/renumber.scm (renumber): Allow this pass to work
on either format.
* module/language/cps/slot-allocation.scm (allocate-slots): Add $closure
case.
* module/language/cps/slot-allocation.scm (allocate-slots): Rework to
avoid computing a CFA, and just relying on the incoming term to have
sorted labels.
* module/language/cps.scm ($kclause, $kentry): Instead of having an
entry continuation contain a list of clauses, have the clauses contain
clauses (as in Tree-IL). In some ways it's not as convenient but it
does reflect the continuation tree correctly.
* module/language/cps/arities.scm:
* module/language/cps/closure-conversion.scm:
* module/language/cps/compile-bytecode.scm:
* module/language/cps/constructors.scm:
* module/language/cps/contification.scm:
* module/language/cps/dce.scm:
* module/language/cps/dfg.scm:
* module/language/cps/elide-values.scm:
* module/language/cps/prune-top-level-scopes.scm:
* module/language/cps/reify-primitives.scm:
* module/language/cps/renumber.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/specialize-primcalls.scm:
* module/language/cps/verify.scm:
* module/language/tree-il/compile-cps.scm: Adapt aaaaaaall users.
* module/language/cps/dfg.scm (lookup-cont): Change to take a DFG
instead of a cont table.
(build-cont-table): Change to return a vector.
* module/language/cps/arities.scm:
* module/language/cps/contification.scm:
* module/language/cps/dce.scm:
* module/language/cps/effects-analysis.scm:
* module/language/cps/elide-values.scm:
* module/language/cps/reify-primitives.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm: Adapt to lookup-cont and
build-cont-table changes.
* module/language/cps.scm ($callk): New expression type, for calls to
known labels. Part of "low CPS".
* module/language/cps/arities.scm:
* module/language/cps/closure-conversion.scm:
* module/language/cps/compile-bytecode.scm:
* module/language/cps/dce.scm:
* module/language/cps/dfg.scm:
* module/language/cps/effects-analysis.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/verify.scm: Adapt call sites.
* libguile/vm-engine.c (call-label, tail-call-label): New instructions.
Renumber the rest; this is an ABI change.
* libguile/_scm.h (SCM_OBJCODE_MINOR_VERSION):
* module/system/vm/assembler.scm (*bytecode-minor-version*): Bump.
* doc/ref/compiler.texi (CPS in Guile): Document $callk.
* module/language/cps/slot-allocation.scm (lookup-dead-slot-map)
(allocate-slots): For each non-tail call in a function, compute the
set of slots that are dead after the function has begun the call.
* module/language/cps/compile-bytecode.scm (compile-fun): Emit the
`dead-slot-map' macro instruction for non-tail calls.
* module/system/vm/assembler.scm (<asm>): Add `dead-slot-maps' member.
(dead-slot-map): New macro-instruction.
(link-frame-maps, link-dynamic-section, link-objects): Write dead
slots information into .guile.frame-maps sections of ELF files.
* module/system/vm/elf.scm (DT_GUILE_FRAME_MAPS): New definition.
* libguile/loader.h:
* libguile/loader.c (DT_GUILE_FRAME_MAPS, process_dynamic_segment):
(load_thunk_from_memory, register_elf): Arrange to parse
DT_GUILE_FRAME_MAPS out of the dynamic section.
(find_mapped_elf_image_unlocked, find_mapped_elf_image): New helpers.
(scm_find_mapped_elf_image): Refactor.
(scm_find_dead_slot_map_unlocked): New interface.
* libguile/vm.c (scm_i_vm_mark_stack): Mark the hottest frame
conservatively, as before. Otherwise use the dead slots map, if
available, to avoid marking data that isn't live.
* module/language/cps/slot-allocation.scm (allocate-slots): For
truncating calls, shuffle the first return value (if any). Avoids
frame size growth due to sparse locals, pegged where they were left by
procedure call returns. With this patch, eval with $ktrunc nodes goes
from 31 locals to 18 (similar to the size before adding $ktrunc
nodes).
* module/language/cps/slot-allocation.scm (allocate-slots): Fix bug in
allocate!, whereby a previously hinted allocation would not be added
to the live set if a hint was not given later.
* module/language/cps.scm:
* module/language/cps/closure-conversion.scm:
* module/language/cps/compile-bytecode.scm:
* module/language/cps/dfg.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/verify.scm:
* module/language/tree-il/compile-cps.scm: Remove "pop" member from
$prompt data type, as it is no longer used.
* module/language/cps/slot-allocation.scm (allocate-slots): Don't
allocate slots to unused results of function calls. This can allow us
to avoid consing a rest list for call-with-values with an ignored rest
parameter, and can improve the parallel move code.
* module/language/cps/compile-bytecode.scm (compile-fun): Adapt to avoid
emitting bind-rest in values context if the rest arg is unused.
* libguile/_scm.h (SCM_OBJCODE_MINOR_VERSION): Bump for frame layout
change.
* libguile/frames.c: Update some static checks.
(scm_frame_num_locals, scm_frame_local_ref, scm_frame_local_set_x):
Update to not skip over uninitialized frames, as that's not a thing
any more.
* libguile/frames.h: Update to remove MVRA. Woo!
* libguile/vm-engine.c (ALLOC_FRAME, RETURN_ONE_VALUE):
(rtl_vm_engine): Update for 3 words per frame instead of 4.
* libguile/vm.c (vm_return_to_continuation): Likewise.
* module/language/cps/slot-allocation.scm (allocate-slots): 3 words per
frame, not 4.
* module/system/vm/assembler.scm (*bytecode-minor-version*): Bump. Also
remove a couple of tc7's that aren't around any more.
* module/language/cps/slot-allocation.scm (allocate-slots): Convert
cont-table to a vector, for ease of access. Run a pass before
allocation that determines the set of variables whose slot allocation
can and should be delayed, so that they can ideally be allocated
directly in an argument slot.
* module/language/cps/slot-allocation.scm ($allocation): Refactor
internal format of allocations. Instead of an allocation being a hash
table of small $allocation objects, it is an $allocation object that
contains packed vectors.
(find-first-trailing-zero): Rework to not need a maximum.
(lookup-maybe-slot): New interface.
(lookup-slot): Raise an error if a var has no slot.
(lookup-call-allocation): New helper.
(lookup-constant-value, lookup-maybe-constant-value):
(lookup-call-proc-slot, lookup-parallel-moves): Adapt to $allocation
change
(allocate-slots): Rewrite so that instead of being recursive, it
traverses the blocks in CFA order. Also, procedure call frames are
now allocated with respect to the live set after using arguments (and
killing any dead-after-use vars); this should make call frames more
compact but it does necessitate a parallel move solution. Therefore
parallel moves are recorded for all calls, for arguments; also if the
continuation is a $ktrunc, the continuation gets parallel moves for
the results.
This rewrite is in preparation to allocating call args directly in the
appropriate slots, where possible.
* module/language/cps/compile-rtl.scm (compile-fun): Adapt to slot
allocation changes, using lookup-maybe-slot where appropriate,
performing parallel moves when calling functions, and expecting return
moves to be associated with $ktrunc continuations.
* module/language/cps.scm ($continue, $cont): Put source information on
the $continue, not on the $cont. Otherwise it is difficult for CPS
conversion to preserve source information.
($fun): Add a src member to $fun. Otherwise we might miss the source
info for the start of the function.
* .dir-locals.el:
* module/language/cps/arities.scm:
* module/language/cps/closure-conversion.scm:
* module/language/cps/compile-rtl.scm:
* module/language/cps/constructors.scm:
* module/language/cps/contification.scm:
* module/language/cps/dfg.scm:
* module/language/cps/elide-values.scm:
* module/language/cps/reify-primitives.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/verify.scm:
* module/language/tree-il/compile-cps.scm: Update the whole CPS world
for this change.
* module/language/cps/compile-rtl.scm (compile-fun): Rewrite to visit
conts in reverse-post-order, which is a topological sort on the basic
blocks.
* module/language/cps/slot-allocation.scm (allocate-slots): Expect a DFG
as an argument.