* module/language/cps/compile-bytecode.scm (compile-function)
(emit-bytecode):
* module/language/cps/slot-allocation.scm (allocate-slots):
* module/language/cps/optimize.scm (cps-default-optimization-options):
Allow the "lazy vars" optimization, a form of slot precoloring, to be
disabled. It will be disabled at -O0 or -O1, to speed compilation
times.
* module/language/cps/slot-allocation.scm
(compute-reverse-control-flow-order): For graphs without back-edges,
use a simplified computation of reverse control flow order.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/slot-allocation.scm ($allocation)
(lookup-nlocals, compute-frame-size, allocate-slots): Adapt to
have one frame size per function, for all clauses.
* module/language/cps/slot-allocation.scm
(add-prompt-control-flow-edges): Fix to add links from prompt bodies
to handlers, even in cases where the handler can reach the body but
the body can't reach the handler.
* test-suite/tests/compiler.test ("prompt body slot allocation"): Add
test case.
* libguile/vm-engine.c (bv-s8-ref, bv-s16-ref, bv-s32-ref, bv-s64-ref):
Unbox index and return unboxed S32 value.
(bv-s8-set!, bv-s16-set!, bv-s32-set!, bv-s64-set!): Unbox index and
take unboxed S32 value.
(bv-u8-ref, bv-u16-ref, bv-u32-ref, bv-u64-ref)
(bv-s8-set!, bv-s16-set!, bv-s32-set!, bv-s64-set!): Likewise, but
with unsigned values.
(bv-f32-ref, bv-f32-set!, bv-f64-ref, bv-f64-set!): Use memcpy to
access the value so we don't have to think about alignment. GCC will
inline this to a single instruction on architectures that support
unaligned access.
* libguile/vm.c (vm_error_out_of_range_uint64)
(vm_error_out_of_range_int64): New helpers.
* module/language/cps/slot-allocation.scm (compute-var-representations):
All bytevector ref operations produce untagged values.
* module/language/cps/types.scm (define-bytevector-accessors): Update
for bytevector untagged indices and values.
* module/language/cps/utils.scm (compute-constant-values): Fix s64
case.
* module/language/tree-il/compile-cps.scm (convert): Box results of all
bytevector accesses, and unbox incoming indices and values.
* libguile/instructions.c (FOR_EACH_INSTRUCTION_WORD_TYPE): Add word
types for immediate f64 and u64 values.
(TYPE_WIDTH): Bump up by a bit, now that we have 32 word types.
(NOP, parse_instruction): Use 64-bit meta type.
* libguile/vm-engine.c (load-f64, load-u64): New instructions.
* module/language/bytecode.scm (compute-instruction-arity): Add parser
for new instruction word types.
* module/language/cps/compile-bytecode.scm (compile-function): Add
special-cased assemblers for new instructions, and also for scm->u64
and u64->scm which I missed before.
* module/language/cps/effects-analysis.scm (load-f64, load-u64): New
instructions.
* module/language/cps/slot-allocation.scm (compute-needs-slot): load-f64
and load-u64 don't need slots.
(compute-var-representations): Update for new instructions.
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
Specialize scm->f64 and scm->u64 to make-f64 and make-u64.
* module/language/cps/types.scm (load-f64, load-u64): Wire up to type
inference, though currently type inference only runs before
specialization.
* module/language/cps/utils.scm (compute-defining-expressions): For some
reason I don't understand, it's possible to see two definitions that
are equal but not equal? here. Allow for now.
(compute-constant-values): Punch through type conversions to get
constant u64/f64 values.
* module/system/vm/assembler.scm (assembler): Support for new word
types. Export the new assemblers.
* libguile/vm-engine.c (add/immediate, sub/immediate)
(uadd/immediate, usub/immediate, umul/immediate): New instructions.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/slot-allocation.scm (compute-needs-slot):
* module/language/cps/types.scm:
* module/system/vm/assembler.scm (system):
* module/language/cps/effects-analysis.scm: Support
for new instructions.
* module/language/cps/optimize.scm (optimize-first-order-cps): Move
primcall specialization to the last step -- the only benefit of doing
it earlier was easier reasoning about side effects, and we're already
doing that in a more general way with (language cps types).
* module/language/cps/specialize-primcalls.scm (specialize-primcalls):
Specialize add and sub to add/immediate and sub/immediate, and
specialize u64 addition as well. U64 specialization doesn't work now
though because computing constant values doesn't work for U64s; oh
well.
* module/language/tree-il/compile-cps.scm (convert): Box results of
bv-f32-ref and bv-f64-ref. Unbox the argument to bv-f32-set! and
bv-f64-set!.
* libguile/vm-engine.c (bv-f32-ref, bv-f64-ref): Results are raw.
(bv-f32-set!, bv-f64-set!): Take unboxed arguments.
* module/system/vm/assembler.scm (emit-scm->f64, emit-f64->scm):
Export.
* module/language/cps/compile-bytecode.scm (compile-function):
* module/language/cps/effects-analysis.scm: Add support for scm->f64 and
f64->scm.
* module/language/cps/slot-allocation.scm (compute-var-representations):
Add cases for primops returning raw values.
* module/language/cps/types.scm (bv-f32-ref, bv-f32-set!)
(bv-f64-ref, bv-f64-set!): Deal in &f64 values instead of reals.
* libguile/loader.c (scm_find_slot_map_unlocked): Rename from
scm_find_dead_slot_map_unlocked.
* libguile/vm.c (struct slot_map_cache_entry, struct slot_map_cache)
(find_slot_map): Rename, changing "dead_slot" to "slot".
(enum slot_desc): New type.
(scm_i_vm_mark_stack): Interpret slot maps as having two bits per
slot, allowing us to indicate that a slot is live but not a pointer.
* module/language/cps/compile-bytecode.scm (compile-function): Adapt to
emit-slot-map name change.
* module/system/vm/assembler.scm (<asm>): Rename dead-slot-maps field to
slot-maps.
(emit-slot-map): Rename from emit-dead-slot-map.
(link-frame-maps): 2 bits per slot.
* module/language/cps/slot-allocation.scm (lookup-slot-map): Rename from
lookup-dead-slot-map.
(compute-var-representations): New function.
(allocate-slots): Adapt to encode two-bit slot representations.
* libguile/vm-engine.c: S24/S12/S8 operands addressed relative to the
SP, not the FP. Cache the SP instead of a FP-relative locals
pointer. Further cleanups to follow.
* libguile/vm.c (vm_builtin_call_with_values_code): Adapt to mov operand
addresing change.
* module/language/cps/compile-bytecode.scm (compile-function): Reify
SP-relative local indexes where appropriate.
* module/system/vm/assembler.scm (emit-fmov*): New helper, exported as
emit-fmov.
(shuffling-assembler, define-shuffling-assembler): Rewrite to shuffle
via push/pop/drop.
(standard-prelude, opt-prelude, kw-prelude): No need to provide for
shuffling args.
* test-suite/tests/rtl.test: Update.
* module/language/cps/slot-allocation.scm: Don't reserve slots 253-255.
* module/language/cps/utils.scm (solve-flow-equations): Revert to take
separate in and out maps. Take an optional initial worklist.
* module/language/cps/slot-allocation.scm: Adapt to solve-flow-equations
change.
* module/language/cps/slot-allocation.scm (compute-lazy-vars):
(compute-live-variables): Adapt to solve-flow-equations interface
change.
* module/language/cps/utils.scm (solve-flow-equations): Move here. Use
an init value instead of an init map.
* module/language/cps/slot-allocation.scm (allocate-slots): Even if an
expression does not define a live value, it might need a place to
put its value. In that case we should stop scanning for hints,
otherwise e.g. an (current-module) primcall whose value isn't used
could clobber a hinted variable.
* module/language/cps/slot-allocation.scm (allocate-slots): For
continuations of $call, $callk, and $values with multiple
predecessors, recalculate the set of live slots. Fixes miscompilation
of ice-9/futures.scm:process-future!, broken since the previous patch,
now that $kreceive continuations can have multiple predecessors.
* module/language/cps/dfg.scm (compute-live-variables): Convert to use
intsets, and fold in compute-maximum-fixed-point.
(print-dfa): Update.
* module/language/cps/slot-allocation.scm (dead-after-def?)
(dead-after-use?, allocate-slots): Convert to use intsets.
* module/language/cps/slot-allocation.scm (allocate-slots): Avoid
allocating locals in the range [253,255].
* module/system/vm/assembler.scm: List exports explicitly. For
operations with limited-range operands, export wrapper assemblers that
handle shuffling their operands into and out of their range.
(define-assembler): Get rid of enclosing begin.
(shuffling-assembler, define-shuffling-assembler): New helpers to
define shuffling wrapper assemblers.
(emit-mov*, emit-receive*): New functions.
(shuffle-up-args): New helper.
(standard-prelude, opt-prelude, kw-prelude): Call shuffle-up-args
after finishing.
* test-suite/tests/compiler.test ("limits"): Add test cases.
* module/language/cps/slot-allocation.scm (dead-after-def?):
(dead-after-use?, allocate-slots): Remove some needless remapping
between label indexes in the CFA, the DFA, and their names.
* module/language/cps.scm ($closure, $program): New CPS types, part of
low-level (first-order) CPS.
(build-cps-exp, build-cps-term, parse-cps, unparse-cps)
(compute-max-label-and-var): Update for new CPS types.
* module/language/cps/closure-conversion.scm: Rewrite to produce a
$program with $closures, and no $funs.
* module/language/cps/reify-primitives.scm:
* module/language/cps/compile-bytecode.scm (compile-fun):
(compile-bytecode): Adapt to new first-order format.
* module/language/cps/dfg.scm (compute-dfg): Add $closure case.
* module/language/cps/renumber.scm (renumber): Allow this pass to work
on either format.
* module/language/cps/slot-allocation.scm (allocate-slots): Add $closure
case.
* module/language/cps/slot-allocation.scm (allocate-slots): Rework to
avoid computing a CFA, and just relying on the incoming term to have
sorted labels.
* module/language/cps.scm ($kclause, $kentry): Instead of having an
entry continuation contain a list of clauses, have the clauses contain
clauses (as in Tree-IL). In some ways it's not as convenient but it
does reflect the continuation tree correctly.
* module/language/cps/arities.scm:
* module/language/cps/closure-conversion.scm:
* module/language/cps/compile-bytecode.scm:
* module/language/cps/constructors.scm:
* module/language/cps/contification.scm:
* module/language/cps/dce.scm:
* module/language/cps/dfg.scm:
* module/language/cps/elide-values.scm:
* module/language/cps/prune-top-level-scopes.scm:
* module/language/cps/reify-primitives.scm:
* module/language/cps/renumber.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/specialize-primcalls.scm:
* module/language/cps/verify.scm:
* module/language/tree-il/compile-cps.scm: Adapt aaaaaaall users.
* module/language/cps/dfg.scm (lookup-cont): Change to take a DFG
instead of a cont table.
(build-cont-table): Change to return a vector.
* module/language/cps/arities.scm:
* module/language/cps/contification.scm:
* module/language/cps/dce.scm:
* module/language/cps/effects-analysis.scm:
* module/language/cps/elide-values.scm:
* module/language/cps/reify-primitives.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm: Adapt to lookup-cont and
build-cont-table changes.
* module/language/cps.scm ($callk): New expression type, for calls to
known labels. Part of "low CPS".
* module/language/cps/arities.scm:
* module/language/cps/closure-conversion.scm:
* module/language/cps/compile-bytecode.scm:
* module/language/cps/dce.scm:
* module/language/cps/dfg.scm:
* module/language/cps/effects-analysis.scm:
* module/language/cps/simplify.scm:
* module/language/cps/slot-allocation.scm:
* module/language/cps/verify.scm: Adapt call sites.
* libguile/vm-engine.c (call-label, tail-call-label): New instructions.
Renumber the rest; this is an ABI change.
* libguile/_scm.h (SCM_OBJCODE_MINOR_VERSION):
* module/system/vm/assembler.scm (*bytecode-minor-version*): Bump.
* doc/ref/compiler.texi (CPS in Guile): Document $callk.
* module/language/cps/slot-allocation.scm (lookup-dead-slot-map)
(allocate-slots): For each non-tail call in a function, compute the
set of slots that are dead after the function has begun the call.
* module/language/cps/compile-bytecode.scm (compile-fun): Emit the
`dead-slot-map' macro instruction for non-tail calls.
* module/system/vm/assembler.scm (<asm>): Add `dead-slot-maps' member.
(dead-slot-map): New macro-instruction.
(link-frame-maps, link-dynamic-section, link-objects): Write dead
slots information into .guile.frame-maps sections of ELF files.
* module/system/vm/elf.scm (DT_GUILE_FRAME_MAPS): New definition.
* libguile/loader.h:
* libguile/loader.c (DT_GUILE_FRAME_MAPS, process_dynamic_segment):
(load_thunk_from_memory, register_elf): Arrange to parse
DT_GUILE_FRAME_MAPS out of the dynamic section.
(find_mapped_elf_image_unlocked, find_mapped_elf_image): New helpers.
(scm_find_mapped_elf_image): Refactor.
(scm_find_dead_slot_map_unlocked): New interface.
* libguile/vm.c (scm_i_vm_mark_stack): Mark the hottest frame
conservatively, as before. Otherwise use the dead slots map, if
available, to avoid marking data that isn't live.
* module/language/cps/slot-allocation.scm (allocate-slots): For
truncating calls, shuffle the first return value (if any). Avoids
frame size growth due to sparse locals, pegged where they were left by
procedure call returns. With this patch, eval with $ktrunc nodes goes
from 31 locals to 18 (similar to the size before adding $ktrunc
nodes).