update docs for recent vm/compiler work

* doc/ref/compiler.texi: * doc/ref/vm.texi: Update for recent changes. * module/language/assembly/disassemble.scm (disassemble-load-program): Don't print nops, they are distracting.
2025-06-17 01:00:20 +02:00 · 2009-08-12 23:38:05 +02:00 · 2009-08-12 23:38:05 +02:00 · 98850fd727
commit 98850fd727
parent aaae0d5ab3
3 changed files with 313 additions and 165 deletions
--- a/doc/ref/compiler.texi
+++ b/doc/ref/compiler.texi
@ -17,7 +17,7 @@ This section aims to pay attention to the small man behind the
 curtain.

@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
-know how to compile your .scm file.
+know how to compile your @code{.scm} file.

@menu
 * Compiler Tower::                   
@ -67,8 +67,7 @@ for Scheme:
  #:title       "Guile Scheme"
  #:version     "0.5"
  #:reader      read
-  #:compilers   `((tree-il . ,compile-tree-il)
-                  (ghil . ,compile-ghil))
+  #:compilers   `((tree-il . ,compile-tree-il))
  #:decompilers `((tree-il . ,decompile-tree-il))
  #:evaluator   (lambda (x module) (primitive-eval x))
  #:printer     write)
@ -220,13 +219,13 @@ Note however that @code{sc-expand} does not have the same signature as
 around @code{sc-expand}, to make it conform to the general form of
 compiler procedures in Guile's language tower.

-Compiler procedures take two arguments, an expression and an
-environment. They return three values: the compiled expression, the
-corresponding environment for the target language, and a
-``continuation environment''. The compiled expression and environment
-will serve as input to the next language's compiler. The
-``continuation environment'' can be used to compile another expression
-from the same source language within the same module.
+Compiler procedures take three arguments: an expression, an
+environment, and a keyword list of options. They return three values:
+the compiled expression, the corresponding environment for the target
+language, and a ``continuation environment''. The compiled expression
+and environment will serve as input to the next language's compiler.
+The ``continuation environment'' can be used to compile another
+expression from the same source language within the same module.

 For example, you might compile the expression, @code{(define-module
 (foo))}. This will result in a Tree-IL expression and environment. But
@ -292,6 +291,14 @@ tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))

 The @code{src} fields are left out of the external representation.

+One may create Tree-IL objects from their external representations via
+calling @code{parse-tree-il}, the reader for Tree-IL. If any source
+information is attached to the input S-expression, it will be
+propagated to the resulting Tree-IL expressions. This is probably the
+easiest way to compile to Tree-IL: just make the appropriate external
+representations in S-expression format, and let @code{parse-tree-il}
+take care of the rest.
+
@deftp {Scheme Variable} <void> src
@deftpx {External Representation} (void)
 An empty expression. In practice, equivalent to Scheme's @code{(if #f
@ -384,12 +391,29 @@ A version of @code{<let>} that creates recursive bindings, like
 Scheme's @code{letrec}.
@end deftp

-@c FIXME -- need to revive this one
-@c @deftp {Scheme Variable} <ghil-mv-bind> src vars rest producer . body
-@c Like Scheme's @code{receive} -- binds the values returned by
-@c applying @code{producer}, which should be a thunk, to the
-@c @code{lambda}-like bindings described by @var{vars} and @var{rest}.
-@c @end deftp
+There are two Tree-IL constructs that are not normally produced by
+higher-level compilers, but instead are generated during the
+source-to-source optimization and analysis passes that the Tree-IL
+compiler does. Users should not generate these expressions directly,
+unless they feel very clever, as the default analysis pass will
+generate them as necessary.
+
+@deftp {Scheme Variable} <let-values> src names vars exp body
+@deftpx {External Representation} (let-values @var{names} @var{vars} @var{exp} @var{body})
+Like Scheme's @code{receive} -- binds the values returned by
+evaluating @code{exp} to the @code{lambda}-like bindings described by
+@var{vars}. That is to say, @var{vars} may be an improper list.
+
+@code{<let-values>} is an optimization of @code{<application>} of the
+primitive, @code{call-with-values}.
+@end deftp
+@deftp {Scheme Variable} <fix> src names vars vals body
+@deftpx {External Representation} (fix @var{names} @var{vars} @var{vals} @var{body})
+Like @code{<letrec>}, but only for @var{vals} that are unset
+@code{lambda} expressions.
+
+@code{fix} is an optimization of @code{letrec} (and @code{let}).
+@end deftp

 Tree-IL implements a compiler to GLIL that recursively traverses
 Tree-IL expressions, writing out GLIL expressions into a linear list.
@ -399,9 +423,9 @@ future computations. This state allows the compiler not to emit code
 for constant expressions that will not be used (e.g. docstrings), and
 to perform tail calls when in tail position.

-In the future, there will be a pass at the beginning of the
-Tree-IL->GLIL compilation step to perform inlining, copy propagation,
-dead code elimination, and constant folding.
+Most optimization, such as it currently is, is performed on Tree-IL
+expressions as source-to-source transformations. There will be more
+optimizations added in the future.

 Interested readers are encouraged to read the implementation in
@code{(language tree-il compile-glil)} for more details.
@ -411,18 +435,16 @@ Interested readers are encouraged to read the implementation in

 Guile Low Intermediate Language (GLIL) is a structured intermediate
 language whose expressions more closely approximate Guile's VM
-instruction set.
+instruction set. Its expression types are defined in @code{(language
+glil)}.

-Its expression types are defined in @code{(language glil)}, and as
-with GHIL, some of its fields parse as rest arguments.
-
-@deftp {Scheme Variable} <glil-program> nargs nrest nlocs nexts meta . body
+@deftp {Scheme Variable} <glil-program> nargs nrest nlocs meta . body
 A unit of code that at run-time will correspond to a compiled
-procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts}
-collectively define the program's arity; see @ref{Compiled
-Procedures}, for more information. @var{meta} should be an alist of
-properties, as in Tree IL's @code{<lambda>}. @var{body} is a list of
-GLIL expressions.
+procedure. @var{nargs} @var{nrest} and @var{nlocs} collectively define
+the program's arity; see @ref{Compiled Procedures}, for more
+information. @var{meta} should be an alist of properties, as in
+Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
+expressions.
@end deftp
@deftp {Scheme Variable} <glil-bind> . vars
 An advisory expression that notes a liveness extent for a set of
@ -461,23 +483,21 @@ and @code{filename} keys, e.g. as returned by
@code{source-properties}.
@end deftp
@deftp {Scheme Variable} <glil-void>
-Pushes the unspecified value on the stack.
+Pushes ``the unspecified value'' on the stack.
@end deftp
@deftp {Scheme Variable} <glil-const> obj
 Pushes a constant value onto the stack. @var{obj} must be a number,
-string, symbol, keyword, boolean, character, the empty list, or a pair
-or vector of constants.
+string, symbol, keyword, boolean, character, uniform array, the empty
+list, or a pair or vector of constants.
@end deftp
-@deftp {Scheme Variable} <glil-local> op index
-Accesses a lexically bound variable from the stack. If @var{op} is
-@code{ref}, the value is pushed onto the stack; if it is @code{set},
-the variable is set from the top value on the stack, which is popped
-off. @xref{Stack Layout}, for more information.
-@end deftp
-@deftp {Scheme Variable} <glil-external> op depth index
-Accesses a heap-allocated variable, addressed by @var{depth}, the nth
-enclosing environment, and @var{index}, the variable's position within
-the environment. @var{op} is @code{ref} or @code{set}.
+@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
+Accesses a lexically bound variable. If the variable is not
+@var{local?} it is free. All variables may have @code{ref} and
+@code{set} as their @var{op}. Boxed variables may also have the
+@var{op}s @code{box}, @code{empty-box}, and @code{fix}, which
+correspond in semantics to the VM instructions @code{box},
+@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
+more information.
@end deftp
@deftp {Scheme Variable} <glil-toplevel> op name
 Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
@ -520,7 +540,7 @@ Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 on Guile 1.9.0
 Copyright (C) 2001-2008 Free Software Foundation, Inc.

 Enter `,help' for help.
-glil@@(guile-user)> (program 0 0 0 0 () (const 3) (call return 0))
+glil@@(guile-user)> (program 0 0 0 () (const 3) (call return 1))
@result{} 3
@end example

@ -542,12 +562,12 @@ differs from GLIL in four main ways:
@itemize
@item Labels have been resolved to byte offsets in the program.
@item Constants inside procedures have either been expressed as inline
-instructions, and possibly cached in object arrays.
+instructions or cached in object arrays.
@item Procedures with metadata (source location information, liveness
 extents, procedure names, generic properties, etc) have had their
 metadata serialized out to thunks.
@item All expressions correspond directly to VM instructions -- i.e.,
-there is no @code{<glil-local>} which can be a ref or a set.
+there is no @code{<glil-lexical>} which can be a ref or a set.
@end itemize

 Assembly is isomorphic to the bytecode that it compiles to. You can
@ -567,10 +587,11 @@ example:

@example
 scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
-(load-program 0 0 0 0
+(load-program 0 0 0
  () ; Labels
-  60 ; Length
+  70 ; Length
  #f ; Metadata
+  (make-false)
  (make-false) ; object table for the returned lambda
  (nop)
  (nop) ; Alignment. Since assembly has already resolved its labels
@ -578,11 +599,12 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
  (nop) ; object code is mmap'd directly to structures, assembly
  (nop) ; has to have the alignment embedded in it.
  (nop) 
-  (load-program 1 0 0 0 
+  (load-program
+    1
+    0
    ()
-    6
-    ; This is the metadata thunk for the returned procedure.
-    (load-program 0 0 0 0 () 21 #f
+    8
+    (load-program 0 0 0 () 21 #f
      (load-symbol "x")  ; Name and liveness extent for @code{x}.
      (make-false)
      (make-int8:0) ; Some instruction+arg combinations
@ -597,7 +619,9 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
    (local-ref 0)
    (local-ref 0)
    (add)
-    (return))
+    (return)
+    (nop)
+    (nop))
  ; Return our new procedure.
  (return))
@end example
@ -618,10 +642,10 @@ the next step down from assembly:

@example
 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
-@result{} (load-program 0 0 0 0 () 6 #f
+@result{} (load-program 0 0 0 () 6 #f
       (make-int8 32) (make-int8 10) (add) (return))
 scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
-@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 10 32 10 10 100 48)
+@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 10 32 10 10 120 52)
@end example

 ``Objcode'' is bytecode, but mapped directly to a C structure,
@ -631,8 +655,7 @@ scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
 struct scm_objcode @{
  scm_t_uint8 nargs;
  scm_t_uint8 nrest;
-  scm_t_uint8 nlocs;
-  scm_t_uint8 nexts;
+  scm_t_uint16 nlocs;
  scm_t_uint32 len;
  scm_t_uint32 metalen;
  scm_t_uint8 base[0];
@ -642,7 +665,7 @@ struct scm_objcode @{
 As one might imagine, objcode imposes a minimum length on the
 bytecode. Also, the multibyte fields are in native endianness, which
 makes objcode (and bytecode) system-dependent. Indeed, in the short
-example above, all but the last 5 bytes were the program's header.
+example above, all but the last 6 bytes were the program's header.

 Objcode also has a couple of important efficiency hacks. First,
 objcode may be mapped directly from disk, allowing compiled code to be
@ -672,7 +695,7 @@ Makes a bytecode object from @var{bytecode}, which should be a
 Load object code from a file named @var{file}. The file will be mapped
 into memory via @code{mmap}, so this is a very fast operation.

-On disk, object code has an eight-byte cookie prepended to it, to
+On disk, object code has an sixteen-byte cookie prepended to it, to
 prevent accidental loading of arbitrary garbage.
@end deffn

@ -689,11 +712,11 @@ Copy object code out to a @code{u8vector} for analysis by Scheme.
 The following procedure is actually in @code{(system vm program)}, but
 we'll mention it here:

-@deffn {Scheme Variable} make-program objcode objtable [external='()]
-@deffnx {C Function} scm_make_program (objcode, objtable, external)
+@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
+@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
 Load up object code into a Scheme program. The resulting program will
 have @var{objtable} as its object table, which should be a vector or
-@code{#f}, and will capture the closure variables from @var{external}.
+@code{#f}, and will capture the free variables from @var{free-vars}.
@end deffn

 Object code from a file may be disassembled at the REPL via the
@ -707,9 +730,9 @@ respect to the compilation environment. Normally the environment
 propagates through the compiler transparently, but users may specify
 the compilation environment manually as well:

-@deffn {Scheme Procedure} make-objcode-env module externals
+@deffn {Scheme Procedure} make-objcode-env module free-vars
 Make an object code environment. @var{module} should be a Scheme
-module, and @var{externals} should be a list of external variables.
+module, and @var{free-vars} should be a vector of free variables.
@code{#f} is also a valid object code environment.
@end deffn

@ -748,12 +771,14 @@ procedure is called a certain number of times.
 The name of the game is a profiling-based harvest of the low-hanging
 fruit, running programs of interest under a system-level profiler and
 determining which improvements would give the most bang for the buck.
-There are many well-known efficiency hacks in the literature: Dybvig's
-letrec optimization, individual boxing of heap-allocated values (and
-then store the boxes on the stack directly), optimized case-lambda
-expressions, stack underflow and overflow handlers, etc. Highly
-recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
+It's really getting to the point though that native compilation is the
+next step.

 The compiler also needs help at the top end, enhancing the Scheme that
-it knows to also understand R6RS, and adding new high-level compilers:
-Emacs Lisp, Lua, JavaScript...
+it knows to also understand R6RS, and adding new high-level compilers.
+We have JavaScript and Emacs Lisp mostly complete, but they could use
+some love; Lua would be nice as well, butq whatever language it is
+that strikes your fancy would be welcome too.
+
+Compilers are for hacking, not for admiring or for complaining about.
+Get to it!
--- a/doc/ref/vm.texi
+++ b/doc/ref/vm.texi
@ -13,8 +13,8 @@ procedures can call each other as they please.

 The difference is that the compiler creates and interprets bytecode
 for a custom virtual machine, instead of interpreting the
-S-expressions directly. Running compiled code is faster than running
-interpreted code.
+S-expressions directly. Loading and running compiled code is faster
+than loading and running source code.

 The virtual machine that does the bytecode interpretation is a part of
 Guile itself. This section describes the nature of Guile's virtual
@ -134,7 +134,7 @@ compiled to object code, one might never leave the virtual machine.
@subsection Stack Layout

 While not strictly necessary to understand how to work with the VM, it
-is instructive and sometimes entertaining to consider the struture of
+is instructive and sometimes entertaining to consider the structure of
 the VM stack.

 Logically speaking, a VM stack is composed of ``frames''. Each frame
@ -159,12 +159,11 @@ The structure of the fixed part of an application frame is as follows:

@example
             Stack
-   |                  | <- fp + bp->nargs + bp->nlocs + 4
+   |                  | <- fp + bp->nargs + bp->nlocs + 3
   +------------------+    = SCM_FRAME_UPPER_ADDRESS (fp)
   | Return address   |
   | MV return address|
-   | Dynamic link     |
-   | External link    | <- fp + bp->nargs + bp->nlocs
+   | Dynamic link     | <- fp + bp->nargs + bp->nlocs
   | Local variable 1 |    = SCM_FRAME_DATA_ADDRESS (fp)
   | Local variable 0 | <- fp + bp->nargs
   | Argument 1       |
@ -201,25 +200,17 @@ values being returned.
@item Dynamic link
 This is the @code{fp} in effect before this program was applied. In
 effect, this and the return address are the registers that are always
-``saved''.
-
-@item External link
-This field is a reference to the list of heap-allocated variables
-associated with this frame. For a discussion of heap versus stack
-allocation, @xref{Variables and the VM}.
+``saved''. The dynamic link links the current frame to the previous
+frame; computing a stack trace involves traversing these frames.

@item Local variable @var{n}
-Lambda-local variables that are allocated on the stack are all
-allocated as part of the frame. This makes access to non-captured,
-non-mutated variables very cheap.
+Lambda-local variables that are all allocated as part of the frame.
+This makes access to variables very cheap.

@item Argument @var{n}
 The calling convention of the VM requires arguments of a function
-application to be pushed on the stack, and here they are. Normally
-references to arguments dispatch to these locations on the stack.
-However if an argument has to be stored on the heap, it will be copied
-from its initial value here onto a location in the heap, and
-thereafter only referenced on the heap.
+application to be pushed on the stack, and here they are. References
+to arguments dispatch to these locations on the stack.

@item Program
 This is the program being applied. For more information on how
@ -236,26 +227,44 @@ Consider the following Scheme code as an example:
    (lambda (b) (list foo a b)))
@end example

-Within the lambda expression, "foo" is a top-level variable, "a" is a
-lexically captured variable, and "b" is a local variable.
+Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a
+lexically captured variable, and @code{b} is a local variable.

-@code{b} may safely be allocated on the stack, as there is no enclosed
-procedure that references it, nor is it ever mutated.
+Another way to refer to @code{a} and @code{b} is to say that @code{a}
+is a ``free'' variable, since it is not defined within the lambda, and
+@code{b} is a ``bound'' variable. These are the terms used in the
+@dfn{lambda calculus}, a mathematical notation for describing
+functions. The lambda calculus is useful because it allows one to
+prove statements about functions. It is especially good at describing
+scope relations, and it is for that reason that we mention it here.

-@code{a}, on the other hand, is referenced by an enclosed procedure,
-that of the lambda. Thus it must be allocated on the heap, as it may
-(and will) outlive the dynamic extent of the invocation of @code{foo}.
+Guile allocates all variables on the stack. When a lexically enclosed
+procedure with free variables---a @dfn{closure}---is created, it
+copies those variables its free variable vector. References to free
+variables are then redirected through the free variable vector.

-@code{foo} is a top-level variable, because it names the procedure
-@code{foo}, which is here defined at the top-level.
+If a variable is ever @code{set!}, however, it will need to be
+heap-allocated instead of stack-allocated, so that different closures
+that capture the same variable can see the same value. Also, this
+allows continuations to capture a reference to the variable, instead
+of to its value at one point in time. For these reasons, @code{set!}
+variables are allocated in ``boxes''---actually, in variable cells.
+@xref{Variables}, for more information. References to @code{set!}
+variables are indirected through the boxes.

-Note that variables that are mutated (via @code{set!}) must be
-allocated on the heap, even if they are local variables. This is
-because any called subprocedure might capture the continuation, which
-would need to capture locations instead of values. Thus perhaps
-counterintuitively, what would seem ``closer to the metal'', viz
-@code{set!}, actually forces heap allocation instead of stack
-allocation.
+Thus perhaps counterintuitively, what would seem ``closer to the
+metal'', viz @code{set!}, actually forces an extra memory allocation
+and indirection.
+
+Going back to our example, @code{b} may be allocated on the stack, as
+it is never mutated.
+
+@code{a} may also be allocated on the stack, as it too is never
+mutated. Within the enclosed lambda, its value will be copied into
+(and referenced from) the free variables vector.
+
+@code{foo} is a top-level variable, because @code{foo} is not
+lexically bound in this example.

@node VM Programs
@subsection Compiled Procedures are VM Programs
@ -297,27 +306,26 @@ scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
 scheme@@(guile-user)> ,x foo
 Disassembly of #<program foo (a)>:

-   0    (local-ref 0)                   ;; `a' (arg)
-   2    (external-set 0)                ;; `a' (arg)
-   4    (object-ref 1)                  ;; #<program b70d2910 at <unknown port>:0:16 (b)>
-   6    (make-closure)                  
-   7    (return)                        
+   0    (object-ref 1)                  ;; #<program b7e478b0 at <unknown port>:0:16 (b)>
+   2    (local-ref 0)                   ;; `a' (arg)
+   4    (vector 0 1)                    ;; 1 element
+   7    (make-closure)                  
+   8    (return)                        

 ----------------------------------------
-Disassembly of #<program b70d2910 at <unknown port>:0:16 (b)>:
+Disassembly of #<program b7e478b0 at <unknown port>:0:16 (b)>:

   0    (toplevel-ref 1)                ;; `foo'
-   2    (external-ref 0)                ;; (closure variable)
+   2    (free-ref 0)                    ;; (closure variable)
   4    (local-ref 0)                   ;; `b' (arg)
   6    (list 0 3)                      ;; 3 elements         at (unknown file):0:28
   9    (return)                        
@end smallexample

-At @code{ip} 0 and 2, we do the copy from argument to heap for
-@code{a}. @code{Ip} 4 loads up the compiled lambda, and then at
-@code{ip} 6 we make a closure---binding code (from the compiled
-lambda) with data (the heap-allocated variables). Finally we return
-the closure.
+At @code{ip} 0, we load up the compiled lambda. @code{Ip} 2 and 4
+create the free variables vector, and @code{ip} 7 makes the
+closure---binding code (from the compiled lambda) with data (the
+free-variable vector). Finally we return the closure.

 The second stanza disassembles the compiled lambda. Toplevel variables
 are resolved relative to the module that was current when the
@ -336,7 +344,7 @@ routine.
@node Instruction Set
@subsection Instruction Set

-There are about 100 instructions in Guile's virtual machine. These
+There are about 150 instructions in Guile's virtual machine. These
 instructions represent atomic units of a program's execution. Ideally,
 they perform one task without conditional branches, then dispatch to
 the next instruction in the stream.
@ -376,16 +384,22 @@ instructions. More instructions may be added over time.
 * Miscellaneous Instructions::  
 * Inlined Scheme Instructions::  
 * Inlined Mathematical Instructions::  
+* Inlined Bytevector Instructions::  
@end menu

@node Environment Control Instructions
@subsubsection Environment Control Instructions

 These instructions access and mutate the environment of a compiled
-procedure---the local bindings, the ``external'' bindings, and the
+procedure---the local bindings, the free (captured) bindings, and the
 toplevel bindings.

+Some of these instructions have @code{long-} variants, the difference
+being that they take 16-bit arguments, encoded in big-endianness,
+instead of the normal 8-bit range.
+
@deffn Instruction local-ref index
+@deffnx Instruction long-local-ref index
 Push onto the stack the value of the local variable located at
@var{index} within the current stack frame.

@ -395,26 +409,62 @@ arguments.
@end deffn

@deffn Instruction local-set index
+@deffnx Instruction long-local-ref index
 Pop the Scheme object located on top of the stack and make it the new
 value of the local variable located at @var{index} within the current
 stack frame.
@end deffn

-@deffn Instruction external-ref index
-Push the value of the closure variable located at position
-@var{index} within the program's list of external variables.
+@deffn Instruction free-ref index
+Push the value of the captured variable located at position
+@var{index} within the program's vector of captured variables.
@end deffn

-@deffn Instruction external-set index
-Pop the Scheme object located on top of the stack and make it the new
-value of the closure variable located at @var{index} within the
-program's list of external variables.
+@deffn Instruction free-boxed-ref index
+@deffnx Instruction free-boxed-set index
+Get or set a boxed free variable. Note that there is no free-set
+instruction, as variables that are @code{set!} must be boxed.
+
+These instructions assume that the value at position @var{index} in
+the free variables vector is a variable.
@end deffn

-The external variable lookup algorithm should probably be made more
-efficient in the future via addressing by frame and index. Currently,
-external variables are all consed onto a list, which results in O(N)
-lookup time.
+@deffn Instruction make-closure
+Pop a vector and a program object off the stack, in that order, and
+push a new program object with the given free variables vector. The
+new program object shares state with the original program.
+
+At the time of this writing, the space overhead of closures is 4 words
+per closure.
+@end deffn
+
+@deffn Instruction fix-closure index
+Pop a vector off the stack, and set it as the @var{index}th local
+variable's free variable vector. The @var{index}th local variable is
+assumed to be a procedure.
+
+This instruction is part of a hack for allocating mutually recursive
+procedures. The hack is to first perform a @code{local-set} for all of
+the recursive procedures, then fix up the procedures' free variable
+bindings in place. This allows most @code{letrec}-bound procedures to
+be allocated unboxed on the stack.
+
+One could of course do a @code{local-ref}, then @code{make-closure},
+then @code{local-set}, but this macroinstruction helps to speed up the
+common case.
+@end deffn
+
+@deffn Instruction box index
+Pop a value off the stack, and set the @var{index}nth local variable
+to a box containing that value. A shortcut for @code{make-variable}
+then @code{local-set}, used when binding boxed variables.
+@end deffn
+
+@deffn Instruction empty-box index
+Set the @var{indext}h local variable to a box containing a variable
+whose value is unbound. Used when compiling some @code{letrec}
+expressions.
+@end deffn

@deffn Instruction toplevel-ref index
@deffnx Instruction long-toplevel-ref index
@ -442,9 +492,6 @@ in-place mutation of the object table. This mechanism provides for
 lazy variable resolution, and an important cached fast-path once the
 variable has been successfully resolved.

-The ``long'' variant has a 16-bit index instead of an 8-bit index,
-with the most significant byte first.
-
 This instruction pushes the value of the variable onto the stack.
@end deffn

@ -453,8 +500,13 @@ This instruction pushes the value of the variable onto the stack.
 Pop a value off the stack, and set it as the value of the toplevel
 variable stored at @var{index} in the object table. If the variable
 has not yet been looked up, we do the lookup as in
-@code{toplevel-ref}. The ``long'' variant has a 16-bit index instead
-of an 8-bit index.
+@code{toplevel-ref}.
+@end deffn
+
+@deffn Instruction define
+Pop a symbol and a value from the stack, in that order. Look up its
+binding in the current toplevel environment, creating the binding if
+necessary. Set the variable to the value.
@end deffn

@deffn Instruction link-now
@ -476,6 +528,11 @@ Pop off two objects from the stack, a variable and a value, and set
 the variable to the value.
@end deffn

+@deffn Instruction make-variable
+Replace the top object on the stack with a variable containing it.
+Used in some circumstances when compiling @code{letrec} expressions.
+@end deffn
+
@deffn Instruction object-ref n
@deffnx Instruction long-object-ref n
 Push @var{n}th value from the current program's object vector. The
@ -499,7 +556,10 @@ the one to which the instruction pointer points).
@end itemize

 Note that the offset passed to the instruction is encoded on two 8-bit
-integers which are then combined by the VM as one 16-bit integer.
+integers which are then combined by the VM as one 16-bit integer. Note
+also that jump targets in Guile are aligned on 8-byte boundaries, and
+that the offset refers to the @var{n}th 8-byte boundary, effectively
+giving Guile a 19-bit relative address space.

@deffn Instruction br offset
 Jump to @var{offset}.
@ -550,19 +610,21 @@ Load an arbitrary number from the instruction stream. The number is
 embedded in the stream as a string.
@end deffn
@deffn Instruction load-string length
-Load a string from the instruction stream.
+Load a string from the instruction stream. The string is assumed to be
+encoded in the ``latin1'' locale.
+@end deffn
+@deffn Instruction load-wide-string length
+Load a UTF-32 string from the instruction stream. @var{length} is the
+length in bytes, not in codepoints
@end deffn
@deffn Instruction load-symbol length
-Load a symbol from the instruction stream.
+Load a symbol from the instruction stream. The symbol is assumed to be
+encoded in the ``latin1'' locale. Symbols backed by wide strings may
+be loaded via @code{load-wide-string} then @code{make-symbol}.
@end deffn
-@deffn Instruction load-keyword length
-Load a keyword from the instruction stream.
-@end deffn
-
-@deffn Instruction define length
-Load a symbol from the instruction stream, and look up its binding in
-the current toplevel environment, creating the binding if necessary.
-Push the variable corresponding to the binding.
+@deffn Instruction load-array length
+Load a uniform array from the instruction stream. The shape and type
+of the array are popped off the stack, in that order.
@end deffn

@deffn Instruction load-program
@ -579,23 +641,9 @@ because instead of parsing its data, it directly maps the instruction
 stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
 and Objcode}, for more information.

-The resulting compiled procedure will not have any ``external''
-variables captured, so it may be loaded only once but used many times
-to create closures.
-@end deffn
-
-Finally, while this instruction is not strictly a ``loading''
-instruction, it's useful to wind up the @code{load-program} discussion
-here:
-
-@deffn Instruction make-closure
-Pop the program object from the stack, capture the current set of
-``external'' variables, and assign those external variables to a copy
-of the program. Push the new program object, which shares state with
-the original program.
-
-At the time of this writing, the space overhead of closures is 4 words
-per closure.
+The resulting compiled procedure will not have any free variables
+captured, so it may be loaded only once but used many times to create
+closures.
@end deffn

@node Procedural Instructions
@ -764,6 +812,19 @@ Push @code{'()} onto the stack.
 Push @var{value}, an 8-bit character, onto the stack.
@end deffn

+@deffn Instruction make-char32 value
+Push @var{value}, an 32-bit character, onto the stack. The value is
+encoded in big-endian order.
+@end deffn
+
+@deffn Instruction make-symbol
+Pops a string off the stack, and pushes a symbol.
+@end deffn
+
+@deffn Instruction make-keyword value
+Pops a symbol off the stack, and pushes a keyword.
+@end deffn
+
@deffn Instruction list n
 Pops off the top @var{n} values off of the stack, consing them up into
 a list, then pushes that list on the stack. What was the topmost value
@ -807,7 +868,8 @@ pushes its elements on the stack.
@subsubsection Miscellaneous Instructions

@deffn Instruction nop
-Does nothing!
+Does nothing! Used for padding other instructions to certain
+alignments.
@end deffn

@deffn Instruction halt
@ -873,6 +935,8 @@ stream.
@deffnx Instruction cons x y
@deffnx Instruction car x
@deffnx Instruction cdr x
+@deffnx Instruction vector-ref x y
+@deffnx Instruction vector-set x n y
 Inlined implementations of their Scheme equivalents.
@end deffn

@ -893,7 +957,9 @@ As in the previous section, the definitions below show stack
 parameters instead of instruction stream parameters.

@deffn Instruction add x y
+@deffnx Instruction add1 x
@deffnx Instruction sub x y
+@deffnx Instruction sub1 x
@deffnx Instruction mul x y
@deffnx Instruction div x y
@deffnx Instruction quo x y
@ -906,3 +972,58 @@ parameters instead of instruction stream parameters.
@deffnx Instruction ge? x y
 Inlined implementations of the corresponding mathematical operations.
@end deffn
+
+@node Inlined Bytevector Instructions
+@subsubsection Inlined Bytevector Instructions
+
+Bytevector operations correspond closely to what the current hardware
+can do, so it makes sense to inline them to VM instructions, providing
+a clear path for eventual native compilation. Without this, Scheme
+programs would need other primitives for accessing raw bytes -- but
+these primitives are as good as any.
+
+As in the previous section, the definitions below show stack
+parameters instead of instruction stream parameters.
+
+The multibyte formats (@code{u16}, @code{f64}, etc) take an extra
+endianness argument. Only aligned native accesses are currently
+fast-pathed in Guile's VM.
+
+@deffn Instruction bv-u8-ref bv n
+@deffnx Instruction bv-s8-ref bv n
+@deffnx Instruction bv-u16-native-ref bv n
+@deffnx Instruction bv-s16-native-ref bv n
+@deffnx Instruction bv-u32-native-ref bv n
+@deffnx Instruction bv-s32-native-ref bv n
+@deffnx Instruction bv-u64-native-ref bv n
+@deffnx Instruction bv-s64-native-ref bv n
+@deffnx Instruction bv-f32-native-ref bv n
+@deffnx Instruction bv-f64-native-ref bv n
+@deffnx Instruction bv-u16-ref bv n endianness
+@deffnx Instruction bv-s16-ref bv n endianness
+@deffnx Instruction bv-u32-ref bv n endianness
+@deffnx Instruction bv-s32-ref bv n endianness
+@deffnx Instruction bv-u64-ref bv n endianness
+@deffnx Instruction bv-s64-ref bv n endianness
+@deffnx Instruction bv-f32-ref bv n endianness
+@deffnx Instruction bv-f64-ref bv n endianness
+@deffnx Instruction bv-u8-set bv n val
+@deffnx Instruction bv-s8-set bv n val
+@deffnx Instruction bv-u16-native-set bv n val
+@deffnx Instruction bv-s16-native-set bv n val
+@deffnx Instruction bv-u32-native-set bv n val
+@deffnx Instruction bv-s32-native-set bv n val
+@deffnx Instruction bv-u64-native-set bv n val
+@deffnx Instruction bv-s64-native-set bv n val
+@deffnx Instruction bv-f32-native-set bv n val
+@deffnx Instruction bv-f64-native-set bv n val
+@deffnx Instruction bv-u16-set bv n val endianness
+@deffnx Instruction bv-s16-set bv n val endianness
+@deffnx Instruction bv-u32-set bv n val endianness
+@deffnx Instruction bv-s32-set bv n val endianness
+@deffnx Instruction bv-u64-set bv n val endianness
+@deffnx Instruction bv-s64-set bv n val endianness
+@deffnx Instruction bv-f32-set bv n val endianness
+@deffnx Instruction bv-f64-set bv n val endianness
+Inlined implementations of the corresponding bytevector operations.
+@end deffn
--- a/module/language/assembly/disassemble.scm
+++ b/module/language/assembly/disassemble.scm
@ -60,6 +60,8 @@
                  (print-info pos `(load-program ,sym) #f #f)
                  (lp (+ pos (byte-length asm)) (cdr code)
                      (acons sym asm programs))))
+               ((nop)
+                (lp (+ pos (byte-length asm)) (cdr code) programs))
               (else
                (print-info pos asm
                            (code-annotation end asm objs nargs blocs