1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-04-30 03:40:34 +02:00

update docs for recent vm/compiler work

* doc/ref/compiler.texi:
* doc/ref/vm.texi: Update for recent changes.
* module/language/assembly/disassemble.scm (disassemble-load-program):
  Don't print nops, they are distracting.
This commit is contained in:
Andy Wingo 2009-08-12 23:38:05 +02:00
parent aaae0d5ab3
commit 98850fd727
3 changed files with 313 additions and 165 deletions

View file

@ -17,7 +17,7 @@ This section aims to pay attention to the small man behind the
curtain.
@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
know how to compile your .scm file.
know how to compile your @code{.scm} file.
@menu
* Compiler Tower::
@ -67,8 +67,7 @@ for Scheme:
#:title "Guile Scheme"
#:version "0.5"
#:reader read
#:compilers `((tree-il . ,compile-tree-il)
(ghil . ,compile-ghil))
#:compilers `((tree-il . ,compile-tree-il))
#:decompilers `((tree-il . ,decompile-tree-il))
#:evaluator (lambda (x module) (primitive-eval x))
#:printer write)
@ -220,13 +219,13 @@ Note however that @code{sc-expand} does not have the same signature as
around @code{sc-expand}, to make it conform to the general form of
compiler procedures in Guile's language tower.
Compiler procedures take two arguments, an expression and an
environment. They return three values: the compiled expression, the
corresponding environment for the target language, and a
``continuation environment''. The compiled expression and environment
will serve as input to the next language's compiler. The
``continuation environment'' can be used to compile another expression
from the same source language within the same module.
Compiler procedures take three arguments: an expression, an
environment, and a keyword list of options. They return three values:
the compiled expression, the corresponding environment for the target
language, and a ``continuation environment''. The compiled expression
and environment will serve as input to the next language's compiler.
The ``continuation environment'' can be used to compile another
expression from the same source language within the same module.
For example, you might compile the expression, @code{(define-module
(foo))}. This will result in a Tree-IL expression and environment. But
@ -292,6 +291,14 @@ tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
The @code{src} fields are left out of the external representation.
One may create Tree-IL objects from their external representations via
calling @code{parse-tree-il}, the reader for Tree-IL. If any source
information is attached to the input S-expression, it will be
propagated to the resulting Tree-IL expressions. This is probably the
easiest way to compile to Tree-IL: just make the appropriate external
representations in S-expression format, and let @code{parse-tree-il}
take care of the rest.
@deftp {Scheme Variable} <void> src
@deftpx {External Representation} (void)
An empty expression. In practice, equivalent to Scheme's @code{(if #f
@ -384,12 +391,29 @@ A version of @code{<let>} that creates recursive bindings, like
Scheme's @code{letrec}.
@end deftp
@c FIXME -- need to revive this one
@c @deftp {Scheme Variable} <ghil-mv-bind> src vars rest producer . body
@c Like Scheme's @code{receive} -- binds the values returned by
@c applying @code{producer}, which should be a thunk, to the
@c @code{lambda}-like bindings described by @var{vars} and @var{rest}.
@c @end deftp
There are two Tree-IL constructs that are not normally produced by
higher-level compilers, but instead are generated during the
source-to-source optimization and analysis passes that the Tree-IL
compiler does. Users should not generate these expressions directly,
unless they feel very clever, as the default analysis pass will
generate them as necessary.
@deftp {Scheme Variable} <let-values> src names vars exp body
@deftpx {External Representation} (let-values @var{names} @var{vars} @var{exp} @var{body})
Like Scheme's @code{receive} -- binds the values returned by
evaluating @code{exp} to the @code{lambda}-like bindings described by
@var{vars}. That is to say, @var{vars} may be an improper list.
@code{<let-values>} is an optimization of @code{<application>} of the
primitive, @code{call-with-values}.
@end deftp
@deftp {Scheme Variable} <fix> src names vars vals body
@deftpx {External Representation} (fix @var{names} @var{vars} @var{vals} @var{body})
Like @code{<letrec>}, but only for @var{vals} that are unset
@code{lambda} expressions.
@code{fix} is an optimization of @code{letrec} (and @code{let}).
@end deftp
Tree-IL implements a compiler to GLIL that recursively traverses
Tree-IL expressions, writing out GLIL expressions into a linear list.
@ -399,9 +423,9 @@ future computations. This state allows the compiler not to emit code
for constant expressions that will not be used (e.g. docstrings), and
to perform tail calls when in tail position.
In the future, there will be a pass at the beginning of the
Tree-IL->GLIL compilation step to perform inlining, copy propagation,
dead code elimination, and constant folding.
Most optimization, such as it currently is, is performed on Tree-IL
expressions as source-to-source transformations. There will be more
optimizations added in the future.
Interested readers are encouraged to read the implementation in
@code{(language tree-il compile-glil)} for more details.
@ -411,18 +435,16 @@ Interested readers are encouraged to read the implementation in
Guile Low Intermediate Language (GLIL) is a structured intermediate
language whose expressions more closely approximate Guile's VM
instruction set.
instruction set. Its expression types are defined in @code{(language
glil)}.
Its expression types are defined in @code{(language glil)}, and as
with GHIL, some of its fields parse as rest arguments.
@deftp {Scheme Variable} <glil-program> nargs nrest nlocs nexts meta . body
@deftp {Scheme Variable} <glil-program> nargs nrest nlocs meta . body
A unit of code that at run-time will correspond to a compiled
procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts}
collectively define the program's arity; see @ref{Compiled
Procedures}, for more information. @var{meta} should be an alist of
properties, as in Tree IL's @code{<lambda>}. @var{body} is a list of
GLIL expressions.
procedure. @var{nargs} @var{nrest} and @var{nlocs} collectively define
the program's arity; see @ref{Compiled Procedures}, for more
information. @var{meta} should be an alist of properties, as in
Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
expressions.
@end deftp
@deftp {Scheme Variable} <glil-bind> . vars
An advisory expression that notes a liveness extent for a set of
@ -461,23 +483,21 @@ and @code{filename} keys, e.g. as returned by
@code{source-properties}.
@end deftp
@deftp {Scheme Variable} <glil-void>
Pushes the unspecified value on the stack.
Pushes ``the unspecified value'' on the stack.
@end deftp
@deftp {Scheme Variable} <glil-const> obj
Pushes a constant value onto the stack. @var{obj} must be a number,
string, symbol, keyword, boolean, character, the empty list, or a pair
or vector of constants.
string, symbol, keyword, boolean, character, uniform array, the empty
list, or a pair or vector of constants.
@end deftp
@deftp {Scheme Variable} <glil-local> op index
Accesses a lexically bound variable from the stack. If @var{op} is
@code{ref}, the value is pushed onto the stack; if it is @code{set},
the variable is set from the top value on the stack, which is popped
off. @xref{Stack Layout}, for more information.
@end deftp
@deftp {Scheme Variable} <glil-external> op depth index
Accesses a heap-allocated variable, addressed by @var{depth}, the nth
enclosing environment, and @var{index}, the variable's position within
the environment. @var{op} is @code{ref} or @code{set}.
@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
Accesses a lexically bound variable. If the variable is not
@var{local?} it is free. All variables may have @code{ref} and
@code{set} as their @var{op}. Boxed variables may also have the
@var{op}s @code{box}, @code{empty-box}, and @code{fix}, which
correspond in semantics to the VM instructions @code{box},
@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
more information.
@end deftp
@deftp {Scheme Variable} <glil-toplevel> op name
Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
@ -520,7 +540,7 @@ Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 on Guile 1.9.0
Copyright (C) 2001-2008 Free Software Foundation, Inc.
Enter `,help' for help.
glil@@(guile-user)> (program 0 0 0 0 () (const 3) (call return 0))
glil@@(guile-user)> (program 0 0 0 () (const 3) (call return 1))
@result{} 3
@end example
@ -542,12 +562,12 @@ differs from GLIL in four main ways:
@itemize
@item Labels have been resolved to byte offsets in the program.
@item Constants inside procedures have either been expressed as inline
instructions, and possibly cached in object arrays.
instructions or cached in object arrays.
@item Procedures with metadata (source location information, liveness
extents, procedure names, generic properties, etc) have had their
metadata serialized out to thunks.
@item All expressions correspond directly to VM instructions -- i.e.,
there is no @code{<glil-local>} which can be a ref or a set.
there is no @code{<glil-lexical>} which can be a ref or a set.
@end itemize
Assembly is isomorphic to the bytecode that it compiles to. You can
@ -567,10 +587,11 @@ example:
@example
scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
(load-program 0 0 0 0
(load-program 0 0 0
() ; Labels
60 ; Length
70 ; Length
#f ; Metadata
(make-false)
(make-false) ; object table for the returned lambda
(nop)
(nop) ; Alignment. Since assembly has already resolved its labels
@ -578,11 +599,12 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
(nop) ; object code is mmap'd directly to structures, assembly
(nop) ; has to have the alignment embedded in it.
(nop)
(load-program 1 0 0 0
(load-program
1
0
()
6
; This is the metadata thunk for the returned procedure.
(load-program 0 0 0 0 () 21 #f
8
(load-program 0 0 0 () 21 #f
(load-symbol "x") ; Name and liveness extent for @code{x}.
(make-false)
(make-int8:0) ; Some instruction+arg combinations
@ -597,7 +619,9 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
(local-ref 0)
(local-ref 0)
(add)
(return))
(return)
(nop)
(nop))
; Return our new procedure.
(return))
@end example
@ -618,10 +642,10 @@ the next step down from assembly:
@example
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
@result{} (load-program 0 0 0 0 () 6 #f
@result{} (load-program 0 0 0 () 6 #f
(make-int8 32) (make-int8 10) (add) (return))
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 10 32 10 10 100 48)
@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 10 32 10 10 120 52)
@end example
``Objcode'' is bytecode, but mapped directly to a C structure,
@ -631,8 +655,7 @@ scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
struct scm_objcode @{
scm_t_uint8 nargs;
scm_t_uint8 nrest;
scm_t_uint8 nlocs;
scm_t_uint8 nexts;
scm_t_uint16 nlocs;
scm_t_uint32 len;
scm_t_uint32 metalen;
scm_t_uint8 base[0];
@ -642,7 +665,7 @@ struct scm_objcode @{
As one might imagine, objcode imposes a minimum length on the
bytecode. Also, the multibyte fields are in native endianness, which
makes objcode (and bytecode) system-dependent. Indeed, in the short
example above, all but the last 5 bytes were the program's header.
example above, all but the last 6 bytes were the program's header.
Objcode also has a couple of important efficiency hacks. First,
objcode may be mapped directly from disk, allowing compiled code to be
@ -672,7 +695,7 @@ Makes a bytecode object from @var{bytecode}, which should be a
Load object code from a file named @var{file}. The file will be mapped
into memory via @code{mmap}, so this is a very fast operation.
On disk, object code has an eight-byte cookie prepended to it, to
On disk, object code has an sixteen-byte cookie prepended to it, to
prevent accidental loading of arbitrary garbage.
@end deffn
@ -689,11 +712,11 @@ Copy object code out to a @code{u8vector} for analysis by Scheme.
The following procedure is actually in @code{(system vm program)}, but
we'll mention it here:
@deffn {Scheme Variable} make-program objcode objtable [external='()]
@deffnx {C Function} scm_make_program (objcode, objtable, external)
@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
Load up object code into a Scheme program. The resulting program will
have @var{objtable} as its object table, which should be a vector or
@code{#f}, and will capture the closure variables from @var{external}.
@code{#f}, and will capture the free variables from @var{free-vars}.
@end deffn
Object code from a file may be disassembled at the REPL via the
@ -707,9 +730,9 @@ respect to the compilation environment. Normally the environment
propagates through the compiler transparently, but users may specify
the compilation environment manually as well:
@deffn {Scheme Procedure} make-objcode-env module externals
@deffn {Scheme Procedure} make-objcode-env module free-vars
Make an object code environment. @var{module} should be a Scheme
module, and @var{externals} should be a list of external variables.
module, and @var{free-vars} should be a vector of free variables.
@code{#f} is also a valid object code environment.
@end deffn
@ -748,12 +771,14 @@ procedure is called a certain number of times.
The name of the game is a profiling-based harvest of the low-hanging
fruit, running programs of interest under a system-level profiler and
determining which improvements would give the most bang for the buck.
There are many well-known efficiency hacks in the literature: Dybvig's
letrec optimization, individual boxing of heap-allocated values (and
then store the boxes on the stack directly), optimized case-lambda
expressions, stack underflow and overflow handlers, etc. Highly
recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
It's really getting to the point though that native compilation is the
next step.
The compiler also needs help at the top end, enhancing the Scheme that
it knows to also understand R6RS, and adding new high-level compilers:
Emacs Lisp, Lua, JavaScript...
it knows to also understand R6RS, and adding new high-level compilers.
We have JavaScript and Emacs Lisp mostly complete, but they could use
some love; Lua would be nice as well, butq whatever language it is
that strikes your fancy would be welcome too.
Compilers are for hacking, not for admiring or for complaining about.
Get to it!

View file

@ -13,8 +13,8 @@ procedures can call each other as they please.
The difference is that the compiler creates and interprets bytecode
for a custom virtual machine, instead of interpreting the
S-expressions directly. Running compiled code is faster than running
interpreted code.
S-expressions directly. Loading and running compiled code is faster
than loading and running source code.
The virtual machine that does the bytecode interpretation is a part of
Guile itself. This section describes the nature of Guile's virtual
@ -134,7 +134,7 @@ compiled to object code, one might never leave the virtual machine.
@subsection Stack Layout
While not strictly necessary to understand how to work with the VM, it
is instructive and sometimes entertaining to consider the struture of
is instructive and sometimes entertaining to consider the structure of
the VM stack.
Logically speaking, a VM stack is composed of ``frames''. Each frame
@ -159,12 +159,11 @@ The structure of the fixed part of an application frame is as follows:
@example
Stack
| | <- fp + bp->nargs + bp->nlocs + 4
| | <- fp + bp->nargs + bp->nlocs + 3
+------------------+ = SCM_FRAME_UPPER_ADDRESS (fp)
| Return address |
| MV return address|
| Dynamic link |
| External link | <- fp + bp->nargs + bp->nlocs
| Dynamic link | <- fp + bp->nargs + bp->nlocs
| Local variable 1 | = SCM_FRAME_DATA_ADDRESS (fp)
| Local variable 0 | <- fp + bp->nargs
| Argument 1 |
@ -201,25 +200,17 @@ values being returned.
@item Dynamic link
This is the @code{fp} in effect before this program was applied. In
effect, this and the return address are the registers that are always
``saved''.
@item External link
This field is a reference to the list of heap-allocated variables
associated with this frame. For a discussion of heap versus stack
allocation, @xref{Variables and the VM}.
``saved''. The dynamic link links the current frame to the previous
frame; computing a stack trace involves traversing these frames.
@item Local variable @var{n}
Lambda-local variables that are allocated on the stack are all
allocated as part of the frame. This makes access to non-captured,
non-mutated variables very cheap.
Lambda-local variables that are all allocated as part of the frame.
This makes access to variables very cheap.
@item Argument @var{n}
The calling convention of the VM requires arguments of a function
application to be pushed on the stack, and here they are. Normally
references to arguments dispatch to these locations on the stack.
However if an argument has to be stored on the heap, it will be copied
from its initial value here onto a location in the heap, and
thereafter only referenced on the heap.
application to be pushed on the stack, and here they are. References
to arguments dispatch to these locations on the stack.
@item Program
This is the program being applied. For more information on how
@ -236,26 +227,44 @@ Consider the following Scheme code as an example:
(lambda (b) (list foo a b)))
@end example
Within the lambda expression, "foo" is a top-level variable, "a" is a
lexically captured variable, and "b" is a local variable.
Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a
lexically captured variable, and @code{b} is a local variable.
@code{b} may safely be allocated on the stack, as there is no enclosed
procedure that references it, nor is it ever mutated.
Another way to refer to @code{a} and @code{b} is to say that @code{a}
is a ``free'' variable, since it is not defined within the lambda, and
@code{b} is a ``bound'' variable. These are the terms used in the
@dfn{lambda calculus}, a mathematical notation for describing
functions. The lambda calculus is useful because it allows one to
prove statements about functions. It is especially good at describing
scope relations, and it is for that reason that we mention it here.
@code{a}, on the other hand, is referenced by an enclosed procedure,
that of the lambda. Thus it must be allocated on the heap, as it may
(and will) outlive the dynamic extent of the invocation of @code{foo}.
Guile allocates all variables on the stack. When a lexically enclosed
procedure with free variables---a @dfn{closure}---is created, it
copies those variables its free variable vector. References to free
variables are then redirected through the free variable vector.
@code{foo} is a top-level variable, because it names the procedure
@code{foo}, which is here defined at the top-level.
If a variable is ever @code{set!}, however, it will need to be
heap-allocated instead of stack-allocated, so that different closures
that capture the same variable can see the same value. Also, this
allows continuations to capture a reference to the variable, instead
of to its value at one point in time. For these reasons, @code{set!}
variables are allocated in ``boxes''---actually, in variable cells.
@xref{Variables}, for more information. References to @code{set!}
variables are indirected through the boxes.
Note that variables that are mutated (via @code{set!}) must be
allocated on the heap, even if they are local variables. This is
because any called subprocedure might capture the continuation, which
would need to capture locations instead of values. Thus perhaps
counterintuitively, what would seem ``closer to the metal'', viz
@code{set!}, actually forces heap allocation instead of stack
allocation.
Thus perhaps counterintuitively, what would seem ``closer to the
metal'', viz @code{set!}, actually forces an extra memory allocation
and indirection.
Going back to our example, @code{b} may be allocated on the stack, as
it is never mutated.
@code{a} may also be allocated on the stack, as it too is never
mutated. Within the enclosed lambda, its value will be copied into
(and referenced from) the free variables vector.
@code{foo} is a top-level variable, because @code{foo} is not
lexically bound in this example.
@node VM Programs
@subsection Compiled Procedures are VM Programs
@ -297,27 +306,26 @@ scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
scheme@@(guile-user)> ,x foo
Disassembly of #<program foo (a)>:
0 (local-ref 0) ;; `a' (arg)
2 (external-set 0) ;; `a' (arg)
4 (object-ref 1) ;; #<program b70d2910 at <unknown port>:0:16 (b)>
6 (make-closure)
7 (return)
0 (object-ref 1) ;; #<program b7e478b0 at <unknown port>:0:16 (b)>
2 (local-ref 0) ;; `a' (arg)
4 (vector 0 1) ;; 1 element
7 (make-closure)
8 (return)
----------------------------------------
Disassembly of #<program b70d2910 at <unknown port>:0:16 (b)>:
Disassembly of #<program b7e478b0 at <unknown port>:0:16 (b)>:
0 (toplevel-ref 1) ;; `foo'
2 (external-ref 0) ;; (closure variable)
2 (free-ref 0) ;; (closure variable)
4 (local-ref 0) ;; `b' (arg)
6 (list 0 3) ;; 3 elements at (unknown file):0:28
9 (return)
@end smallexample
At @code{ip} 0 and 2, we do the copy from argument to heap for
@code{a}. @code{Ip} 4 loads up the compiled lambda, and then at
@code{ip} 6 we make a closure---binding code (from the compiled
lambda) with data (the heap-allocated variables). Finally we return
the closure.
At @code{ip} 0, we load up the compiled lambda. @code{Ip} 2 and 4
create the free variables vector, and @code{ip} 7 makes the
closure---binding code (from the compiled lambda) with data (the
free-variable vector). Finally we return the closure.
The second stanza disassembles the compiled lambda. Toplevel variables
are resolved relative to the module that was current when the
@ -336,7 +344,7 @@ routine.
@node Instruction Set
@subsection Instruction Set
There are about 100 instructions in Guile's virtual machine. These
There are about 150 instructions in Guile's virtual machine. These
instructions represent atomic units of a program's execution. Ideally,
they perform one task without conditional branches, then dispatch to
the next instruction in the stream.
@ -376,16 +384,22 @@ instructions. More instructions may be added over time.
* Miscellaneous Instructions::
* Inlined Scheme Instructions::
* Inlined Mathematical Instructions::
* Inlined Bytevector Instructions::
@end menu
@node Environment Control Instructions
@subsubsection Environment Control Instructions
These instructions access and mutate the environment of a compiled
procedure---the local bindings, the ``external'' bindings, and the
procedure---the local bindings, the free (captured) bindings, and the
toplevel bindings.
Some of these instructions have @code{long-} variants, the difference
being that they take 16-bit arguments, encoded in big-endianness,
instead of the normal 8-bit range.
@deffn Instruction local-ref index
@deffnx Instruction long-local-ref index
Push onto the stack the value of the local variable located at
@var{index} within the current stack frame.
@ -395,26 +409,62 @@ arguments.
@end deffn
@deffn Instruction local-set index
@deffnx Instruction long-local-ref index
Pop the Scheme object located on top of the stack and make it the new
value of the local variable located at @var{index} within the current
stack frame.
@end deffn
@deffn Instruction external-ref index
Push the value of the closure variable located at position
@var{index} within the program's list of external variables.
@deffn Instruction free-ref index
Push the value of the captured variable located at position
@var{index} within the program's vector of captured variables.
@end deffn
@deffn Instruction external-set index
Pop the Scheme object located on top of the stack and make it the new
value of the closure variable located at @var{index} within the
program's list of external variables.
@deffn Instruction free-boxed-ref index
@deffnx Instruction free-boxed-set index
Get or set a boxed free variable. Note that there is no free-set
instruction, as variables that are @code{set!} must be boxed.
These instructions assume that the value at position @var{index} in
the free variables vector is a variable.
@end deffn
The external variable lookup algorithm should probably be made more
efficient in the future via addressing by frame and index. Currently,
external variables are all consed onto a list, which results in O(N)
lookup time.
@deffn Instruction make-closure
Pop a vector and a program object off the stack, in that order, and
push a new program object with the given free variables vector. The
new program object shares state with the original program.
At the time of this writing, the space overhead of closures is 4 words
per closure.
@end deffn
@deffn Instruction fix-closure index
Pop a vector off the stack, and set it as the @var{index}th local
variable's free variable vector. The @var{index}th local variable is
assumed to be a procedure.
This instruction is part of a hack for allocating mutually recursive
procedures. The hack is to first perform a @code{local-set} for all of
the recursive procedures, then fix up the procedures' free variable
bindings in place. This allows most @code{letrec}-bound procedures to
be allocated unboxed on the stack.
One could of course do a @code{local-ref}, then @code{make-closure},
then @code{local-set}, but this macroinstruction helps to speed up the
common case.
@end deffn
@deffn Instruction box index
Pop a value off the stack, and set the @var{index}nth local variable
to a box containing that value. A shortcut for @code{make-variable}
then @code{local-set}, used when binding boxed variables.
@end deffn
@deffn Instruction empty-box index
Set the @var{indext}h local variable to a box containing a variable
whose value is unbound. Used when compiling some @code{letrec}
expressions.
@end deffn
@deffn Instruction toplevel-ref index
@deffnx Instruction long-toplevel-ref index
@ -442,9 +492,6 @@ in-place mutation of the object table. This mechanism provides for
lazy variable resolution, and an important cached fast-path once the
variable has been successfully resolved.
The ``long'' variant has a 16-bit index instead of an 8-bit index,
with the most significant byte first.
This instruction pushes the value of the variable onto the stack.
@end deffn
@ -453,8 +500,13 @@ This instruction pushes the value of the variable onto the stack.
Pop a value off the stack, and set it as the value of the toplevel
variable stored at @var{index} in the object table. If the variable
has not yet been looked up, we do the lookup as in
@code{toplevel-ref}. The ``long'' variant has a 16-bit index instead
of an 8-bit index.
@code{toplevel-ref}.
@end deffn
@deffn Instruction define
Pop a symbol and a value from the stack, in that order. Look up its
binding in the current toplevel environment, creating the binding if
necessary. Set the variable to the value.
@end deffn
@deffn Instruction link-now
@ -476,6 +528,11 @@ Pop off two objects from the stack, a variable and a value, and set
the variable to the value.
@end deffn
@deffn Instruction make-variable
Replace the top object on the stack with a variable containing it.
Used in some circumstances when compiling @code{letrec} expressions.
@end deffn
@deffn Instruction object-ref n
@deffnx Instruction long-object-ref n
Push @var{n}th value from the current program's object vector. The
@ -499,7 +556,10 @@ the one to which the instruction pointer points).
@end itemize
Note that the offset passed to the instruction is encoded on two 8-bit
integers which are then combined by the VM as one 16-bit integer.
integers which are then combined by the VM as one 16-bit integer. Note
also that jump targets in Guile are aligned on 8-byte boundaries, and
that the offset refers to the @var{n}th 8-byte boundary, effectively
giving Guile a 19-bit relative address space.
@deffn Instruction br offset
Jump to @var{offset}.
@ -550,19 +610,21 @@ Load an arbitrary number from the instruction stream. The number is
embedded in the stream as a string.
@end deffn
@deffn Instruction load-string length
Load a string from the instruction stream.
Load a string from the instruction stream. The string is assumed to be
encoded in the ``latin1'' locale.
@end deffn
@deffn Instruction load-wide-string length
Load a UTF-32 string from the instruction stream. @var{length} is the
length in bytes, not in codepoints
@end deffn
@deffn Instruction load-symbol length
Load a symbol from the instruction stream.
Load a symbol from the instruction stream. The symbol is assumed to be
encoded in the ``latin1'' locale. Symbols backed by wide strings may
be loaded via @code{load-wide-string} then @code{make-symbol}.
@end deffn
@deffn Instruction load-keyword length
Load a keyword from the instruction stream.
@end deffn
@deffn Instruction define length
Load a symbol from the instruction stream, and look up its binding in
the current toplevel environment, creating the binding if necessary.
Push the variable corresponding to the binding.
@deffn Instruction load-array length
Load a uniform array from the instruction stream. The shape and type
of the array are popped off the stack, in that order.
@end deffn
@deffn Instruction load-program
@ -579,23 +641,9 @@ because instead of parsing its data, it directly maps the instruction
stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
and Objcode}, for more information.
The resulting compiled procedure will not have any ``external''
variables captured, so it may be loaded only once but used many times
to create closures.
@end deffn
Finally, while this instruction is not strictly a ``loading''
instruction, it's useful to wind up the @code{load-program} discussion
here:
@deffn Instruction make-closure
Pop the program object from the stack, capture the current set of
``external'' variables, and assign those external variables to a copy
of the program. Push the new program object, which shares state with
the original program.
At the time of this writing, the space overhead of closures is 4 words
per closure.
The resulting compiled procedure will not have any free variables
captured, so it may be loaded only once but used many times to create
closures.
@end deffn
@node Procedural Instructions
@ -764,6 +812,19 @@ Push @code{'()} onto the stack.
Push @var{value}, an 8-bit character, onto the stack.
@end deffn
@deffn Instruction make-char32 value
Push @var{value}, an 32-bit character, onto the stack. The value is
encoded in big-endian order.
@end deffn
@deffn Instruction make-symbol
Pops a string off the stack, and pushes a symbol.
@end deffn
@deffn Instruction make-keyword value
Pops a symbol off the stack, and pushes a keyword.
@end deffn
@deffn Instruction list n
Pops off the top @var{n} values off of the stack, consing them up into
a list, then pushes that list on the stack. What was the topmost value
@ -807,7 +868,8 @@ pushes its elements on the stack.
@subsubsection Miscellaneous Instructions
@deffn Instruction nop
Does nothing!
Does nothing! Used for padding other instructions to certain
alignments.
@end deffn
@deffn Instruction halt
@ -873,6 +935,8 @@ stream.
@deffnx Instruction cons x y
@deffnx Instruction car x
@deffnx Instruction cdr x
@deffnx Instruction vector-ref x y
@deffnx Instruction vector-set x n y
Inlined implementations of their Scheme equivalents.
@end deffn
@ -893,7 +957,9 @@ As in the previous section, the definitions below show stack
parameters instead of instruction stream parameters.
@deffn Instruction add x y
@deffnx Instruction add1 x
@deffnx Instruction sub x y
@deffnx Instruction sub1 x
@deffnx Instruction mul x y
@deffnx Instruction div x y
@deffnx Instruction quo x y
@ -906,3 +972,58 @@ parameters instead of instruction stream parameters.
@deffnx Instruction ge? x y
Inlined implementations of the corresponding mathematical operations.
@end deffn
@node Inlined Bytevector Instructions
@subsubsection Inlined Bytevector Instructions
Bytevector operations correspond closely to what the current hardware
can do, so it makes sense to inline them to VM instructions, providing
a clear path for eventual native compilation. Without this, Scheme
programs would need other primitives for accessing raw bytes -- but
these primitives are as good as any.
As in the previous section, the definitions below show stack
parameters instead of instruction stream parameters.
The multibyte formats (@code{u16}, @code{f64}, etc) take an extra
endianness argument. Only aligned native accesses are currently
fast-pathed in Guile's VM.
@deffn Instruction bv-u8-ref bv n
@deffnx Instruction bv-s8-ref bv n
@deffnx Instruction bv-u16-native-ref bv n
@deffnx Instruction bv-s16-native-ref bv n
@deffnx Instruction bv-u32-native-ref bv n
@deffnx Instruction bv-s32-native-ref bv n
@deffnx Instruction bv-u64-native-ref bv n
@deffnx Instruction bv-s64-native-ref bv n
@deffnx Instruction bv-f32-native-ref bv n
@deffnx Instruction bv-f64-native-ref bv n
@deffnx Instruction bv-u16-ref bv n endianness
@deffnx Instruction bv-s16-ref bv n endianness
@deffnx Instruction bv-u32-ref bv n endianness
@deffnx Instruction bv-s32-ref bv n endianness
@deffnx Instruction bv-u64-ref bv n endianness
@deffnx Instruction bv-s64-ref bv n endianness
@deffnx Instruction bv-f32-ref bv n endianness
@deffnx Instruction bv-f64-ref bv n endianness
@deffnx Instruction bv-u8-set bv n val
@deffnx Instruction bv-s8-set bv n val
@deffnx Instruction bv-u16-native-set bv n val
@deffnx Instruction bv-s16-native-set bv n val
@deffnx Instruction bv-u32-native-set bv n val
@deffnx Instruction bv-s32-native-set bv n val
@deffnx Instruction bv-u64-native-set bv n val
@deffnx Instruction bv-s64-native-set bv n val
@deffnx Instruction bv-f32-native-set bv n val
@deffnx Instruction bv-f64-native-set bv n val
@deffnx Instruction bv-u16-set bv n val endianness
@deffnx Instruction bv-s16-set bv n val endianness
@deffnx Instruction bv-u32-set bv n val endianness
@deffnx Instruction bv-s32-set bv n val endianness
@deffnx Instruction bv-u64-set bv n val endianness
@deffnx Instruction bv-s64-set bv n val endianness
@deffnx Instruction bv-f32-set bv n val endianness
@deffnx Instruction bv-f64-set bv n val endianness
Inlined implementations of the corresponding bytevector operations.
@end deffn

View file

@ -60,6 +60,8 @@
(print-info pos `(load-program ,sym) #f #f)
(lp (+ pos (byte-length asm)) (cdr code)
(acons sym asm programs))))
((nop)
(lp (+ pos (byte-length asm)) (cdr code) programs))
(else
(print-info pos asm
(code-annotation end asm objs nargs blocs