1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-04-30 11:50:28 +02:00

update docs for recent vm/compiler work

* doc/ref/compiler.texi:
* doc/ref/vm.texi: Update for recent changes.
* module/language/assembly/disassemble.scm (disassemble-load-program):
  Don't print nops, they are distracting.
This commit is contained in:
Andy Wingo 2009-08-12 23:38:05 +02:00
parent aaae0d5ab3
commit 98850fd727
3 changed files with 313 additions and 165 deletions

View file

@ -17,7 +17,7 @@ This section aims to pay attention to the small man behind the
curtain. curtain.
@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
know how to compile your .scm file. know how to compile your @code{.scm} file.
@menu @menu
* Compiler Tower:: * Compiler Tower::
@ -67,8 +67,7 @@ for Scheme:
#:title "Guile Scheme" #:title "Guile Scheme"
#:version "0.5" #:version "0.5"
#:reader read #:reader read
#:compilers `((tree-il . ,compile-tree-il) #:compilers `((tree-il . ,compile-tree-il))
(ghil . ,compile-ghil))
#:decompilers `((tree-il . ,decompile-tree-il)) #:decompilers `((tree-il . ,decompile-tree-il))
#:evaluator (lambda (x module) (primitive-eval x)) #:evaluator (lambda (x module) (primitive-eval x))
#:printer write) #:printer write)
@ -220,13 +219,13 @@ Note however that @code{sc-expand} does not have the same signature as
around @code{sc-expand}, to make it conform to the general form of around @code{sc-expand}, to make it conform to the general form of
compiler procedures in Guile's language tower. compiler procedures in Guile's language tower.
Compiler procedures take two arguments, an expression and an Compiler procedures take three arguments: an expression, an
environment. They return three values: the compiled expression, the environment, and a keyword list of options. They return three values:
corresponding environment for the target language, and a the compiled expression, the corresponding environment for the target
``continuation environment''. The compiled expression and environment language, and a ``continuation environment''. The compiled expression
will serve as input to the next language's compiler. The and environment will serve as input to the next language's compiler.
``continuation environment'' can be used to compile another expression The ``continuation environment'' can be used to compile another
from the same source language within the same module. expression from the same source language within the same module.
For example, you might compile the expression, @code{(define-module For example, you might compile the expression, @code{(define-module
(foo))}. This will result in a Tree-IL expression and environment. But (foo))}. This will result in a Tree-IL expression and environment. But
@ -292,6 +291,14 @@ tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
The @code{src} fields are left out of the external representation. The @code{src} fields are left out of the external representation.
One may create Tree-IL objects from their external representations via
calling @code{parse-tree-il}, the reader for Tree-IL. If any source
information is attached to the input S-expression, it will be
propagated to the resulting Tree-IL expressions. This is probably the
easiest way to compile to Tree-IL: just make the appropriate external
representations in S-expression format, and let @code{parse-tree-il}
take care of the rest.
@deftp {Scheme Variable} <void> src @deftp {Scheme Variable} <void> src
@deftpx {External Representation} (void) @deftpx {External Representation} (void)
An empty expression. In practice, equivalent to Scheme's @code{(if #f An empty expression. In practice, equivalent to Scheme's @code{(if #f
@ -384,12 +391,29 @@ A version of @code{<let>} that creates recursive bindings, like
Scheme's @code{letrec}. Scheme's @code{letrec}.
@end deftp @end deftp
@c FIXME -- need to revive this one There are two Tree-IL constructs that are not normally produced by
@c @deftp {Scheme Variable} <ghil-mv-bind> src vars rest producer . body higher-level compilers, but instead are generated during the
@c Like Scheme's @code{receive} -- binds the values returned by source-to-source optimization and analysis passes that the Tree-IL
@c applying @code{producer}, which should be a thunk, to the compiler does. Users should not generate these expressions directly,
@c @code{lambda}-like bindings described by @var{vars} and @var{rest}. unless they feel very clever, as the default analysis pass will
@c @end deftp generate them as necessary.
@deftp {Scheme Variable} <let-values> src names vars exp body
@deftpx {External Representation} (let-values @var{names} @var{vars} @var{exp} @var{body})
Like Scheme's @code{receive} -- binds the values returned by
evaluating @code{exp} to the @code{lambda}-like bindings described by
@var{vars}. That is to say, @var{vars} may be an improper list.
@code{<let-values>} is an optimization of @code{<application>} of the
primitive, @code{call-with-values}.
@end deftp
@deftp {Scheme Variable} <fix> src names vars vals body
@deftpx {External Representation} (fix @var{names} @var{vars} @var{vals} @var{body})
Like @code{<letrec>}, but only for @var{vals} that are unset
@code{lambda} expressions.
@code{fix} is an optimization of @code{letrec} (and @code{let}).
@end deftp
Tree-IL implements a compiler to GLIL that recursively traverses Tree-IL implements a compiler to GLIL that recursively traverses
Tree-IL expressions, writing out GLIL expressions into a linear list. Tree-IL expressions, writing out GLIL expressions into a linear list.
@ -399,9 +423,9 @@ future computations. This state allows the compiler not to emit code
for constant expressions that will not be used (e.g. docstrings), and for constant expressions that will not be used (e.g. docstrings), and
to perform tail calls when in tail position. to perform tail calls when in tail position.
In the future, there will be a pass at the beginning of the Most optimization, such as it currently is, is performed on Tree-IL
Tree-IL->GLIL compilation step to perform inlining, copy propagation, expressions as source-to-source transformations. There will be more
dead code elimination, and constant folding. optimizations added in the future.
Interested readers are encouraged to read the implementation in Interested readers are encouraged to read the implementation in
@code{(language tree-il compile-glil)} for more details. @code{(language tree-il compile-glil)} for more details.
@ -411,18 +435,16 @@ Interested readers are encouraged to read the implementation in
Guile Low Intermediate Language (GLIL) is a structured intermediate Guile Low Intermediate Language (GLIL) is a structured intermediate
language whose expressions more closely approximate Guile's VM language whose expressions more closely approximate Guile's VM
instruction set. instruction set. Its expression types are defined in @code{(language
glil)}.
Its expression types are defined in @code{(language glil)}, and as @deftp {Scheme Variable} <glil-program> nargs nrest nlocs meta . body
with GHIL, some of its fields parse as rest arguments.
@deftp {Scheme Variable} <glil-program> nargs nrest nlocs nexts meta . body
A unit of code that at run-time will correspond to a compiled A unit of code that at run-time will correspond to a compiled
procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts} procedure. @var{nargs} @var{nrest} and @var{nlocs} collectively define
collectively define the program's arity; see @ref{Compiled the program's arity; see @ref{Compiled Procedures}, for more
Procedures}, for more information. @var{meta} should be an alist of information. @var{meta} should be an alist of properties, as in
properties, as in Tree IL's @code{<lambda>}. @var{body} is a list of Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
GLIL expressions. expressions.
@end deftp @end deftp
@deftp {Scheme Variable} <glil-bind> . vars @deftp {Scheme Variable} <glil-bind> . vars
An advisory expression that notes a liveness extent for a set of An advisory expression that notes a liveness extent for a set of
@ -461,23 +483,21 @@ and @code{filename} keys, e.g. as returned by
@code{source-properties}. @code{source-properties}.
@end deftp @end deftp
@deftp {Scheme Variable} <glil-void> @deftp {Scheme Variable} <glil-void>
Pushes the unspecified value on the stack. Pushes ``the unspecified value'' on the stack.
@end deftp @end deftp
@deftp {Scheme Variable} <glil-const> obj @deftp {Scheme Variable} <glil-const> obj
Pushes a constant value onto the stack. @var{obj} must be a number, Pushes a constant value onto the stack. @var{obj} must be a number,
string, symbol, keyword, boolean, character, the empty list, or a pair string, symbol, keyword, boolean, character, uniform array, the empty
or vector of constants. list, or a pair or vector of constants.
@end deftp @end deftp
@deftp {Scheme Variable} <glil-local> op index @deftp {Scheme Variable} <glil-lexical> local? boxed? op index
Accesses a lexically bound variable from the stack. If @var{op} is Accesses a lexically bound variable. If the variable is not
@code{ref}, the value is pushed onto the stack; if it is @code{set}, @var{local?} it is free. All variables may have @code{ref} and
the variable is set from the top value on the stack, which is popped @code{set} as their @var{op}. Boxed variables may also have the
off. @xref{Stack Layout}, for more information. @var{op}s @code{box}, @code{empty-box}, and @code{fix}, which
@end deftp correspond in semantics to the VM instructions @code{box},
@deftp {Scheme Variable} <glil-external> op depth index @code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
Accesses a heap-allocated variable, addressed by @var{depth}, the nth more information.
enclosing environment, and @var{index}, the variable's position within
the environment. @var{op} is @code{ref} or @code{set}.
@end deftp @end deftp
@deftp {Scheme Variable} <glil-toplevel> op name @deftp {Scheme Variable} <glil-toplevel> op name
Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set}, Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
@ -520,7 +540,7 @@ Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 on Guile 1.9.0
Copyright (C) 2001-2008 Free Software Foundation, Inc. Copyright (C) 2001-2008 Free Software Foundation, Inc.
Enter `,help' for help. Enter `,help' for help.
glil@@(guile-user)> (program 0 0 0 0 () (const 3) (call return 0)) glil@@(guile-user)> (program 0 0 0 () (const 3) (call return 1))
@result{} 3 @result{} 3
@end example @end example
@ -542,12 +562,12 @@ differs from GLIL in four main ways:
@itemize @itemize
@item Labels have been resolved to byte offsets in the program. @item Labels have been resolved to byte offsets in the program.
@item Constants inside procedures have either been expressed as inline @item Constants inside procedures have either been expressed as inline
instructions, and possibly cached in object arrays. instructions or cached in object arrays.
@item Procedures with metadata (source location information, liveness @item Procedures with metadata (source location information, liveness
extents, procedure names, generic properties, etc) have had their extents, procedure names, generic properties, etc) have had their
metadata serialized out to thunks. metadata serialized out to thunks.
@item All expressions correspond directly to VM instructions -- i.e., @item All expressions correspond directly to VM instructions -- i.e.,
there is no @code{<glil-local>} which can be a ref or a set. there is no @code{<glil-lexical>} which can be a ref or a set.
@end itemize @end itemize
Assembly is isomorphic to the bytecode that it compiles to. You can Assembly is isomorphic to the bytecode that it compiles to. You can
@ -567,10 +587,11 @@ example:
@example @example
scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly) scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
(load-program 0 0 0 0 (load-program 0 0 0
() ; Labels () ; Labels
60 ; Length 70 ; Length
#f ; Metadata #f ; Metadata
(make-false)
(make-false) ; object table for the returned lambda (make-false) ; object table for the returned lambda
(nop) (nop)
(nop) ; Alignment. Since assembly has already resolved its labels (nop) ; Alignment. Since assembly has already resolved its labels
@ -578,11 +599,12 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
(nop) ; object code is mmap'd directly to structures, assembly (nop) ; object code is mmap'd directly to structures, assembly
(nop) ; has to have the alignment embedded in it. (nop) ; has to have the alignment embedded in it.
(nop) (nop)
(load-program 1 0 0 0 (load-program
1
0
() ()
6 8
; This is the metadata thunk for the returned procedure. (load-program 0 0 0 () 21 #f
(load-program 0 0 0 0 () 21 #f
(load-symbol "x") ; Name and liveness extent for @code{x}. (load-symbol "x") ; Name and liveness extent for @code{x}.
(make-false) (make-false)
(make-int8:0) ; Some instruction+arg combinations (make-int8:0) ; Some instruction+arg combinations
@ -597,7 +619,9 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
(local-ref 0) (local-ref 0)
(local-ref 0) (local-ref 0)
(add) (add)
(return)) (return)
(nop)
(nop))
; Return our new procedure. ; Return our new procedure.
(return)) (return))
@end example @end example
@ -618,10 +642,10 @@ the next step down from assembly:
@example @example
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly) scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
@result{} (load-program 0 0 0 0 () 6 #f @result{} (load-program 0 0 0 () 6 #f
(make-int8 32) (make-int8 10) (add) (return)) (make-int8 32) (make-int8 10) (add) (return))
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode) scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 10 32 10 10 100 48) @result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 10 32 10 10 120 52)
@end example @end example
``Objcode'' is bytecode, but mapped directly to a C structure, ``Objcode'' is bytecode, but mapped directly to a C structure,
@ -631,8 +655,7 @@ scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
struct scm_objcode @{ struct scm_objcode @{
scm_t_uint8 nargs; scm_t_uint8 nargs;
scm_t_uint8 nrest; scm_t_uint8 nrest;
scm_t_uint8 nlocs; scm_t_uint16 nlocs;
scm_t_uint8 nexts;
scm_t_uint32 len; scm_t_uint32 len;
scm_t_uint32 metalen; scm_t_uint32 metalen;
scm_t_uint8 base[0]; scm_t_uint8 base[0];
@ -642,7 +665,7 @@ struct scm_objcode @{
As one might imagine, objcode imposes a minimum length on the As one might imagine, objcode imposes a minimum length on the
bytecode. Also, the multibyte fields are in native endianness, which bytecode. Also, the multibyte fields are in native endianness, which
makes objcode (and bytecode) system-dependent. Indeed, in the short makes objcode (and bytecode) system-dependent. Indeed, in the short
example above, all but the last 5 bytes were the program's header. example above, all but the last 6 bytes were the program's header.
Objcode also has a couple of important efficiency hacks. First, Objcode also has a couple of important efficiency hacks. First,
objcode may be mapped directly from disk, allowing compiled code to be objcode may be mapped directly from disk, allowing compiled code to be
@ -672,7 +695,7 @@ Makes a bytecode object from @var{bytecode}, which should be a
Load object code from a file named @var{file}. The file will be mapped Load object code from a file named @var{file}. The file will be mapped
into memory via @code{mmap}, so this is a very fast operation. into memory via @code{mmap}, so this is a very fast operation.
On disk, object code has an eight-byte cookie prepended to it, to On disk, object code has an sixteen-byte cookie prepended to it, to
prevent accidental loading of arbitrary garbage. prevent accidental loading of arbitrary garbage.
@end deffn @end deffn
@ -689,11 +712,11 @@ Copy object code out to a @code{u8vector} for analysis by Scheme.
The following procedure is actually in @code{(system vm program)}, but The following procedure is actually in @code{(system vm program)}, but
we'll mention it here: we'll mention it here:
@deffn {Scheme Variable} make-program objcode objtable [external='()] @deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
@deffnx {C Function} scm_make_program (objcode, objtable, external) @deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
Load up object code into a Scheme program. The resulting program will Load up object code into a Scheme program. The resulting program will
have @var{objtable} as its object table, which should be a vector or have @var{objtable} as its object table, which should be a vector or
@code{#f}, and will capture the closure variables from @var{external}. @code{#f}, and will capture the free variables from @var{free-vars}.
@end deffn @end deffn
Object code from a file may be disassembled at the REPL via the Object code from a file may be disassembled at the REPL via the
@ -707,9 +730,9 @@ respect to the compilation environment. Normally the environment
propagates through the compiler transparently, but users may specify propagates through the compiler transparently, but users may specify
the compilation environment manually as well: the compilation environment manually as well:
@deffn {Scheme Procedure} make-objcode-env module externals @deffn {Scheme Procedure} make-objcode-env module free-vars
Make an object code environment. @var{module} should be a Scheme Make an object code environment. @var{module} should be a Scheme
module, and @var{externals} should be a list of external variables. module, and @var{free-vars} should be a vector of free variables.
@code{#f} is also a valid object code environment. @code{#f} is also a valid object code environment.
@end deffn @end deffn
@ -748,12 +771,14 @@ procedure is called a certain number of times.
The name of the game is a profiling-based harvest of the low-hanging The name of the game is a profiling-based harvest of the low-hanging
fruit, running programs of interest under a system-level profiler and fruit, running programs of interest under a system-level profiler and
determining which improvements would give the most bang for the buck. determining which improvements would give the most bang for the buck.
There are many well-known efficiency hacks in the literature: Dybvig's It's really getting to the point though that native compilation is the
letrec optimization, individual boxing of heap-allocated values (and next step.
then store the boxes on the stack directly), optimized case-lambda
expressions, stack underflow and overflow handlers, etc. Highly
recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
The compiler also needs help at the top end, enhancing the Scheme that The compiler also needs help at the top end, enhancing the Scheme that
it knows to also understand R6RS, and adding new high-level compilers: it knows to also understand R6RS, and adding new high-level compilers.
Emacs Lisp, Lua, JavaScript... We have JavaScript and Emacs Lisp mostly complete, but they could use
some love; Lua would be nice as well, butq whatever language it is
that strikes your fancy would be welcome too.
Compilers are for hacking, not for admiring or for complaining about.
Get to it!

View file

@ -13,8 +13,8 @@ procedures can call each other as they please.
The difference is that the compiler creates and interprets bytecode The difference is that the compiler creates and interprets bytecode
for a custom virtual machine, instead of interpreting the for a custom virtual machine, instead of interpreting the
S-expressions directly. Running compiled code is faster than running S-expressions directly. Loading and running compiled code is faster
interpreted code. than loading and running source code.
The virtual machine that does the bytecode interpretation is a part of The virtual machine that does the bytecode interpretation is a part of
Guile itself. This section describes the nature of Guile's virtual Guile itself. This section describes the nature of Guile's virtual
@ -134,7 +134,7 @@ compiled to object code, one might never leave the virtual machine.
@subsection Stack Layout @subsection Stack Layout
While not strictly necessary to understand how to work with the VM, it While not strictly necessary to understand how to work with the VM, it
is instructive and sometimes entertaining to consider the struture of is instructive and sometimes entertaining to consider the structure of
the VM stack. the VM stack.
Logically speaking, a VM stack is composed of ``frames''. Each frame Logically speaking, a VM stack is composed of ``frames''. Each frame
@ -159,12 +159,11 @@ The structure of the fixed part of an application frame is as follows:
@example @example
Stack Stack
| | <- fp + bp->nargs + bp->nlocs + 4 | | <- fp + bp->nargs + bp->nlocs + 3
+------------------+ = SCM_FRAME_UPPER_ADDRESS (fp) +------------------+ = SCM_FRAME_UPPER_ADDRESS (fp)
| Return address | | Return address |
| MV return address| | MV return address|
| Dynamic link | | Dynamic link | <- fp + bp->nargs + bp->nlocs
| External link | <- fp + bp->nargs + bp->nlocs
| Local variable 1 | = SCM_FRAME_DATA_ADDRESS (fp) | Local variable 1 | = SCM_FRAME_DATA_ADDRESS (fp)
| Local variable 0 | <- fp + bp->nargs | Local variable 0 | <- fp + bp->nargs
| Argument 1 | | Argument 1 |
@ -201,25 +200,17 @@ values being returned.
@item Dynamic link @item Dynamic link
This is the @code{fp} in effect before this program was applied. In This is the @code{fp} in effect before this program was applied. In
effect, this and the return address are the registers that are always effect, this and the return address are the registers that are always
``saved''. ``saved''. The dynamic link links the current frame to the previous
frame; computing a stack trace involves traversing these frames.
@item External link
This field is a reference to the list of heap-allocated variables
associated with this frame. For a discussion of heap versus stack
allocation, @xref{Variables and the VM}.
@item Local variable @var{n} @item Local variable @var{n}
Lambda-local variables that are allocated on the stack are all Lambda-local variables that are all allocated as part of the frame.
allocated as part of the frame. This makes access to non-captured, This makes access to variables very cheap.
non-mutated variables very cheap.
@item Argument @var{n} @item Argument @var{n}
The calling convention of the VM requires arguments of a function The calling convention of the VM requires arguments of a function
application to be pushed on the stack, and here they are. Normally application to be pushed on the stack, and here they are. References
references to arguments dispatch to these locations on the stack. to arguments dispatch to these locations on the stack.
However if an argument has to be stored on the heap, it will be copied
from its initial value here onto a location in the heap, and
thereafter only referenced on the heap.
@item Program @item Program
This is the program being applied. For more information on how This is the program being applied. For more information on how
@ -236,26 +227,44 @@ Consider the following Scheme code as an example:
(lambda (b) (list foo a b))) (lambda (b) (list foo a b)))
@end example @end example
Within the lambda expression, "foo" is a top-level variable, "a" is a Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a
lexically captured variable, and "b" is a local variable. lexically captured variable, and @code{b} is a local variable.
@code{b} may safely be allocated on the stack, as there is no enclosed Another way to refer to @code{a} and @code{b} is to say that @code{a}
procedure that references it, nor is it ever mutated. is a ``free'' variable, since it is not defined within the lambda, and
@code{b} is a ``bound'' variable. These are the terms used in the
@dfn{lambda calculus}, a mathematical notation for describing
functions. The lambda calculus is useful because it allows one to
prove statements about functions. It is especially good at describing
scope relations, and it is for that reason that we mention it here.
@code{a}, on the other hand, is referenced by an enclosed procedure, Guile allocates all variables on the stack. When a lexically enclosed
that of the lambda. Thus it must be allocated on the heap, as it may procedure with free variables---a @dfn{closure}---is created, it
(and will) outlive the dynamic extent of the invocation of @code{foo}. copies those variables its free variable vector. References to free
variables are then redirected through the free variable vector.
@code{foo} is a top-level variable, because it names the procedure If a variable is ever @code{set!}, however, it will need to be
@code{foo}, which is here defined at the top-level. heap-allocated instead of stack-allocated, so that different closures
that capture the same variable can see the same value. Also, this
allows continuations to capture a reference to the variable, instead
of to its value at one point in time. For these reasons, @code{set!}
variables are allocated in ``boxes''---actually, in variable cells.
@xref{Variables}, for more information. References to @code{set!}
variables are indirected through the boxes.
Note that variables that are mutated (via @code{set!}) must be Thus perhaps counterintuitively, what would seem ``closer to the
allocated on the heap, even if they are local variables. This is metal'', viz @code{set!}, actually forces an extra memory allocation
because any called subprocedure might capture the continuation, which and indirection.
would need to capture locations instead of values. Thus perhaps
counterintuitively, what would seem ``closer to the metal'', viz Going back to our example, @code{b} may be allocated on the stack, as
@code{set!}, actually forces heap allocation instead of stack it is never mutated.
allocation.
@code{a} may also be allocated on the stack, as it too is never
mutated. Within the enclosed lambda, its value will be copied into
(and referenced from) the free variables vector.
@code{foo} is a top-level variable, because @code{foo} is not
lexically bound in this example.
@node VM Programs @node VM Programs
@subsection Compiled Procedures are VM Programs @subsection Compiled Procedures are VM Programs
@ -297,27 +306,26 @@ scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
scheme@@(guile-user)> ,x foo scheme@@(guile-user)> ,x foo
Disassembly of #<program foo (a)>: Disassembly of #<program foo (a)>:
0 (local-ref 0) ;; `a' (arg) 0 (object-ref 1) ;; #<program b7e478b0 at <unknown port>:0:16 (b)>
2 (external-set 0) ;; `a' (arg) 2 (local-ref 0) ;; `a' (arg)
4 (object-ref 1) ;; #<program b70d2910 at <unknown port>:0:16 (b)> 4 (vector 0 1) ;; 1 element
6 (make-closure) 7 (make-closure)
7 (return) 8 (return)
---------------------------------------- ----------------------------------------
Disassembly of #<program b70d2910 at <unknown port>:0:16 (b)>: Disassembly of #<program b7e478b0 at <unknown port>:0:16 (b)>:
0 (toplevel-ref 1) ;; `foo' 0 (toplevel-ref 1) ;; `foo'
2 (external-ref 0) ;; (closure variable) 2 (free-ref 0) ;; (closure variable)
4 (local-ref 0) ;; `b' (arg) 4 (local-ref 0) ;; `b' (arg)
6 (list 0 3) ;; 3 elements at (unknown file):0:28 6 (list 0 3) ;; 3 elements at (unknown file):0:28
9 (return) 9 (return)
@end smallexample @end smallexample
At @code{ip} 0 and 2, we do the copy from argument to heap for At @code{ip} 0, we load up the compiled lambda. @code{Ip} 2 and 4
@code{a}. @code{Ip} 4 loads up the compiled lambda, and then at create the free variables vector, and @code{ip} 7 makes the
@code{ip} 6 we make a closure---binding code (from the compiled closure---binding code (from the compiled lambda) with data (the
lambda) with data (the heap-allocated variables). Finally we return free-variable vector). Finally we return the closure.
the closure.
The second stanza disassembles the compiled lambda. Toplevel variables The second stanza disassembles the compiled lambda. Toplevel variables
are resolved relative to the module that was current when the are resolved relative to the module that was current when the
@ -336,7 +344,7 @@ routine.
@node Instruction Set @node Instruction Set
@subsection Instruction Set @subsection Instruction Set
There are about 100 instructions in Guile's virtual machine. These There are about 150 instructions in Guile's virtual machine. These
instructions represent atomic units of a program's execution. Ideally, instructions represent atomic units of a program's execution. Ideally,
they perform one task without conditional branches, then dispatch to they perform one task without conditional branches, then dispatch to
the next instruction in the stream. the next instruction in the stream.
@ -376,16 +384,22 @@ instructions. More instructions may be added over time.
* Miscellaneous Instructions:: * Miscellaneous Instructions::
* Inlined Scheme Instructions:: * Inlined Scheme Instructions::
* Inlined Mathematical Instructions:: * Inlined Mathematical Instructions::
* Inlined Bytevector Instructions::
@end menu @end menu
@node Environment Control Instructions @node Environment Control Instructions
@subsubsection Environment Control Instructions @subsubsection Environment Control Instructions
These instructions access and mutate the environment of a compiled These instructions access and mutate the environment of a compiled
procedure---the local bindings, the ``external'' bindings, and the procedure---the local bindings, the free (captured) bindings, and the
toplevel bindings. toplevel bindings.
Some of these instructions have @code{long-} variants, the difference
being that they take 16-bit arguments, encoded in big-endianness,
instead of the normal 8-bit range.
@deffn Instruction local-ref index @deffn Instruction local-ref index
@deffnx Instruction long-local-ref index
Push onto the stack the value of the local variable located at Push onto the stack the value of the local variable located at
@var{index} within the current stack frame. @var{index} within the current stack frame.
@ -395,26 +409,62 @@ arguments.
@end deffn @end deffn
@deffn Instruction local-set index @deffn Instruction local-set index
@deffnx Instruction long-local-ref index
Pop the Scheme object located on top of the stack and make it the new Pop the Scheme object located on top of the stack and make it the new
value of the local variable located at @var{index} within the current value of the local variable located at @var{index} within the current
stack frame. stack frame.
@end deffn @end deffn
@deffn Instruction external-ref index @deffn Instruction free-ref index
Push the value of the closure variable located at position Push the value of the captured variable located at position
@var{index} within the program's list of external variables. @var{index} within the program's vector of captured variables.
@end deffn @end deffn
@deffn Instruction external-set index @deffn Instruction free-boxed-ref index
Pop the Scheme object located on top of the stack and make it the new @deffnx Instruction free-boxed-set index
value of the closure variable located at @var{index} within the Get or set a boxed free variable. Note that there is no free-set
program's list of external variables. instruction, as variables that are @code{set!} must be boxed.
These instructions assume that the value at position @var{index} in
the free variables vector is a variable.
@end deffn @end deffn
The external variable lookup algorithm should probably be made more @deffn Instruction make-closure
efficient in the future via addressing by frame and index. Currently, Pop a vector and a program object off the stack, in that order, and
external variables are all consed onto a list, which results in O(N) push a new program object with the given free variables vector. The
lookup time. new program object shares state with the original program.
At the time of this writing, the space overhead of closures is 4 words
per closure.
@end deffn
@deffn Instruction fix-closure index
Pop a vector off the stack, and set it as the @var{index}th local
variable's free variable vector. The @var{index}th local variable is
assumed to be a procedure.
This instruction is part of a hack for allocating mutually recursive
procedures. The hack is to first perform a @code{local-set} for all of
the recursive procedures, then fix up the procedures' free variable
bindings in place. This allows most @code{letrec}-bound procedures to
be allocated unboxed on the stack.
One could of course do a @code{local-ref}, then @code{make-closure},
then @code{local-set}, but this macroinstruction helps to speed up the
common case.
@end deffn
@deffn Instruction box index
Pop a value off the stack, and set the @var{index}nth local variable
to a box containing that value. A shortcut for @code{make-variable}
then @code{local-set}, used when binding boxed variables.
@end deffn
@deffn Instruction empty-box index
Set the @var{indext}h local variable to a box containing a variable
whose value is unbound. Used when compiling some @code{letrec}
expressions.
@end deffn
@deffn Instruction toplevel-ref index @deffn Instruction toplevel-ref index
@deffnx Instruction long-toplevel-ref index @deffnx Instruction long-toplevel-ref index
@ -442,9 +492,6 @@ in-place mutation of the object table. This mechanism provides for
lazy variable resolution, and an important cached fast-path once the lazy variable resolution, and an important cached fast-path once the
variable has been successfully resolved. variable has been successfully resolved.
The ``long'' variant has a 16-bit index instead of an 8-bit index,
with the most significant byte first.
This instruction pushes the value of the variable onto the stack. This instruction pushes the value of the variable onto the stack.
@end deffn @end deffn
@ -453,8 +500,13 @@ This instruction pushes the value of the variable onto the stack.
Pop a value off the stack, and set it as the value of the toplevel Pop a value off the stack, and set it as the value of the toplevel
variable stored at @var{index} in the object table. If the variable variable stored at @var{index} in the object table. If the variable
has not yet been looked up, we do the lookup as in has not yet been looked up, we do the lookup as in
@code{toplevel-ref}. The ``long'' variant has a 16-bit index instead @code{toplevel-ref}.
of an 8-bit index. @end deffn
@deffn Instruction define
Pop a symbol and a value from the stack, in that order. Look up its
binding in the current toplevel environment, creating the binding if
necessary. Set the variable to the value.
@end deffn @end deffn
@deffn Instruction link-now @deffn Instruction link-now
@ -476,6 +528,11 @@ Pop off two objects from the stack, a variable and a value, and set
the variable to the value. the variable to the value.
@end deffn @end deffn
@deffn Instruction make-variable
Replace the top object on the stack with a variable containing it.
Used in some circumstances when compiling @code{letrec} expressions.
@end deffn
@deffn Instruction object-ref n @deffn Instruction object-ref n
@deffnx Instruction long-object-ref n @deffnx Instruction long-object-ref n
Push @var{n}th value from the current program's object vector. The Push @var{n}th value from the current program's object vector. The
@ -499,7 +556,10 @@ the one to which the instruction pointer points).
@end itemize @end itemize
Note that the offset passed to the instruction is encoded on two 8-bit Note that the offset passed to the instruction is encoded on two 8-bit
integers which are then combined by the VM as one 16-bit integer. integers which are then combined by the VM as one 16-bit integer. Note
also that jump targets in Guile are aligned on 8-byte boundaries, and
that the offset refers to the @var{n}th 8-byte boundary, effectively
giving Guile a 19-bit relative address space.
@deffn Instruction br offset @deffn Instruction br offset
Jump to @var{offset}. Jump to @var{offset}.
@ -550,19 +610,21 @@ Load an arbitrary number from the instruction stream. The number is
embedded in the stream as a string. embedded in the stream as a string.
@end deffn @end deffn
@deffn Instruction load-string length @deffn Instruction load-string length
Load a string from the instruction stream. Load a string from the instruction stream. The string is assumed to be
encoded in the ``latin1'' locale.
@end deffn
@deffn Instruction load-wide-string length
Load a UTF-32 string from the instruction stream. @var{length} is the
length in bytes, not in codepoints
@end deffn @end deffn
@deffn Instruction load-symbol length @deffn Instruction load-symbol length
Load a symbol from the instruction stream. Load a symbol from the instruction stream. The symbol is assumed to be
encoded in the ``latin1'' locale. Symbols backed by wide strings may
be loaded via @code{load-wide-string} then @code{make-symbol}.
@end deffn @end deffn
@deffn Instruction load-keyword length @deffn Instruction load-array length
Load a keyword from the instruction stream. Load a uniform array from the instruction stream. The shape and type
@end deffn of the array are popped off the stack, in that order.
@deffn Instruction define length
Load a symbol from the instruction stream, and look up its binding in
the current toplevel environment, creating the binding if necessary.
Push the variable corresponding to the binding.
@end deffn @end deffn
@deffn Instruction load-program @deffn Instruction load-program
@ -579,23 +641,9 @@ because instead of parsing its data, it directly maps the instruction
stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
and Objcode}, for more information. and Objcode}, for more information.
The resulting compiled procedure will not have any ``external'' The resulting compiled procedure will not have any free variables
variables captured, so it may be loaded only once but used many times captured, so it may be loaded only once but used many times to create
to create closures. closures.
@end deffn
Finally, while this instruction is not strictly a ``loading''
instruction, it's useful to wind up the @code{load-program} discussion
here:
@deffn Instruction make-closure
Pop the program object from the stack, capture the current set of
``external'' variables, and assign those external variables to a copy
of the program. Push the new program object, which shares state with
the original program.
At the time of this writing, the space overhead of closures is 4 words
per closure.
@end deffn @end deffn
@node Procedural Instructions @node Procedural Instructions
@ -764,6 +812,19 @@ Push @code{'()} onto the stack.
Push @var{value}, an 8-bit character, onto the stack. Push @var{value}, an 8-bit character, onto the stack.
@end deffn @end deffn
@deffn Instruction make-char32 value
Push @var{value}, an 32-bit character, onto the stack. The value is
encoded in big-endian order.
@end deffn
@deffn Instruction make-symbol
Pops a string off the stack, and pushes a symbol.
@end deffn
@deffn Instruction make-keyword value
Pops a symbol off the stack, and pushes a keyword.
@end deffn
@deffn Instruction list n @deffn Instruction list n
Pops off the top @var{n} values off of the stack, consing them up into Pops off the top @var{n} values off of the stack, consing them up into
a list, then pushes that list on the stack. What was the topmost value a list, then pushes that list on the stack. What was the topmost value
@ -807,7 +868,8 @@ pushes its elements on the stack.
@subsubsection Miscellaneous Instructions @subsubsection Miscellaneous Instructions
@deffn Instruction nop @deffn Instruction nop
Does nothing! Does nothing! Used for padding other instructions to certain
alignments.
@end deffn @end deffn
@deffn Instruction halt @deffn Instruction halt
@ -873,6 +935,8 @@ stream.
@deffnx Instruction cons x y @deffnx Instruction cons x y
@deffnx Instruction car x @deffnx Instruction car x
@deffnx Instruction cdr x @deffnx Instruction cdr x
@deffnx Instruction vector-ref x y
@deffnx Instruction vector-set x n y
Inlined implementations of their Scheme equivalents. Inlined implementations of their Scheme equivalents.
@end deffn @end deffn
@ -893,7 +957,9 @@ As in the previous section, the definitions below show stack
parameters instead of instruction stream parameters. parameters instead of instruction stream parameters.
@deffn Instruction add x y @deffn Instruction add x y
@deffnx Instruction add1 x
@deffnx Instruction sub x y @deffnx Instruction sub x y
@deffnx Instruction sub1 x
@deffnx Instruction mul x y @deffnx Instruction mul x y
@deffnx Instruction div x y @deffnx Instruction div x y
@deffnx Instruction quo x y @deffnx Instruction quo x y
@ -906,3 +972,58 @@ parameters instead of instruction stream parameters.
@deffnx Instruction ge? x y @deffnx Instruction ge? x y
Inlined implementations of the corresponding mathematical operations. Inlined implementations of the corresponding mathematical operations.
@end deffn @end deffn
@node Inlined Bytevector Instructions
@subsubsection Inlined Bytevector Instructions
Bytevector operations correspond closely to what the current hardware
can do, so it makes sense to inline them to VM instructions, providing
a clear path for eventual native compilation. Without this, Scheme
programs would need other primitives for accessing raw bytes -- but
these primitives are as good as any.
As in the previous section, the definitions below show stack
parameters instead of instruction stream parameters.
The multibyte formats (@code{u16}, @code{f64}, etc) take an extra
endianness argument. Only aligned native accesses are currently
fast-pathed in Guile's VM.
@deffn Instruction bv-u8-ref bv n
@deffnx Instruction bv-s8-ref bv n
@deffnx Instruction bv-u16-native-ref bv n
@deffnx Instruction bv-s16-native-ref bv n
@deffnx Instruction bv-u32-native-ref bv n
@deffnx Instruction bv-s32-native-ref bv n
@deffnx Instruction bv-u64-native-ref bv n
@deffnx Instruction bv-s64-native-ref bv n
@deffnx Instruction bv-f32-native-ref bv n
@deffnx Instruction bv-f64-native-ref bv n
@deffnx Instruction bv-u16-ref bv n endianness
@deffnx Instruction bv-s16-ref bv n endianness
@deffnx Instruction bv-u32-ref bv n endianness
@deffnx Instruction bv-s32-ref bv n endianness
@deffnx Instruction bv-u64-ref bv n endianness
@deffnx Instruction bv-s64-ref bv n endianness
@deffnx Instruction bv-f32-ref bv n endianness
@deffnx Instruction bv-f64-ref bv n endianness
@deffnx Instruction bv-u8-set bv n val
@deffnx Instruction bv-s8-set bv n val
@deffnx Instruction bv-u16-native-set bv n val
@deffnx Instruction bv-s16-native-set bv n val
@deffnx Instruction bv-u32-native-set bv n val
@deffnx Instruction bv-s32-native-set bv n val
@deffnx Instruction bv-u64-native-set bv n val
@deffnx Instruction bv-s64-native-set bv n val
@deffnx Instruction bv-f32-native-set bv n val
@deffnx Instruction bv-f64-native-set bv n val
@deffnx Instruction bv-u16-set bv n val endianness
@deffnx Instruction bv-s16-set bv n val endianness
@deffnx Instruction bv-u32-set bv n val endianness
@deffnx Instruction bv-s32-set bv n val endianness
@deffnx Instruction bv-u64-set bv n val endianness
@deffnx Instruction bv-s64-set bv n val endianness
@deffnx Instruction bv-f32-set bv n val endianness
@deffnx Instruction bv-f64-set bv n val endianness
Inlined implementations of the corresponding bytevector operations.
@end deffn

View file

@ -60,6 +60,8 @@
(print-info pos `(load-program ,sym) #f #f) (print-info pos `(load-program ,sym) #f #f)
(lp (+ pos (byte-length asm)) (cdr code) (lp (+ pos (byte-length asm)) (cdr code)
(acons sym asm programs)))) (acons sym asm programs))))
((nop)
(lp (+ pos (byte-length asm)) (cdr code) programs))
(else (else
(print-info pos asm (print-info pos asm
(code-annotation end asm objs nargs blocs (code-annotation end asm objs nargs blocs