mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 03:40:34 +02:00
update docs for recent vm/compiler work
* doc/ref/compiler.texi: * doc/ref/vm.texi: Update for recent changes. * module/language/assembly/disassemble.scm (disassemble-load-program): Don't print nops, they are distracting.
This commit is contained in:
parent
aaae0d5ab3
commit
98850fd727
3 changed files with 313 additions and 165 deletions
|
@ -17,7 +17,7 @@ This section aims to pay attention to the small man behind the
|
|||
curtain.
|
||||
|
||||
@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
|
||||
know how to compile your .scm file.
|
||||
know how to compile your @code{.scm} file.
|
||||
|
||||
@menu
|
||||
* Compiler Tower::
|
||||
|
@ -67,8 +67,7 @@ for Scheme:
|
|||
#:title "Guile Scheme"
|
||||
#:version "0.5"
|
||||
#:reader read
|
||||
#:compilers `((tree-il . ,compile-tree-il)
|
||||
(ghil . ,compile-ghil))
|
||||
#:compilers `((tree-il . ,compile-tree-il))
|
||||
#:decompilers `((tree-il . ,decompile-tree-il))
|
||||
#:evaluator (lambda (x module) (primitive-eval x))
|
||||
#:printer write)
|
||||
|
@ -220,13 +219,13 @@ Note however that @code{sc-expand} does not have the same signature as
|
|||
around @code{sc-expand}, to make it conform to the general form of
|
||||
compiler procedures in Guile's language tower.
|
||||
|
||||
Compiler procedures take two arguments, an expression and an
|
||||
environment. They return three values: the compiled expression, the
|
||||
corresponding environment for the target language, and a
|
||||
``continuation environment''. The compiled expression and environment
|
||||
will serve as input to the next language's compiler. The
|
||||
``continuation environment'' can be used to compile another expression
|
||||
from the same source language within the same module.
|
||||
Compiler procedures take three arguments: an expression, an
|
||||
environment, and a keyword list of options. They return three values:
|
||||
the compiled expression, the corresponding environment for the target
|
||||
language, and a ``continuation environment''. The compiled expression
|
||||
and environment will serve as input to the next language's compiler.
|
||||
The ``continuation environment'' can be used to compile another
|
||||
expression from the same source language within the same module.
|
||||
|
||||
For example, you might compile the expression, @code{(define-module
|
||||
(foo))}. This will result in a Tree-IL expression and environment. But
|
||||
|
@ -292,6 +291,14 @@ tree-il@@(guile-user)> (apply (primitive +) (const 32) (const 10))
|
|||
|
||||
The @code{src} fields are left out of the external representation.
|
||||
|
||||
One may create Tree-IL objects from their external representations via
|
||||
calling @code{parse-tree-il}, the reader for Tree-IL. If any source
|
||||
information is attached to the input S-expression, it will be
|
||||
propagated to the resulting Tree-IL expressions. This is probably the
|
||||
easiest way to compile to Tree-IL: just make the appropriate external
|
||||
representations in S-expression format, and let @code{parse-tree-il}
|
||||
take care of the rest.
|
||||
|
||||
@deftp {Scheme Variable} <void> src
|
||||
@deftpx {External Representation} (void)
|
||||
An empty expression. In practice, equivalent to Scheme's @code{(if #f
|
||||
|
@ -384,12 +391,29 @@ A version of @code{<let>} that creates recursive bindings, like
|
|||
Scheme's @code{letrec}.
|
||||
@end deftp
|
||||
|
||||
@c FIXME -- need to revive this one
|
||||
@c @deftp {Scheme Variable} <ghil-mv-bind> src vars rest producer . body
|
||||
@c Like Scheme's @code{receive} -- binds the values returned by
|
||||
@c applying @code{producer}, which should be a thunk, to the
|
||||
@c @code{lambda}-like bindings described by @var{vars} and @var{rest}.
|
||||
@c @end deftp
|
||||
There are two Tree-IL constructs that are not normally produced by
|
||||
higher-level compilers, but instead are generated during the
|
||||
source-to-source optimization and analysis passes that the Tree-IL
|
||||
compiler does. Users should not generate these expressions directly,
|
||||
unless they feel very clever, as the default analysis pass will
|
||||
generate them as necessary.
|
||||
|
||||
@deftp {Scheme Variable} <let-values> src names vars exp body
|
||||
@deftpx {External Representation} (let-values @var{names} @var{vars} @var{exp} @var{body})
|
||||
Like Scheme's @code{receive} -- binds the values returned by
|
||||
evaluating @code{exp} to the @code{lambda}-like bindings described by
|
||||
@var{vars}. That is to say, @var{vars} may be an improper list.
|
||||
|
||||
@code{<let-values>} is an optimization of @code{<application>} of the
|
||||
primitive, @code{call-with-values}.
|
||||
@end deftp
|
||||
@deftp {Scheme Variable} <fix> src names vars vals body
|
||||
@deftpx {External Representation} (fix @var{names} @var{vars} @var{vals} @var{body})
|
||||
Like @code{<letrec>}, but only for @var{vals} that are unset
|
||||
@code{lambda} expressions.
|
||||
|
||||
@code{fix} is an optimization of @code{letrec} (and @code{let}).
|
||||
@end deftp
|
||||
|
||||
Tree-IL implements a compiler to GLIL that recursively traverses
|
||||
Tree-IL expressions, writing out GLIL expressions into a linear list.
|
||||
|
@ -399,9 +423,9 @@ future computations. This state allows the compiler not to emit code
|
|||
for constant expressions that will not be used (e.g. docstrings), and
|
||||
to perform tail calls when in tail position.
|
||||
|
||||
In the future, there will be a pass at the beginning of the
|
||||
Tree-IL->GLIL compilation step to perform inlining, copy propagation,
|
||||
dead code elimination, and constant folding.
|
||||
Most optimization, such as it currently is, is performed on Tree-IL
|
||||
expressions as source-to-source transformations. There will be more
|
||||
optimizations added in the future.
|
||||
|
||||
Interested readers are encouraged to read the implementation in
|
||||
@code{(language tree-il compile-glil)} for more details.
|
||||
|
@ -411,18 +435,16 @@ Interested readers are encouraged to read the implementation in
|
|||
|
||||
Guile Low Intermediate Language (GLIL) is a structured intermediate
|
||||
language whose expressions more closely approximate Guile's VM
|
||||
instruction set.
|
||||
instruction set. Its expression types are defined in @code{(language
|
||||
glil)}.
|
||||
|
||||
Its expression types are defined in @code{(language glil)}, and as
|
||||
with GHIL, some of its fields parse as rest arguments.
|
||||
|
||||
@deftp {Scheme Variable} <glil-program> nargs nrest nlocs nexts meta . body
|
||||
@deftp {Scheme Variable} <glil-program> nargs nrest nlocs meta . body
|
||||
A unit of code that at run-time will correspond to a compiled
|
||||
procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts}
|
||||
collectively define the program's arity; see @ref{Compiled
|
||||
Procedures}, for more information. @var{meta} should be an alist of
|
||||
properties, as in Tree IL's @code{<lambda>}. @var{body} is a list of
|
||||
GLIL expressions.
|
||||
procedure. @var{nargs} @var{nrest} and @var{nlocs} collectively define
|
||||
the program's arity; see @ref{Compiled Procedures}, for more
|
||||
information. @var{meta} should be an alist of properties, as in
|
||||
Tree-IL's @code{<lambda>}. @var{body} is an ordered list of GLIL
|
||||
expressions.
|
||||
@end deftp
|
||||
@deftp {Scheme Variable} <glil-bind> . vars
|
||||
An advisory expression that notes a liveness extent for a set of
|
||||
|
@ -461,23 +483,21 @@ and @code{filename} keys, e.g. as returned by
|
|||
@code{source-properties}.
|
||||
@end deftp
|
||||
@deftp {Scheme Variable} <glil-void>
|
||||
Pushes the unspecified value on the stack.
|
||||
Pushes ``the unspecified value'' on the stack.
|
||||
@end deftp
|
||||
@deftp {Scheme Variable} <glil-const> obj
|
||||
Pushes a constant value onto the stack. @var{obj} must be a number,
|
||||
string, symbol, keyword, boolean, character, the empty list, or a pair
|
||||
or vector of constants.
|
||||
string, symbol, keyword, boolean, character, uniform array, the empty
|
||||
list, or a pair or vector of constants.
|
||||
@end deftp
|
||||
@deftp {Scheme Variable} <glil-local> op index
|
||||
Accesses a lexically bound variable from the stack. If @var{op} is
|
||||
@code{ref}, the value is pushed onto the stack; if it is @code{set},
|
||||
the variable is set from the top value on the stack, which is popped
|
||||
off. @xref{Stack Layout}, for more information.
|
||||
@end deftp
|
||||
@deftp {Scheme Variable} <glil-external> op depth index
|
||||
Accesses a heap-allocated variable, addressed by @var{depth}, the nth
|
||||
enclosing environment, and @var{index}, the variable's position within
|
||||
the environment. @var{op} is @code{ref} or @code{set}.
|
||||
@deftp {Scheme Variable} <glil-lexical> local? boxed? op index
|
||||
Accesses a lexically bound variable. If the variable is not
|
||||
@var{local?} it is free. All variables may have @code{ref} and
|
||||
@code{set} as their @var{op}. Boxed variables may also have the
|
||||
@var{op}s @code{box}, @code{empty-box}, and @code{fix}, which
|
||||
correspond in semantics to the VM instructions @code{box},
|
||||
@code{empty-box}, and @code{fix-closure}. @xref{Stack Layout}, for
|
||||
more information.
|
||||
@end deftp
|
||||
@deftp {Scheme Variable} <glil-toplevel> op name
|
||||
Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
|
||||
|
@ -520,7 +540,7 @@ Guile Lowlevel Intermediate Language (GLIL) interpreter 0.3 on Guile 1.9.0
|
|||
Copyright (C) 2001-2008 Free Software Foundation, Inc.
|
||||
|
||||
Enter `,help' for help.
|
||||
glil@@(guile-user)> (program 0 0 0 0 () (const 3) (call return 0))
|
||||
glil@@(guile-user)> (program 0 0 0 () (const 3) (call return 1))
|
||||
@result{} 3
|
||||
@end example
|
||||
|
||||
|
@ -542,12 +562,12 @@ differs from GLIL in four main ways:
|
|||
@itemize
|
||||
@item Labels have been resolved to byte offsets in the program.
|
||||
@item Constants inside procedures have either been expressed as inline
|
||||
instructions, and possibly cached in object arrays.
|
||||
instructions or cached in object arrays.
|
||||
@item Procedures with metadata (source location information, liveness
|
||||
extents, procedure names, generic properties, etc) have had their
|
||||
metadata serialized out to thunks.
|
||||
@item All expressions correspond directly to VM instructions -- i.e.,
|
||||
there is no @code{<glil-local>} which can be a ref or a set.
|
||||
there is no @code{<glil-lexical>} which can be a ref or a set.
|
||||
@end itemize
|
||||
|
||||
Assembly is isomorphic to the bytecode that it compiles to. You can
|
||||
|
@ -567,10 +587,11 @@ example:
|
|||
|
||||
@example
|
||||
scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
|
||||
(load-program 0 0 0 0
|
||||
(load-program 0 0 0
|
||||
() ; Labels
|
||||
60 ; Length
|
||||
70 ; Length
|
||||
#f ; Metadata
|
||||
(make-false)
|
||||
(make-false) ; object table for the returned lambda
|
||||
(nop)
|
||||
(nop) ; Alignment. Since assembly has already resolved its labels
|
||||
|
@ -578,11 +599,12 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
|
|||
(nop) ; object code is mmap'd directly to structures, assembly
|
||||
(nop) ; has to have the alignment embedded in it.
|
||||
(nop)
|
||||
(load-program 1 0 0 0
|
||||
(load-program
|
||||
1
|
||||
0
|
||||
()
|
||||
6
|
||||
; This is the metadata thunk for the returned procedure.
|
||||
(load-program 0 0 0 0 () 21 #f
|
||||
8
|
||||
(load-program 0 0 0 () 21 #f
|
||||
(load-symbol "x") ; Name and liveness extent for @code{x}.
|
||||
(make-false)
|
||||
(make-int8:0) ; Some instruction+arg combinations
|
||||
|
@ -597,7 +619,9 @@ scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
|
|||
(local-ref 0)
|
||||
(local-ref 0)
|
||||
(add)
|
||||
(return))
|
||||
(return)
|
||||
(nop)
|
||||
(nop))
|
||||
; Return our new procedure.
|
||||
(return))
|
||||
@end example
|
||||
|
@ -618,10 +642,10 @@ the next step down from assembly:
|
|||
|
||||
@example
|
||||
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
|
||||
@result{} (load-program 0 0 0 0 () 6 #f
|
||||
@result{} (load-program 0 0 0 () 6 #f
|
||||
(make-int8 32) (make-int8 10) (add) (return))
|
||||
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
|
||||
@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 10 32 10 10 100 48)
|
||||
@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 10 32 10 10 120 52)
|
||||
@end example
|
||||
|
||||
``Objcode'' is bytecode, but mapped directly to a C structure,
|
||||
|
@ -631,8 +655,7 @@ scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
|
|||
struct scm_objcode @{
|
||||
scm_t_uint8 nargs;
|
||||
scm_t_uint8 nrest;
|
||||
scm_t_uint8 nlocs;
|
||||
scm_t_uint8 nexts;
|
||||
scm_t_uint16 nlocs;
|
||||
scm_t_uint32 len;
|
||||
scm_t_uint32 metalen;
|
||||
scm_t_uint8 base[0];
|
||||
|
@ -642,7 +665,7 @@ struct scm_objcode @{
|
|||
As one might imagine, objcode imposes a minimum length on the
|
||||
bytecode. Also, the multibyte fields are in native endianness, which
|
||||
makes objcode (and bytecode) system-dependent. Indeed, in the short
|
||||
example above, all but the last 5 bytes were the program's header.
|
||||
example above, all but the last 6 bytes were the program's header.
|
||||
|
||||
Objcode also has a couple of important efficiency hacks. First,
|
||||
objcode may be mapped directly from disk, allowing compiled code to be
|
||||
|
@ -672,7 +695,7 @@ Makes a bytecode object from @var{bytecode}, which should be a
|
|||
Load object code from a file named @var{file}. The file will be mapped
|
||||
into memory via @code{mmap}, so this is a very fast operation.
|
||||
|
||||
On disk, object code has an eight-byte cookie prepended to it, to
|
||||
On disk, object code has an sixteen-byte cookie prepended to it, to
|
||||
prevent accidental loading of arbitrary garbage.
|
||||
@end deffn
|
||||
|
||||
|
@ -689,11 +712,11 @@ Copy object code out to a @code{u8vector} for analysis by Scheme.
|
|||
The following procedure is actually in @code{(system vm program)}, but
|
||||
we'll mention it here:
|
||||
|
||||
@deffn {Scheme Variable} make-program objcode objtable [external='()]
|
||||
@deffnx {C Function} scm_make_program (objcode, objtable, external)
|
||||
@deffn {Scheme Variable} make-program objcode objtable [free-vars=#f]
|
||||
@deffnx {C Function} scm_make_program (objcode, objtable, free_vars)
|
||||
Load up object code into a Scheme program. The resulting program will
|
||||
have @var{objtable} as its object table, which should be a vector or
|
||||
@code{#f}, and will capture the closure variables from @var{external}.
|
||||
@code{#f}, and will capture the free variables from @var{free-vars}.
|
||||
@end deffn
|
||||
|
||||
Object code from a file may be disassembled at the REPL via the
|
||||
|
@ -707,9 +730,9 @@ respect to the compilation environment. Normally the environment
|
|||
propagates through the compiler transparently, but users may specify
|
||||
the compilation environment manually as well:
|
||||
|
||||
@deffn {Scheme Procedure} make-objcode-env module externals
|
||||
@deffn {Scheme Procedure} make-objcode-env module free-vars
|
||||
Make an object code environment. @var{module} should be a Scheme
|
||||
module, and @var{externals} should be a list of external variables.
|
||||
module, and @var{free-vars} should be a vector of free variables.
|
||||
@code{#f} is also a valid object code environment.
|
||||
@end deffn
|
||||
|
||||
|
@ -748,12 +771,14 @@ procedure is called a certain number of times.
|
|||
The name of the game is a profiling-based harvest of the low-hanging
|
||||
fruit, running programs of interest under a system-level profiler and
|
||||
determining which improvements would give the most bang for the buck.
|
||||
There are many well-known efficiency hacks in the literature: Dybvig's
|
||||
letrec optimization, individual boxing of heap-allocated values (and
|
||||
then store the boxes on the stack directly), optimized case-lambda
|
||||
expressions, stack underflow and overflow handlers, etc. Highly
|
||||
recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
|
||||
It's really getting to the point though that native compilation is the
|
||||
next step.
|
||||
|
||||
The compiler also needs help at the top end, enhancing the Scheme that
|
||||
it knows to also understand R6RS, and adding new high-level compilers:
|
||||
Emacs Lisp, Lua, JavaScript...
|
||||
it knows to also understand R6RS, and adding new high-level compilers.
|
||||
We have JavaScript and Emacs Lisp mostly complete, but they could use
|
||||
some love; Lua would be nice as well, butq whatever language it is
|
||||
that strikes your fancy would be welcome too.
|
||||
|
||||
Compilers are for hacking, not for admiring or for complaining about.
|
||||
Get to it!
|
||||
|
|
311
doc/ref/vm.texi
311
doc/ref/vm.texi
|
@ -13,8 +13,8 @@ procedures can call each other as they please.
|
|||
|
||||
The difference is that the compiler creates and interprets bytecode
|
||||
for a custom virtual machine, instead of interpreting the
|
||||
S-expressions directly. Running compiled code is faster than running
|
||||
interpreted code.
|
||||
S-expressions directly. Loading and running compiled code is faster
|
||||
than loading and running source code.
|
||||
|
||||
The virtual machine that does the bytecode interpretation is a part of
|
||||
Guile itself. This section describes the nature of Guile's virtual
|
||||
|
@ -134,7 +134,7 @@ compiled to object code, one might never leave the virtual machine.
|
|||
@subsection Stack Layout
|
||||
|
||||
While not strictly necessary to understand how to work with the VM, it
|
||||
is instructive and sometimes entertaining to consider the struture of
|
||||
is instructive and sometimes entertaining to consider the structure of
|
||||
the VM stack.
|
||||
|
||||
Logically speaking, a VM stack is composed of ``frames''. Each frame
|
||||
|
@ -159,12 +159,11 @@ The structure of the fixed part of an application frame is as follows:
|
|||
|
||||
@example
|
||||
Stack
|
||||
| | <- fp + bp->nargs + bp->nlocs + 4
|
||||
| | <- fp + bp->nargs + bp->nlocs + 3
|
||||
+------------------+ = SCM_FRAME_UPPER_ADDRESS (fp)
|
||||
| Return address |
|
||||
| MV return address|
|
||||
| Dynamic link |
|
||||
| External link | <- fp + bp->nargs + bp->nlocs
|
||||
| Dynamic link | <- fp + bp->nargs + bp->nlocs
|
||||
| Local variable 1 | = SCM_FRAME_DATA_ADDRESS (fp)
|
||||
| Local variable 0 | <- fp + bp->nargs
|
||||
| Argument 1 |
|
||||
|
@ -201,25 +200,17 @@ values being returned.
|
|||
@item Dynamic link
|
||||
This is the @code{fp} in effect before this program was applied. In
|
||||
effect, this and the return address are the registers that are always
|
||||
``saved''.
|
||||
|
||||
@item External link
|
||||
This field is a reference to the list of heap-allocated variables
|
||||
associated with this frame. For a discussion of heap versus stack
|
||||
allocation, @xref{Variables and the VM}.
|
||||
``saved''. The dynamic link links the current frame to the previous
|
||||
frame; computing a stack trace involves traversing these frames.
|
||||
|
||||
@item Local variable @var{n}
|
||||
Lambda-local variables that are allocated on the stack are all
|
||||
allocated as part of the frame. This makes access to non-captured,
|
||||
non-mutated variables very cheap.
|
||||
Lambda-local variables that are all allocated as part of the frame.
|
||||
This makes access to variables very cheap.
|
||||
|
||||
@item Argument @var{n}
|
||||
The calling convention of the VM requires arguments of a function
|
||||
application to be pushed on the stack, and here they are. Normally
|
||||
references to arguments dispatch to these locations on the stack.
|
||||
However if an argument has to be stored on the heap, it will be copied
|
||||
from its initial value here onto a location in the heap, and
|
||||
thereafter only referenced on the heap.
|
||||
application to be pushed on the stack, and here they are. References
|
||||
to arguments dispatch to these locations on the stack.
|
||||
|
||||
@item Program
|
||||
This is the program being applied. For more information on how
|
||||
|
@ -236,26 +227,44 @@ Consider the following Scheme code as an example:
|
|||
(lambda (b) (list foo a b)))
|
||||
@end example
|
||||
|
||||
Within the lambda expression, "foo" is a top-level variable, "a" is a
|
||||
lexically captured variable, and "b" is a local variable.
|
||||
Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a
|
||||
lexically captured variable, and @code{b} is a local variable.
|
||||
|
||||
@code{b} may safely be allocated on the stack, as there is no enclosed
|
||||
procedure that references it, nor is it ever mutated.
|
||||
Another way to refer to @code{a} and @code{b} is to say that @code{a}
|
||||
is a ``free'' variable, since it is not defined within the lambda, and
|
||||
@code{b} is a ``bound'' variable. These are the terms used in the
|
||||
@dfn{lambda calculus}, a mathematical notation for describing
|
||||
functions. The lambda calculus is useful because it allows one to
|
||||
prove statements about functions. It is especially good at describing
|
||||
scope relations, and it is for that reason that we mention it here.
|
||||
|
||||
@code{a}, on the other hand, is referenced by an enclosed procedure,
|
||||
that of the lambda. Thus it must be allocated on the heap, as it may
|
||||
(and will) outlive the dynamic extent of the invocation of @code{foo}.
|
||||
Guile allocates all variables on the stack. When a lexically enclosed
|
||||
procedure with free variables---a @dfn{closure}---is created, it
|
||||
copies those variables its free variable vector. References to free
|
||||
variables are then redirected through the free variable vector.
|
||||
|
||||
@code{foo} is a top-level variable, because it names the procedure
|
||||
@code{foo}, which is here defined at the top-level.
|
||||
If a variable is ever @code{set!}, however, it will need to be
|
||||
heap-allocated instead of stack-allocated, so that different closures
|
||||
that capture the same variable can see the same value. Also, this
|
||||
allows continuations to capture a reference to the variable, instead
|
||||
of to its value at one point in time. For these reasons, @code{set!}
|
||||
variables are allocated in ``boxes''---actually, in variable cells.
|
||||
@xref{Variables}, for more information. References to @code{set!}
|
||||
variables are indirected through the boxes.
|
||||
|
||||
Note that variables that are mutated (via @code{set!}) must be
|
||||
allocated on the heap, even if they are local variables. This is
|
||||
because any called subprocedure might capture the continuation, which
|
||||
would need to capture locations instead of values. Thus perhaps
|
||||
counterintuitively, what would seem ``closer to the metal'', viz
|
||||
@code{set!}, actually forces heap allocation instead of stack
|
||||
allocation.
|
||||
Thus perhaps counterintuitively, what would seem ``closer to the
|
||||
metal'', viz @code{set!}, actually forces an extra memory allocation
|
||||
and indirection.
|
||||
|
||||
Going back to our example, @code{b} may be allocated on the stack, as
|
||||
it is never mutated.
|
||||
|
||||
@code{a} may also be allocated on the stack, as it too is never
|
||||
mutated. Within the enclosed lambda, its value will be copied into
|
||||
(and referenced from) the free variables vector.
|
||||
|
||||
@code{foo} is a top-level variable, because @code{foo} is not
|
||||
lexically bound in this example.
|
||||
|
||||
@node VM Programs
|
||||
@subsection Compiled Procedures are VM Programs
|
||||
|
@ -297,27 +306,26 @@ scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
|
|||
scheme@@(guile-user)> ,x foo
|
||||
Disassembly of #<program foo (a)>:
|
||||
|
||||
0 (local-ref 0) ;; `a' (arg)
|
||||
2 (external-set 0) ;; `a' (arg)
|
||||
4 (object-ref 1) ;; #<program b70d2910 at <unknown port>:0:16 (b)>
|
||||
6 (make-closure)
|
||||
7 (return)
|
||||
0 (object-ref 1) ;; #<program b7e478b0 at <unknown port>:0:16 (b)>
|
||||
2 (local-ref 0) ;; `a' (arg)
|
||||
4 (vector 0 1) ;; 1 element
|
||||
7 (make-closure)
|
||||
8 (return)
|
||||
|
||||
----------------------------------------
|
||||
Disassembly of #<program b70d2910 at <unknown port>:0:16 (b)>:
|
||||
Disassembly of #<program b7e478b0 at <unknown port>:0:16 (b)>:
|
||||
|
||||
0 (toplevel-ref 1) ;; `foo'
|
||||
2 (external-ref 0) ;; (closure variable)
|
||||
2 (free-ref 0) ;; (closure variable)
|
||||
4 (local-ref 0) ;; `b' (arg)
|
||||
6 (list 0 3) ;; 3 elements at (unknown file):0:28
|
||||
9 (return)
|
||||
@end smallexample
|
||||
|
||||
At @code{ip} 0 and 2, we do the copy from argument to heap for
|
||||
@code{a}. @code{Ip} 4 loads up the compiled lambda, and then at
|
||||
@code{ip} 6 we make a closure---binding code (from the compiled
|
||||
lambda) with data (the heap-allocated variables). Finally we return
|
||||
the closure.
|
||||
At @code{ip} 0, we load up the compiled lambda. @code{Ip} 2 and 4
|
||||
create the free variables vector, and @code{ip} 7 makes the
|
||||
closure---binding code (from the compiled lambda) with data (the
|
||||
free-variable vector). Finally we return the closure.
|
||||
|
||||
The second stanza disassembles the compiled lambda. Toplevel variables
|
||||
are resolved relative to the module that was current when the
|
||||
|
@ -336,7 +344,7 @@ routine.
|
|||
@node Instruction Set
|
||||
@subsection Instruction Set
|
||||
|
||||
There are about 100 instructions in Guile's virtual machine. These
|
||||
There are about 150 instructions in Guile's virtual machine. These
|
||||
instructions represent atomic units of a program's execution. Ideally,
|
||||
they perform one task without conditional branches, then dispatch to
|
||||
the next instruction in the stream.
|
||||
|
@ -376,16 +384,22 @@ instructions. More instructions may be added over time.
|
|||
* Miscellaneous Instructions::
|
||||
* Inlined Scheme Instructions::
|
||||
* Inlined Mathematical Instructions::
|
||||
* Inlined Bytevector Instructions::
|
||||
@end menu
|
||||
|
||||
@node Environment Control Instructions
|
||||
@subsubsection Environment Control Instructions
|
||||
|
||||
These instructions access and mutate the environment of a compiled
|
||||
procedure---the local bindings, the ``external'' bindings, and the
|
||||
procedure---the local bindings, the free (captured) bindings, and the
|
||||
toplevel bindings.
|
||||
|
||||
Some of these instructions have @code{long-} variants, the difference
|
||||
being that they take 16-bit arguments, encoded in big-endianness,
|
||||
instead of the normal 8-bit range.
|
||||
|
||||
@deffn Instruction local-ref index
|
||||
@deffnx Instruction long-local-ref index
|
||||
Push onto the stack the value of the local variable located at
|
||||
@var{index} within the current stack frame.
|
||||
|
||||
|
@ -395,26 +409,62 @@ arguments.
|
|||
@end deffn
|
||||
|
||||
@deffn Instruction local-set index
|
||||
@deffnx Instruction long-local-ref index
|
||||
Pop the Scheme object located on top of the stack and make it the new
|
||||
value of the local variable located at @var{index} within the current
|
||||
stack frame.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction external-ref index
|
||||
Push the value of the closure variable located at position
|
||||
@var{index} within the program's list of external variables.
|
||||
@deffn Instruction free-ref index
|
||||
Push the value of the captured variable located at position
|
||||
@var{index} within the program's vector of captured variables.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction external-set index
|
||||
Pop the Scheme object located on top of the stack and make it the new
|
||||
value of the closure variable located at @var{index} within the
|
||||
program's list of external variables.
|
||||
@deffn Instruction free-boxed-ref index
|
||||
@deffnx Instruction free-boxed-set index
|
||||
Get or set a boxed free variable. Note that there is no free-set
|
||||
instruction, as variables that are @code{set!} must be boxed.
|
||||
|
||||
These instructions assume that the value at position @var{index} in
|
||||
the free variables vector is a variable.
|
||||
@end deffn
|
||||
|
||||
The external variable lookup algorithm should probably be made more
|
||||
efficient in the future via addressing by frame and index. Currently,
|
||||
external variables are all consed onto a list, which results in O(N)
|
||||
lookup time.
|
||||
@deffn Instruction make-closure
|
||||
Pop a vector and a program object off the stack, in that order, and
|
||||
push a new program object with the given free variables vector. The
|
||||
new program object shares state with the original program.
|
||||
|
||||
At the time of this writing, the space overhead of closures is 4 words
|
||||
per closure.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction fix-closure index
|
||||
Pop a vector off the stack, and set it as the @var{index}th local
|
||||
variable's free variable vector. The @var{index}th local variable is
|
||||
assumed to be a procedure.
|
||||
|
||||
This instruction is part of a hack for allocating mutually recursive
|
||||
procedures. The hack is to first perform a @code{local-set} for all of
|
||||
the recursive procedures, then fix up the procedures' free variable
|
||||
bindings in place. This allows most @code{letrec}-bound procedures to
|
||||
be allocated unboxed on the stack.
|
||||
|
||||
One could of course do a @code{local-ref}, then @code{make-closure},
|
||||
then @code{local-set}, but this macroinstruction helps to speed up the
|
||||
common case.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction box index
|
||||
Pop a value off the stack, and set the @var{index}nth local variable
|
||||
to a box containing that value. A shortcut for @code{make-variable}
|
||||
then @code{local-set}, used when binding boxed variables.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction empty-box index
|
||||
Set the @var{indext}h local variable to a box containing a variable
|
||||
whose value is unbound. Used when compiling some @code{letrec}
|
||||
expressions.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction toplevel-ref index
|
||||
@deffnx Instruction long-toplevel-ref index
|
||||
|
@ -442,9 +492,6 @@ in-place mutation of the object table. This mechanism provides for
|
|||
lazy variable resolution, and an important cached fast-path once the
|
||||
variable has been successfully resolved.
|
||||
|
||||
The ``long'' variant has a 16-bit index instead of an 8-bit index,
|
||||
with the most significant byte first.
|
||||
|
||||
This instruction pushes the value of the variable onto the stack.
|
||||
@end deffn
|
||||
|
||||
|
@ -453,8 +500,13 @@ This instruction pushes the value of the variable onto the stack.
|
|||
Pop a value off the stack, and set it as the value of the toplevel
|
||||
variable stored at @var{index} in the object table. If the variable
|
||||
has not yet been looked up, we do the lookup as in
|
||||
@code{toplevel-ref}. The ``long'' variant has a 16-bit index instead
|
||||
of an 8-bit index.
|
||||
@code{toplevel-ref}.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction define
|
||||
Pop a symbol and a value from the stack, in that order. Look up its
|
||||
binding in the current toplevel environment, creating the binding if
|
||||
necessary. Set the variable to the value.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction link-now
|
||||
|
@ -476,6 +528,11 @@ Pop off two objects from the stack, a variable and a value, and set
|
|||
the variable to the value.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction make-variable
|
||||
Replace the top object on the stack with a variable containing it.
|
||||
Used in some circumstances when compiling @code{letrec} expressions.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction object-ref n
|
||||
@deffnx Instruction long-object-ref n
|
||||
Push @var{n}th value from the current program's object vector. The
|
||||
|
@ -499,7 +556,10 @@ the one to which the instruction pointer points).
|
|||
@end itemize
|
||||
|
||||
Note that the offset passed to the instruction is encoded on two 8-bit
|
||||
integers which are then combined by the VM as one 16-bit integer.
|
||||
integers which are then combined by the VM as one 16-bit integer. Note
|
||||
also that jump targets in Guile are aligned on 8-byte boundaries, and
|
||||
that the offset refers to the @var{n}th 8-byte boundary, effectively
|
||||
giving Guile a 19-bit relative address space.
|
||||
|
||||
@deffn Instruction br offset
|
||||
Jump to @var{offset}.
|
||||
|
@ -550,19 +610,21 @@ Load an arbitrary number from the instruction stream. The number is
|
|||
embedded in the stream as a string.
|
||||
@end deffn
|
||||
@deffn Instruction load-string length
|
||||
Load a string from the instruction stream.
|
||||
Load a string from the instruction stream. The string is assumed to be
|
||||
encoded in the ``latin1'' locale.
|
||||
@end deffn
|
||||
@deffn Instruction load-wide-string length
|
||||
Load a UTF-32 string from the instruction stream. @var{length} is the
|
||||
length in bytes, not in codepoints
|
||||
@end deffn
|
||||
@deffn Instruction load-symbol length
|
||||
Load a symbol from the instruction stream.
|
||||
Load a symbol from the instruction stream. The symbol is assumed to be
|
||||
encoded in the ``latin1'' locale. Symbols backed by wide strings may
|
||||
be loaded via @code{load-wide-string} then @code{make-symbol}.
|
||||
@end deffn
|
||||
@deffn Instruction load-keyword length
|
||||
Load a keyword from the instruction stream.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction define length
|
||||
Load a symbol from the instruction stream, and look up its binding in
|
||||
the current toplevel environment, creating the binding if necessary.
|
||||
Push the variable corresponding to the binding.
|
||||
@deffn Instruction load-array length
|
||||
Load a uniform array from the instruction stream. The shape and type
|
||||
of the array are popped off the stack, in that order.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction load-program
|
||||
|
@ -579,23 +641,9 @@ because instead of parsing its data, it directly maps the instruction
|
|||
stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
|
||||
and Objcode}, for more information.
|
||||
|
||||
The resulting compiled procedure will not have any ``external''
|
||||
variables captured, so it may be loaded only once but used many times
|
||||
to create closures.
|
||||
@end deffn
|
||||
|
||||
Finally, while this instruction is not strictly a ``loading''
|
||||
instruction, it's useful to wind up the @code{load-program} discussion
|
||||
here:
|
||||
|
||||
@deffn Instruction make-closure
|
||||
Pop the program object from the stack, capture the current set of
|
||||
``external'' variables, and assign those external variables to a copy
|
||||
of the program. Push the new program object, which shares state with
|
||||
the original program.
|
||||
|
||||
At the time of this writing, the space overhead of closures is 4 words
|
||||
per closure.
|
||||
The resulting compiled procedure will not have any free variables
|
||||
captured, so it may be loaded only once but used many times to create
|
||||
closures.
|
||||
@end deffn
|
||||
|
||||
@node Procedural Instructions
|
||||
|
@ -764,6 +812,19 @@ Push @code{'()} onto the stack.
|
|||
Push @var{value}, an 8-bit character, onto the stack.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction make-char32 value
|
||||
Push @var{value}, an 32-bit character, onto the stack. The value is
|
||||
encoded in big-endian order.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction make-symbol
|
||||
Pops a string off the stack, and pushes a symbol.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction make-keyword value
|
||||
Pops a symbol off the stack, and pushes a keyword.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction list n
|
||||
Pops off the top @var{n} values off of the stack, consing them up into
|
||||
a list, then pushes that list on the stack. What was the topmost value
|
||||
|
@ -807,7 +868,8 @@ pushes its elements on the stack.
|
|||
@subsubsection Miscellaneous Instructions
|
||||
|
||||
@deffn Instruction nop
|
||||
Does nothing!
|
||||
Does nothing! Used for padding other instructions to certain
|
||||
alignments.
|
||||
@end deffn
|
||||
|
||||
@deffn Instruction halt
|
||||
|
@ -873,6 +935,8 @@ stream.
|
|||
@deffnx Instruction cons x y
|
||||
@deffnx Instruction car x
|
||||
@deffnx Instruction cdr x
|
||||
@deffnx Instruction vector-ref x y
|
||||
@deffnx Instruction vector-set x n y
|
||||
Inlined implementations of their Scheme equivalents.
|
||||
@end deffn
|
||||
|
||||
|
@ -893,7 +957,9 @@ As in the previous section, the definitions below show stack
|
|||
parameters instead of instruction stream parameters.
|
||||
|
||||
@deffn Instruction add x y
|
||||
@deffnx Instruction add1 x
|
||||
@deffnx Instruction sub x y
|
||||
@deffnx Instruction sub1 x
|
||||
@deffnx Instruction mul x y
|
||||
@deffnx Instruction div x y
|
||||
@deffnx Instruction quo x y
|
||||
|
@ -906,3 +972,58 @@ parameters instead of instruction stream parameters.
|
|||
@deffnx Instruction ge? x y
|
||||
Inlined implementations of the corresponding mathematical operations.
|
||||
@end deffn
|
||||
|
||||
@node Inlined Bytevector Instructions
|
||||
@subsubsection Inlined Bytevector Instructions
|
||||
|
||||
Bytevector operations correspond closely to what the current hardware
|
||||
can do, so it makes sense to inline them to VM instructions, providing
|
||||
a clear path for eventual native compilation. Without this, Scheme
|
||||
programs would need other primitives for accessing raw bytes -- but
|
||||
these primitives are as good as any.
|
||||
|
||||
As in the previous section, the definitions below show stack
|
||||
parameters instead of instruction stream parameters.
|
||||
|
||||
The multibyte formats (@code{u16}, @code{f64}, etc) take an extra
|
||||
endianness argument. Only aligned native accesses are currently
|
||||
fast-pathed in Guile's VM.
|
||||
|
||||
@deffn Instruction bv-u8-ref bv n
|
||||
@deffnx Instruction bv-s8-ref bv n
|
||||
@deffnx Instruction bv-u16-native-ref bv n
|
||||
@deffnx Instruction bv-s16-native-ref bv n
|
||||
@deffnx Instruction bv-u32-native-ref bv n
|
||||
@deffnx Instruction bv-s32-native-ref bv n
|
||||
@deffnx Instruction bv-u64-native-ref bv n
|
||||
@deffnx Instruction bv-s64-native-ref bv n
|
||||
@deffnx Instruction bv-f32-native-ref bv n
|
||||
@deffnx Instruction bv-f64-native-ref bv n
|
||||
@deffnx Instruction bv-u16-ref bv n endianness
|
||||
@deffnx Instruction bv-s16-ref bv n endianness
|
||||
@deffnx Instruction bv-u32-ref bv n endianness
|
||||
@deffnx Instruction bv-s32-ref bv n endianness
|
||||
@deffnx Instruction bv-u64-ref bv n endianness
|
||||
@deffnx Instruction bv-s64-ref bv n endianness
|
||||
@deffnx Instruction bv-f32-ref bv n endianness
|
||||
@deffnx Instruction bv-f64-ref bv n endianness
|
||||
@deffnx Instruction bv-u8-set bv n val
|
||||
@deffnx Instruction bv-s8-set bv n val
|
||||
@deffnx Instruction bv-u16-native-set bv n val
|
||||
@deffnx Instruction bv-s16-native-set bv n val
|
||||
@deffnx Instruction bv-u32-native-set bv n val
|
||||
@deffnx Instruction bv-s32-native-set bv n val
|
||||
@deffnx Instruction bv-u64-native-set bv n val
|
||||
@deffnx Instruction bv-s64-native-set bv n val
|
||||
@deffnx Instruction bv-f32-native-set bv n val
|
||||
@deffnx Instruction bv-f64-native-set bv n val
|
||||
@deffnx Instruction bv-u16-set bv n val endianness
|
||||
@deffnx Instruction bv-s16-set bv n val endianness
|
||||
@deffnx Instruction bv-u32-set bv n val endianness
|
||||
@deffnx Instruction bv-s32-set bv n val endianness
|
||||
@deffnx Instruction bv-u64-set bv n val endianness
|
||||
@deffnx Instruction bv-s64-set bv n val endianness
|
||||
@deffnx Instruction bv-f32-set bv n val endianness
|
||||
@deffnx Instruction bv-f64-set bv n val endianness
|
||||
Inlined implementations of the corresponding bytevector operations.
|
||||
@end deffn
|
||||
|
|
|
@ -60,6 +60,8 @@
|
|||
(print-info pos `(load-program ,sym) #f #f)
|
||||
(lp (+ pos (byte-length asm)) (cdr code)
|
||||
(acons sym asm programs))))
|
||||
((nop)
|
||||
(lp (+ pos (byte-length asm)) (cdr code) programs))
|
||||
(else
|
||||
(print-info pos asm
|
||||
(code-annotation end asm objs nargs blocs
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue