mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 20:00:19 +02:00
* INSTALL: Update from newest merge * doc/ref/compiler.texi: * doc/ref/vm.texi: Fold in Ludovic's suggestions
917 lines
34 KiB
Text
917 lines
34 KiB
Text
@c -*-texinfo-*-
|
|
@c This is part of the GNU Guile Reference Manual.
|
|
@c Copyright (C) 2008,2009
|
|
@c Free Software Foundation, Inc.
|
|
@c See the file guile.texi for copying conditions.
|
|
|
|
@node A Virtual Machine for Guile
|
|
@section A Virtual Machine for Guile
|
|
|
|
Guile has both an interpreter and a compiler. To a user, the
|
|
difference is largely transparent---interpreted and compiled
|
|
procedures can call each other as they please.
|
|
|
|
The difference is that the compiler creates and interprets bytecode
|
|
for a custom virtual machine, instead of interpreting the
|
|
S-expressions directly. Running compiled code is faster than running
|
|
interpreted code.
|
|
|
|
The virtual machine that does the bytecode interpretation is a part of
|
|
Guile itself. This section describes the nature of Guile's virtual
|
|
machine.
|
|
|
|
@menu
|
|
* Why a VM?::
|
|
* VM Concepts::
|
|
* Stack Layout::
|
|
* Variables and the VM::
|
|
* VM Programs::
|
|
* Instruction Set::
|
|
@end menu
|
|
|
|
@node Why a VM?
|
|
@subsection Why a VM?
|
|
|
|
@cindex interpreter
|
|
@cindex evaluator
|
|
For a long time, Guile only had an interpreter, called the
|
|
@dfn{evaluator}. Guile's evaluator operates directly on the
|
|
S-expression representation of Scheme source code.
|
|
|
|
But while the evaluator is highly optimized and hand-tuned, and
|
|
contains some extensive speed trickery (@pxref{Memoization}), it still
|
|
performs many needless computations during the course of evaluating an
|
|
expression. For example, application of a function to arguments
|
|
needlessly conses up the arguments in a list. Evaluation of an
|
|
expression always has to figure out what the car of the expression is
|
|
-- a procedure, a memoized form, or something else. All values have to
|
|
be allocated on the heap. Et cetera.
|
|
|
|
The solution to this problem is to compile the higher-level language,
|
|
Scheme, into a lower-level language for which all of the checks and
|
|
dispatching have already been done---the code is instead stripped to
|
|
the bare minimum needed to ``do the job''.
|
|
|
|
The question becomes then, what low-level language to choose? There
|
|
are many options. We could compile to native code directly, but that
|
|
poses portability problems for Guile, as it is a highly cross-platform
|
|
project.
|
|
|
|
So we want the performance gains that compilation provides, but we
|
|
also want to maintain the portability benefits of a single code path.
|
|
The obvious solution is to compile to a virtual machine that is
|
|
present on all Guile installations.
|
|
|
|
The easiest (and most fun) way to depend on a virtual machine is to
|
|
implement the virtual machine within Guile itself. This way the
|
|
virtual machine provides what Scheme needs (tail calls, multiple
|
|
values, @code{call/cc}) and can provide optimized inline instructions
|
|
for Guile (@code{cons}, @code{struct-ref}, etc.).
|
|
|
|
So this is what Guile does. The rest of this section describes that VM
|
|
that Guile implements, and the compiled procedures that run on it.
|
|
|
|
Note that this decision to implement a bytecode compiler does not
|
|
preclude native compilation. We can compile from bytecode to native
|
|
code at runtime, or even do ahead of time compilation. More
|
|
possibilities are discussed in @ref{Extending the Compiler}.
|
|
|
|
@node VM Concepts
|
|
@subsection VM Concepts
|
|
|
|
A virtual machine (VM) is a Scheme object. Users may create virtual
|
|
machines using the standard procedures described later in this manual,
|
|
but that is usually unnecessary, as Guile ensures that there is one
|
|
virtual machine per thread. When a VM-compiled procedure is run, Guile
|
|
looks up the virtual machine for the current thread and executes the
|
|
procedure using that VM.
|
|
|
|
Guile's virtual machine is a stack machine---that is, it has few
|
|
registers, and the instructions defined in the VM operate by pushing
|
|
and popping values from a stack.
|
|
|
|
Stack memory is exclusive to the virtual machine that owns it. In
|
|
addition to their stacks, virtual machines also have access to the
|
|
global memory (modules, global bindings, etc) that is shared among
|
|
other parts of Guile, including other VMs.
|
|
|
|
A VM has generic instructions, such as those to reference local
|
|
variables, and instructions designed to support Guile's languages --
|
|
mathematical instructions that support the entire numerical tower, an
|
|
inlined implementation of @code{cons}, etc.
|
|
|
|
The registers that a VM has are as follows:
|
|
|
|
@itemize
|
|
@item ip - Instruction pointer
|
|
@item sp - Stack pointer
|
|
@item fp - Frame pointer
|
|
@end itemize
|
|
|
|
In other architectures, the instruction pointer is sometimes called
|
|
the ``program counter'' (pc). This set of registers is pretty typical
|
|
for stack machines; their exact meanings in the context of Guile's VM
|
|
is described in the next section.
|
|
|
|
A virtual machine executes by loading a compiled procedure, and
|
|
executing the object code associated with that procedure. Of course,
|
|
that procedure may call other procedures, tail-call others, ad
|
|
infinitum---indeed, within a guile whose modules have all been
|
|
compiled to object code, one might never leave the virtual machine.
|
|
|
|
@c wingo: I wish the following were true, but currently we just use
|
|
@c the one engine. This kind of thing is possible tho.
|
|
|
|
@c A VM may have one of three engines: reckless, regular, or debugging.
|
|
@c Reckless engine is fastest but dangerous. Regular engine is normally
|
|
@c fail-safe and reasonably fast. Debugging engine is safest and
|
|
@c functional but very slow.
|
|
|
|
@node Stack Layout
|
|
@subsection Stack Layout
|
|
|
|
While not strictly necessary to understand how to work with the VM, it
|
|
is instructive and sometimes entertaining to consider the struture of
|
|
the VM stack.
|
|
|
|
Logically speaking, a VM stack is composed of ``frames''. Each frame
|
|
corresponds to the application of one compiled procedure, and contains
|
|
storage space for arguments, local variables, intermediate values, and
|
|
some bookkeeping information (such as what to do after the frame
|
|
computes its value).
|
|
|
|
While the compiler is free to do whatever it wants to, as long as the
|
|
semantics of a computation are preserved, in practice every time you
|
|
call a function, a new frame is created. (The notable exception of
|
|
course is the tail call case, @pxref{Tail Calls}.)
|
|
|
|
Within a frame, you have the data associated with the function
|
|
application itself, which is of a fixed size, and the stack space for
|
|
intermediate values. Sometimes only the former is referred to as the
|
|
``frame'', and the latter is the ``stack'', although all pending
|
|
application frames can have some intermediate computations interleaved
|
|
on the stack.
|
|
|
|
The structure of the fixed part of an application frame is as follows:
|
|
|
|
@example
|
|
Stack
|
|
| | <- fp + bp->nargs + bp->nlocs + 4
|
|
+------------------+ = SCM_FRAME_UPPER_ADDRESS (fp)
|
|
| Return address |
|
|
| MV return address|
|
|
| Dynamic link |
|
|
| External link | <- fp + bp->nargs + bp->nlocs
|
|
| Local variable 1 | = SCM_FRAME_DATA_ADDRESS (fp)
|
|
| Local variable 0 | <- fp + bp->nargs
|
|
| Argument 1 |
|
|
| Argument 0 | <- fp
|
|
| Program | <- fp - 1
|
|
+------------------+ = SCM_FRAME_LOWER_ADDRESS (fp)
|
|
| |
|
|
@end example
|
|
|
|
In the above drawing, the stack grows upward. The intermediate values
|
|
stored in the application of this frame are stored above
|
|
@code{SCM_FRAME_UPPER_ADDRESS (fp)}. @code{bp} refers to the
|
|
@code{struct scm_program*} data associated with the program at
|
|
@code{fp - 1}. @code{nargs} and @code{nlocs} are properties of the
|
|
compiled procedure, which will be discussed later.
|
|
|
|
The individual fields of the frame are as follows:
|
|
|
|
@table @asis
|
|
@item Return address
|
|
The @code{ip} that was in effect before this program was applied. When
|
|
we return from this activation frame, we will jump back to this
|
|
@code{ip}.
|
|
|
|
@item MV return address
|
|
The @code{ip} to return to if this application returns multiple
|
|
values. For continuations that only accept one value, this value will
|
|
be @code{NULL}; for others, it will be an @code{ip} that points to a
|
|
multiple-value return address in the calling code. That code will
|
|
expect the top value on the stack to be an integer---the number of
|
|
values being returned---and that below that integer there are the
|
|
values being returned.
|
|
|
|
@item Dynamic link
|
|
This is the @code{fp} in effect before this program was applied. In
|
|
effect, this and the return address are the registers that are always
|
|
``saved''.
|
|
|
|
@item External link
|
|
This field is a reference to the list of heap-allocated variables
|
|
associated with this frame. For a discussion of heap versus stack
|
|
allocation, @xref{Variables and the VM}.
|
|
|
|
@item Local variable @var{n}
|
|
Lambda-local variables that are allocated on the stack are all
|
|
allocated as part of the frame. This makes access to non-captured,
|
|
non-mutated variables very cheap.
|
|
|
|
@item Argument @var{n}
|
|
The calling convention of the VM requires arguments of a function
|
|
application to be pushed on the stack, and here they are. Normally
|
|
references to arguments dispatch to these locations on the stack.
|
|
However if an argument has to be stored on the heap, it will be copied
|
|
from its initial value here onto a location in the heap, and
|
|
thereafter only referenced on the heap.
|
|
|
|
@item Program
|
|
This is the program being applied. For more information on how
|
|
programs are implemented, @xref{VM Programs}.
|
|
@end table
|
|
|
|
@node Variables and the VM
|
|
@subsection Variables and the VM
|
|
|
|
Let's think about the following Scheme code as an example:
|
|
|
|
@example
|
|
(define (foo a)
|
|
(lambda (b) (list foo a b)))
|
|
@end example
|
|
|
|
Within the lambda expression, "foo" is a top-level variable, "a" is a
|
|
lexically captured variable, and "b" is a local variable.
|
|
|
|
That is to say: @code{b} may safely be allocated on the stack, as
|
|
there is no enclosed procedure that references it, nor is it ever
|
|
mutated.
|
|
|
|
@code{a}, on the other hand, is referenced by an enclosed procedure,
|
|
that of the lambda. Thus it must be allocated on the heap, as it may
|
|
(and will) outlive the dynamic extent of the invocation of @code{foo}.
|
|
|
|
@code{foo} is a toplevel variable, as mandated by Scheme's semantics:
|
|
|
|
@example
|
|
(define proc (foo 'bar)) ; assuming prev. definition of @code{foo}
|
|
(define foo 42) ; redefinition
|
|
(proc 'baz)
|
|
@result{} (42 bar baz)
|
|
@end example
|
|
|
|
Note that variables that are mutated (via @code{set!}) must be
|
|
allocated on the heap, even if they are local variables. This is
|
|
because any called subprocedure might capture the continuation, which
|
|
would need to capture locations instead of values. Thus perhaps
|
|
counterintuitively, what would seem ``closer to the metal'', viz
|
|
@code{set!}, actually forces heap allocation instead of stack
|
|
allocation.
|
|
|
|
@node VM Programs
|
|
@subsection Compiled Procedures are VM Programs
|
|
|
|
By default, when you enter in expressions at Guile's REPL, they are
|
|
first compiled to VM object code, then that VM object code is executed
|
|
to produce a value. If the expression evaluates to a procedure, the
|
|
result of this process is a compiled procedure.
|
|
|
|
A compiled procedure is a compound object, consisting of its bytecode,
|
|
a reference to any captured lexical variables, an object array, and
|
|
some metadata such as the procedure's arity, name, and documentation.
|
|
You can pick apart these pieces with the accessors in @code{(system vm
|
|
program)}. @xref{Compiled Procedures}, for a full API reference.
|
|
|
|
@cindex object table
|
|
The object array of a compiled procedure, also known as the
|
|
@dfn{object table}, holds all Scheme objects whose values are known
|
|
not to change across invocations of the procedure: constant strings,
|
|
symbols, etc. The object table of a program is initialized right
|
|
before a program is loaded with @code{load-program}.
|
|
@xref{Loading Instructions}, for more information.
|
|
|
|
Variable objects are one such type of constant object: when a global
|
|
binding is defined, a variable object is associated to it and that
|
|
object will remain constant over time, even if the value bound to it
|
|
changes. Therefore, toplevel bindings only need to be looked up once.
|
|
Thereafter, references to the corresponding toplevel variables from
|
|
within the program are then performed via the @code{toplevel-ref}
|
|
instruction, which uses the object vector, and are almost as fast as
|
|
local variable references.
|
|
|
|
We can see how these concepts tie together by disassembling the
|
|
@code{foo} function to see what is going on:
|
|
|
|
@smallexample
|
|
scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b)))
|
|
scheme@@(guile-user)> ,x foo
|
|
Disassembly of #<program foo (a)>:
|
|
|
|
Bytecode:
|
|
|
|
0 (local-ref 0) ;; `a' (arg)
|
|
2 (external-set 0) ;; `a' (arg)
|
|
4 (object-ref 0) ;; #<program #(0 28 #f) (b)>
|
|
6 (make-closure) at (unknown file):0:16
|
|
7 (return)
|
|
|
|
----------------------------------------
|
|
Disassembly of #<program #(0 28 #f) (b)>:
|
|
|
|
Bytecode:
|
|
|
|
0 (toplevel-ref 0) ;; `list'
|
|
2 (toplevel-ref 1) ;; `foo'
|
|
4 (external-ref 0) ;; (closure variable)
|
|
6 (local-ref 0) ;; `b' (arg)
|
|
8 (goto/args 3) at (unknown file):0:28
|
|
@end smallexample
|
|
|
|
At @code{ip} 0 and 2, we do the copy from argument to heap for
|
|
@code{a}. @code{Ip} 4 loads up the compiled lambda, and then at
|
|
@code{ip} 6 we make a closure---binding code (from the compiled
|
|
lambda) with data (the heap-allocated variables). Finally we return
|
|
the closure.
|
|
|
|
The second stanza disassembles the compiled lambda. Toplevel variables
|
|
are resolved relative to the module that was current when the
|
|
procedure was created. This lookup occurs lazily, at the first time
|
|
the variable is actually referenced, and the location of the lookup is
|
|
cached so that future references are very cheap. @xref{Environment
|
|
Control Instructions}, for more details.
|
|
|
|
Then we see a reference to an external variable, corresponding to
|
|
@code{a}. The disassembler doesn't have enough information to give a
|
|
name to that variable, so it just marks it as being a ``closure
|
|
variable''. Finally we see the reference to @code{b}, then a tail call
|
|
(@code{goto/args}) with three arguments.
|
|
|
|
@node Instruction Set
|
|
@subsection Instruction Set
|
|
|
|
There are about 100 instructions in Guile's virtual machine. These
|
|
instructions represent atomic units of a program's execution. Ideally,
|
|
they perform one task without conditional branches, then dispatch to
|
|
the next instruction in the stream.
|
|
|
|
Instructions themselves are one byte long. Some instructions take
|
|
parameters, which follow the instruction byte in the instruction
|
|
stream.
|
|
|
|
Sometimes the compiler can figure out that it is compiling a special
|
|
case that can be run more efficiently. So, for example, while Guile
|
|
offers a generic test-and-branch instruction, it also offers specific
|
|
instructions for special cases, so that the following cases all have
|
|
their own test-and-branch instructions:
|
|
|
|
@example
|
|
(if pred then else)
|
|
(if (not pred) then else)
|
|
(if (null? l) then else)
|
|
(if (not (null? l)) then else)
|
|
@end example
|
|
|
|
In addition, some Scheme primitives have their own inline
|
|
implementations, e.g. @code{cons}.
|
|
|
|
So Guile's instruction set is a @emph{complete} instruction set, in
|
|
that it provides the instructions that are suited to the problem, and
|
|
is not concerned with making a minimal, orthogonal set of
|
|
instructions. More instructions may be added over time.
|
|
|
|
@menu
|
|
* Environment Control Instructions::
|
|
* Branch Instructions::
|
|
* Loading Instructions::
|
|
* Procedural Instructions::
|
|
* Data Control Instructions::
|
|
* Miscellaneous Instructions::
|
|
* Inlined Scheme Instructions::
|
|
* Inlined Mathematical Instructions::
|
|
@end menu
|
|
|
|
@node Environment Control Instructions
|
|
@subsubsection Environment Control Instructions
|
|
|
|
These instructions access and mutate the environment of a compiled
|
|
procedure---the local bindings, the ``external'' bindings, and the
|
|
toplevel bindings.
|
|
|
|
@deffn Instruction local-ref index
|
|
Push onto the stack the value of the local variable located at
|
|
@var{index} within the current stack frame.
|
|
|
|
Note that arguments and local variables are all in one block. Thus the
|
|
first argument, if any, is at index 0, and local bindings follow the
|
|
arguments.
|
|
@end deffn
|
|
|
|
@deffn Instruction local-set index
|
|
Pop the Scheme object located on top of the stack and make it the new
|
|
value of the local variable located at @var{index} within the current
|
|
stack frame.
|
|
@end deffn
|
|
|
|
@deffn Instruction external-ref index
|
|
Push the value of the closure variable located at position
|
|
@var{index} within the program's list of external variables.
|
|
@end deffn
|
|
|
|
@deffn Instruction external-set index
|
|
Pop the Scheme object located on top of the stack and make it the new
|
|
value of the closure variable located at @var{index} within the
|
|
program's list of external variables.
|
|
@end deffn
|
|
|
|
The external variable lookup algorithm should probably be made more
|
|
efficient in the future via addressing by frame and index. Currently,
|
|
external variables are all consed onto a list, which results in O(N)
|
|
lookup time.
|
|
|
|
@deffn Instruction externals
|
|
Pushes the current list of external variables onto the stack. This
|
|
instruction is used in the implementation of
|
|
@code{compile-time-environment}. @xref{The Scheme Compiler}.
|
|
@end deffn
|
|
|
|
@deffn Instruction toplevel-ref index
|
|
Push the value of the toplevel binding whose location is stored in at
|
|
position @var{index} in the object table.
|
|
|
|
Initially, a cell in the object table that is used by
|
|
@code{toplevel-ref} is initialized to one of two forms. The normal
|
|
case is that the cell holds a symbol, whose binding will be looked up
|
|
relative to the module that was current when the current program was
|
|
created.
|
|
|
|
Alternately, the lookup may be performed relative to a particular
|
|
module, determined at compile-time (e.g. via @code{@@} or
|
|
@code{@@@@}). In that case, the cell in the object table holds a list:
|
|
@code{(@var{modname} @var{sym} @var{interface?})}. The symbol
|
|
@var{sym} will be looked up in the module named @var{modname} (a list
|
|
of symbols). The lookup will be performed against the module's public
|
|
interface, unless @var{interface?} is @code{#f}, which it is for
|
|
example when compiling @code{@@@@}.
|
|
|
|
In any case, if the symbol is unbound, an error is signalled.
|
|
Otherwise the initial form is replaced with the looked-up variable, an
|
|
in-place mutation of the object table. This mechanism provides for
|
|
lazy variable resolution, and an important cached fast-path once the
|
|
variable has been successfully resolved.
|
|
|
|
This instruction pushes the value of the variable onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction toplevel-ref index
|
|
Pop a value off the stack, and set it as the value of the toplevel
|
|
variable stored at @var{index} in the object table. If the variable
|
|
has not yet been looked up, we do the lookup as in
|
|
@code{toplevel-ref}.
|
|
@end deffn
|
|
|
|
@deffn Instruction link-now
|
|
Pop a value, @var{x}, from the stack. Look up the binding for @var{x},
|
|
according to the rules for @code{toplevel-ref}, and push that variable
|
|
on the stack. If the lookup fails, an error will be signalled.
|
|
|
|
This instruction is mostly used when loading programs, because it can
|
|
do toplevel variable lookups without an object vector.
|
|
@end deffn
|
|
|
|
@deffn Instruction variable-ref
|
|
Dereference the variable object which is on top of the stack and
|
|
replace it by the value of the variable it represents.
|
|
@end deffn
|
|
|
|
@deffn Instruction variable-set
|
|
Pop off two objects from the stack, a variable and a value, and set
|
|
the variable to the value.
|
|
@end deffn
|
|
|
|
@deffn Instruction object-ref n
|
|
Push @var{n}th value from the current program's object vector.
|
|
@end deffn
|
|
|
|
@node Branch Instructions
|
|
@subsubsection Branch Instructions
|
|
|
|
All the conditional branch instructions described below work in the
|
|
same way:
|
|
|
|
@itemize
|
|
@item They pop off the Scheme object located on the stack and use it as
|
|
the branch condition;
|
|
@item If the condition is true, then the instruction pointer is
|
|
increased by the offset passed as an argument to the branch
|
|
instruction;
|
|
@item Program execution proceeds with the next instruction (that is,
|
|
the one to which the instruction pointer points).
|
|
@end itemize
|
|
|
|
Note that the offset passed to the instruction is encoded on two 8-bit
|
|
integers which are then combined by the VM as one 16-bit integer.
|
|
|
|
@deffn Instruction br offset
|
|
Jump to @var{offset}.
|
|
@end deffn
|
|
|
|
@deffn Instruction br-if offset
|
|
Jump to @var{offset} if the condition on the stack is not false.
|
|
@end deffn
|
|
|
|
@deffn Instruction br-if-not offset
|
|
Jump to @var{offset} if the condition on the stack is false.
|
|
@end deffn
|
|
|
|
@deffn Instruction br-if-eq offset
|
|
Jump to @var{offset} if the two objects located on the stack are
|
|
equal in the sense of @var{eq?}. Note that, for this instruction, the
|
|
stack pointer is decremented by two Scheme objects instead of only
|
|
one.
|
|
@end deffn
|
|
|
|
@deffn Instruction br-if-not-eq offset
|
|
Same as @var{br-if-eq} for non-@code{eq?} objects.
|
|
@end deffn
|
|
|
|
@deffn Instruction br-if-null offset
|
|
Jump to @var{offset} if the object on the stack is @code{'()}.
|
|
@end deffn
|
|
|
|
@deffn Instruction br-if-not-null offset
|
|
Jump to @var{offset} if the object on the stack is not @code{'()}.
|
|
@end deffn
|
|
|
|
|
|
@node Loading Instructions
|
|
@subsubsection Loading Instructions
|
|
|
|
In addition to VM instructions, an instruction stream may contain
|
|
variable-length data embedded within it. This data is always preceded
|
|
by special loading instructions, which interpret the data and advance
|
|
the instruction pointer to the next VM instruction.
|
|
|
|
All of these loading instructions have a @code{length} parameter,
|
|
indicating the size of the embedded data, in bytes. The length itself
|
|
may be encoded in 1, 2, or 4 bytes.
|
|
|
|
@deffn Instruction load-integer length
|
|
Load a 32-bit integer from the instruction stream.
|
|
@end deffn
|
|
@deffn Instruction load-number length
|
|
Load an arbitrary number from the instruction stream. The number is
|
|
embedded in the stream as a string.
|
|
@end deffn
|
|
@deffn Instruction load-string length
|
|
Load a string from the instruction stream.
|
|
@end deffn
|
|
@deffn Instruction load-symbol length
|
|
Load a symbol from the instruction stream.
|
|
@end deffn
|
|
@deffn Instruction load-keyword length
|
|
Load a keyword from the instruction stream.
|
|
@end deffn
|
|
|
|
@deffn Instruction define length
|
|
Load a symbol from the instruction stream, and look up its binding in
|
|
the current toplevel environment, creating the binding if necessary.
|
|
Push the variable corresponding to the binding.
|
|
@end deffn
|
|
|
|
@deffn Instruction load-program length
|
|
Load bytecode from the instruction stream, and push a compiled
|
|
procedure. This instruction pops the following values from the stack:
|
|
|
|
@itemize
|
|
@item Optionally, a thunk, which when called should return metadata
|
|
associated with this program---for example its name, the names of its
|
|
arguments, its documentation string, debugging information, etc.
|
|
|
|
Normally, this thunk its itself a compiled procedure (with no
|
|
metadata). Metadata is represented this way so that the initial load
|
|
of a procedure is fast: the VM just mmap's the thunk and goes. The
|
|
symbols and pairs associated with the metadata are only created if the
|
|
user asks for them.
|
|
|
|
For information on the format of the thunk's return value,
|
|
@xref{Compiled Procedures}.
|
|
@item Optionally, the program's object table, as a vector.
|
|
|
|
A program that does not reference toplevel bindings and does not use
|
|
@code{object-ref} does not need an object table.
|
|
@item Finally, either one immediate integer or four immediate integers
|
|
representing the arity of the program.
|
|
|
|
In the four-fixnum case, the values are respectively the number of
|
|
arguments taken by the function (@var{nargs}), the number of @dfn{rest
|
|
arguments} (@var{nrest}, 0 or 1), the number of local variables
|
|
(@var{nlocs}) and the number of external variables (@var{nexts})
|
|
(@pxref{Environment Control Instructions}).
|
|
|
|
The common single-fixnum case represents all of these values within a
|
|
16-bit bitmask.
|
|
@end itemize
|
|
|
|
The resulting compiled procedure will not have any ``external''
|
|
variables captured, so it will be loaded only once but may be used
|
|
many times to create closures.
|
|
@end deffn
|
|
|
|
Finally, while this instruction is not strictly a ``loading''
|
|
instruction, it's useful to wind up the @code{load-program} discussion
|
|
here:
|
|
|
|
@deffn Instruction make-closure
|
|
Pop the program object from the stack, capture the current set of
|
|
``external'' variables, and assign those external variables to a copy
|
|
of the program. Push the new program object, which shares state with
|
|
the original program. Also captures the current module.
|
|
@end deffn
|
|
|
|
@node Procedural Instructions
|
|
@subsubsection Procedural Instructions
|
|
|
|
@deffn Instruction return
|
|
Free the program's frame, returning the top value from the stack to
|
|
the current continuation. (The stack should have exactly one value on
|
|
it.)
|
|
|
|
Specifically, the @code{sp} is decremented to one below the current
|
|
@code{fp}, the @code{ip} is reset to the current return address, the
|
|
@code{fp} is reset to the value of the current dynamic link, and then
|
|
the top item on the stack (formerly the procedure being applied) is
|
|
set to the returned value.
|
|
@end deffn
|
|
|
|
@deffn Instruction call nargs
|
|
Call the procedure located at @code{sp[-nargs]} with the @var{nargs}
|
|
arguments located from @code{sp[0]} to @code{sp[-nargs + 1]}.
|
|
|
|
For non-compiled procedures (continuations, primitives, and
|
|
interpreted procedures), @code{call} will pop the procedure and
|
|
arguments off the stack, and push the result of calling
|
|
@code{scm_apply}.
|
|
|
|
For compiled procedures, this instruction sets up a new stack frame,
|
|
as described in @ref{Stack Layout}, and then dispatches to the first
|
|
instruction in the called procedure, relying on the called procedure
|
|
to return one value to the newly-created continuation.
|
|
@end deffn
|
|
|
|
@deffn Instruction goto/args nargs
|
|
Like @code{call}, but reusing the current continuation. This
|
|
instruction implements tail calling as required by RnRS.
|
|
|
|
For compiled procedures, that means that @code{goto/args} reuses the
|
|
current frame instead of building a new one. The @code{goto/*}
|
|
instruction family is named as it is because tail calls are equivalent
|
|
to @code{goto}, along with relabeled variables.
|
|
|
|
For non-VM procedures, the result is the same, but the current VM
|
|
invocation remains on the C stack. True tail calls are not currently
|
|
possible between compiled and non-compiled procedures.
|
|
@end deffn
|
|
|
|
@deffn Instruction apply nargs
|
|
@deffnx Instruction goto/apply nargs
|
|
Like @code{call} and @code{goto/args}, except that the top item on the
|
|
stack must be a list. The elements of that list are then pushed on the
|
|
stack and treated as additional arguments, replacing the list itself,
|
|
then the procedure is invoked as usual.
|
|
@end deffn
|
|
|
|
@deffn Instruction call/nargs
|
|
@deffnx Instruction goto/nargs
|
|
These are like @code{call} and @code{goto/args}, except they take the
|
|
number of arguments from the stack instead of the instruction stream.
|
|
These instructions are used in the implementation of multiple value
|
|
returns, where the actual number of values is pushed on the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction call/cc
|
|
@deffnx Instruction goto/cc
|
|
Capture the current continuation, and then call (or tail-call) the
|
|
procedure on the top of the stack, with the continuation as the
|
|
argument.
|
|
|
|
Both the VM continuation and the C continuation are captured.
|
|
@end deffn
|
|
|
|
@deffn Instruction mv-call nargs offset
|
|
Like @code{call}, except that a multiple-value continuation is created
|
|
in addition to a single-value continuation.
|
|
|
|
The offset (a two-byte value) is an offset within the instruction
|
|
stream; the multiple-value return address in the new frame
|
|
(@pxref{Stack Layout}) will be set to the normal return address plus
|
|
this offset. Instructions at that offset will expect the top value of
|
|
the stack to be the number of values, and below that values
|
|
themselves, pushed separately.
|
|
@end deffn
|
|
|
|
@deffn Instruction return/values nvalues
|
|
Return the top @var{nvalues} to the current continuation.
|
|
|
|
If the current continuation is a multiple-value continuation,
|
|
@code{return/values} pushes the number of values on the stack, then
|
|
returns as in @code{return}, but to the multiple-value return address.
|
|
|
|
Otherwise if the current continuation accepts only one value, i.e. the
|
|
multiple-value return address is @code{NULL}, then we assume the user
|
|
only wants one value, and we give them the first one. If there are no
|
|
values, an error is signaled.
|
|
@end deffn
|
|
|
|
@deffn Instruction return/values* nvalues
|
|
Like a combination of @code{apply} and @code{return/values}, in which
|
|
the top value on the stack is interpreted as a list of additional
|
|
values. This is an optimization for the common @code{(apply values
|
|
...)} case.
|
|
@end deffn
|
|
|
|
@deffn Instruction truncate-values nbinds nrest
|
|
Used in multiple-value continuations, this instruction takes the
|
|
values that are on the stack (including the number-of-value marker)
|
|
and truncates them for a binding construct.
|
|
|
|
For example, a call to @code{(receive (x y . z) (foo) ...)} would,
|
|
logically speaking, pop off the values returned from @code{(foo)} and
|
|
push them as three values, corresponding to @code{x}, @code{y}, and
|
|
@code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would
|
|
be 1 (to indicate that one of the bindings was a rest arguments).
|
|
|
|
Signals an error if there is an insufficient number of values.
|
|
@end deffn
|
|
|
|
@node Data Control Instructions
|
|
@subsubsection Data Control Instructions
|
|
|
|
These instructions push simple immediate values onto the stack, or
|
|
manipulate lists and vectors on the stack.
|
|
|
|
@deffn Instruction make-int8 value
|
|
Push @var{value}, an 8-bit integer, onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction make-int8:0
|
|
Push the immediate value @code{0} onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction make-int8:1
|
|
Push the immediate value @code{1} onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction make-int16 value
|
|
Push @var{value}, a 16-bit integer, onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction make-false
|
|
Push @code{#f} onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction make-true
|
|
Push @code{#t} onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction make-eol
|
|
Push @code{'()} onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction make-char8 value
|
|
Push @var{value}, an 8-bit character, onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction list n
|
|
Pops off the top @var{n} values off of the stack, consing them up into
|
|
a list, then pushes that list on the stack. What was the topmost value
|
|
will be the last element in the list.
|
|
@end deffn
|
|
|
|
@deffn Instruction vector n
|
|
Create and fill a vector with the top @var{n} values from the stack,
|
|
popping off those values and pushing on the resulting vector.
|
|
@end deffn
|
|
|
|
@deffn Instruction mark
|
|
Pushes a special value onto the stack that other stack instructions
|
|
like @code{list-mark} can use.
|
|
@end deffn
|
|
|
|
@deffn Instruction list-mark
|
|
Create a list from values from the stack, as in @code{list}, but
|
|
instead of knowing beforehand how many there will be, keep going until
|
|
we see a @code{mark} value.
|
|
@end deffn
|
|
|
|
@deffn Instruction cons-mark
|
|
As the scheme procedure @code{cons*} is to the scheme procedure
|
|
@code{list}, so the instruction @code{cons-mark} is to the instruction
|
|
@code{list-mark}.
|
|
@end deffn
|
|
|
|
@deffn Instruction vector-mark
|
|
Like @code{list-mark}, but makes a vector instead of a list.
|
|
@end deffn
|
|
|
|
@deffn Instruction list-break
|
|
The opposite of @code{list}: pops a value, which should be a list, and
|
|
pushes its elements on the stack.
|
|
@end deffn
|
|
|
|
@node Miscellaneous Instructions
|
|
@subsubsection Miscellaneous Instructions
|
|
|
|
@deffn Instruction nop
|
|
Does nothing!
|
|
@end deffn
|
|
|
|
@deffn Instruction halt
|
|
Exits the VM, returning a SCM value. Normally, this instruction is
|
|
only part of the ``bootstrap program'', a program run when a virtual
|
|
machine is first entered; compiled Scheme procedures will not contain
|
|
this instruction.
|
|
|
|
If multiple values have been returned, the SCM value will be a
|
|
multiple-values object (@pxref{Multiple Values}).
|
|
@end deffn
|
|
|
|
@deffn Instruction break
|
|
Does nothing, but invokes the break hook.
|
|
@end deffn
|
|
|
|
@deffn Instruction drop
|
|
Pops off the top value from the stack, throwing it away.
|
|
@end deffn
|
|
|
|
@deffn Instruction dup
|
|
Re-pushes the top value onto the stack.
|
|
@end deffn
|
|
|
|
@deffn Instruction void
|
|
Pushes ``the unspecified value'' onto the stack.
|
|
@end deffn
|
|
|
|
@node Inlined Scheme Instructions
|
|
@subsubsection Inlined Scheme Instructions
|
|
|
|
The Scheme compiler can recognize the application of standard Scheme
|
|
procedures, or unbound variables that look like they are bound to
|
|
standard Scheme procedures. It tries to inline these small operations
|
|
to avoid the overhead of creating new stack frames.
|
|
|
|
Since most of these operations are historically implemented as C
|
|
primitives, not inlining them would entail constantly calling out from
|
|
the VM to the interpreter, which has some costs---registers must be
|
|
saved, the interpreter has to dispatch, called procedures have to do
|
|
much typechecking, etc. It's much more efficient to inline these
|
|
operations in the virtual machine itself.
|
|
|
|
All of these instructions pop their arguments from the stack and push
|
|
their results, and take no parameters from the instruction stream.
|
|
Thus, unlike in the previous sections, these instruction definitions
|
|
show stack parameters instead of parameters from the instruction
|
|
stream.
|
|
|
|
@deffn Instruction not x
|
|
@deffnx Instruction not-not x
|
|
@deffnx Instruction eq? x y
|
|
@deffnx Instruction not-eq? x y
|
|
@deffnx Instruction null?
|
|
@deffnx Instruction not-null?
|
|
@deffnx Instruction eqv? x y
|
|
@deffnx Instruction equal? x y
|
|
@deffnx Instruction pair? x y
|
|
@deffnx Instruction list? x y
|
|
@deffnx Instruction set-car! pair x
|
|
@deffnx Instruction set-cdr! pair x
|
|
@deffnx Instruction slot-ref struct n
|
|
@deffnx Instruction slot-set struct n x
|
|
@deffnx Instruction cons x
|
|
@deffnx Instruction car x
|
|
@deffnx Instruction cdr x
|
|
Inlined implementations of their Scheme equivalents.
|
|
@end deffn
|
|
|
|
Note that @code{caddr} and friends compile to a series of @code{car}
|
|
and @code{cdr} instructions.
|
|
|
|
@node Inlined Mathematical Instructions
|
|
@subsubsection Inlined Mathematical Instructions
|
|
|
|
Inlining mathematical operations has the obvious advantage of handling
|
|
fixnums without function calls or allocations. The trick, of course,
|
|
is knowing when the result of an operation will be a fixnum, and there
|
|
might be a couple bugs here.
|
|
|
|
More instructions could be added here over time.
|
|
|
|
As in the previous section, the definitions below show stack
|
|
parameters instead of instruction stream parameters.
|
|
|
|
@deffn Instruction add x y
|
|
@deffnx Instruction sub x y
|
|
@deffnx Instruction mul x y
|
|
@deffnx Instruction div x y
|
|
@deffnx Instruction quo x y
|
|
@deffnx Instruction rem x y
|
|
@deffnx Instruction mod x y
|
|
@deffnx Instruction ee? x y
|
|
@deffnx Instruction lt? x y
|
|
@deffnx Instruction gt? x y
|
|
@deffnx Instruction le? x y
|
|
@deffnx Instruction ge? x y
|
|
Inlined implementations of the corresponding mathematical operations.
|
|
@end deffn
|