@c -*-texinfo-*- @c This is part of the GNU Guile Reference Manual. @c Copyright (C) 2008,2009,2010,2011,2013 @c Free Software Foundation, Inc. @c See the file guile.texi for copying conditions. @node A Virtual Machine for Guile @section A Virtual Machine for Guile Guile has both an interpreter and a compiler. To a user, the difference is transparent---interpreted and compiled procedures can call each other as they please. The difference is that the compiler creates and interprets bytecode for a custom virtual machine, instead of interpreting the S-expressions directly. Loading and running compiled code is faster than loading and running source code. The virtual machine that does the bytecode interpretation is a part of Guile itself. This section describes the nature of Guile's virtual machine. @menu * Why a VM?:: * VM Concepts:: * Stack Layout:: * Variables and the VM:: * VM Programs:: * Instruction Set:: @end menu @node Why a VM? @subsection Why a VM? @cindex interpreter For a long time, Guile only had an interpreter. Guile's interpreter operated directly on the S-expression representation of Scheme source code. But while the interpreter was highly optimized and hand-tuned, it still performs many needless computations during the course of evaluating an expression. For example, application of a function to arguments needlessly consed up the arguments in a list. Evaluation of an expression always had to figure out what the car of the expression is -- a procedure, a memoized form, or something else. All values have to be allocated on the heap. Et cetera. The solution to this problem was to compile the higher-level language, Scheme, into a lower-level language for which all of the checks and dispatching have already been done---the code is instead stripped to the bare minimum needed to ``do the job''. The question becomes then, what low-level language to choose? There are many options. We could compile to native code directly, but that poses portability problems for Guile, as it is a highly cross-platform project. So we want the performance gains that compilation provides, but we also want to maintain the portability benefits of a single code path. The obvious solution is to compile to a virtual machine that is present on all Guile installations. The easiest (and most fun) way to depend on a virtual machine is to implement the virtual machine within Guile itself. This way the virtual machine provides what Scheme needs (tail calls, multiple values, @code{call/cc}) and can provide optimized inline instructions for Guile (@code{cons}, @code{struct-ref}, etc.). So this is what Guile does. The rest of this section describes that VM that Guile implements, and the compiled procedures that run on it. Before moving on, though, we should note that though we spoke of the interpreter in the past tense, Guile still has an interpreter. The difference is that before, it was Guile's main evaluator, and so was implemented in highly optimized C; now, it is actually implemented in Scheme, and compiled down to VM bytecode, just like any other program. (There is still a C interpreter around, used to bootstrap the compiler, but it is not normally used at runtime.) The upside of implementing the interpreter in Scheme is that we preserve tail calls and multiple-value handling between interpreted and compiled code. The downside is that the interpreter in Guile 2.2 is still slower than the interpreter in 1.8. We hope the that the compiler's speed makes up for the loss. In any case, once we have native compilation for Scheme code, we expect the new self-hosted interpreter to beat the old hand-tuned C implementation. Also note that this decision to implement a bytecode compiler does not preclude native compilation. We can compile from bytecode to native code at runtime, or even do ahead of time compilation. More possibilities are discussed in @ref{Extending the Compiler}. @node VM Concepts @subsection VM Concepts Compiled code is run by a virtual machine (VM). Each thread has its own VM. The virtual machine executes the sequence of instructions in a procedure. Each VM instruction starts by indicating which operation it is, and then follows by encoding its source and destination operands. Each procedure declares that it has some number of local variables, including the function arguments. These local variables form the available operands of the procedure, and are accessed by index. The local variables for a procedure are stored on a stack. Calling a procedure typically enlarges the stack, and returning from a procedure shrinks it. Stack memory is exclusive to the virtual machine that owns it. In addition to their stacks, virtual machines also have access to the global memory (modules, global bindings, etc) that is shared among other parts of Guile, including other VMs. The registers that a VM has are as follows: @itemize @item ip - Instruction pointer @item sp - Stack pointer @item fp - Frame pointer @end itemize In other architectures, the instruction pointer is sometimes called the ``program counter'' (pc). This set of registers is pretty typical for virtual machines; their exact meanings in the context of Guile's VM are described in the next section. @node Stack Layout @subsection Stack Layout The stack of Guile's virtual machine is composed of @dfn{frames}. Each frame corresponds to the application of one compiled procedure, and contains storage space for arguments, local variables, and some bookkeeping information (such as what to do after the frame is finished). While the compiler is free to do whatever it wants to, as long as the semantics of a computation are preserved, in practice every time you call a function, a new frame is created. (The notable exception of course is the tail call case, @pxref{Tail Calls}.) The structure of the top stack frame is as follows: @example /------------------\ <- top of stack | Local N-1 | <- sp | ... | | Local 1 | | Local 0 | <- fp = SCM_FRAME_LOCALS_ADDRESS (fp) +==================+ | Return address | | Dynamic link | <- fp - 2 = SCM_FRAME_LOWER_ADDRESS (fp) +==================+ | | <- fp - 3 = SCM_FRAME_PREVIOUS_SP (fp) @end example In the above drawing, the stack grows upward. Usually the procedure being applied is in local 0, followed by the arguments from local 1. After that are enough slots to store the various lexically-bound and temporary values that are needed in the function's application. The @dfn{return address} is the @code{ip} that was in effect before this program was applied. When we return from this activation frame, we will jump back to this @code{ip}. Likewise, the @dfn{dynamic link} is the @code{fp} in effect before this program was applied. To prepare for a non-tail application, Guile's VM will emit code that shuffles the function to apply and its arguments into appropriate stack slots, with two free slots below them. The call then initializes those free slots with the current @code{ip} and @code{fp}, and updates @code{ip} to point to the function entry, and @code{fp} to point to the new call frame. In this way, the dynamic link links the current frame to the previous frame. Computing a stack trace involves traversing these frames. @node Variables and the VM @subsection Variables and the VM Consider the following Scheme code as an example: @example (define (foo a) (lambda (b) (list foo a b))) @end example Within the lambda expression, @code{foo} is a top-level variable, @code{a} is a lexically captured variable, and @code{b} is a local variable. Another way to refer to @code{a} and @code{b} is to say that @code{a} is a ``free'' variable, since it is not defined within the lambda, and @code{b} is a ``bound'' variable. These are the terms used in the @dfn{lambda calculus}, a mathematical notation for describing functions. The lambda calculus is useful because it is a language in which to reason precisely about functions and variables. It is especially good at describing scope relations, and it is for that reason that we mention it here. Guile allocates all variables on the stack. When a lexically enclosed procedure with free variables---a @dfn{closure}---is created, it copies those variables into its free variable vector. References to free variables are then redirected through the free variable vector. If a variable is ever @code{set!}, however, it will need to be heap-allocated instead of stack-allocated, so that different closures that capture the same variable can see the same value. Also, this allows continuations to capture a reference to the variable, instead of to its value at one point in time. For these reasons, @code{set!} variables are allocated in ``boxes''---actually, in variable cells. @xref{Variables}, for more information. References to @code{set!} variables are indirected through the boxes. Thus perhaps counterintuitively, what would seem ``closer to the metal'', viz @code{set!}, actually forces an extra memory allocation and indirection. Going back to our example, @code{b} may be allocated on the stack, as it is never mutated. @code{a} may also be allocated on the stack, as it too is never mutated. Within the enclosed lambda, its value will be copied into (and referenced from) the free variables vector. @code{foo} is a top-level variable, because @code{foo} is not lexically bound in this example. @node VM Programs @subsection Compiled Procedures are VM Programs By default, when you enter in expressions at Guile's REPL, they are first compiled to bytecode. Then that bytecode is executed to produce a value. If the expression evaluates to a procedure, the result of this process is a compiled procedure. A compiled procedure is a compound object consisting of its bytecode and a reference to any captured lexical variables. In addition, when a procedure is compiled, it has associated metadata written to side tables, for instance a line number mapping, or its docstring. You can pick apart these pieces with the accessors in @code{(system vm program)}. @xref{Compiled Procedures}, for a full API reference. A procedure may reference data that was statically allocated when the procedure was compiled. For example, a pair of immediate objects (@pxref{Immediate objects}) can be allocated directly in the memory segment that contains the compiled bytecode, and accessed directly by the bytecode. Another use for statically allocated data is to serve as a cache for a bytecode. Top-level variable lookups are handled in this way. If the @code{toplevel-box} instruction finds that it does not have a cached variable for a top-level reference, it accesses other static data to resolve the reference, and fills in the cache slot. Thereafter all access to the variable goes through the cache cell. The variable's value may change in the future, but the variable itself will not. We can see how these concepts tie together by disassembling the @code{foo} function we defined earlier to see what is going on: @smallexample scheme@@(guile-user)> (define (foo a) (lambda (b) (list foo a b))) scheme@@(guile-user)> ,x foo Disassembly of # at #x203be34: 0 (assert-nargs-ee/locals 2 1) ;; 1 arg, 1 local at (unknown file):1:0 1 (make-closure 2 6 1) ;; anonymous procedure at #x203be50 (1 free var) 4 (free-set! 2 1 0) ;; free var 0 6 (return 2) ---------------------------------------- Disassembly of anonymous procedure at #x203be50: 0 (assert-nargs-ee/locals 2 3) ;; 1 arg, 3 locals at (unknown file):1:0 1 (toplevel-box 2 73 57 71 #t) ;; `foo' 6 (box-ref 2 2) 7 (make-short-immediate 3 772) ;; () 8 (cons 3 1 3) 9 (free-ref 4 0 0) ;; free var 0 11 (cons 3 4 3) 12 (cons 2 2 3) 13 (return 2) @end smallexample First there's some prelude, where @code{foo} checks that it was called with only 1 argument. Then at @code{ip} 1, we allocate a new closure and store it in slot 2. The `6' in the @code{(make-closure 2 6 1)} is a relative offset from the instruction pointer of the code for the closure. A closure is code with data. We already have the code part initialized; what remains is to set the data. @code{Ip} 4 initializes free variable 0 in the new closure with the value from local variable 1, which corresponds to the first argument of @code{foo}: `a'. Finally we return the closure. The second stanza disassembles the code for the closure. After the prelude, we load the variable for the toplevel variable @code{foo} into local variable 2. This lookup occurs lazily, the first time the variable is actually referenced, and the location of the lookup is cached so that future references are very cheap. @xref{Top-Level Environment Instructions}, for more details. The @code{box-ref} dereferences the variable cell, replacing the contents of local 2. What follows is a sequence of conses to build up the result list. @code{Ip} 7 makes the tail of the list. @code{Ip} 8 conses on the value in local 1, corresponding to the first argument to the closure: `b'. @code{Ip} 9 loads free variable 0 of local 0 -- the procedure being called -- into slot 4, then @code{ip} 11 conses it onto the list. Finally we cons local 2, containing the @code{foo} toplevel, onto the front of the list, and we return it. @node Instruction Set @subsection Instruction Set There are currently about 130 instructions in Guile's virtual machine. These instructions represent atomic units of a program's execution. Ideally, they perform one task without conditional branches, then dispatch to the next instruction in the stream. Instructions themselves are composed of 1 or more 32-bit units. The low 8 bits of the first word indicate the opcode, and the rest of instruction describe the operands. There are a number of different ways operands can be encoded. @table @code @item u@var{n} An unsigned @var{n}-bit integer. Usually indicates the index of a local variable, but some instructions interpret these operands as immediate values. @item l24 An offset from the current @code{ip}, in 32-bit units, as a signed 24-bit value. Indicates a bytecode address, for a relative jump. @item i16 @itemx i32 An immediate Scheme value (@pxref{Immediate objects}), encoded directly in 16 or 32 bits. @item a32 @itemx b32 An immediate Scheme value, encoded as a pair of 32-bit words. @code{a32} and @code{b32} values always go together on the same opcode, and indicate the high and low bits, respectively. Normally only used on 64-bit systems. @item n32 A statically allocated non-immediate. The address of the non-immediate is encoded as a signed 32-bit integer, and indicates a relative offset in 32-bit units. Think of it as @code{SCM x = ip + offset}. @item s32 Indirect scheme value, like @code{n32} but indirected. Think of it as @code{SCM *x = ip + offset}. @item l32 @item lo32 An ip-relative address, as a signed 32-bit integer. Could indicate a bytecode address, as in @code{make-closure}, or a non-immediate address, as with @code{static-patch!}. @code{l32} and @code{lo32} are the same from the perspective of the virtual machine. The difference is that an assembler might want to allow an @code{lo32} address to be specified as a label and then some number of words offset from that label, for example when patching a field of a statically allocated object. @item b1 A boolean value: 1 for true, otherwise 0. @item x@var{n} An ignored sequence of @var{n} bits. @end table An instruction is specified by giving its name, then describing its operands. The operands are packed by 32-bit words, with earlier operands occupying the lower bits. For example, consider the following instruction specification: @deftypefn Instruction {} free-set! u12:@var{dst} u12:@var{src} x8:@var{_} u24:@var{idx} Set free variable @var{idx} from the closure @var{dst} to @var{src}. @end deftypefn The first word in the instruction will start with the 8-bit value corresponding to the @var{free-set!} opcode in the low bits, followed by @var{dst} and @var{src} as 12-bit values. The second word starts with 8 dead bits, followed by the index as a 24-bit immediate value. Sometimes the compiler can figure out that it is compiling a special case that can be run more efficiently. So, for example, while Guile offers a generic test-and-branch instruction, it also offers specific instructions for special cases, so that the following cases all have their own test-and-branch instructions: @example (if pred then else) (if (not pred) then else) (if (null? l) then else) (if (not (null? l)) then else) @end example In addition, some Scheme primitives have their own inline implementations. For example, in the previous section we saw @code{cons}. Guile's instruction set is a @emph{complete} instruction set, in that it provides the instructions that are suited to the problem, and is not concerned with making a minimal, orthogonal set of instructions. More instructions may be added over time. @menu * Lexical Environment Instructions:: * Top-Level Environment Instructions:: * Procedure Call and Return Instructions:: * Function Prologue Instructions:: * Trampoline Instructions:: * Branch Instructions:: * Data Constructor Instructions:: * Loading Instructions:: * Dynamic Environment Instructions:: * Miscellaneous Instructions:: * Inlined Scheme Instructions:: * Inlined Mathematical Instructions:: * Inlined Bytevector Instructions:: @end menu @node Lexical Environment Instructions @subsubsection Lexical Environment Instructions These instructions access and mutate the lexical environment of a compiled procedure---its free and bound variables. Some of these instructions have @code{long-} variants, the difference being that they take 16-bit arguments, encoded in big-endianness, instead of the normal 8-bit range. @xref{Stack Layout}, for more information on the format of stack frames. @deffn Instruction local-ref index @deffnx Instruction long-local-ref index Push onto the stack the value of the local variable located at @var{index} within the current stack frame. Note that arguments and local variables are all in one block. Thus the first argument, if any, is at index 0, and local bindings follow the arguments. @end deffn @deffn Instruction local-set index @deffnx Instruction long-local-set index Pop the Scheme object located on top of the stack and make it the new value of the local variable located at @var{index} within the current stack frame. @end deffn @deffn Instruction box index Pop a value off the stack, and set the @var{index}nth local variable to a box containing that value. A shortcut for @code{make-variable} then @code{local-set}, used when binding boxed variables. @end deffn @deffn Instruction empty-box index Set the @var{index}th local variable to a box containing a variable whose value is unbound. Used when compiling some @code{letrec} expressions. @end deffn @deffn Instruction local-boxed-ref index @deffnx Instruction local-boxed-set index Get or set the value of the variable located at @var{index} within the current stack frame. A shortcut for @code{local-ref} then @code{variable-ref} or @code{variable-set}, respectively. @end deffn @deffn Instruction free-ref index Push the value of the captured variable located at position @var{index} within the program's vector of captured variables. @end deffn @deffn Instruction free-boxed-ref index @deffnx Instruction free-boxed-set index Get or set a boxed free variable. A shortcut for @code{free-ref} then @code{variable-ref} or @code{variable-set}, respectively. Note that there is no @code{free-set} instruction, as variables that are @code{set!} must be boxed. @end deffn @deffn Instruction make-closure num-free-vars Pop @var{num-free-vars} values and a program object off the stack in that order, and push a new program object closing over the given free variables. @var{num-free-vars} is encoded as a two-byte big-endian value. The free variables are stored in an array, inline to the new program object, in the order that they were on the stack (not the order they are popped off). The new closure shares state with the original program. At the time of this writing, the space overhead of closures is 3 words, plus one word for each free variable. @end deffn @deffn Instruction fix-closure index Fix up the free variables array of the closure stored in the @var{index}th local variable. @var{index} is a two-byte big-endian integer. This instruction will pop as many values from the stack as are in the corresponding closure's free variables array. The topmost value on the stack will be stored as the closure's last free variable, with other values filling in free variable slots in order. @code{fix-closure} is part of a hack for allocating mutually recursive procedures. The hack is to store the procedures in their corresponding local variable slots, with space already allocated for free variables. Then once they are all in place, this instruction fixes up their procedures' free variable bindings in place. This allows most @code{letrec}-bound procedures to be allocated unboxed on the stack. @end deffn @deffn Instruction local-bound? index @deffnx Instruction long-local-bound? index Push @code{#t} on the stack if the @code{index}th local variable has been assigned, or @code{#f} otherwise. Mostly useful for handling optional arguments in procedure prologues. @end deffn @node Top-Level Environment Instructions @subsubsection Top-Level Environment Instructions These instructions access values in the top-level environment: bindings that were not lexically apparent at the time that the code in question was compiled. The location in which a toplevel binding is stored can be looked up once and cached for later. The binding itself may change over time, but its location will stay constant. Currently only toplevel references within procedures are cached, as only procedures have a place to cache them, in their object tables. @deffn Instruction toplevel-ref index @deffnx Instruction long-toplevel-ref index Push the value of the toplevel binding whose location is stored in at position @var{index} in the current procedure's object table. The @code{long-} variant encodes the index over two bytes. Initially, a cell in a procedure's object table that is used by @code{toplevel-ref} is initialized to one of two forms. The normal case is that the cell holds a symbol, whose binding will be looked up relative to the module that was current when the current program was created. Alternately, the lookup may be performed relative to a particular module, determined at compile-time (e.g.@: via @code{@@} or @code{@@@@}). In that case, the cell in the object table holds a list: @code{(@var{modname} @var{sym} @var{public?})}. The symbol @var{sym} will be looked up in the module named @var{modname} (a list of symbols). The lookup will be performed against the module's public interface, unless @var{public?} is @code{#f}, which it is for example when compiling @code{@@@@}. In any case, if the symbol is unbound, an error is signalled. Otherwise the initial form is replaced with the looked-up variable, an in-place mutation of the object table. This mechanism provides for lazy variable resolution, and an important cached fast-path once the variable has been successfully resolved. This instruction pushes the value of the variable onto the stack. @end deffn @deffn Instruction toplevel-set index @deffnx Instruction long-toplevel-set index Pop a value off the stack, and set it as the value of the toplevel variable stored at @var{index} in the object table. If the variable has not yet been looked up, we do the lookup as in @code{toplevel-ref}. @end deffn @deffn Instruction define Pop a symbol and a value from the stack, in that order. Look up its binding in the current toplevel environment, creating the binding if necessary. Set the variable to the value. @end deffn @deffn Instruction link-now Pop a value, @var{x}, from the stack. Look up the binding for @var{x}, according to the rules for @code{toplevel-ref}, and push that variable on the stack. If the lookup fails, an error will be signalled. This instruction is mostly used when loading programs, because it can do toplevel variable lookups without an object table. @end deffn @deffn Instruction variable-ref Dereference the variable object which is on top of the stack and replace it by the value of the variable it represents. @end deffn @deffn Instruction variable-set Pop off two objects from the stack, a variable and a value, and set the variable to the value. @end deffn @deffn Instruction variable-bound? Pop off the variable object from top of the stack and push @code{#t} if it is bound, or @code{#f} otherwise. Mostly useful in procedure prologues for defining default values for boxed optional variables. @end deffn @deffn Instruction make-variable Replace the top object on the stack with a variable containing it. Used in some circumstances when compiling @code{letrec} expressions. @end deffn @node Procedure Call and Return Instructions @subsubsection Procedure Call and Return Instructions @c something about the calling convention here? @deffn Instruction new-frame Push a new frame on the stack, reserving space for the dynamic link, return address, and the multiple-values return address. The frame pointer is not yet updated, because the frame is not yet active -- it has to be patched by a @code{call} instruction to get the return address. @end deffn @deffn Instruction call nargs Call the procedure located at @code{sp[-nargs]} with the @var{nargs} arguments located from @code{sp[-nargs + 1]} to @code{sp[0]}. This instruction requires that a new frame be pushed on the stack before the procedure, via @code{new-frame}. @xref{Stack Layout}, for more information. It patches up that frame with the current @code{ip} as the return address, then dispatches to the first instruction in the called procedure, relying on the called procedure to return one value to the newly-created continuation. Because the new frame pointer will point to @code{sp[-nargs + 1]}, the arguments don't have to be shuffled around -- they are already in place. @end deffn @deffn Instruction tail-call nargs Transfer control to the procedure located at @code{sp[-nargs]} with the @var{nargs} arguments located from @code{sp[-nargs + 1]} to @code{sp[0]}. Unlike @code{call}, which requires a new frame to be pushed onto the stack, @code{tail-call} simply shuffles down the procedure and arguments to the current stack frame. This instruction implements tail calls as required by RnRS. @end deffn @deffn Instruction apply nargs @deffnx Instruction tail-apply nargs Like @code{call} and @code{tail-call}, except that the top item on the stack must be a list. The elements of that list are then pushed on the stack and treated as additional arguments, replacing the list itself, then the procedure is invoked as usual. @end deffn @deffn Instruction call/nargs @deffnx Instruction tail-call/nargs These are like @code{call} and @code{tail-call}, except they take the number of arguments from the stack instead of the instruction stream. These instructions are used in the implementation of multiple value returns, where the actual number of values is pushed on the stack. @end deffn @deffn Instruction mv-call nargs offset Like @code{call}, except that a multiple-value continuation is created in addition to a single-value continuation. The offset (a three-byte value) is an offset within the instruction stream; the multiple-value return address in the new frame (@pxref{Stack Layout}) will be set to the normal return address plus this offset. Instructions at that offset will expect the top value of the stack to be the number of values, and below that values themselves, pushed separately. @end deffn @deffn Instruction return Free the program's frame, returning the top value from the stack to the current continuation. (The stack should have exactly one value on it.) Specifically, the @code{sp} is decremented to one below the current @code{fp}, the @code{ip} is reset to the current return address, the @code{fp} is reset to the value of the current dynamic link, and then the returned value is pushed on the stack. @end deffn @deffn Instruction return/values nvalues @deffnx Instruction return/nvalues Return the top @var{nvalues} to the current continuation. In the case of @code{return/nvalues}, @var{nvalues} itself is first popped from the top of the stack. If the current continuation is a multiple-value continuation, @code{return/values} pushes the number of values on the stack, then returns as in @code{return}, but to the multiple-value return address. Otherwise if the current continuation accepts only one value, i.e.@: the multiple-value return address is @code{NULL}, then we assume the user only wants one value, and we give them the first one. If there are no values, an error is signaled. @end deffn @deffn Instruction return/values* nvalues Like a combination of @code{apply} and @code{return/values}, in which the top value on the stack is interpreted as a list of additional values. This is an optimization for the common @code{(apply values ...)} case. @end deffn @deffn Instruction truncate-values nbinds nrest Used in multiple-value continuations, this instruction takes the values that are on the stack (including the number-of-values marker) and truncates them for a binding construct. For example, a call to @code{(receive (x y . z) (foo) ...)} would, logically speaking, pop off the values returned from @code{(foo)} and push them as three values, corresponding to @code{x}, @code{y}, and @code{z}. In that case, @var{nbinds} would be 3, and @var{nrest} would be 1 (to indicate that one of the bindings was a rest argument). Signals an error if there is an insufficient number of values. @end deffn @deffn Instruction call/cc @deffnx Instruction tail-call/cc Capture the current continuation, and then call (or tail-call) the procedure on the top of the stack, with the continuation as the argument. @code{call/cc} does not require a @code{new-frame} to be pushed on the stack, as @code{call} does, because it needs to capture the stack before the frame is pushed. Both the VM continuation and the C continuation are captured. @end deffn @node Function Prologue Instructions @subsubsection Function Prologue Instructions A function call in Guile is very cheap: the VM simply hands control to the procedure. The procedure itself is responsible for asserting that it has been passed an appropriate number of arguments. This strategy allows arbitrarily complex argument parsing idioms to be developed, without harming the common case. For example, only calls to keyword-argument procedures ``pay'' for the cost of parsing keyword arguments. (At the time of this writing, calling procedures with keyword arguments is typically two to four times as costly as calling procedures with a fixed set of arguments.) @deffn Instruction assert-nargs-ee n @deffnx Instruction assert-nargs-ge n Assert that the current procedure has been passed exactly @var{n} arguments, for the @code{-ee} case, or @var{n} or more arguments, for the @code{-ge} case. @var{n} is encoded over two bytes. The number of arguments is determined by subtracting the frame pointer from the stack pointer (@code{sp - (fp -1)}). @xref{Stack Layout}, for more details on stack frames. @end deffn @deffn Instruction br-if-nargs-ne n offset @deffnx Instruction br-if-nargs-gt n offset @deffnx Instruction br-if-nargs-lt n offset Jump to @var{offset} if the number of arguments is not equal to, greater than, or less than @var{n}. @var{n} is encoded over two bytes, and @var{offset} has the normal three-byte encoding. These instructions are used to implement multiple arities, as in @code{case-lambda}. @xref{Case-lambda}, for more information. @end deffn @deffn Instruction bind-optionals n If the procedure has been called with fewer than @var{n} arguments, fill in the remaining arguments with an unbound value (@code{SCM_UNDEFINED}). @var{n} is encoded over two bytes. The optionals can be later initialized conditionally via the @code{local-bound?} instruction. @end deffn @deffn Instruction push-rest n Pop off excess arguments (more than @var{n}), collecting them into a list, and push that list. Used to bind a rest argument, if the procedure has no keyword arguments. Procedures with keyword arguments use @code{bind-rest} instead. @end deffn @deffn Instruction bind-rest n idx Pop off excess arguments (more than @var{n}), collecting them into a list. The list is then assigned to the @var{idx}th local variable. @end deffn @deffn Instruction bind-optionals/shuffle nreq nreq-and-opt ntotal @deffnx Instruction bind-optionals/shuffle-or-br nreq nreq-and-opt ntotal offset Shuffle keyword arguments to the top of the stack, filling in the holes with @code{SCM_UNDEFINED}. Each argument is encoded over two bytes. This instruction is used by procedures with keyword arguments. @var{nreq} is the number of required arguments to the procedure, and @var{nreq-and-opt} is the total number of positional arguments (required plus optional). @code{bind-optionals/shuffle} will scan the stack from the @var{nreq}th argument up to the @var{nreq-and-opt}th, and start shuffling when it sees the first keyword argument or runs out of positional arguments. @code{bind-optionals/shuffle-or-br} does the same, except that it checks if there are too many positional arguments before shuffling. If this is the case, it jumps to @var{offset}, encoded using the normal three-byte encoding. Shuffling simply moves the keyword arguments past the total number of arguments, @var{ntotal}, which includes keyword and rest arguments. The free slots created by the shuffle are filled in with @code{SCM_UNDEFINED}, so they may be conditionally initialized later in the function's prologue. @end deffn @deffn Instruction bind-kwargs idx ntotal flags Parse keyword arguments, assigning their values to the corresponding local variables. The keyword arguments should already have been shuffled above the @var{ntotal}th stack slot by @code{bind-optionals/shuffle}. The parsing is driven by a keyword arguments association list, looked up from the @var{idx}th element of the procedures object array. The alist is a list of pairs of the form @code{(@var{kw} . @var{index})}, mapping keyword arguments to their local variable indices. There are two bitflags that affect the parser, @code{allow-other-keys?} (@code{0x1}) and @code{rest?} (@code{0x2}). Unless @code{allow-other-keys?} is set, the parser will signal an error if an unknown key is found. If @code{rest?} is set, errors parsing the keyword arguments will be ignored, as a later @code{bind-rest} instruction will collect all of the tail arguments, including the keywords, into a list. Otherwise if the keyword arguments are invalid, an error is signalled. @var{idx} and @var{ntotal} are encoded over two bytes each, and @var{flags} is encoded over one byte. @end deffn @deffn Instruction reserve-locals n Resets the stack pointer to have space for @var{n} local variables, including the arguments. If this operation increments the stack pointer, as in a push, the new slots are filled with @code{SCM_UNBOUND}. If this operation decrements the stack pointer, any excess values are dropped. @code{reserve-locals} is typically used after argument parsing to reserve space for local variables. @end deffn @deffn Instruction assert-nargs-ee/locals n @deffnx Instruction assert-nargs-ge/locals n A combination of @code{assert-nargs-ee} and @code{reserve-locals}. The number of arguments is encoded in the lower three bits of @var{n}, a one-byte value. The number of additional local variables is take from the upper 5 bits of @var{n}. @end deffn @node Trampoline Instructions @subsubsection Trampoline Instructions Though most applicable objects in Guile are procedures implemented in bytecode, not all are. There are primitives, continuations, and other procedure-like objects that have their own calling convention. Instead of adding special cases to the @code{call} instruction, Guile wraps these other applicable objects in VM trampoline procedures, then provides special support for these objects in bytecode. Trampoline procedures are typically generated by Guile at runtime, for example in response to a call to @code{scm_c_make_gsubr}. As such, a compiler probably shouldn't emit code with these instructions. However, it's still interesting to know how these things work, so we document these trampoline instructions here. @deffn Instruction subr-call nargs Pop off a foreign pointer (which should have been pushed on by the trampoline), and call it directly, with the @var{nargs} arguments from the stack. Return the resulting value or values to the calling procedure. @end deffn @deffn Instruction foreign-call nargs Pop off an internal foreign object (which should have been pushed on by the trampoline), and call that foreign function with the @var{nargs} arguments from the stack. Return the resulting value to the calling procedure. @end deffn @deffn Instruction continuation-call Pop off an internal continuation object (which should have been pushed on by the trampoline), and reinstate that continuation. All of the procedure's arguments are passed to the continuation. Does not return. @end deffn @deffn Instruction partial-cont-call Pop off two objects from the stack: the dynamic winds associated with the partial continuation, and the VM continuation object. Unroll the continuation onto the stack, rewinding the dynamic environment and overwriting the current frame, and pass all arguments to the continuation. Control flow proceeds where the continuation was captured. @end deffn @node Branch Instructions @subsubsection Branch Instructions All the conditional branch instructions described below work in the same way: @itemize @item They pop off Scheme object(s) located on the stack for use in the branch condition @item If the condition is true, then the instruction pointer is increased by the offset passed as an argument to the branch instruction; @item Program execution proceeds with the next instruction (that is, the one to which the instruction pointer points). @end itemize Note that the offset passed to the instruction is encoded as three 8-bit integers, in big-endian order, effectively giving Guile a 24-bit relative address space. @deffn Instruction br offset Jump to @var{offset}. No values are popped. @end deffn @deffn Instruction br-if offset Jump to @var{offset} if the object on the stack is not false. @end deffn @deffn Instruction br-if-not offset Jump to @var{offset} if the object on the stack is false. @end deffn @deffn Instruction br-if-eq offset Jump to @var{offset} if the two objects located on the stack are equal in the sense of @code{eq?}. Note that, for this instruction, the stack pointer is decremented by two Scheme objects instead of only one. @end deffn @deffn Instruction br-if-not-eq offset Same as @code{br-if-eq} for non-@code{eq?} objects. @end deffn @deffn Instruction br-if-null offset Jump to @var{offset} if the object on the stack is @code{'()}. @end deffn @deffn Instruction br-if-not-null offset Jump to @var{offset} if the object on the stack is not @code{'()}. @end deffn @node Data Constructor Instructions @subsubsection Data Constructor Instructions These instructions push simple immediate values onto the stack, or construct compound data structures from values on the stack. @deffn Instruction make-int8 value Push @var{value}, an 8-bit integer, onto the stack. @end deffn @deffn Instruction make-int8:0 Push the immediate value @code{0} onto the stack. @end deffn @deffn Instruction make-int8:1 Push the immediate value @code{1} onto the stack. @end deffn @deffn Instruction make-int16 value Push @var{value}, a 16-bit integer, onto the stack. @end deffn @deffn Instruction make-uint64 value Push @var{value}, an unsigned 64-bit integer, onto the stack. The value is encoded in 8 bytes, most significant byte first (big-endian). @end deffn @deffn Instruction make-int64 value Push @var{value}, a signed 64-bit integer, onto the stack. The value is encoded in 8 bytes, most significant byte first (big-endian), in twos-complement arithmetic. @end deffn @deffn Instruction make-false Push @code{#f} onto the stack. @end deffn @deffn Instruction make-true Push @code{#t} onto the stack. @end deffn @deffn Instruction make-nil Push @code{#nil} onto the stack. @end deffn @deffn Instruction make-eol Push @code{'()} onto the stack. @end deffn @deffn Instruction make-char8 value Push @var{value}, an 8-bit character, onto the stack. @end deffn @deffn Instruction make-char32 value Push @var{value}, an 32-bit character, onto the stack. The value is encoded in big-endian order. @end deffn @deffn Instruction make-symbol Pops a string off the stack, and pushes a symbol. @end deffn @deffn Instruction make-keyword value Pops a symbol off the stack, and pushes a keyword. @end deffn @deffn Instruction list n Pops off the top @var{n} values off of the stack, consing them up into a list, then pushes that list on the stack. What was the topmost value will be the last element in the list. @var{n} is a two-byte value, most significant byte first. @end deffn @deffn Instruction vector n Create and fill a vector with the top @var{n} values from the stack, popping off those values and pushing on the resulting vector. @var{n} is a two-byte value, like in @code{vector}. @end deffn @deffn Instruction make-struct n Make a new struct from the top @var{n} values on the stack. The values are popped, and the new struct is pushed. The deepest value is used as the vtable for the struct, and the rest are used in order as the field initializers. Tail arrays are not supported by this instruction. @end deffn @deffn Instruction make-array n Pop an array shape from the stack, then pop the remaining @var{n} values, pushing a new array. @var{n} is encoded over three bytes. The array shape should be appropriate to store @var{n} values. @xref{Array Procedures}, for more information on array shapes. @end deffn Many of these data structures are constant, never changing over the course of the different invocations of the procedure. In that case it is often advantageous to make them once when the procedure is created, and just reference them from the object table thereafter. @xref{Variables and the VM}, for more information on the object table. @deffn Instruction object-ref n @deffnx Instruction long-object-ref n Push @var{n}th value from the current program's object vector. The ``long'' variant has a 16-bit index instead of an 8-bit index. @end deffn @node Loading Instructions @subsubsection Loading Instructions In addition to VM instructions, an instruction stream may contain variable-length data embedded within it. This data is always preceded by special loading instructions, which interpret the data and advance the instruction pointer to the next VM instruction. All of these loading instructions have a @code{length} parameter, indicating the size of the embedded data, in bytes. The length itself is encoded in 3 bytes. @deffn Instruction load-number length Load an arbitrary number from the instruction stream. The number is embedded in the stream as a string. @end deffn @deffn Instruction load-string length Load a string from the instruction stream. The string is assumed to be encoded in the ``latin1'' locale. @end deffn @deffn Instruction load-wide-string length Load a UTF-32 string from the instruction stream. @var{length} is the length in bytes, not in codepoints. @end deffn @deffn Instruction load-symbol length Load a symbol from the instruction stream. The symbol is assumed to be encoded in the ``latin1'' locale. Symbols backed by wide strings may be loaded via @code{load-wide-string} then @code{make-symbol}. @end deffn @deffn Instruction load-array length Load a uniform array from the instruction stream. The shape and type of the array are popped off the stack, in that order. @end deffn @deffn Instruction load-program Load bytecode from the instruction stream, and push a compiled procedure. This instruction pops one value from the stack: the program's object table, as a vector, or @code{#f} in the case that the program has no object table. A program that does not reference toplevel bindings and does not use @code{object-ref} does not need an object table. This instruction is unlike the rest of the loading instructions, because instead of parsing its data, it directly maps the instruction stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode and Objcode}, for more information. The resulting compiled procedure will not have any free variables captured, so it may be loaded only once but used many times to create closures. @end deffn @node Dynamic Environment Instructions @subsubsection Dynamic Environment Instructions Guile's virtual machine has low-level support for @code{dynamic-wind}, dynamic binding, and composable prompts and aborts. @deffn Instruction wind Pop an unwind thunk and a wind thunk from the stack, in that order, and push them onto the ``dynamic stack''. The unwind thunk will be called on nonlocal exits, and the wind thunk on reentries. Used to implement @code{dynamic-wind}. Note that neither thunk is actually called; the compiler should emit calls to wind and unwind for the normal dynamic-wind control flow. @xref{Dynamic Wind}. @end deffn @deffn Instruction unwind Pop off the top entry from the ``dynamic stack'', for example, a wind/unwind thunk pair. @code{unwind} instructions should be properly paired with their winding instructions, like @code{wind}. @end deffn @deffn Instruction push-fluid Pop a value and a fluid from the stack, in that order. Set the fluid to the value by creating a with-fluids object and pushing that object on the dynamic stack. @xref{Fluids and Dynamic States}. @end deffn @deffn Instruction pop-fluid Pop a with-fluids object from the dynamic stack, and swap the current values of its fluids with the saved values of its fluids. In this way, the dynamic environment is left as it was before the corresponding @code{wind-fluid} instruction was processed. @end deffn @deffn Instruction fluid-ref Pop a fluid from the stack, and push its current value. @end deffn @deffn Instruction fluid-set Pop a value and a fluid from the stack, in that order, and set the fluid to the value. @end deffn @deffn Instruction prompt escape-only? offset Establish a dynamic prompt. @xref{Prompts}, for more information on prompts. The prompt will be pushed on the dynamic stack. The normal control flow should ensure that the prompt is popped off at the end, via @code{unwind}. If an abort is made to this prompt, control will jump to @var{offset}, a three-byte relative address. The continuation and all arguments to the abort will be pushed on the stack, along with the total number of arguments (including the continuation. If control returns to the handler, the prompt is already popped off by the abort mechanism. (Guile's @code{prompt} implements Felleisen's @dfn{--F--} operator.) If @var{escape-only?} is nonzero, the prompt will be marked as escape-only, which allows an abort to this prompt to avoid reifying the continuation. @end deffn @deffn Instruction abort n Abort to a dynamic prompt. This instruction pops one tail argument list, @var{n} arguments, and a prompt tag from the stack. The dynamic environment is then searched for a prompt having the given tag. If none is found, an error is signalled. Otherwise all arguments are passed to the prompt's handler, along with the captured continuation, if necessary. If the prompt's handler can be proven to not reference the captured continuation, no continuation is allocated. This decision happens dynamically, at run-time; the general case is that the continuation may be captured, and thus resumed. A reinstated continuation will have its arguments pushed on the stack, along with the number of arguments, as in the multiple-value return convention. Therefore an @code{abort} instruction should be followed by code ready to handle the equivalent of a multiply-valued return. @end deffn @node Miscellaneous Instructions @subsubsection Miscellaneous Instructions @deffn Instruction nop Does nothing! Used for padding other instructions to certain alignments. @end deffn @deffn Instruction halt Exits the VM, returning a SCM value. Normally, this instruction is only part of the ``bootstrap program'', a program run when a virtual machine is first entered; compiled Scheme procedures will not contain this instruction. If multiple values have been returned, the SCM value will be a multiple-values object (@pxref{Multiple Values}). @end deffn @deffn Instruction break Does nothing, but invokes the break hook. @end deffn @deffn Instruction drop Pops off the top value from the stack, throwing it away. @end deffn @deffn Instruction dup Re-pushes the top value onto the stack. @end deffn @deffn Instruction void Pushes ``the unspecified value'' onto the stack. @end deffn @node Inlined Scheme Instructions @subsubsection Inlined Scheme Instructions The Scheme compiler can recognize the application of standard Scheme procedures. It tries to inline these small operations to avoid the overhead of creating new stack frames. Since most of these operations are historically implemented as C primitives, not inlining them would entail constantly calling out from the VM to the interpreter, which has some costs---registers must be saved, the interpreter has to dispatch, called procedures have to do much type checking, etc. It's much more efficient to inline these operations in the virtual machine itself. All of these instructions pop their arguments from the stack and push their results, and take no parameters from the instruction stream. Thus, unlike in the previous sections, these instruction definitions show stack parameters instead of parameters from the instruction stream. @deffn Instruction not x @deffnx Instruction not-not x @deffnx Instruction eq? x y @deffnx Instruction not-eq? x y @deffnx Instruction null? @deffnx Instruction not-null? @deffnx Instruction eqv? x y @deffnx Instruction equal? x y @deffnx Instruction pair? x y @deffnx Instruction list? x @deffnx Instruction set-car! pair x @deffnx Instruction set-cdr! pair x @deffnx Instruction cons x y @deffnx Instruction car x @deffnx Instruction cdr x @deffnx Instruction vector-ref x y @deffnx Instruction vector-set x n y @deffnx Instruction struct? x @deffnx Instruction struct-ref x n @deffnx Instruction struct-set x n v @deffnx Instruction struct-vtable x @deffnx Instruction class-of x @deffnx Instruction slot-ref struct n @deffnx Instruction slot-set struct n x Inlined implementations of their Scheme equivalents. @end deffn Note that @code{caddr} and friends compile to a series of @code{car} and @code{cdr} instructions. @node Inlined Mathematical Instructions @subsubsection Inlined Mathematical Instructions Inlining mathematical operations has the obvious advantage of handling fixnums without function calls or allocations. The trick, of course, is knowing when the result of an operation will be a fixnum, and there might be a couple bugs here. More instructions could be added here over time. As in the previous section, the definitions below show stack parameters instead of instruction stream parameters. @deffn Instruction add x y @deffnx Instruction add1 x @deffnx Instruction sub x y @deffnx Instruction sub1 x @deffnx Instruction mul x y @deffnx Instruction div x y @deffnx Instruction quo x y @deffnx Instruction rem x y @deffnx Instruction mod x y @deffnx Instruction ee? x y @deffnx Instruction lt? x y @deffnx Instruction gt? x y @deffnx Instruction le? x y @deffnx Instruction ge? x y @deffnx Instruction ash x n @deffnx Instruction logand x y @deffnx Instruction logior x y @deffnx Instruction logxor x y Inlined implementations of the corresponding mathematical operations. @end deffn @node Inlined Bytevector Instructions @subsubsection Inlined Bytevector Instructions Bytevector operations correspond closely to what the current hardware can do, so it makes sense to inline them to VM instructions, providing a clear path for eventual native compilation. Without this, Scheme programs would need other primitives for accessing raw bytes -- but these primitives are as good as any. As in the previous section, the definitions below show stack parameters instead of instruction stream parameters. The multibyte formats (@code{u16}, @code{f64}, etc) take an extra endianness argument. Only aligned native accesses are currently fast-pathed in Guile's VM. @deffn Instruction bv-u8-ref bv n @deffnx Instruction bv-s8-ref bv n @deffnx Instruction bv-u16-native-ref bv n @deffnx Instruction bv-s16-native-ref bv n @deffnx Instruction bv-u32-native-ref bv n @deffnx Instruction bv-s32-native-ref bv n @deffnx Instruction bv-u64-native-ref bv n @deffnx Instruction bv-s64-native-ref bv n @deffnx Instruction bv-f32-native-ref bv n @deffnx Instruction bv-f64-native-ref bv n @deffnx Instruction bv-u16-ref bv n endianness @deffnx Instruction bv-s16-ref bv n endianness @deffnx Instruction bv-u32-ref bv n endianness @deffnx Instruction bv-s32-ref bv n endianness @deffnx Instruction bv-u64-ref bv n endianness @deffnx Instruction bv-s64-ref bv n endianness @deffnx Instruction bv-f32-ref bv n endianness @deffnx Instruction bv-f64-ref bv n endianness @deffnx Instruction bv-u8-set bv n val @deffnx Instruction bv-s8-set bv n val @deffnx Instruction bv-u16-native-set bv n val @deffnx Instruction bv-s16-native-set bv n val @deffnx Instruction bv-u32-native-set bv n val @deffnx Instruction bv-s32-native-set bv n val @deffnx Instruction bv-u64-native-set bv n val @deffnx Instruction bv-s64-native-set bv n val @deffnx Instruction bv-f32-native-set bv n val @deffnx Instruction bv-f64-native-set bv n val @deffnx Instruction bv-u16-set bv n val endianness @deffnx Instruction bv-s16-set bv n val endianness @deffnx Instruction bv-u32-set bv n val endianness @deffnx Instruction bv-s32-set bv n val endianness @deffnx Instruction bv-u64-set bv n val endianness @deffnx Instruction bv-s64-set bv n val endianness @deffnx Instruction bv-f32-set bv n val endianness @deffnx Instruction bv-f64-set bv n val endianness Inlined implementations of the corresponding bytevector operations. @end deffn