mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-06-12 23:00:22 +02:00
update docs -- sections on assembly and objcode
* doc/ref/api-procedures.texi: * doc/ref/compiler.texi: * doc/ref/vm.texi: Update the docs some more.
This commit is contained in:
parent
81fd315299
commit
7364333952
3 changed files with 160 additions and 38 deletions
|
@ -164,8 +164,8 @@ Returns @code{#t} iff @var{obj} is a compiled procedure.
|
||||||
|
|
||||||
@deffn {Scheme Procedure} program-objcode program
|
@deffn {Scheme Procedure} program-objcode program
|
||||||
@deffnx {C Function} scm_program_objcode (program)
|
@deffnx {C Function} scm_program_objcode (program)
|
||||||
Returns the object code associated with this program. @xref{Object
|
Returns the object code associated with this program. @xref{Bytecode
|
||||||
Code}, for more information.
|
and Objcode}, for more information.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn {Scheme Procedure} program-objects program
|
@deffn {Scheme Procedure} program-objects program
|
||||||
|
|
|
@ -25,8 +25,7 @@ know how to compile your .scm file.
|
||||||
* Tree-IL::
|
* Tree-IL::
|
||||||
* GLIL::
|
* GLIL::
|
||||||
* Assembly::
|
* Assembly::
|
||||||
* Bytecode::
|
* Bytecode and Objcode::
|
||||||
* Object Code::
|
|
||||||
* Extending the Compiler::
|
* Extending the Compiler::
|
||||||
@end menu
|
@end menu
|
||||||
|
|
||||||
|
@ -132,13 +131,13 @@ The normal tower of languages when compiling Scheme goes like this:
|
||||||
@item Guile Low Intermediate Language (GLIL)
|
@item Guile Low Intermediate Language (GLIL)
|
||||||
@item Assembly
|
@item Assembly
|
||||||
@item Bytecode
|
@item Bytecode
|
||||||
@item Object code
|
@item Objcode
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
Object code may be serialized to disk directly, though it has a cookie
|
Object code may be serialized to disk directly, though it has a cookie
|
||||||
and version prepended to the front. But when compiling Scheme at
|
and version prepended to the front. But when compiling Scheme at run
|
||||||
run time, you want a Scheme value, e.g. a compiled procedure. For this
|
time, you want a Scheme value: for example, a compiled procedure. For
|
||||||
reason, so as not to break the abstraction, Guile defines a fake
|
this reason, so as not to break the abstraction, Guile defines a fake
|
||||||
language at the bottom of the tower:
|
language at the bottom of the tower:
|
||||||
|
|
||||||
@itemize
|
@itemize
|
||||||
|
@ -421,8 +420,8 @@ A unit of code that at run-time will correspond to a compiled
|
||||||
procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts}
|
procedure. @var{nargs} @var{nrest} @var{nlocs}, and @var{nexts}
|
||||||
collectively define the program's arity; see @ref{Compiled
|
collectively define the program's arity; see @ref{Compiled
|
||||||
Procedures}, for more information. @var{meta} should be an alist of
|
Procedures}, for more information. @var{meta} should be an alist of
|
||||||
properties, as in @code{<ghil-lambda>}. @var{body} is a list of GLIL
|
properties, as in Tree IL's @code{<lambda>}. @var{body} is a list of
|
||||||
expressions.
|
GLIL expressions.
|
||||||
@end deftp
|
@end deftp
|
||||||
@deftp {Scheme Variable} <glil-bind> . vars
|
@deftp {Scheme Variable} <glil-bind> . vars
|
||||||
An advisory expression that notes a liveness extent for a set of
|
An advisory expression that notes a liveness extent for a set of
|
||||||
|
@ -456,18 +455,20 @@ offset within a VM program.
|
||||||
@end deftp
|
@end deftp
|
||||||
@deftp {Scheme Variable} <glil-source> loc
|
@deftp {Scheme Variable} <glil-source> loc
|
||||||
Records source information for the preceding expression. @var{loc}
|
Records source information for the preceding expression. @var{loc}
|
||||||
should be a vector, @code{#(@var{line} @var{column} @var{filename})}.
|
should be an association list of containing @code{line} @code{column},
|
||||||
|
and @code{filename} keys, e.g. as returned by
|
||||||
|
@code{source-properties}.
|
||||||
@end deftp
|
@end deftp
|
||||||
@deftp {Scheme Variable} <glil-void>
|
@deftp {Scheme Variable} <glil-void>
|
||||||
Pushes the unspecified value on the stack.
|
Pushes the unspecified value on the stack.
|
||||||
@end deftp
|
@end deftp
|
||||||
@deftp {Scheme Variable} <glil-const> obj
|
@deftp {Scheme Variable} <glil-const> obj
|
||||||
Pushes a constant value onto the stack. @var{obj} must be a number,
|
Pushes a constant value onto the stack. @var{obj} must be a number,
|
||||||
string, symbol, keyword, boolean, character, or a pair or vector or
|
string, symbol, keyword, boolean, character, the empty list, or a pair
|
||||||
list thereof, or the empty list.
|
or vector of constants.
|
||||||
@end deftp
|
@end deftp
|
||||||
@deftp {Scheme Variable} <glil-local> op index
|
@deftp {Scheme Variable} <glil-local> op index
|
||||||
Accesses a lexically variable from the stack. If @var{op} is
|
Accesses a lexically bound variable from the stack. If @var{op} is
|
||||||
@code{ref}, the value is pushed onto the stack; if it is @code{set},
|
@code{ref}, the value is pushed onto the stack; if it is @code{set},
|
||||||
the variable is set from the top value on the stack, which is popped
|
the variable is set from the top value on the stack, which is popped
|
||||||
off. @xref{Stack Layout}, for more information.
|
off. @xref{Stack Layout}, for more information.
|
||||||
|
@ -482,8 +483,8 @@ Accesses a toplevel variable. @var{op} may be @code{ref}, @code{set},
|
||||||
or @code{define}.
|
or @code{define}.
|
||||||
@end deftp
|
@end deftp
|
||||||
@deftp {Scheme Variable} <glil-module> op mod name public?
|
@deftp {Scheme Variable} <glil-module> op mod name public?
|
||||||
Accesses a variable within a specific module. See
|
Accesses a variable within a specific module. See Tree-IL's
|
||||||
@code{ghil-var-at-module!}, for more information.
|
@code{<module-ref>}, for more information.
|
||||||
@end deftp
|
@end deftp
|
||||||
@deftp {Scheme Variable} <glil-label> label
|
@deftp {Scheme Variable} <glil-label> label
|
||||||
Creates a new label. @var{label} can be any Scheme value, and should
|
Creates a new label. @var{label} can be any Scheme value, and should
|
||||||
|
@ -529,26 +530,140 @@ the object code.
|
||||||
@node Assembly
|
@node Assembly
|
||||||
@subsection Assembly
|
@subsection Assembly
|
||||||
|
|
||||||
@node Bytecode
|
Assembly is an S-expression-based, human-readable representation of
|
||||||
@subsection Bytecode
|
the actual bytecodes that will be emitted for the VM. As such, it is a
|
||||||
|
useful intermediate language both for compilation and for
|
||||||
|
decompilation.
|
||||||
|
|
||||||
@node Object Code
|
Besides the fact that it is not a record-based language, assembly
|
||||||
@subsection Object Code
|
differs from GLIL in four main ways:
|
||||||
|
|
||||||
Object code is the serialization of the raw instruction stream of a
|
@itemize
|
||||||
program, ready for interpretation by the VM. Procedures related to
|
@item Labels have been resolved to byte offsets in the program.
|
||||||
object code are defined in the @code{(system vm objcode)} module.
|
@item Constants inside procedures have either been expressed as inline
|
||||||
|
instructions, and possibly cached in object arrays.
|
||||||
|
@item Procedures with metadata (source location information, liveness
|
||||||
|
extents, procedure names, generic properties, etc) have had their
|
||||||
|
metadata serialized out to thunks.
|
||||||
|
@item All expressions correspond directly to VM instructions -- i.e.,
|
||||||
|
there is no @code{<glil-local>} which can be a ref or a set.
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
Assembly is isomorphic to the bytecode that it compiles to. You can
|
||||||
|
compile to bytecode, then decompile back to assembly, and you have the
|
||||||
|
same assembly code.
|
||||||
|
|
||||||
|
The general form of assembly instructions is the following:
|
||||||
|
|
||||||
|
@lisp
|
||||||
|
(@var{inst} @var{arg} ...)
|
||||||
|
@end lisp
|
||||||
|
|
||||||
|
The @var{inst} names a VM instruction, and its @var{arg}s will be
|
||||||
|
embedded in the instruction stream. The easiest way to see assembly is
|
||||||
|
to play around with it at the REPL, as can be seen in this annotated
|
||||||
|
example:
|
||||||
|
|
||||||
|
@example
|
||||||
|
scheme@@(guile-user)> (compile '(lambda (x) (+ x x)) #:to 'assembly)
|
||||||
|
(load-program 0 0 0 0
|
||||||
|
() ; Labels
|
||||||
|
60 ; Length
|
||||||
|
#f ; Metadata
|
||||||
|
(make-false) ; object table for the returned lambda
|
||||||
|
(nop)
|
||||||
|
(nop) ; Alignment. Since assembly has already resolved its labels
|
||||||
|
(nop) ; to offsets, and programs must be 8-byte aligned since their
|
||||||
|
(nop) ; object code is mmap'd directly to structures, assembly
|
||||||
|
(nop) ; has to have the alignment embedded in it.
|
||||||
|
(nop)
|
||||||
|
(load-program 1 0 0 0
|
||||||
|
()
|
||||||
|
6
|
||||||
|
; This is the metadata thunk for the returned procedure.
|
||||||
|
(load-program 0 0 0 0 () 21 #f
|
||||||
|
(load-symbol "x") ; Name and liveness extent for @code{x}.
|
||||||
|
(make-false)
|
||||||
|
(make-int8:0) ; Some instruction+arg combinations
|
||||||
|
(make-int8:0) ; have abbreviations.
|
||||||
|
(make-int8 6)
|
||||||
|
(list 0 5)
|
||||||
|
(list 0 1)
|
||||||
|
(make-eol)
|
||||||
|
(list 0 2)
|
||||||
|
(return))
|
||||||
|
; And here, the actual code.
|
||||||
|
(local-ref 0)
|
||||||
|
(local-ref 0)
|
||||||
|
(add)
|
||||||
|
(return))
|
||||||
|
; Return our new procedure.
|
||||||
|
(return))
|
||||||
|
@end example
|
||||||
|
|
||||||
|
Of course you can switch the REPL to assembly and enter in assembly
|
||||||
|
S-expressions directly, like with other languages, though it is more
|
||||||
|
difficult, given that the length fields have to be correct.
|
||||||
|
|
||||||
|
@node Bytecode and Objcode
|
||||||
|
@subsection Bytecode and Objcode
|
||||||
|
|
||||||
|
Finally, the raw bytes. There are actually two different ``languages''
|
||||||
|
here, corresponding to two different ways to represent the bytes.
|
||||||
|
|
||||||
|
``Bytecode'' represents code as uniform byte vectors, useful for
|
||||||
|
structuring and destructuring code on the Scheme level. Bytecode is
|
||||||
|
the next step down from assembly:
|
||||||
|
|
||||||
|
@example
|
||||||
|
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'assembly)
|
||||||
|
@result{} (load-program 0 0 0 0 () 6 #f
|
||||||
|
(make-int8 32) (make-int8 10) (add) (return))
|
||||||
|
scheme@@(guile-user)> (compile '(+ 32 10) #:to 'bytecode)
|
||||||
|
@result{} #u8(0 0 0 0 6 0 0 0 0 0 0 0 10 32 10 10 100 48)
|
||||||
|
@end example
|
||||||
|
|
||||||
|
``Objcode'' is bytecode, but mapped directly to a C structure,
|
||||||
|
@code{struct scm_objcode}:
|
||||||
|
|
||||||
|
@example
|
||||||
|
struct scm_objcode @{
|
||||||
|
scm_t_uint8 nargs;
|
||||||
|
scm_t_uint8 nrest;
|
||||||
|
scm_t_uint8 nlocs;
|
||||||
|
scm_t_uint8 nexts;
|
||||||
|
scm_t_uint32 len;
|
||||||
|
scm_t_uint32 metalen;
|
||||||
|
scm_t_uint8 base[0];
|
||||||
|
@};
|
||||||
|
@end example
|
||||||
|
|
||||||
|
As one might imagine, objcode imposes a minimum length on the
|
||||||
|
bytecode. Also, the multibyte fields are in native endianness, which
|
||||||
|
makes objcode (and bytecode) system-dependent. Indeed, in the short
|
||||||
|
example above, all but the last 5 bytes were the program's header.
|
||||||
|
|
||||||
|
Objcode also has a couple of important efficiency hacks. First,
|
||||||
|
objcode may be mapped directly from disk, allowing compiled code to be
|
||||||
|
loaded quickly, often from the system's disk cache, and shared among
|
||||||
|
multiple processes. Secondly, objcode may be embedded in other
|
||||||
|
objcode, allowing procedures to have the text of other procedures
|
||||||
|
inlined into their bodies, without the need for separate allocation of
|
||||||
|
the code. Of course, the objcode object itself does need to be
|
||||||
|
allocated.
|
||||||
|
|
||||||
|
Procedures related to objcode are defined in the @code{(system vm
|
||||||
|
objcode)} module.
|
||||||
|
|
||||||
@deffn {Scheme Procedure} objcode? obj
|
@deffn {Scheme Procedure} objcode? obj
|
||||||
@deffnx {C Function} scm_objcode_p (obj)
|
@deffnx {C Function} scm_objcode_p (obj)
|
||||||
Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
|
Returns @code{#f} iff @var{obj} is object code, @code{#f} otherwise.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn {Scheme Procedure} bytecode->objcode bytecode nlocs nexts
|
@deffn {Scheme Procedure} bytecode->objcode bytecode
|
||||||
@deffnx {C Function} scm_bytecode_to_objcode (bytecode, nlocs, nexts)
|
@deffnx {C Function} scm_bytecode_to_objcode (bytecode,)
|
||||||
Makes a bytecode object from @var{bytecode}, which should be a
|
Makes a bytecode object from @var{bytecode}, which should be a
|
||||||
@code{u8vector}. @var{nlocs} and @var{nexts} denote the number of
|
@code{u8vector}.
|
||||||
stack and heap variables to reserve when this objcode is executed.
|
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn {Scheme Variable} load-objcode file
|
@deffn {Scheme Variable} load-objcode file
|
||||||
|
@ -556,21 +671,28 @@ stack and heap variables to reserve when this objcode is executed.
|
||||||
Load object code from a file named @var{file}. The file will be mapped
|
Load object code from a file named @var{file}. The file will be mapped
|
||||||
into memory via @code{mmap}, so this is a very fast operation.
|
into memory via @code{mmap}, so this is a very fast operation.
|
||||||
|
|
||||||
On disk, object code has an eight-byte cookie prepended to it, so that
|
On disk, object code has an eight-byte cookie prepended to it, to
|
||||||
we will not execute arbitrary garbage. In addition, two more bytes are
|
prevent accidental loading of arbitrary garbage.
|
||||||
reserved for @var{nlocs} and @var{nexts}.
|
@end deffn
|
||||||
|
|
||||||
|
@deffn {Scheme Variable} write-objcode objcode file
|
||||||
|
@deffnx {C Function} scm_write_objcode (objcode)
|
||||||
|
Write object code out to a file, prepending the eight-byte cookie.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn {Scheme Variable} objcode->u8vector objcode
|
@deffn {Scheme Variable} objcode->u8vector objcode
|
||||||
@deffnx {C Function} scm_objcode_to_u8vector (objcode)
|
@deffnx {C Function} scm_objcode_to_u8vector (objcode)
|
||||||
Copy object code out to a @code{u8vector} for analysis by Scheme. The
|
Copy object code out to a @code{u8vector} for analysis by Scheme.
|
||||||
ten-byte header is included.
|
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn {Scheme Variable} objcode->program objcode [external='()]
|
The following procedure is actually in @code{(system vm program)}, but
|
||||||
@deffnx {C Function} scm_objcode_to_program (objcode, external)
|
we'll mention it here:
|
||||||
|
|
||||||
|
@deffn {Scheme Variable} make-program objcode objtable [external='()]
|
||||||
|
@deffnx {C Function} scm_make_program (objcode, objtable, external)
|
||||||
Load up object code into a Scheme program. The resulting program will
|
Load up object code into a Scheme program. The resulting program will
|
||||||
be a thunk that captures closure variables from @var{external}.
|
have @var{objtable} as its object table, which should be a vector or
|
||||||
|
@code{#f}, and will capture the closure variables from @var{external}.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
Object code from a file may be disassembled at the REPL via the
|
Object code from a file may be disassembled at the REPL via the
|
||||||
|
@ -614,7 +736,7 @@ fruit, running programs of interest under a system-level profiler and
|
||||||
determining which improvements would give the most bang for the buck.
|
determining which improvements would give the most bang for the buck.
|
||||||
There are many well-known efficiency hacks in the literature: Dybvig's
|
There are many well-known efficiency hacks in the literature: Dybvig's
|
||||||
letrec optimization, individual boxing of heap-allocated values (and
|
letrec optimization, individual boxing of heap-allocated values (and
|
||||||
then store the boxes on the stack directory), optimized case-lambda
|
then store the boxes on the stack directly), optimized case-lambda
|
||||||
expressions, stack underflow and overflow handlers, etc. Highly
|
expressions, stack underflow and overflow handlers, etc. Highly
|
||||||
recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
|
recommended papers: Dybvig's HOCS, Ghuloum's compiler paper.
|
||||||
|
|
||||||
|
|
|
@ -574,8 +574,8 @@ does not use @code{object-ref} does not need an object table.
|
||||||
|
|
||||||
This instruction is unlike the rest of the loading instructions,
|
This instruction is unlike the rest of the loading instructions,
|
||||||
because instead of parsing its data, it directly maps the instruction
|
because instead of parsing its data, it directly maps the instruction
|
||||||
stream onto a C structure, @code{struct scm_objcode}. @xref{Object
|
stream onto a C structure, @code{struct scm_objcode}. @xref{Bytecode
|
||||||
Code}, for more information.
|
and Objcode}, for more information.
|
||||||
|
|
||||||
The resulting compiled procedure will not have any ``external''
|
The resulting compiled procedure will not have any ``external''
|
||||||
variables captured, so it may be loaded only once but used many times
|
variables captured, so it may be loaded only once but used many times
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue