mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 11:50:28 +02:00
fix some xrefs, flesh out compiler.texi a bit more
* doc/ref/api-debug.texi: * doc/ref/vm.texi: Fix some cross-references. * doc/ref/compiler.texi: Hack some more, finishing the section on the compiler tower.
This commit is contained in:
parent
46d666d4aa
commit
e3ba263de4
3 changed files with 150 additions and 55 deletions
|
@ -2021,6 +2021,8 @@ this-is-a-matric
|
||||||
guile>
|
guile>
|
||||||
@end lisp
|
@end lisp
|
||||||
|
|
||||||
|
@anchor{Memoization}
|
||||||
|
@cindex Memoization
|
||||||
(For anyone wondering why the first @code{(do-main 4)} call above
|
(For anyone wondering why the first @code{(do-main 4)} call above
|
||||||
generates lots more trace lines than the subsequent calls: these
|
generates lots more trace lines than the subsequent calls: these
|
||||||
examples also demonstrate how the Guile evaluator ``memoizes'' code.
|
examples also demonstrate how the Guile evaluator ``memoizes'' code.
|
||||||
|
|
|
@ -14,10 +14,10 @@ the switch on Frankenstein. However, this magic is perceived by many
|
||||||
to be impenetrable.
|
to be impenetrable.
|
||||||
|
|
||||||
This section aims to pull back the veil from over Guile's compiler
|
This section aims to pull back the veil from over Guile's compiler
|
||||||
implementation, some reference to the wizard of oz FIXME.
|
implementation, and pay attention to the small man behind the curtain.
|
||||||
|
|
||||||
REFFIXME, if you're lost and you just wanted to know how to compile
|
@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
|
||||||
your .scm file.
|
know how to compile your .scm file.
|
||||||
|
|
||||||
@menu
|
@menu
|
||||||
* Compiler Tower::
|
* Compiler Tower::
|
||||||
|
@ -25,15 +25,18 @@ your .scm file.
|
||||||
* GHIL::
|
* GHIL::
|
||||||
* GLIL::
|
* GLIL::
|
||||||
* Object Code::
|
* Object Code::
|
||||||
|
* Extending the Compiler::
|
||||||
@end menu
|
@end menu
|
||||||
|
|
||||||
|
FIXME: document the new repl somewhere?
|
||||||
|
|
||||||
@node Compiler Tower
|
@node Compiler Tower
|
||||||
@subsection Compiler Tower
|
@subsection Compiler Tower
|
||||||
|
|
||||||
Guile's compiler is quite simple, actually -- its @emph{compilers}, to
|
Guile's compiler is quite simple, actually -- its @emph{compilers}, to
|
||||||
put it more accurately. Guile defines a tower of languages, starting
|
put it more accurately. Guile defines a tower of languages, starting
|
||||||
at Scheme and progressively simplifying down to languages that
|
at Scheme and progressively simplifying down to languages that
|
||||||
resemble the VM instruction set (REFFIXME).
|
resemble the VM instruction set (@pxref{Instruction Set}).
|
||||||
|
|
||||||
Each language knows how to compile to the next, so each step is simple
|
Each language knows how to compile to the next, so each step is simple
|
||||||
and understandable. Furthermore, this set of languages is not
|
and understandable. Furthermore, this set of languages is not
|
||||||
|
@ -41,41 +44,116 @@ hardcoded into Guile, so it is possible for the user to add new
|
||||||
high-level languages, new passes, or even different compilation
|
high-level languages, new passes, or even different compilation
|
||||||
targets.
|
targets.
|
||||||
|
|
||||||
lookup-language
|
Languages are registered in the module, @code{(system base language)}:
|
||||||
(lang xxx spec)
|
|
||||||
|
|
||||||
(system-base-language)
|
@example
|
||||||
|
(use-modules (system base language))
|
||||||
|
@end example
|
||||||
|
|
||||||
describe:
|
They are registered with the @code{define-language} form.
|
||||||
|
|
||||||
(define-record <language>
|
@deffn {Scheme Syntax} define-language @
|
||||||
name
|
name title version reader printer @
|
||||||
title
|
[parser=#f] [read-file=#f] [compilers='()] [evaluator=#f]
|
||||||
version
|
Define a language.
|
||||||
reader
|
|
||||||
printer
|
|
||||||
(parser #f)
|
|
||||||
(read-file #f)
|
|
||||||
(compilers '())
|
|
||||||
(evaluator #f))
|
|
||||||
|
|
||||||
(define-macro (define-language name . spec)
|
This syntax defines a @code{#<language>} object, bound to @var{name}
|
||||||
|
in the current environment. In addition, the language will be added to
|
||||||
|
the global language set. For example, this is the language definition
|
||||||
|
for Scheme:
|
||||||
|
|
||||||
(lookup-compilation-order from to)
|
@example
|
||||||
|
(define-language scheme
|
||||||
|
#:title "Guile Scheme"
|
||||||
|
#:version "0.5"
|
||||||
|
#:reader read
|
||||||
|
#:read-file read-file
|
||||||
|
#:compilers `((,ghil . ,translate))
|
||||||
|
#:evaluator (lambda (x module) (primitive-eval x))
|
||||||
|
#:printer write)
|
||||||
|
@end example
|
||||||
|
|
||||||
language definition
|
In this example, from @code{(language scheme spec)}, @code{read-file}
|
||||||
|
reads expressions from a port and wraps them in a @code{begin} block.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
compiling from here to there
|
The interesting thing about having languages defined this way is that
|
||||||
|
they present a uniform interface to the read-eval-print loop. This
|
||||||
|
allows the user to change the current language of the REPL:
|
||||||
|
|
||||||
the normal tower: scheme, ghil, glil, object code
|
@example
|
||||||
maybe from there serialized to disk
|
$ guile
|
||||||
or if at repl, brought back to life by compiling to ``value''
|
Guile Scheme interpreter 0.5 on Guile 1.9.0
|
||||||
|
Copyright (C) 2001-2008 Free Software Foundation, Inc.
|
||||||
|
|
||||||
compile-file defaults to compiling to objcode
|
Enter `,help' for help.
|
||||||
compile defaults to compiling to value
|
scheme@@(guile-user)> ,language ghil
|
||||||
|
Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0
|
||||||
|
Copyright (C) 2001-2008 Free Software Foundation, Inc.
|
||||||
|
|
||||||
|
Enter `,help' for help.
|
||||||
|
ghil@@(guile-user)>
|
||||||
|
@end example
|
||||||
|
|
||||||
|
Languages can be looked up by name, as they were above.
|
||||||
|
|
||||||
|
@deffn {Scheme Procedure} lookup-language name
|
||||||
|
Looks up a language named @var{name}, autoloading it if necessary.
|
||||||
|
|
||||||
|
Languages are autoloaded by looking for a variable named @var{name} in
|
||||||
|
a module named @code{(language @var{name} spec)}.
|
||||||
|
|
||||||
|
The language object will be returned, or @code{#f} if there does not
|
||||||
|
exist a language with that name.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
Defining languages this way allows us to programmatically determine
|
||||||
|
the necessary steps for compiling code from one language to another.
|
||||||
|
|
||||||
|
@deffn {Scheme Procedure} lookup-compilation-order from to
|
||||||
|
Recursively traverses the set of languages to which @var{from} can
|
||||||
|
compile, depth-first, and return the first path that can transform
|
||||||
|
@var{from} to @var{to}. Returns @code{#f} if no path is found.
|
||||||
|
|
||||||
|
This function memoizes its results in a cache that is invalidated by
|
||||||
|
subsequent calls to @code{define-language}, so it should be quite
|
||||||
|
fast.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
There is a notion of a ``current language'', which is maintained in
|
||||||
|
the @code{*current-language*} fluid. This language is normally Scheme,
|
||||||
|
and may be rebound by the user. The runtime compilation interfaces
|
||||||
|
(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
|
||||||
|
and target languages.
|
||||||
|
|
||||||
|
The normal tower of languages when compiling Scheme goes like this:
|
||||||
|
|
||||||
|
@itemize
|
||||||
|
@item Scheme, which we know and love
|
||||||
|
@item Guile High Intermediate Language (GHIL)
|
||||||
|
@item Guile Low Intermediate Language (GLIL)
|
||||||
|
@item Object code
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
Object code may be serialized to disk directly, though it has a cookie
|
||||||
|
and version prepended to the front. But when compiling Scheme at
|
||||||
|
runtime, you want a Scheme value, e.g. a compiled procedure. For this
|
||||||
|
reason, so as not to break the abstraction, Guile defines a fake
|
||||||
|
language, @code{value}. Compiling to @code{value} loads the object
|
||||||
|
code into a procedure, and wakes the sleeping giant.
|
||||||
|
|
||||||
|
Perhaps this strangeness can be explained by example:
|
||||||
|
@code{compile-file} defaults to compiling to object code, because it
|
||||||
|
produces object code that has to live in the barren world outside the
|
||||||
|
Guile runtime; but @code{compile} defaults to compiling to
|
||||||
|
@code{value}, as its product re-enters the Guile world.
|
||||||
|
|
||||||
|
Indeed, the process of compilation can circulate through these
|
||||||
|
different worlds indefinitely, as shown by the following quine:
|
||||||
|
|
||||||
|
@example
|
||||||
((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
|
((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
|
||||||
quine
|
@end example
|
||||||
|
|
||||||
@node The Scheme Compiler
|
@node The Scheme Compiler
|
||||||
@subsection The Scheme Compiler
|
@subsection The Scheme Compiler
|
||||||
|
@ -118,7 +196,7 @@ and code. Well, there's a bit more, but that's the flavor of GLIL.
|
||||||
|
|
||||||
Compiled code will effectively be a thunk, of no arguments, but
|
Compiled code will effectively be a thunk, of no arguments, but
|
||||||
optionally closing over some number of variables (which should be
|
optionally closing over some number of variables (which should be
|
||||||
captured via `make-closure' REFFIXME.
|
captured via `make-closure', @pxref{Loading Instructions}).
|
||||||
|
|
||||||
@node Object Code
|
@node Object Code
|
||||||
@subsection Object Code
|
@subsection Object Code
|
||||||
|
@ -130,3 +208,15 @@ thunk from objcode->program with a certain current module and with
|
||||||
those externals. so you can recompile a closure at runtime, a trick
|
those externals. so you can recompile a closure at runtime, a trick
|
||||||
that goops uses.
|
that goops uses.
|
||||||
|
|
||||||
|
@node Extending the Compiler
|
||||||
|
@subsection Extending the Compiler
|
||||||
|
|
||||||
|
JIT compilation
|
||||||
|
|
||||||
|
AOT compilation
|
||||||
|
|
||||||
|
link to what dybvig did
|
||||||
|
|
||||||
|
profiling
|
||||||
|
|
||||||
|
startup time
|
||||||
|
|
|
@ -37,13 +37,13 @@ Guile's evaluator operates directly on the S-expression representation
|
||||||
of Scheme source code.
|
of Scheme source code.
|
||||||
|
|
||||||
But while the evaluator is highly optimized and hand-tuned, and
|
But while the evaluator is highly optimized and hand-tuned, and
|
||||||
contains some extensive speed trickery (REFFIXME memoization), it
|
contains some extensive speed trickery (@pxref{Memoization}), it still
|
||||||
still performs many needless computations during the course of
|
performs many needless computations during the course of evaluating an
|
||||||
evaluating an expression. For example, application of a function to
|
expression. For example, application of a function to arguments
|
||||||
arguments needlessly conses up the arguments in a list. Evaluation of
|
needlessly conses up the arguments in a list. Evaluation of an
|
||||||
an expression always has to figure out what the car of the expression
|
expression always has to figure out what the car of the expression is
|
||||||
is -- a procedure, a memoized form, or something else. All values have
|
-- a procedure, a memoized form, or something else. All values have to
|
||||||
to be allocated on the heap. Et cetera.
|
be allocated on the heap. Et cetera.
|
||||||
|
|
||||||
The solution to this problem is to compile the higher-level language,
|
The solution to this problem is to compile the higher-level language,
|
||||||
Scheme, into a lower-level language for which all of the checks and
|
Scheme, into a lower-level language for which all of the checks and
|
||||||
|
@ -72,7 +72,7 @@ that Guile implements, and the compiled procedures that run on it.
|
||||||
Note that this decision to implement a bytecode compiler does not
|
Note that this decision to implement a bytecode compiler does not
|
||||||
preclude native compilation. We can compile from bytecode to native
|
preclude native compilation. We can compile from bytecode to native
|
||||||
code at runtime, or even do ahead of time compilation. More
|
code at runtime, or even do ahead of time compilation. More
|
||||||
possibilities are discussed in REFFIXME.
|
possibilities are discussed in @xref{Extending the Compiler}.
|
||||||
|
|
||||||
@node VM Concepts
|
@node VM Concepts
|
||||||
@subsection VM Concepts
|
@subsection VM Concepts
|
||||||
|
@ -109,7 +109,7 @@ The registers that a VM has are as follows:
|
||||||
In other architectures, the instruction pointer is sometimes called
|
In other architectures, the instruction pointer is sometimes called
|
||||||
the ``program counter'' (pc). This set of registers is pretty typical
|
the ``program counter'' (pc). This set of registers is pretty typical
|
||||||
for stack machines; their exact meanings in the context of Guile's VM
|
for stack machines; their exact meanings in the context of Guile's VM
|
||||||
is described below REFFIXME.
|
is described in the next section.
|
||||||
|
|
||||||
A virtual machine executes by loading a compiled procedure, and
|
A virtual machine executes by loading a compiled procedure, and
|
||||||
executing the object code associated with that procedure. Of course,
|
executing the object code associated with that procedure. Of course,
|
||||||
|
@ -200,8 +200,8 @@ effect, this and the return address are the registers that are always
|
||||||
|
|
||||||
@item External link
|
@item External link
|
||||||
This field is a reference to the list of heap-allocated variables
|
This field is a reference to the list of heap-allocated variables
|
||||||
associated with this frame. A discussion of heap versus stack
|
associated with this frame. For a discussion of heap versus stack
|
||||||
allocation can be found in REFFIXME.
|
allocation, @xref{Variables and the VM}.
|
||||||
|
|
||||||
@item Local variable @var{n}
|
@item Local variable @var{n}
|
||||||
Lambda-local variables that are allocated on the stack are all
|
Lambda-local variables that are allocated on the stack are all
|
||||||
|
@ -217,7 +217,8 @@ from its initial value here onto a location in the heap, and
|
||||||
thereafter only referenced on the heap.
|
thereafter only referenced on the heap.
|
||||||
|
|
||||||
@item Program
|
@item Program
|
||||||
This is the program being applied. Programs are discussed in REFFIXME!
|
This is the program being applied. For more information on how
|
||||||
|
programs are implemented, @xref{VM Programs}.
|
||||||
@end table
|
@end table
|
||||||
|
|
||||||
@node Variables and the VM
|
@node Variables and the VM
|
||||||
|
@ -270,14 +271,15 @@ A compiled procedure is a compound object, consisting of its bytecode,
|
||||||
a reference to any captured lexical variables, an object array, and
|
a reference to any captured lexical variables, an object array, and
|
||||||
some metadata such as the procedure's arity, name, and documentation.
|
some metadata such as the procedure's arity, name, and documentation.
|
||||||
You can pick apart these pieces with the accessors in @code{(system vm
|
You can pick apart these pieces with the accessors in @code{(system vm
|
||||||
program)}. REFFIXME, for a full API reference.
|
program)}. @xref{Compiled Procedures}, for a full API reference.
|
||||||
|
|
||||||
@cindex object table
|
@cindex object table
|
||||||
The object array of a compiled procedure, also known as the
|
The object array of a compiled procedure, also known as the
|
||||||
@dfn{object table}, holds all Scheme objects whose values are known
|
@dfn{object table}, holds all Scheme objects whose values are known
|
||||||
not to change across invocations of the procedure: constant strings,
|
not to change across invocations of the procedure: constant strings,
|
||||||
symbols, etc. The object table of a program is initialized right
|
symbols, etc. The object table of a program is initialized right
|
||||||
before a program is loaded with @code{load-program} REFFIXME.
|
before a program is loaded with @code{load-program}.
|
||||||
|
@xref{Loading Instructions}, for more information.
|
||||||
|
|
||||||
Variable objects are one such type of constant object: when a global
|
Variable objects are one such type of constant object: when a global
|
||||||
binding is defined, a variable object is associated to it and that
|
binding is defined, a variable object is associated to it and that
|
||||||
|
@ -326,8 +328,8 @@ The second stanza disassembles the compiled lambda. Toplevel variables
|
||||||
are resolved relative to the module that was current when the
|
are resolved relative to the module that was current when the
|
||||||
procedure was created. This lookup occurs lazily, at the first time
|
procedure was created. This lookup occurs lazily, at the first time
|
||||||
the variable is actually referenced, and the location of the lookup is
|
the variable is actually referenced, and the location of the lookup is
|
||||||
cached so that future references are very cheap. REFFIXME xref
|
cached so that future references are very cheap. @xref{Environment
|
||||||
toplevel-ref, for more details.
|
Control Instructions}, for more details.
|
||||||
|
|
||||||
Then we see a reference to an external variable, corresponding to
|
Then we see a reference to an external variable, corresponding to
|
||||||
@code{a}. The disassembler doesn't have enough information to give a
|
@code{a}. The disassembler doesn't have enough information to give a
|
||||||
|
@ -584,7 +586,8 @@ of a procedure is fast: the VM just mmap's the thunk and goes. The
|
||||||
symbols and pairs associated with the metadata are only created if the
|
symbols and pairs associated with the metadata are only created if the
|
||||||
user asks for them.
|
user asks for them.
|
||||||
|
|
||||||
The format of the thunk's return value is specified in REFFIXME.
|
For information on the format of the thunk's return value,
|
||||||
|
@xref{Compiled Procedures}.
|
||||||
@item Optionally, the program's object table, as a vector.
|
@item Optionally, the program's object table, as a vector.
|
||||||
|
|
||||||
A program that does not reference toplevel bindings and does not use
|
A program that does not reference toplevel bindings and does not use
|
||||||
|
@ -643,9 +646,9 @@ arguments off the stack, and push the result of calling
|
||||||
@code{scm_apply}.
|
@code{scm_apply}.
|
||||||
|
|
||||||
For compiled procedures, this instruction sets up a new stack frame,
|
For compiled procedures, this instruction sets up a new stack frame,
|
||||||
as described in REFFIXME, and then dispatches to the first instruction
|
as described in @ref{Stack Layout}, and then dispatches to the first
|
||||||
in the called procedure, relying on the called procedure to return one
|
instruction in the called procedure, relying on the called procedure
|
||||||
value to the newly-created continuation.
|
to return one value to the newly-created continuation.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn Instruction goto/args nargs
|
@deffn Instruction goto/args nargs
|
||||||
|
@ -692,11 +695,11 @@ Like @code{call}, except that a multiple-value continuation is created
|
||||||
in addition to a single-value continuation.
|
in addition to a single-value continuation.
|
||||||
|
|
||||||
The offset (a two-byte value) is an offset within the instruction
|
The offset (a two-byte value) is an offset within the instruction
|
||||||
stream; the multiple-value return address in the new frame (see
|
stream; the multiple-value return address in the new frame
|
||||||
frames REFFIXME) will be set to the normal return address plus this
|
(@pxref{Stack Layout}) will be set to the normal return address plus
|
||||||
offset. Instructions at that offset will expect the top value of the
|
this offset. Instructions at that offset will expect the top value of
|
||||||
stack to be the number of values, and below that values themselves,
|
the stack to be the number of values, and below that values
|
||||||
pushed separately.
|
themselves, pushed separately.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn Instruction return/values nvalues
|
@deffn Instruction return/values nvalues
|
||||||
|
@ -822,7 +825,7 @@ machine is first entered; compiled Scheme procedures will not contain
|
||||||
this instruction.
|
this instruction.
|
||||||
|
|
||||||
If multiple values have been returned, the SCM value will be a
|
If multiple values have been returned, the SCM value will be a
|
||||||
multiple-values object (REFFIXME scm_values).
|
multiple-values object (@pxref{Multiple Values}).
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
@deffn Instruction break
|
@deffn Instruction break
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue