1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-04-30 03:40:34 +02:00

fix some xrefs, flesh out compiler.texi a bit more

* doc/ref/api-debug.texi:
* doc/ref/vm.texi: Fix some cross-references.

* doc/ref/compiler.texi: Hack some more, finishing the section on the
  compiler tower.
This commit is contained in:
Andy Wingo 2009-01-09 15:52:55 +01:00
parent 46d666d4aa
commit e3ba263de4
3 changed files with 150 additions and 55 deletions

View file

@ -2021,6 +2021,8 @@ this-is-a-matric
guile> guile>
@end lisp @end lisp
@anchor{Memoization}
@cindex Memoization
(For anyone wondering why the first @code{(do-main 4)} call above (For anyone wondering why the first @code{(do-main 4)} call above
generates lots more trace lines than the subsequent calls: these generates lots more trace lines than the subsequent calls: these
examples also demonstrate how the Guile evaluator ``memoizes'' code. examples also demonstrate how the Guile evaluator ``memoizes'' code.

View file

@ -14,10 +14,10 @@ the switch on Frankenstein. However, this magic is perceived by many
to be impenetrable. to be impenetrable.
This section aims to pull back the veil from over Guile's compiler This section aims to pull back the veil from over Guile's compiler
implementation, some reference to the wizard of oz FIXME. implementation, and pay attention to the small man behind the curtain.
REFFIXME, if you're lost and you just wanted to know how to compile @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
your .scm file. know how to compile your .scm file.
@menu @menu
* Compiler Tower:: * Compiler Tower::
@ -25,15 +25,18 @@ your .scm file.
* GHIL:: * GHIL::
* GLIL:: * GLIL::
* Object Code:: * Object Code::
* Extending the Compiler::
@end menu @end menu
FIXME: document the new repl somewhere?
@node Compiler Tower @node Compiler Tower
@subsection Compiler Tower @subsection Compiler Tower
Guile's compiler is quite simple, actually -- its @emph{compilers}, to Guile's compiler is quite simple, actually -- its @emph{compilers}, to
put it more accurately. Guile defines a tower of languages, starting put it more accurately. Guile defines a tower of languages, starting
at Scheme and progressively simplifying down to languages that at Scheme and progressively simplifying down to languages that
resemble the VM instruction set (REFFIXME). resemble the VM instruction set (@pxref{Instruction Set}).
Each language knows how to compile to the next, so each step is simple Each language knows how to compile to the next, so each step is simple
and understandable. Furthermore, this set of languages is not and understandable. Furthermore, this set of languages is not
@ -41,41 +44,116 @@ hardcoded into Guile, so it is possible for the user to add new
high-level languages, new passes, or even different compilation high-level languages, new passes, or even different compilation
targets. targets.
lookup-language Languages are registered in the module, @code{(system base language)}:
(lang xxx spec)
(system-base-language) @example
(use-modules (system base language))
@end example
describe: They are registered with the @code{define-language} form.
(define-record <language> @deffn {Scheme Syntax} define-language @
name name title version reader printer @
title [parser=#f] [read-file=#f] [compilers='()] [evaluator=#f]
version Define a language.
reader
printer
(parser #f)
(read-file #f)
(compilers '())
(evaluator #f))
(define-macro (define-language name . spec) This syntax defines a @code{#<language>} object, bound to @var{name}
in the current environment. In addition, the language will be added to
the global language set. For example, this is the language definition
for Scheme:
(lookup-compilation-order from to) @example
(define-language scheme
#:title "Guile Scheme"
#:version "0.5"
#:reader read
#:read-file read-file
#:compilers `((,ghil . ,translate))
#:evaluator (lambda (x module) (primitive-eval x))
#:printer write)
@end example
language definition In this example, from @code{(language scheme spec)}, @code{read-file}
reads expressions from a port and wraps them in a @code{begin} block.
@end deffn
compiling from here to there The interesting thing about having languages defined this way is that
they present a uniform interface to the read-eval-print loop. This
allows the user to change the current language of the REPL:
the normal tower: scheme, ghil, glil, object code @example
maybe from there serialized to disk $ guile
or if at repl, brought back to life by compiling to ``value'' Guile Scheme interpreter 0.5 on Guile 1.9.0
Copyright (C) 2001-2008 Free Software Foundation, Inc.
compile-file defaults to compiling to objcode Enter `,help' for help.
compile defaults to compiling to value scheme@@(guile-user)> ,language ghil
Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0
Copyright (C) 2001-2008 Free Software Foundation, Inc.
Enter `,help' for help.
ghil@@(guile-user)>
@end example
Languages can be looked up by name, as they were above.
@deffn {Scheme Procedure} lookup-language name
Looks up a language named @var{name}, autoloading it if necessary.
Languages are autoloaded by looking for a variable named @var{name} in
a module named @code{(language @var{name} spec)}.
The language object will be returned, or @code{#f} if there does not
exist a language with that name.
@end deffn
Defining languages this way allows us to programmatically determine
the necessary steps for compiling code from one language to another.
@deffn {Scheme Procedure} lookup-compilation-order from to
Recursively traverses the set of languages to which @var{from} can
compile, depth-first, and return the first path that can transform
@var{from} to @var{to}. Returns @code{#f} if no path is found.
This function memoizes its results in a cache that is invalidated by
subsequent calls to @code{define-language}, so it should be quite
fast.
@end deffn
There is a notion of a ``current language'', which is maintained in
the @code{*current-language*} fluid. This language is normally Scheme,
and may be rebound by the user. The runtime compilation interfaces
(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
and target languages.
The normal tower of languages when compiling Scheme goes like this:
@itemize
@item Scheme, which we know and love
@item Guile High Intermediate Language (GHIL)
@item Guile Low Intermediate Language (GLIL)
@item Object code
@end itemize
Object code may be serialized to disk directly, though it has a cookie
and version prepended to the front. But when compiling Scheme at
runtime, you want a Scheme value, e.g. a compiled procedure. For this
reason, so as not to break the abstraction, Guile defines a fake
language, @code{value}. Compiling to @code{value} loads the object
code into a procedure, and wakes the sleeping giant.
Perhaps this strangeness can be explained by example:
@code{compile-file} defaults to compiling to object code, because it
produces object code that has to live in the barren world outside the
Guile runtime; but @code{compile} defaults to compiling to
@code{value}, as its product re-enters the Guile world.
Indeed, the process of compilation can circulate through these
different worlds indefinitely, as shown by the following quine:
@example
((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x))) ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
quine @end example
@node The Scheme Compiler @node The Scheme Compiler
@subsection The Scheme Compiler @subsection The Scheme Compiler
@ -118,7 +196,7 @@ and code. Well, there's a bit more, but that's the flavor of GLIL.
Compiled code will effectively be a thunk, of no arguments, but Compiled code will effectively be a thunk, of no arguments, but
optionally closing over some number of variables (which should be optionally closing over some number of variables (which should be
captured via `make-closure' REFFIXME. captured via `make-closure', @pxref{Loading Instructions}).
@node Object Code @node Object Code
@subsection Object Code @subsection Object Code
@ -130,3 +208,15 @@ thunk from objcode->program with a certain current module and with
those externals. so you can recompile a closure at runtime, a trick those externals. so you can recompile a closure at runtime, a trick
that goops uses. that goops uses.
@node Extending the Compiler
@subsection Extending the Compiler
JIT compilation
AOT compilation
link to what dybvig did
profiling
startup time

View file

@ -37,13 +37,13 @@ Guile's evaluator operates directly on the S-expression representation
of Scheme source code. of Scheme source code.
But while the evaluator is highly optimized and hand-tuned, and But while the evaluator is highly optimized and hand-tuned, and
contains some extensive speed trickery (REFFIXME memoization), it contains some extensive speed trickery (@pxref{Memoization}), it still
still performs many needless computations during the course of performs many needless computations during the course of evaluating an
evaluating an expression. For example, application of a function to expression. For example, application of a function to arguments
arguments needlessly conses up the arguments in a list. Evaluation of needlessly conses up the arguments in a list. Evaluation of an
an expression always has to figure out what the car of the expression expression always has to figure out what the car of the expression is
is -- a procedure, a memoized form, or something else. All values have -- a procedure, a memoized form, or something else. All values have to
to be allocated on the heap. Et cetera. be allocated on the heap. Et cetera.
The solution to this problem is to compile the higher-level language, The solution to this problem is to compile the higher-level language,
Scheme, into a lower-level language for which all of the checks and Scheme, into a lower-level language for which all of the checks and
@ -72,7 +72,7 @@ that Guile implements, and the compiled procedures that run on it.
Note that this decision to implement a bytecode compiler does not Note that this decision to implement a bytecode compiler does not
preclude native compilation. We can compile from bytecode to native preclude native compilation. We can compile from bytecode to native
code at runtime, or even do ahead of time compilation. More code at runtime, or even do ahead of time compilation. More
possibilities are discussed in REFFIXME. possibilities are discussed in @xref{Extending the Compiler}.
@node VM Concepts @node VM Concepts
@subsection VM Concepts @subsection VM Concepts
@ -109,7 +109,7 @@ The registers that a VM has are as follows:
In other architectures, the instruction pointer is sometimes called In other architectures, the instruction pointer is sometimes called
the ``program counter'' (pc). This set of registers is pretty typical the ``program counter'' (pc). This set of registers is pretty typical
for stack machines; their exact meanings in the context of Guile's VM for stack machines; their exact meanings in the context of Guile's VM
is described below REFFIXME. is described in the next section.
A virtual machine executes by loading a compiled procedure, and A virtual machine executes by loading a compiled procedure, and
executing the object code associated with that procedure. Of course, executing the object code associated with that procedure. Of course,
@ -200,8 +200,8 @@ effect, this and the return address are the registers that are always
@item External link @item External link
This field is a reference to the list of heap-allocated variables This field is a reference to the list of heap-allocated variables
associated with this frame. A discussion of heap versus stack associated with this frame. For a discussion of heap versus stack
allocation can be found in REFFIXME. allocation, @xref{Variables and the VM}.
@item Local variable @var{n} @item Local variable @var{n}
Lambda-local variables that are allocated on the stack are all Lambda-local variables that are allocated on the stack are all
@ -217,7 +217,8 @@ from its initial value here onto a location in the heap, and
thereafter only referenced on the heap. thereafter only referenced on the heap.
@item Program @item Program
This is the program being applied. Programs are discussed in REFFIXME! This is the program being applied. For more information on how
programs are implemented, @xref{VM Programs}.
@end table @end table
@node Variables and the VM @node Variables and the VM
@ -270,14 +271,15 @@ A compiled procedure is a compound object, consisting of its bytecode,
a reference to any captured lexical variables, an object array, and a reference to any captured lexical variables, an object array, and
some metadata such as the procedure's arity, name, and documentation. some metadata such as the procedure's arity, name, and documentation.
You can pick apart these pieces with the accessors in @code{(system vm You can pick apart these pieces with the accessors in @code{(system vm
program)}. REFFIXME, for a full API reference. program)}. @xref{Compiled Procedures}, for a full API reference.
@cindex object table @cindex object table
The object array of a compiled procedure, also known as the The object array of a compiled procedure, also known as the
@dfn{object table}, holds all Scheme objects whose values are known @dfn{object table}, holds all Scheme objects whose values are known
not to change across invocations of the procedure: constant strings, not to change across invocations of the procedure: constant strings,
symbols, etc. The object table of a program is initialized right symbols, etc. The object table of a program is initialized right
before a program is loaded with @code{load-program} REFFIXME. before a program is loaded with @code{load-program}.
@xref{Loading Instructions}, for more information.
Variable objects are one such type of constant object: when a global Variable objects are one such type of constant object: when a global
binding is defined, a variable object is associated to it and that binding is defined, a variable object is associated to it and that
@ -326,8 +328,8 @@ The second stanza disassembles the compiled lambda. Toplevel variables
are resolved relative to the module that was current when the are resolved relative to the module that was current when the
procedure was created. This lookup occurs lazily, at the first time procedure was created. This lookup occurs lazily, at the first time
the variable is actually referenced, and the location of the lookup is the variable is actually referenced, and the location of the lookup is
cached so that future references are very cheap. REFFIXME xref cached so that future references are very cheap. @xref{Environment
toplevel-ref, for more details. Control Instructions}, for more details.
Then we see a reference to an external variable, corresponding to Then we see a reference to an external variable, corresponding to
@code{a}. The disassembler doesn't have enough information to give a @code{a}. The disassembler doesn't have enough information to give a
@ -584,7 +586,8 @@ of a procedure is fast: the VM just mmap's the thunk and goes. The
symbols and pairs associated with the metadata are only created if the symbols and pairs associated with the metadata are only created if the
user asks for them. user asks for them.
The format of the thunk's return value is specified in REFFIXME. For information on the format of the thunk's return value,
@xref{Compiled Procedures}.
@item Optionally, the program's object table, as a vector. @item Optionally, the program's object table, as a vector.
A program that does not reference toplevel bindings and does not use A program that does not reference toplevel bindings and does not use
@ -643,9 +646,9 @@ arguments off the stack, and push the result of calling
@code{scm_apply}. @code{scm_apply}.
For compiled procedures, this instruction sets up a new stack frame, For compiled procedures, this instruction sets up a new stack frame,
as described in REFFIXME, and then dispatches to the first instruction as described in @ref{Stack Layout}, and then dispatches to the first
in the called procedure, relying on the called procedure to return one instruction in the called procedure, relying on the called procedure
value to the newly-created continuation. to return one value to the newly-created continuation.
@end deffn @end deffn
@deffn Instruction goto/args nargs @deffn Instruction goto/args nargs
@ -692,11 +695,11 @@ Like @code{call}, except that a multiple-value continuation is created
in addition to a single-value continuation. in addition to a single-value continuation.
The offset (a two-byte value) is an offset within the instruction The offset (a two-byte value) is an offset within the instruction
stream; the multiple-value return address in the new frame (see stream; the multiple-value return address in the new frame
frames REFFIXME) will be set to the normal return address plus this (@pxref{Stack Layout}) will be set to the normal return address plus
offset. Instructions at that offset will expect the top value of the this offset. Instructions at that offset will expect the top value of
stack to be the number of values, and below that values themselves, the stack to be the number of values, and below that values
pushed separately. themselves, pushed separately.
@end deffn @end deffn
@deffn Instruction return/values nvalues @deffn Instruction return/values nvalues
@ -822,7 +825,7 @@ machine is first entered; compiled Scheme procedures will not contain
this instruction. this instruction.
If multiple values have been returned, the SCM value will be a If multiple values have been returned, the SCM value will be a
multiple-values object (REFFIXME scm_values). multiple-values object (@pxref{Multiple Values}).
@end deffn @end deffn
@deffn Instruction break @deffn Instruction break