diff --git a/doc/ref/api-debug.texi b/doc/ref/api-debug.texi index d99a56724..357c038ca 100644 --- a/doc/ref/api-debug.texi +++ b/doc/ref/api-debug.texi @@ -2021,6 +2021,8 @@ this-is-a-matric guile> @end lisp +@anchor{Memoization} +@cindex Memoization (For anyone wondering why the first @code{(do-main 4)} call above generates lots more trace lines than the subsequent calls: these examples also demonstrate how the Guile evaluator ``memoizes'' code. diff --git a/doc/ref/compiler.texi b/doc/ref/compiler.texi index 125ec92a5..c40a82e5d 100644 --- a/doc/ref/compiler.texi +++ b/doc/ref/compiler.texi @@ -14,10 +14,10 @@ the switch on Frankenstein. However, this magic is perceived by many to be impenetrable. This section aims to pull back the veil from over Guile's compiler -implementation, some reference to the wizard of oz FIXME. +implementation, and pay attention to the small man behind the curtain. -REFFIXME, if you're lost and you just wanted to know how to compile -your .scm file. +@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to +know how to compile your .scm file. @menu * Compiler Tower:: @@ -25,15 +25,18 @@ your .scm file. * GHIL:: * GLIL:: * Object Code:: +* Extending the Compiler:: @end menu +FIXME: document the new repl somewhere? + @node Compiler Tower @subsection Compiler Tower Guile's compiler is quite simple, actually -- its @emph{compilers}, to put it more accurately. Guile defines a tower of languages, starting at Scheme and progressively simplifying down to languages that -resemble the VM instruction set (REFFIXME). +resemble the VM instruction set (@pxref{Instruction Set}). Each language knows how to compile to the next, so each step is simple and understandable. Furthermore, this set of languages is not @@ -41,41 +44,116 @@ hardcoded into Guile, so it is possible for the user to add new high-level languages, new passes, or even different compilation targets. -lookup-language -(lang xxx spec) +Languages are registered in the module, @code{(system base language)}: -(system-base-language) +@example +(use-modules (system base language)) +@end example -describe: +They are registered with the @code{define-language} form. -(define-record - name - title - version - reader - printer - (parser #f) - (read-file #f) - (compilers '()) - (evaluator #f)) +@deffn {Scheme Syntax} define-language @ +name title version reader printer @ +[parser=#f] [read-file=#f] [compilers='()] [evaluator=#f] +Define a language. -(define-macro (define-language name . spec) +This syntax defines a @code{#} object, bound to @var{name} +in the current environment. In addition, the language will be added to +the global language set. For example, this is the language definition +for Scheme: -(lookup-compilation-order from to) +@example +(define-language scheme + #:title "Guile Scheme" + #:version "0.5" + #:reader read + #:read-file read-file + #:compilers `((,ghil . ,translate)) + #:evaluator (lambda (x module) (primitive-eval x)) + #:printer write) +@end example -language definition +In this example, from @code{(language scheme spec)}, @code{read-file} +reads expressions from a port and wraps them in a @code{begin} block. +@end deffn -compiling from here to there +The interesting thing about having languages defined this way is that +they present a uniform interface to the read-eval-print loop. This +allows the user to change the current language of the REPL: -the normal tower: scheme, ghil, glil, object code -maybe from there serialized to disk -or if at repl, brought back to life by compiling to ``value'' +@example +$ guile +Guile Scheme interpreter 0.5 on Guile 1.9.0 +Copyright (C) 2001-2008 Free Software Foundation, Inc. -compile-file defaults to compiling to objcode -compile defaults to compiling to value +Enter `,help' for help. +scheme@@(guile-user)> ,language ghil +Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0 +Copyright (C) 2001-2008 Free Software Foundation, Inc. +Enter `,help' for help. +ghil@@(guile-user)> +@end example + +Languages can be looked up by name, as they were above. + +@deffn {Scheme Procedure} lookup-language name +Looks up a language named @var{name}, autoloading it if necessary. + +Languages are autoloaded by looking for a variable named @var{name} in +a module named @code{(language @var{name} spec)}. + +The language object will be returned, or @code{#f} if there does not +exist a language with that name. +@end deffn + +Defining languages this way allows us to programmatically determine +the necessary steps for compiling code from one language to another. + +@deffn {Scheme Procedure} lookup-compilation-order from to +Recursively traverses the set of languages to which @var{from} can +compile, depth-first, and return the first path that can transform +@var{from} to @var{to}. Returns @code{#f} if no path is found. + +This function memoizes its results in a cache that is invalidated by +subsequent calls to @code{define-language}, so it should be quite +fast. +@end deffn + +There is a notion of a ``current language'', which is maintained in +the @code{*current-language*} fluid. This language is normally Scheme, +and may be rebound by the user. The runtime compilation interfaces +(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source +and target languages. + +The normal tower of languages when compiling Scheme goes like this: + +@itemize +@item Scheme, which we know and love +@item Guile High Intermediate Language (GHIL) +@item Guile Low Intermediate Language (GLIL) +@item Object code +@end itemize + +Object code may be serialized to disk directly, though it has a cookie +and version prepended to the front. But when compiling Scheme at +runtime, you want a Scheme value, e.g. a compiled procedure. For this +reason, so as not to break the abstraction, Guile defines a fake +language, @code{value}. Compiling to @code{value} loads the object +code into a procedure, and wakes the sleeping giant. + +Perhaps this strangeness can be explained by example: +@code{compile-file} defaults to compiling to object code, because it +produces object code that has to live in the barren world outside the +Guile runtime; but @code{compile} defaults to compiling to +@code{value}, as its product re-enters the Guile world. + +Indeed, the process of compilation can circulate through these +different worlds indefinitely, as shown by the following quine: + +@example ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x))) -quine +@end example @node The Scheme Compiler @subsection The Scheme Compiler @@ -118,7 +196,7 @@ and code. Well, there's a bit more, but that's the flavor of GLIL. Compiled code will effectively be a thunk, of no arguments, but optionally closing over some number of variables (which should be -captured via `make-closure' REFFIXME. +captured via `make-closure', @pxref{Loading Instructions}). @node Object Code @subsection Object Code @@ -130,3 +208,15 @@ thunk from objcode->program with a certain current module and with those externals. so you can recompile a closure at runtime, a trick that goops uses. +@node Extending the Compiler +@subsection Extending the Compiler + +JIT compilation + +AOT compilation + +link to what dybvig did + +profiling + +startup time diff --git a/doc/ref/vm.texi b/doc/ref/vm.texi index 5048edb19..21817b2e3 100644 --- a/doc/ref/vm.texi +++ b/doc/ref/vm.texi @@ -37,13 +37,13 @@ Guile's evaluator operates directly on the S-expression representation of Scheme source code. But while the evaluator is highly optimized and hand-tuned, and -contains some extensive speed trickery (REFFIXME memoization), it -still performs many needless computations during the course of -evaluating an expression. For example, application of a function to -arguments needlessly conses up the arguments in a list. Evaluation of -an expression always has to figure out what the car of the expression -is -- a procedure, a memoized form, or something else. All values have -to be allocated on the heap. Et cetera. +contains some extensive speed trickery (@pxref{Memoization}), it still +performs many needless computations during the course of evaluating an +expression. For example, application of a function to arguments +needlessly conses up the arguments in a list. Evaluation of an +expression always has to figure out what the car of the expression is +-- a procedure, a memoized form, or something else. All values have to +be allocated on the heap. Et cetera. The solution to this problem is to compile the higher-level language, Scheme, into a lower-level language for which all of the checks and @@ -72,7 +72,7 @@ that Guile implements, and the compiled procedures that run on it. Note that this decision to implement a bytecode compiler does not preclude native compilation. We can compile from bytecode to native code at runtime, or even do ahead of time compilation. More -possibilities are discussed in REFFIXME. +possibilities are discussed in @xref{Extending the Compiler}. @node VM Concepts @subsection VM Concepts @@ -109,7 +109,7 @@ The registers that a VM has are as follows: In other architectures, the instruction pointer is sometimes called the ``program counter'' (pc). This set of registers is pretty typical for stack machines; their exact meanings in the context of Guile's VM -is described below REFFIXME. +is described in the next section. A virtual machine executes by loading a compiled procedure, and executing the object code associated with that procedure. Of course, @@ -200,8 +200,8 @@ effect, this and the return address are the registers that are always @item External link This field is a reference to the list of heap-allocated variables -associated with this frame. A discussion of heap versus stack -allocation can be found in REFFIXME. +associated with this frame. For a discussion of heap versus stack +allocation, @xref{Variables and the VM}. @item Local variable @var{n} Lambda-local variables that are allocated on the stack are all @@ -217,7 +217,8 @@ from its initial value here onto a location in the heap, and thereafter only referenced on the heap. @item Program -This is the program being applied. Programs are discussed in REFFIXME! +This is the program being applied. For more information on how +programs are implemented, @xref{VM Programs}. @end table @node Variables and the VM @@ -270,14 +271,15 @@ A compiled procedure is a compound object, consisting of its bytecode, a reference to any captured lexical variables, an object array, and some metadata such as the procedure's arity, name, and documentation. You can pick apart these pieces with the accessors in @code{(system vm -program)}. REFFIXME, for a full API reference. +program)}. @xref{Compiled Procedures}, for a full API reference. @cindex object table The object array of a compiled procedure, also known as the @dfn{object table}, holds all Scheme objects whose values are known not to change across invocations of the procedure: constant strings, symbols, etc. The object table of a program is initialized right -before a program is loaded with @code{load-program} REFFIXME. +before a program is loaded with @code{load-program}. +@xref{Loading Instructions}, for more information. Variable objects are one such type of constant object: when a global binding is defined, a variable object is associated to it and that @@ -326,8 +328,8 @@ The second stanza disassembles the compiled lambda. Toplevel variables are resolved relative to the module that was current when the procedure was created. This lookup occurs lazily, at the first time the variable is actually referenced, and the location of the lookup is -cached so that future references are very cheap. REFFIXME xref -toplevel-ref, for more details. +cached so that future references are very cheap. @xref{Environment +Control Instructions}, for more details. Then we see a reference to an external variable, corresponding to @code{a}. The disassembler doesn't have enough information to give a @@ -584,7 +586,8 @@ of a procedure is fast: the VM just mmap's the thunk and goes. The symbols and pairs associated with the metadata are only created if the user asks for them. -The format of the thunk's return value is specified in REFFIXME. +For information on the format of the thunk's return value, +@xref{Compiled Procedures}. @item Optionally, the program's object table, as a vector. A program that does not reference toplevel bindings and does not use @@ -643,9 +646,9 @@ arguments off the stack, and push the result of calling @code{scm_apply}. For compiled procedures, this instruction sets up a new stack frame, -as described in REFFIXME, and then dispatches to the first instruction -in the called procedure, relying on the called procedure to return one -value to the newly-created continuation. +as described in @ref{Stack Layout}, and then dispatches to the first +instruction in the called procedure, relying on the called procedure +to return one value to the newly-created continuation. @end deffn @deffn Instruction goto/args nargs @@ -692,11 +695,11 @@ Like @code{call}, except that a multiple-value continuation is created in addition to a single-value continuation. The offset (a two-byte value) is an offset within the instruction -stream; the multiple-value return address in the new frame (see -frames REFFIXME) will be set to the normal return address plus this -offset. Instructions at that offset will expect the top value of the -stack to be the number of values, and below that values themselves, -pushed separately. +stream; the multiple-value return address in the new frame +(@pxref{Stack Layout}) will be set to the normal return address plus +this offset. Instructions at that offset will expect the top value of +the stack to be the number of values, and below that values +themselves, pushed separately. @end deffn @deffn Instruction return/values nvalues @@ -822,7 +825,7 @@ machine is first entered; compiled Scheme procedures will not contain this instruction. If multiple values have been returned, the SCM value will be a -multiple-values object (REFFIXME scm_values). +multiple-values object (@pxref{Multiple Values}). @end deffn @deffn Instruction break