@c -*-texinfo-*- @c This is part of the GNU Guile Reference Manual. @c Copyright (C) 2008 @c Free Software Foundation, Inc. @c See the file guile.texi for copying conditions. @node Compiling to the Virtual Machine @section Compiling to the Virtual Machine Compilers have a mystique about them that is attractive and off-putting at the same time. They are attractive because they are magical -- they transform inert text into live results, like throwing the switch on Frankenstein. However, this magic is perceived by many to be impenetrable. This section aims to pull back the veil from over Guile's compiler implementation, and pay attention to the small man behind the curtain. @xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to know how to compile your .scm file. @menu * Compiler Tower:: * The Scheme Compiler:: * GHIL:: * GLIL:: * Object Code:: * Extending the Compiler:: @end menu FIXME: document the new repl somewhere? @node Compiler Tower @subsection Compiler Tower Guile's compiler is quite simple, actually -- its @emph{compilers}, to put it more accurately. Guile defines a tower of languages, starting at Scheme and progressively simplifying down to languages that resemble the VM instruction set (@pxref{Instruction Set}). Each language knows how to compile to the next, so each step is simple and understandable. Furthermore, this set of languages is not hardcoded into Guile, so it is possible for the user to add new high-level languages, new passes, or even different compilation targets. Languages are registered in the module, @code{(system base language)}: @example (use-modules (system base language)) @end example They are registered with the @code{define-language} form. @deffn {Scheme Syntax} define-language @ name title version reader printer @ [parser=#f] [read-file=#f] [compilers='()] [evaluator=#f] Define a language. This syntax defines a @code{#} object, bound to @var{name} in the current environment. In addition, the language will be added to the global language set. For example, this is the language definition for Scheme: @example (define-language scheme #:title "Guile Scheme" #:version "0.5" #:reader read #:read-file read-file #:compilers `((,ghil . ,compile-ghil)) #:evaluator (lambda (x module) (primitive-eval x)) #:printer write) @end example In this example, from @code{(language scheme spec)}, @code{read-file} reads expressions from a port and wraps them in a @code{begin} block. @end deffn The interesting thing about having languages defined this way is that they present a uniform interface to the read-eval-print loop. This allows the user to change the current language of the REPL: @example $ guile Guile Scheme interpreter 0.5 on Guile 1.9.0 Copyright (C) 2001-2008 Free Software Foundation, Inc. Enter `,help' for help. scheme@@(guile-user)> ,language ghil Guile High Intermediate Language (GHIL) interpreter 0.3 on Guile 1.9.0 Copyright (C) 2001-2008 Free Software Foundation, Inc. Enter `,help' for help. ghil@@(guile-user)> @end example Languages can be looked up by name, as they were above. @deffn {Scheme Procedure} lookup-language name Looks up a language named @var{name}, autoloading it if necessary. Languages are autoloaded by looking for a variable named @var{name} in a module named @code{(language @var{name} spec)}. The language object will be returned, or @code{#f} if there does not exist a language with that name. @end deffn Defining languages this way allows us to programmatically determine the necessary steps for compiling code from one language to another. @deffn {Scheme Procedure} lookup-compilation-order from to Recursively traverses the set of languages to which @var{from} can compile, depth-first, and return the first path that can transform @var{from} to @var{to}. Returns @code{#f} if no path is found. This function memoizes its results in a cache that is invalidated by subsequent calls to @code{define-language}, so it should be quite fast. @end deffn There is a notion of a ``current language'', which is maintained in the @code{*current-language*} fluid. This language is normally Scheme, and may be rebound by the user. The runtime compilation interfaces (@pxref{Read/Load/Eval/Compile}) also allow you to choose other source and target languages. The normal tower of languages when compiling Scheme goes like this: @itemize @item Scheme, which we know and love @item Guile High Intermediate Language (GHIL) @item Guile Low Intermediate Language (GLIL) @item Object code @end itemize Object code may be serialized to disk directly, though it has a cookie and version prepended to the front. But when compiling Scheme at runtime, you want a Scheme value, e.g. a compiled procedure. For this reason, so as not to break the abstraction, Guile defines a fake language, @code{value}. Compiling to @code{value} loads the object code into a procedure, and wakes the sleeping giant. Perhaps this strangeness can be explained by example: @code{compile-file} defaults to compiling to object code, because it produces object code that has to live in the barren world outside the Guile runtime; but @code{compile} defaults to compiling to @code{value}, as its product re-enters the Guile world. Indeed, the process of compilation can circulate through these different worlds indefinitely, as shown by the following quine: @example ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x))) @end example @node The Scheme Compiler @subsection The Scheme Compiler The job of the Scheme compiler is to expand all macros and to resolve all symbols to lexical variables. Its target language, GHIL, is fairly close to Scheme itself, so this process is not very complicated. The Scheme compiler is driven by a table of @dfn{translators}, declared with the @code{define-scheme-translator} form, defined in the module, @code{(language scheme compile-ghil)}. @deffn {Scheme Syntax} define-scheme-translator head clause1 clause2... The best documentation of this form is probably an example. Here is the translator for @code{if}: @example (define-scheme-translator if ;; (if TEST THEN [ELSE]) ((,test ,then) (make-ghil-if e l (retrans test) (retrans then) (retrans '(begin)))) ((,test ,then ,else) (make-ghil-if e l (retrans test) (retrans then) (retrans else)))) @end example The match syntax is from the @code{pmatch} macro, defined in @code{(system base pmatch)}. The result of a clause should be a valid GHIL value. If no clause matches, a syntax error is signalled. In the body of the clauses, the following bindings are introduced: @itemize @item @code{e}, the current environment @item @code{l}, the current source location (or @code{#f}) @item @code{retrans}, a procedure that may be called to compile subexpressions @end itemize Note that translators are looked up by @emph{value}, not by name. That is to say, the translator is keyed under the @emph{value} of @code{if}, which normally prints as @code{#}. @end deffn Users can extend the compiler by defining new translators. Additionally, some forms can be inlined directly to instructions -- @xref{Inlined Scheme Instructions}, for a list. The actual inliners are defined in @code{(language scheme inline)}: @deffn {Scheme Syntax} define-inline head arity1 result1 arity2 result2... Defines an inliner for @code{head}. As in @code{define-scheme-translator}, inliners are keyed by value and not by name. Expressions are matched on their arities. For example: @example (define-inline eq? (x y) (eq? x y)) @end example This inlines calls to the Scheme procedure, @code{eq?}, to the instruction @code{eq?}. A more complicated example would be: @example (define-inline + () 0 (x) x (x y) (add x y) (x y . rest) (add x (+ y . rest))) @end example @end deffn Compilers take two arguments, an expression and an environment, and return two values as well: an expression in the target language, and an environment suitable for the target language. The format of the environment is language-dependent. For Scheme, an environment may be one of three things: @itemize @item @code{#f}, in which case compilation is performed in the context of the current module; @item a module, which specifies the context of the compilation; or @item a @dfn{compile environment}, which specifies lexical variables as well. @end itemize The format of a compile environment for scheme is @code{(@var{module} @var{lexicals} . @var{externals})}, though users are strongly discouraged from constructing these environments themselves. Instead, if you need this functionality -- as in GOOPS' dynamic method compiler -- capture an environment with @code{compile-time-environment}, then pass that environment to @code{compile}. @deffn {Scheme Procedure} compile-time-environment A special function known to the compiler that, when compiled, will return a representation of the lexical environment in place at compile time. Useful for supporting some forms of dynamic compilation. Returns @code{#f} if called from the interpreter. @end deffn @node GHIL @subsection GHIL structured, typed intermediate language, close to scheme with an s-expression representation ,lang ghil document reified format, as it's more interesting, and gives you an idea all have environment and location pointers @deffn {GHIL Expression} quote exp A quoted expression. @end deffn @deffn {GHIL Expression} quasiquote exp A quasiquoted expression. The parse format understands the normal @code{unquote} and @code{unquote-splicing} forms as in normal Scheme. When constructing @var{exp} programmatically, you will need to call @code{make-ghil-unquote} and @code{make-ghil-unquote-splicing} as appropriate. @end deffn @deffn {GHIL Expression} lambda syms rest meta . body A closure. @var{syms} is the argument list, as a list of symbols. @var{rest} is a boolean, which is @code{#t} iff the last argument is a rest argument. @var{meta} is an association list of properties. The actual @var{body} should be a list of GHIL expressions. @end deffn @deffn {GHIL Expression} void The unspecified value. @end deffn @deffn {GHIL Expression} begin . body Like Scheme's @code{begin}. @end deffn @deffn {GHIL Expression} bind syms exprs . body Like a deconstructed @code{let}: each element of @var{syms} will be bound to the corresponding GHIL expression in @var{exprs}. @end deffn @deffn {GHIL Expression} bindrec syms exprs . body As @code{bind} is to @code{let}, so @code{bindrec} is to @code{letrec}. @end deffn @deffn {GHIL Expression} set! sym val Like Scheme's @code{set!}. @end deffn @deffn {GHIL Expression} define sym val Like Scheme's @code{define}, but without the lambda sugar of course. @end deffn @deffn {GHIL Expression} if test then else A conditional. Note that @var{else} is not optional. @end deffn @deffn {GHIL Expression} and . exps Like Scheme's @code{and}. @end deffn @deffn {GHIL Expression} or . exps Like Scheme's @code{or}. @end deffn @deffn {GHIL Expression} mv-bind syms rest producer . body Like Scheme's @code{receive} -- binds the values returned by applying @code{producer}, which should be a thunk, to the @code{lambda}-like bindings described by @var{syms} and @var{rest}. @end deffn @deffn {GHIL Expression} call proc . args A procedure call. @end deffn @deffn {GHIL Expression} mv-call producer consumer Like Scheme's @code{call-with-values}. @end deffn @deffn {GHIL Expression} inline op . args An inlined VM instruction. @var{op} should be the instruction name as a symbol, and @var{args} should be its arguments, as GHIL expressions. @end deffn @deffn {GHIL Expression} values . values Like Scheme's @code{values}. @end deffn @deffn {GHIL Expression} values* . values @var{values} are as in the Scheme expression, @code{(apply values . @var{vals})}. @end deffn @deffn {GHIL Expression} compile-time-environment Produces, at runtime, a reification of the environment at compile time. @end deffn ghil environments ghil-var-for-ref!, ghil-var-for-set!, ghil-var-define!, ghil-var-at-module! some pre-optimization real name of the game is closure elimination -- fixing letrec @node GLIL @subsection GLIL structured, typed intermediate language, close to object code passes through the env no let, no lambda, no closures, just labels and branches and constants and code. Well, there's a bit more, but that's the flavor of GLIL. Compiled code will effectively be a thunk, of no arguments, but optionally closing over some number of variables (which should be captured via `make-closure', @pxref{Loading Instructions}). @node Object Code @subsection Object Code describe the env -- module + externals (the actual values!) The env is used when compiling to value -- effectively calling the thunk from objcode->program with a certain current module and with those externals. so you can recompile a closure at runtime, a trick that goops uses. @node Extending the Compiler @subsection Extending the Compiler JIT compilation AOT compilation link to what dybvig did profiling startup time