mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 11:50:28 +02:00
967 lines
39 KiB
Text
967 lines
39 KiB
Text
@c -*-texinfo-*-
|
|
@c This is part of the GNU Guile Reference Manual.
|
|
@c Copyright (C) 2008, 2009, 2010, 2011, 2012, 2013, 2014
|
|
@c Free Software Foundation, Inc.
|
|
@c See the file guile.texi for copying conditions.
|
|
|
|
@node Compiling to the Virtual Machine
|
|
@section Compiling to the Virtual Machine
|
|
|
|
Compilers! The word itself inspires excitement and awe, even among
|
|
experienced practitioners. But a compiler is just a program: an
|
|
eminently hackable thing. This section aims to to describe Guile's
|
|
compiler in such a way that interested Scheme hackers can feel
|
|
comfortable reading and extending it.
|
|
|
|
@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
|
|
know how to compile your @code{.scm} file.
|
|
|
|
@menu
|
|
* Compiler Tower::
|
|
* The Scheme Compiler::
|
|
* Tree-IL::
|
|
* Continuation-Passing Style::
|
|
* Bytecode::
|
|
* Writing New High-Level Languages::
|
|
* Extending the Compiler::
|
|
@end menu
|
|
|
|
@node Compiler Tower
|
|
@subsection Compiler Tower
|
|
|
|
Guile's compiler is quite simple -- its @emph{compilers}, to put it more
|
|
accurately. Guile defines a tower of languages, starting at Scheme and
|
|
progressively simplifying down to languages that resemble the VM
|
|
instruction set (@pxref{Instruction Set}).
|
|
|
|
Each language knows how to compile to the next, so each step is simple
|
|
and understandable. Furthermore, this set of languages is not hardcoded
|
|
into Guile, so it is possible for the user to add new high-level
|
|
languages, new passes, or even different compilation targets.
|
|
|
|
Languages are registered in the module, @code{(system base language)}:
|
|
|
|
@example
|
|
(use-modules (system base language))
|
|
@end example
|
|
|
|
They are registered with the @code{define-language} form.
|
|
|
|
@deffn {Scheme Syntax} define-language @
|
|
[#:name] [#:title] [#:reader] [#:printer] @
|
|
[#:parser=#f] [#:compilers='()] @
|
|
[#:decompilers='()] [#:evaluator=#f] @
|
|
[#:joiner=#f] [#:for-humans?=#t] @
|
|
[#:make-default-environment=make-fresh-user-module]
|
|
Define a language.
|
|
|
|
This syntax defines a @code{<language>} object, bound to @var{name} in
|
|
the current environment. In addition, the language will be added to the
|
|
global language set. For example, this is the language definition for
|
|
Scheme:
|
|
|
|
@example
|
|
(define-language scheme
|
|
#:title "Scheme"
|
|
#:reader (lambda (port env) ...)
|
|
#:compilers `((tree-il . ,compile-tree-il))
|
|
#:decompilers `((tree-il . ,decompile-tree-il))
|
|
#:evaluator (lambda (x module) (primitive-eval x))
|
|
#:printer write
|
|
#:make-default-environment (lambda () ...))
|
|
@end example
|
|
@end deffn
|
|
|
|
The interesting thing about having languages defined this way is that
|
|
they present a uniform interface to the read-eval-print loop. This
|
|
allows the user to change the current language of the REPL:
|
|
|
|
@example
|
|
scheme@@(guile-user)> ,language tree-il
|
|
Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
|
|
tree-il@@(guile-user)> ,L scheme
|
|
Happy hacking with Scheme! To switch back, type `,L tree-il'.
|
|
scheme@@(guile-user)>
|
|
@end example
|
|
|
|
Languages can be looked up by name, as they were above.
|
|
|
|
@deffn {Scheme Procedure} lookup-language name
|
|
Looks up a language named @var{name}, autoloading it if necessary.
|
|
|
|
Languages are autoloaded by looking for a variable named @var{name} in
|
|
a module named @code{(language @var{name} spec)}.
|
|
|
|
The language object will be returned, or @code{#f} if there does not
|
|
exist a language with that name.
|
|
@end deffn
|
|
|
|
Defining languages this way allows us to programmatically determine
|
|
the necessary steps for compiling code from one language to another.
|
|
|
|
@deffn {Scheme Procedure} lookup-compilation-order from to
|
|
Recursively traverses the set of languages to which @var{from} can
|
|
compile, depth-first, and return the first path that can transform
|
|
@var{from} to @var{to}. Returns @code{#f} if no path is found.
|
|
|
|
This function memoizes its results in a cache that is invalidated by
|
|
subsequent calls to @code{define-language}, so it should be quite
|
|
fast.
|
|
@end deffn
|
|
|
|
There is a notion of a ``current language'', which is maintained in the
|
|
@code{current-language} parameter, defined in the core @code{(guile)}
|
|
module. This language is normally Scheme, and may be rebound by the
|
|
user. The run-time compilation interfaces
|
|
(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
|
|
and target languages.
|
|
|
|
The normal tower of languages when compiling Scheme goes like this:
|
|
|
|
@itemize
|
|
@item Scheme
|
|
@item Tree Intermediate Language (Tree-IL)
|
|
@item Continuation-Passing Style (CPS)
|
|
@item Bytecode
|
|
@end itemize
|
|
|
|
As discussed before (@pxref{Object File Format}), bytecode is in ELF
|
|
format, ready to be serialized to disk. But when compiling Scheme at
|
|
run time, you want a Scheme value: for example, a compiled procedure.
|
|
For this reason, so as not to break the abstraction, Guile defines a
|
|
fake language at the bottom of the tower:
|
|
|
|
@itemize
|
|
@item Value
|
|
@end itemize
|
|
|
|
Compiling to @code{value} loads the bytecode into a procedure, turning
|
|
cold bytes into warm code.
|
|
|
|
Perhaps this strangeness can be explained by example:
|
|
@code{compile-file} defaults to compiling to bytecode, because it
|
|
produces object code that has to live in the barren world outside the
|
|
Guile runtime; but @code{compile} defaults to compiling to @code{value},
|
|
as its product re-enters the Guile world.
|
|
|
|
@c FIXME: This doesn't work anymore :( Should we add some kind of
|
|
@c special GC pass, or disclaim this kind of code, or what?
|
|
|
|
Indeed, the process of compilation can circulate through these
|
|
different worlds indefinitely, as shown by the following quine:
|
|
|
|
@example
|
|
((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
|
|
@end example
|
|
|
|
@node The Scheme Compiler
|
|
@subsection The Scheme Compiler
|
|
|
|
The job of the Scheme compiler is to expand all macros and all of Scheme
|
|
to its most primitive expressions. The definition of ``primitive
|
|
expression'' is given by the inventory of constructs provided by
|
|
Tree-IL, the target language of the Scheme compiler: procedure calls,
|
|
conditionals, lexical references, and so on. This is described more
|
|
fully in the next section.
|
|
|
|
The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
|
|
that it is completely implemented by the macro expander. Since the
|
|
macro expander has to run over all of the source code already in order
|
|
to expand macros, it might as well do the analysis at the same time,
|
|
producing Tree-IL expressions directly.
|
|
|
|
Because this compiler is actually the macro expander, it is extensible.
|
|
Any macro which the user writes becomes part of the compiler.
|
|
|
|
The Scheme-to-Tree-IL expander may be invoked using the generic
|
|
@code{compile} procedure:
|
|
|
|
@lisp
|
|
(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
|
|
@result{}
|
|
#<tree-il (call (toplevel +) (const 1) (const 2))>
|
|
@end lisp
|
|
|
|
@code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely
|
|
equivalent to calling the macro expander as @code{(macroexpand @var{foo}
|
|
'c '(compile load eval))}. @xref{Macro Expansion}.
|
|
@code{compile-tree-il}, the procedure dispatched by @code{compile} to
|
|
@code{'tree-il}, is a small wrapper around @code{macroexpand}, to make
|
|
it conform to the general form of compiler procedures in Guile's
|
|
language tower.
|
|
|
|
Compiler procedures take three arguments: an expression, an
|
|
environment, and a keyword list of options. They return three values:
|
|
the compiled expression, the corresponding environment for the target
|
|
language, and a ``continuation environment''. The compiled expression
|
|
and environment will serve as input to the next language's compiler.
|
|
The ``continuation environment'' can be used to compile another
|
|
expression from the same source language within the same module.
|
|
|
|
For example, you might compile the expression, @code{(define-module
|
|
(foo))}. This will result in a Tree-IL expression and environment. But
|
|
if you compiled a second expression, you would want to take into
|
|
account the compile-time effect of compiling the previous expression,
|
|
which puts the user in the @code{(foo)} module. That is purpose of the
|
|
``continuation environment''; you would pass it as the environment
|
|
when compiling the subsequent expression.
|
|
|
|
For Scheme, an environment is a module. By default, the @code{compile}
|
|
and @code{compile-file} procedures compile in a fresh module, such
|
|
that bindings and macros introduced by the expression being compiled
|
|
are isolated:
|
|
|
|
@example
|
|
(eq? (current-module) (compile '(current-module)))
|
|
@result{} #f
|
|
|
|
(compile '(define hello 'world))
|
|
(defined? 'hello)
|
|
@result{} #f
|
|
|
|
(define / *)
|
|
(eq? (compile '/) /)
|
|
@result{} #f
|
|
@end example
|
|
|
|
Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
|
|
@code{current-reader}}) are isolated:
|
|
|
|
@example
|
|
(compile '(fluid-set! current-reader (lambda args 'fail)))
|
|
(fluid-ref current-reader)
|
|
@result{} #f
|
|
@end example
|
|
|
|
Nevertheless, having the compiler and @dfn{compilee} share the same name
|
|
space can be achieved by explicitly passing @code{(current-module)} as
|
|
the compilation environment:
|
|
|
|
@example
|
|
(define hello 'world)
|
|
(compile 'hello #:env (current-module))
|
|
@result{} world
|
|
@end example
|
|
|
|
@node Tree-IL
|
|
@subsection Tree-IL
|
|
|
|
Tree Intermediate Language (Tree-IL) is a structured intermediate
|
|
language that is close in expressive power to Scheme. It is an
|
|
expanded, pre-analyzed Scheme.
|
|
|
|
Tree-IL is ``structured'' in the sense that its representation is
|
|
based on records, not S-expressions. This gives a rigidity to the
|
|
language that ensures that compiling to a lower-level language only
|
|
requires a limited set of transformations. For example, the Tree-IL
|
|
type @code{<const>} is a record type with two fields, @code{src} and
|
|
@code{exp}. Instances of this type are created via @code{make-const}.
|
|
Fields of this type are accessed via the @code{const-src} and
|
|
@code{const-exp} procedures. There is also a predicate, @code{const?}.
|
|
@xref{Records}, for more information on records.
|
|
|
|
@c alpha renaming
|
|
|
|
All Tree-IL types have a @code{src} slot, which holds source location
|
|
information for the expression. This information, if present, will be
|
|
residualized into the compiled object code, allowing backtraces to
|
|
show source information. The format of @code{src} is the same as that
|
|
returned by Guile's @code{source-properties} function. @xref{Source
|
|
Properties}, for more information.
|
|
|
|
Although Tree-IL objects are represented internally using records,
|
|
there is also an equivalent S-expression external representation for
|
|
each kind of Tree-IL. For example, the S-expression representation
|
|
of @code{#<const src: #f exp: 3>} expression would be:
|
|
|
|
@example
|
|
(const 3)
|
|
@end example
|
|
|
|
Users may program with this format directly at the REPL:
|
|
|
|
@example
|
|
scheme@@(guile-user)> ,language tree-il
|
|
Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
|
|
tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10))
|
|
@result{} 42
|
|
@end example
|
|
|
|
The @code{src} fields are left out of the external representation.
|
|
|
|
One may create Tree-IL objects from their external representations via
|
|
calling @code{parse-tree-il}, the reader for Tree-IL. If any source
|
|
information is attached to the input S-expression, it will be
|
|
propagated to the resulting Tree-IL expressions. This is probably the
|
|
easiest way to compile to Tree-IL: just make the appropriate external
|
|
representations in S-expression format, and let @code{parse-tree-il}
|
|
take care of the rest.
|
|
|
|
@deftp {Scheme Variable} <void> src
|
|
@deftpx {External Representation} (void)
|
|
An empty expression. In practice, equivalent to Scheme's @code{(if #f
|
|
#f)}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <const> src exp
|
|
@deftpx {External Representation} (const @var{exp})
|
|
A constant.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <primitive-ref> src name
|
|
@deftpx {External Representation} (primitive @var{name})
|
|
A reference to a ``primitive''. A primitive is a procedure that, when
|
|
compiled, may be open-coded. For example, @code{cons} is usually
|
|
recognized as a primitive, so that it compiles down to a single
|
|
instruction.
|
|
|
|
Compilation of Tree-IL usually begins with a pass that resolves some
|
|
@code{<module-ref>} and @code{<toplevel-ref>} expressions to
|
|
@code{<primitive-ref>} expressions. The actual compilation pass has
|
|
special cases for calls to certain primitives, like @code{apply} or
|
|
@code{cons}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lexical-ref> src name gensym
|
|
@deftpx {External Representation} (lexical @var{name} @var{gensym})
|
|
A reference to a lexically-bound variable. The @var{name} is the
|
|
original name of the variable in the source program. @var{gensym} is a
|
|
unique identifier for this variable.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lexical-set> src name gensym exp
|
|
@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
|
|
Sets a lexically-bound variable.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <module-ref> src mod name public?
|
|
@deftpx {External Representation} (@@ @var{mod} @var{name})
|
|
@deftpx {External Representation} (@@@@ @var{mod} @var{name})
|
|
A reference to a variable in a specific module. @var{mod} should be
|
|
the name of the module, e.g.@: @code{(guile-user)}.
|
|
|
|
If @var{public?} is true, the variable named @var{name} will be looked
|
|
up in @var{mod}'s public interface, and serialized with @code{@@};
|
|
otherwise it will be looked up among the module's private bindings,
|
|
and is serialized with @code{@@@@}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <module-set> src mod name public? exp
|
|
@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
|
|
@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
|
|
Sets a variable in a specific module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <toplevel-ref> src name
|
|
@deftpx {External Representation} (toplevel @var{name})
|
|
References a variable from the current procedure's module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <toplevel-set> src name exp
|
|
@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
|
|
Sets a variable in the current procedure's module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <toplevel-define> src name exp
|
|
@deftpx {External Representation} (define (toplevel @var{name}) @var{exp})
|
|
Defines a new top-level variable in the current procedure's module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <conditional> src test then else
|
|
@deftpx {External Representation} (if @var{test} @var{then} @var{else})
|
|
A conditional. Note that @var{else} is not optional.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <call> src proc args
|
|
@deftpx {External Representation} (call @var{proc} . @var{args})
|
|
A procedure call.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <primcall> src name args
|
|
@deftpx {External Representation} (primcall @var{name} . @var{args})
|
|
A call to a primitive. Equivalent to @code{(call (primitive @var{name})
|
|
. @var{args})}. This construct is often more convenient to generate and
|
|
analyze than @code{<call>}.
|
|
|
|
As part of the compilation process, instances of @code{(call (primitive
|
|
@var{name}) . @var{args})} are transformed into primcalls.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <seq> src head tail
|
|
@deftpx {External Representation} (seq @var{head} @var{tail})
|
|
A sequence. The semantics is that @var{head} is evaluated first, and
|
|
any resulting values are ignored. Then @var{tail} is evaluated, in tail
|
|
position.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lambda> src meta body
|
|
@deftpx {External Representation} (lambda @var{meta} @var{body})
|
|
A closure. @var{meta} is an association list of properties for the
|
|
procedure. @var{body} is a single Tree-IL expression of type
|
|
@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
|
|
an alternate clause, this makes Tree-IL's @code{<lambda>} have the
|
|
expressiveness of Scheme's @code{case-lambda}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
|
|
@deftpx {External Representation} @
|
|
(lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
|
|
@var{body})@
|
|
[@var{alternate}])
|
|
One clause of a @code{case-lambda}. A @code{lambda} expression in
|
|
Scheme is treated as a @code{case-lambda} with one clause.
|
|
|
|
@var{req} is a list of the procedure's required arguments, as symbols.
|
|
@var{opt} is a list of the optional arguments, or @code{#f} if there
|
|
are no optional arguments. @var{rest} is the name of the rest
|
|
argument, or @code{#f}.
|
|
|
|
@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
|
|
(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
|
|
keyword corresponding to the argument named @var{name}, and whose
|
|
corresponding gensym is @var{var}. @var{inits} are tree-il expressions
|
|
corresponding to all of the optional and keyword arguments, evaluated to
|
|
bind variables whose value is not supplied by the procedure caller.
|
|
Each @var{init} expression is evaluated in the lexical context of
|
|
previously bound variables, from left to right.
|
|
|
|
@var{gensyms} is a list of gensyms corresponding to all arguments:
|
|
first all of the required arguments, then the optional arguments if
|
|
any, then the rest argument if any, then all of the keyword arguments.
|
|
|
|
@var{body} is the body of the clause. If the procedure is called with
|
|
an appropriate number of arguments, @var{body} is evaluated in tail
|
|
position. Otherwise, if there is an @var{alternate}, it should be a
|
|
@code{<lambda-case>} expression, representing the next clause to try.
|
|
If there is no @var{alternate}, a wrong-number-of-arguments error is
|
|
signaled.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <let> src names gensyms vals exp
|
|
@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
|
|
Lexical binding, like Scheme's @code{let}. @var{names} are the original
|
|
binding names, @var{gensyms} are gensyms corresponding to the
|
|
@var{names}, and @var{vals} are Tree-IL expressions for the values.
|
|
@var{exp} is a single Tree-IL expression.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
|
|
@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
|
|
@deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
|
|
A version of @code{<let>} that creates recursive bindings, like
|
|
Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <prompt> escape-only? tag body handler
|
|
@deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
|
|
A dynamic prompt. Instates a prompt named @var{tag}, an expression,
|
|
during the dynamic extent of the execution of @var{body}, also an
|
|
expression. If an abort occurs to this prompt, control will be passed
|
|
to @var{handler}, also an expression, which should be a procedure. The
|
|
first argument to the handler procedure will be the captured
|
|
continuation, followed by all of the values passed to the abort. If
|
|
@var{escape-only?} is true, the handler should be a @code{<lambda>} with
|
|
a single @code{<lambda-case>} body expression with no optional or
|
|
keyword arguments, and no alternate, and whose first argument is
|
|
unreferenced. @xref{Prompts}, for more information.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <abort> tag args tail
|
|
@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
|
|
An abort to the nearest prompt with the name @var{tag}, an expression.
|
|
@var{args} should be a list of expressions to pass to the prompt's
|
|
handler, and @var{tail} should be an expression that will evaluate to
|
|
a list of additional arguments. An abort will save the partial
|
|
continuation, which may later be reinstated, resulting in the
|
|
@code{<abort>} expression evaluating to some number of values.
|
|
@end deftp
|
|
|
|
There are two Tree-IL constructs that are not normally produced by
|
|
higher-level compilers, but instead are generated during the
|
|
source-to-source optimization and analysis passes that the Tree-IL
|
|
compiler does. Users should not generate these expressions directly,
|
|
unless they feel very clever, as the default analysis pass will generate
|
|
them as necessary.
|
|
|
|
@deftp {Scheme Variable} <let-values> src names gensyms exp body
|
|
@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
|
|
Like Scheme's @code{receive} -- binds the values returned by
|
|
evaluating @code{exp} to the @code{lambda}-like bindings described by
|
|
@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
|
|
|
|
@code{<let-values>} is an optimization of a @code{<call>} to the
|
|
primitive, @code{call-with-values}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <fix> src names gensyms vals body
|
|
@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
|
|
Like @code{<letrec>}, but only for @var{vals} that are unset
|
|
@code{lambda} expressions.
|
|
|
|
@code{fix} is an optimization of @code{letrec} (and @code{let}).
|
|
@end deftp
|
|
|
|
Tree-IL is a convenient compilation target from source languages. It
|
|
can be convenient as a medium for optimization, though CPS is usually
|
|
better. The strength of Tree-IL is that it does not fix order of
|
|
evaluation, so it makes some code motion a bit easier.
|
|
|
|
Optimization passes performed on Tree-IL currently include:
|
|
|
|
@itemize
|
|
@item Open-coding (turning toplevel-refs into primitive-refs,
|
|
and calls to primitives to primcalls)
|
|
@item Partial evaluation (comprising inlining, copy propagation, and
|
|
constant folding)
|
|
@item Common subexpression elimination (CSE)
|
|
@end itemize
|
|
|
|
In the future, we will move the CSE pass to operate over the lower-level
|
|
CPS language.
|
|
|
|
@node Continuation-Passing Style
|
|
@subsection Continuation-Passing Style
|
|
|
|
@cindex CPS
|
|
Continuation-passing style (CPS) is Guile's principal intermediate
|
|
language, bridging the gap between languages for people and languages
|
|
for machines. CPS gives a name to every part of a program: every
|
|
control point, and every intermediate value. This makes it an excellent
|
|
medium for reasoning about programs, which is the principal job of a
|
|
compiler.
|
|
|
|
@menu
|
|
* An Introduction to CPS::
|
|
* CPS in Guile::
|
|
* Building CPS::
|
|
@end menu
|
|
|
|
@node An Introduction to CPS
|
|
@subsubsection An Introduction to CPS
|
|
|
|
As an example, consider the following Scheme expression:
|
|
|
|
@lisp
|
|
(begin
|
|
(display "The sum of 32 and 10 is: ")
|
|
(display 42)
|
|
(newline))
|
|
@end lisp
|
|
|
|
Let us identify all of the sub-expressions in this expression. We give
|
|
them unique labels, like @var{k1}, and annotate the original source
|
|
code:
|
|
|
|
@lisp
|
|
(begin
|
|
(display "The sum of 32 and 10 is: ")
|
|
|k1 k2
|
|
k0
|
|
(display 42)
|
|
|k4 k5
|
|
k3
|
|
(newline))
|
|
|k7
|
|
k6
|
|
@end lisp
|
|
|
|
These labels also identify continuations. For example, the continuation
|
|
of @code{k7} is @code{k6}. This is because after evaluating the value
|
|
of @code{newline}, performed by the expression labelled @code{k7}, we
|
|
continue to apply it in @code{k6}.
|
|
|
|
Which label has @code{k0} as its continuation? It is either @code{k1}
|
|
or @code{k2}. Scheme does not have a fixed order of evaluation of
|
|
arguments, although it does guarantee that they are evaluated in some
|
|
order. However, continuation-passing style makes evaluation order
|
|
explicit. In Guile, this choice is made by the higher-level language
|
|
compilers.
|
|
|
|
Let us assume a left-to-right evaluation order. In that case the
|
|
continuation of @code{k1} is @code{k2}, and the continuation of
|
|
@code{k2} is @code{k0}.
|
|
|
|
With this example established, we are ready to give an example of CPS in
|
|
Scheme:
|
|
|
|
@lisp
|
|
(lambda (ktail)
|
|
(let ((k1 (lambda ()
|
|
(let ((k2 (lambda (proc)
|
|
(let ((k0 (lambda (arg0)
|
|
(proc k4 arg0))))
|
|
(k0 "The sum of 32 and 10 is: ")))))
|
|
(k2 display))))
|
|
(k4 (lambda _
|
|
(let ((k5 (lambda (proc)
|
|
(let ((k3 (lambda (arg0)
|
|
(proc k7 arg0))))
|
|
(k3 42)))))
|
|
(k5 display))))
|
|
(k7 (lambda _
|
|
(let ((k6 (lambda (proc)
|
|
(proc ktail))))
|
|
(k6 newline)))))
|
|
(k1))
|
|
@end lisp
|
|
|
|
Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS
|
|
is by nature much more verbose than ``direct-style'' intermediate
|
|
languages like Tree-IL. At the same time, CPS is more simple than full
|
|
Scheme, in the same way that a Turing machine is more simple than
|
|
Scheme, although they are semantically equivalent.
|
|
|
|
In the original program, the expression labelled @code{k0} is in effect
|
|
context. Any values it returns are ignored. This is reflected in CPS
|
|
by noting that its continuation, @code{k4}, takes any number of values
|
|
and ignores them. Compare this to @code{k2}, which takes a single
|
|
value; in this way we can say that @code{k1} is in a ``value'' context.
|
|
Likewise @code{k6} is in tail context with respect to the expression as
|
|
a whole, because its continuation is the tail continuation,
|
|
@code{ktail}. CPS makes these details manifest, and gives them names.
|
|
|
|
@subsubheading Compiling CPS
|
|
|
|
In CPS, there are no nested expressions. Indeed, CPS even removes the
|
|
concept of a stack. All applications in CPS are in tail context. For
|
|
that reason, applications in CPS are jumps, not calls. The @code{(k1)}
|
|
above is nothing more than a @code{goto}. @code{(k3 42)} is a
|
|
@code{goto} with a value. In this way, CPS bridges the gap between the
|
|
lambda calculus and machine instruction sequences.
|
|
|
|
On the side of machine instructions, Guile does still have a stack, and
|
|
the @code{lambda} forms shown above do not actually result in one
|
|
closure being allocated per subexpression at run-time. Lambda
|
|
expressions introduced by a CPS transformation can always be allocated
|
|
as labels or basic blocks within a function. In fact, we make a
|
|
syntactic distinction between closures and continuations in the CPS
|
|
language, and attempt to transform closures to continuations (basic
|
|
blocks) where possible, via the @dfn{contification} optimization pass.
|
|
|
|
Values bound by continuations are allocated to stack slots in a
|
|
function's frame. The compiler from CPS only allocates slots to values
|
|
that are actually live; it's possible to have a value in scope but not
|
|
allocated to a slot.
|
|
|
|
@node CPS in Guile
|
|
@subsubsection CPS in Guile
|
|
|
|
Guile's CPS language is composed of @dfn{terms}, @dfn{expressions},
|
|
and @dfn{continuations}.
|
|
|
|
A term can either evaluate an expression and pass the resulting values
|
|
to some continuation, or it can declare local continuations and contain
|
|
a sub-term in the scope of those continuations.
|
|
|
|
@deftp {CPS Term} $continue k src exp
|
|
Evaluate the expression @var{exp} and pass the resulting values (if any)
|
|
to the continuation labelled @var{k}. The source information associated
|
|
with the expression may be found in @var{src}, which is either an alist
|
|
as in @code{source-properties} or is @code{#f} if there is no associated
|
|
source.
|
|
@end deftp
|
|
|
|
@deftp {CPS Term} $letk conts body
|
|
Bind @var{conts}, a list of continuations (@code{$cont} instances), in
|
|
the scope of the sub-term @var{body}. The continuations are mutually
|
|
recursive.
|
|
@end deftp
|
|
|
|
Additionally, the early stages of CPS allow for a set of mutually
|
|
recursive functions to be declared as a term. This @code{$letrec} type
|
|
is like Tree-IL's @code{<fix>}. The contification pass will attempt to
|
|
transform the functions declared in a @code{$letrec} into local
|
|
continuations. Any remaining functions are later lowered to @code{$fun}
|
|
expressions.
|
|
|
|
@deftp {CPS Term} $letrec names syms funs body
|
|
Declare the mutually recursive set of functions denoted by @var{names},
|
|
@var{syms}, and @var{funs} within the sub-term @var{body}. @var{names}
|
|
and @var{syms} are lists of symbols, and @var{funs} is a list of
|
|
@code{$fun} values. @var{syms} are globally unique.
|
|
@end deftp
|
|
|
|
Here is an inventory of the kinds of expressions in Guile's CPS
|
|
language. Recall that all expressions are wrapped in a @code{$continue}
|
|
term which specifies their continuation.
|
|
|
|
@deftp {CPS Expression} $void
|
|
Continue with the unspecified value.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $const val
|
|
Continue with the constant value @var{val}.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $prim name
|
|
Continue with the procedure that implements the primitive operation
|
|
named by @var{name}.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $fun src meta free body
|
|
Continue with a procedure. @var{src} identifies the source information
|
|
for the procedure declaration, and @var{meta} is the metadata alist as
|
|
described above in Tree-IL's @code{<lambda>}. @var{free} is a list of
|
|
free variables accessed by the procedure. Early CPS uses an empty list
|
|
for @var{free}; only after closure conversion is it correctly populated.
|
|
Finally, @var{body} is the @code{$kentry} @code{$cont} of the procedure
|
|
entry.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $call proc args
|
|
Call @var{proc} with the arguments @var{args}, and pass all values to
|
|
the continuation. @var{proc} and the elements of the @var{args} list
|
|
should all be variable names. The continuation identified by the term's
|
|
@var{k} should be a @code{$kreceive} or a @code{$ktail} instance.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $primcall name args
|
|
Perform the primitive operation identified by @code{name}, a well-known
|
|
symbol, passing it the arguments @var{args}, and pass all resulting
|
|
values to the continuation. The set of available primitives includes
|
|
all primitives known to Tree-IL and then some more; see the source code
|
|
for details.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $values args
|
|
Pass the values named by the list @var{args} to the continuation.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $prompt escape? tag handler
|
|
Push a prompt on the stack identified by the variable name @var{tag},
|
|
which may be escape-only if @var{escape?} is true, and continue with
|
|
zero values. If the body aborts to this prompt, control will proceed at
|
|
the continuation labelled @var{handler}, which should be a
|
|
@code{$kreceive} continuation. Prompts are later popped by
|
|
@code{pop-prompt} primcalls.
|
|
@end deftp
|
|
|
|
The remaining element of the CPS language in Guile is the continuation.
|
|
In CPS, all continuations have unique labels. Since this aspect is
|
|
common to all continuation types, all continuations are contained in a
|
|
@code{$cont} instance:
|
|
|
|
@deftp {CPS Continuation Wrapper} $cont k cont
|
|
Declare a continuation labelled @var{k}. All references to the
|
|
continuation will use this label.
|
|
@end deftp
|
|
|
|
The most common kind of continuation binds some number of values, and
|
|
then evaluates a sub-term. @code{$kargs} is this kind of simple
|
|
@code{lambda}.
|
|
|
|
@deftp {CPS Continuation} $kargs names syms body
|
|
Bind the incoming values to the variables @var{syms}, with original
|
|
names @var{names}, and then evaluate the sub-term @var{body}.
|
|
@end deftp
|
|
|
|
Variable names (the names in the @var{syms} of a @code{$kargs}) should
|
|
be globally unique, and also disjoint from continuation labels. To bind
|
|
a value to a variable and then evaluate some term, you would continue
|
|
with the value to a @code{$kargs} that declares one variable. The bound
|
|
value would then be available for use within the body of the
|
|
@code{$kargs}.
|
|
|
|
@deftp {CPS Continuation} $kif kt kf
|
|
Receive one value. If it is true for the purposes of Scheme, branch to
|
|
the continuation labelled @var{kt}, passing no values; otherwise, branch
|
|
to @var{kf}.
|
|
@end deftp
|
|
|
|
For internal reasons, only certain terms may continue to a @code{$kif}.
|
|
Compiling @code{$kif} avoids allocating space for the test variable, so
|
|
it needs to be preceded by expressions that can test-and-branch without
|
|
temporary values. In practice this condition is true for
|
|
@code{$primcall}s to @code{null?}, @code{=}, and similar primitives that
|
|
have corresponding @code{br-if-@var{foo}} VM operations; see the source
|
|
code for full details. When in doubt, bind the test expression to a
|
|
variable, and continue to the @code{$kif} with a @code{$values}
|
|
expression. The optimizer should elide the @code{$values} if it is not
|
|
needed.
|
|
|
|
Calls out to other functions need to be wrapped in a @code{$kreceive}
|
|
continuation in order to adapt the returned values to their uses in the
|
|
calling function, if any.
|
|
|
|
@deftp {CPS Continuation} $kreceive arity k
|
|
Receive values on the stack. Parse them according to @var{arity}, and
|
|
then proceed with the parsed values to the @code{$kargs} continuation
|
|
labelled @var{k}. As a limitation specific to @code{$kreceive},
|
|
@var{arity} may only contain required and rest arguments.
|
|
@end deftp
|
|
|
|
@code{$arity} is a helper data structure used by @code{$kreceive} and
|
|
also by @code{$kclause}, described below.
|
|
|
|
@deftp {CPS Data} $arity req opt rest kw allow-other-keys?
|
|
A data type declaring an arity. @var{req} and @var{opt} are lists of
|
|
source names of required and optional arguments, respectively.
|
|
@var{rest} is either the source name of the rest variable, or @code{#f}
|
|
if this arity does not accept additional values. @var{kw} is a list of
|
|
the form @code{((@var{keyword} @var{name} @var{var}) ...)}, describing
|
|
the keyword arguments. @var{allow-other-keys?} is true if other keyword
|
|
arguments are allowed and false otherwise.
|
|
|
|
Note that all of these names with the exception of the @var{var}s in the
|
|
@var{kw} list are source names, not unique variable names.
|
|
@end deftp
|
|
|
|
Additionally, there are three specific kinds of continuations that can
|
|
only be declared at function entries.
|
|
|
|
@deftp {CPS Continuation} $kentry self tail clauses
|
|
Declare a function entry. @var{self} is a variable bound to the
|
|
procedure being called, and which may be used for self-references.
|
|
@var{tail} declares the @code{$cont} wrapping the @code{$ktail} for this
|
|
function, corresponding to the function's tail continuation.
|
|
@var{clauses} is a list of @code{$kclause} @code{$cont} instances.
|
|
@end deftp
|
|
|
|
@deftp {CPS Continuation} $ktail
|
|
A tail continuation.
|
|
@end deftp
|
|
|
|
@deftp {CPS Continuation} $kclause arity cont
|
|
A clause of a function with a given arity. Applications of a function
|
|
with a compatible set of actual arguments will continue to @var{cont}, a
|
|
@code{$kargs} @code{$cont} instance representing the clause body.
|
|
@end deftp
|
|
|
|
|
|
@node Building CPS
|
|
@subsubsection Building CPS
|
|
|
|
Unlike Tree-IL, the CPS language is built to be constructed and
|
|
deconstructed with abstract macros instead of via procedural
|
|
constructors or accessors, or instead of S-expression matching.
|
|
|
|
Deconstruction and matching is handled adequately by the @code{match}
|
|
form from @code{(ice-9 match)}. @xref{Pattern Matching}. Construction
|
|
is handled by a set of mutually recursive builder macros:
|
|
@code{build-cps-term}, @code{build-cps-cont}, and @code{build-cps-exp}.
|
|
|
|
In the following interface definitions, consider variables containing
|
|
@code{cont} to be recursively build by @code{build-cps-cont}, and
|
|
likewise for @code{term} and @code{exp}. Consider any other name to be
|
|
evaluated as a Scheme expression. Many of these forms recognize
|
|
@code{unquote} in some contexts, to splice in a previously-built value;
|
|
see the specifications below for full details.
|
|
|
|
@deffn {Scheme Syntax} build-cps-term ,val
|
|
@deffnx {Scheme Syntax} build-cps-term ($letk (cont ...) term)
|
|
@deffnx {Scheme Syntax} build-cps-term ($letrec names syms funs term)
|
|
@deffnx {Scheme Syntax} build-cps-term ($continue k src exp)
|
|
@deffnx {Scheme Syntax} build-cps-exp ,val
|
|
@deffnx {Scheme Syntax} build-cps-exp ($void)
|
|
@deffnx {Scheme Syntax} build-cps-exp ($const val)
|
|
@deffnx {Scheme Syntax} build-cps-exp ($prim name)
|
|
@deffnx {Scheme Syntax} build-cps-exp ($fun src meta free body)
|
|
@deffnx {Scheme Syntax} build-cps-exp ($call proc (arg ...))
|
|
@deffnx {Scheme Syntax} build-cps-exp ($call proc args)
|
|
@deffnx {Scheme Syntax} build-cps-exp ($primcall name (arg ...))
|
|
@deffnx {Scheme Syntax} build-cps-exp ($primcall name args)
|
|
@deffnx {Scheme Syntax} build-cps-exp ($values (arg ...))
|
|
@deffnx {Scheme Syntax} build-cps-exp ($values args)
|
|
@deffnx {Scheme Syntax} build-cps-exp ($prompt escape? tag handler)
|
|
@deffnx {Scheme Syntax} build-cps-cont ,val
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kargs (name ...) (sym ...) term))
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kargs names syms term))
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kif kt kf))
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kreceive req rest kargs))
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont ,clauses))
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kentry self tail-cont (cont ...)))
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kclause ,arity cont))
|
|
@deffnx {Scheme Syntax} build-cps-cont (k ($kclause (req opt rest kw aok?) cont))
|
|
Construct a CPS term, expression, or continuation.
|
|
@end deffn
|
|
|
|
There are a few more miscellaneous interfaces as well.
|
|
|
|
@deffn {Scheme Procedure} make-arity req opt rest kw allow-other-keywords?
|
|
A procedural constructor for @code{$arity} objects.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Syntax} let-gensyms (sym ...) body ...
|
|
Bind @var{sym...} to fresh names, and evaluate @var{body...}.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Syntax} rewrite-cps-term val (pat term) ...
|
|
@deffnx {Scheme Syntax} rewrite-cps-exp val (pat exp) ...
|
|
@deffnx {Scheme Syntax} rewrite-cps-cont val (pat cont) ...
|
|
Match @var{val} against the series of patterns @var{pat...}, using
|
|
@code{match}. The body of the matching clause should be a template in
|
|
the syntax of @code{build-cps-term}, @code{build-cps-exp}, or
|
|
@code{build-cps-cont}, respectively.
|
|
@end deffn
|
|
|
|
@node Bytecode
|
|
@subsection Bytecode
|
|
|
|
@xref{Object File Format}.
|
|
|
|
TODO: document (system vm loader)
|
|
|
|
@deffn {Scheme Variable} load-thunk-from-file file
|
|
@deffnx {C Function} scm_load_thunk_from_file (file)
|
|
Load object code from a file named @var{file}. The file will be mapped
|
|
into memory via @code{mmap}, so this is a very fast operation.
|
|
|
|
On disk, object code is embedded in ELF, a flexible container format
|
|
created for use in UNIX systems. Guile has its own ELF linker and
|
|
loader, so it uses the ELF format on all systems.
|
|
@end deffn
|
|
|
|
TODO: document load-thunk-from-memory
|
|
|
|
Compiling object code to the fake language, @code{value}, is performed
|
|
via loading objcode into a program, then executing that thunk with
|
|
respect to the compilation environment. Normally the environment
|
|
propagates through the compiler transparently, but users may specify
|
|
the compilation environment manually as well, as a module.
|
|
|
|
|
|
@node Writing New High-Level Languages
|
|
@subsection Writing New High-Level Languages
|
|
|
|
In order to integrate a new language @var{lang} into Guile's compiler
|
|
system, one has to create the module @code{(language @var{lang} spec)}
|
|
containing the language definition and referencing the parser,
|
|
compiler and other routines processing it. The module hierarchy in
|
|
@code{(language brainfuck)} defines a very basic Brainfuck
|
|
implementation meant to serve as easy-to-understand example on how to
|
|
do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
|
|
for more information about the Brainfuck language itself.
|
|
|
|
|
|
@node Extending the Compiler
|
|
@subsection Extending the Compiler
|
|
|
|
At this point we take a detour from the impersonal tone of the rest of
|
|
the manual. Admit it: if you've read this far into the compiler
|
|
internals manual, you are a junkie. Perhaps a course at your university
|
|
left you unsated, or perhaps you've always harbored a desire to hack the
|
|
holy of computer science holies: a compiler. Well you're in good
|
|
company, and in a good position. Guile's compiler needs your help.
|
|
|
|
There are many possible avenues for improving Guile's compiler.
|
|
Probably the most important improvement, speed-wise, will be some form
|
|
of native compilation, both just-in-time and ahead-of-time. This could
|
|
be done in many ways. Probably the easiest strategy would be to extend
|
|
the compiled procedure structure to include a pointer to a native code
|
|
vector, and compile from bytecode to native code at run-time after a
|
|
procedure is called a certain number of times.
|
|
|
|
The name of the game is a profiling-based harvest of the low-hanging
|
|
fruit, running programs of interest under a system-level profiler and
|
|
determining which improvements would give the most bang for the buck.
|
|
It's really getting to the point though that native compilation is the
|
|
next step.
|
|
|
|
The compiler also needs help at the top end, enhancing the Scheme that
|
|
it knows to also understand R6RS, and adding new high-level compilers.
|
|
We have JavaScript and Emacs Lisp mostly complete, but they could use
|
|
some love; Lua would be nice as well, but whatever language it is
|
|
that strikes your fancy would be welcome too.
|
|
|
|
Compilers are for hacking, not for admiring or for complaining about.
|
|
Get to it!
|