mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 11:50:28 +02:00
* doc/ref/compiler.texi (CPS Soup) : fix small typo in doc (closing paren out of @code{} tag). Signed-off-by: Ludovic Courtès <ludo@gnu.org>
1408 lines
60 KiB
Text
1408 lines
60 KiB
Text
@c -*-texinfo-*-
|
|
@c This is part of the GNU Guile Reference Manual.
|
|
@c Copyright (C) 2008-2016, 2018, 2020
|
|
@c Free Software Foundation, Inc.
|
|
@c See the file guile.texi for copying conditions.
|
|
|
|
@node Compiling to the Virtual Machine
|
|
@section Compiling to the Virtual Machine
|
|
|
|
Compilers! The word itself inspires excitement and awe, even among
|
|
experienced practitioners. But a compiler is just a program: an
|
|
eminently hackable thing. This section aims to describe Guile's
|
|
compiler in such a way that interested Scheme hackers can feel
|
|
comfortable reading and extending it.
|
|
|
|
@xref{Read/Load/Eval/Compile}, if you're lost and you just wanted to
|
|
know how to compile your @code{.scm} file.
|
|
|
|
@menu
|
|
* Compiler Tower::
|
|
* The Scheme Compiler::
|
|
* Tree-IL::
|
|
* Continuation-Passing Style::
|
|
* Bytecode::
|
|
* Writing New High-Level Languages::
|
|
* Extending the Compiler::
|
|
@end menu
|
|
|
|
@node Compiler Tower
|
|
@subsection Compiler Tower
|
|
|
|
Guile's compiler is quite simple -- its @emph{compilers}, to put it more
|
|
accurately. Guile defines a tower of languages, starting at Scheme and
|
|
progressively simplifying down to languages that resemble the VM
|
|
instruction set (@pxref{Instruction Set}).
|
|
|
|
Each language knows how to compile to the next, so each step is simple
|
|
and understandable. Furthermore, this set of languages is not hardcoded
|
|
into Guile, so it is possible for the user to add new high-level
|
|
languages, new passes, or even different compilation targets.
|
|
|
|
Languages are registered in the module, @code{(system base language)}:
|
|
|
|
@example
|
|
(use-modules (system base language))
|
|
@end example
|
|
|
|
They are registered with the @code{define-language} form.
|
|
|
|
@deffn {Scheme Syntax} define-language @
|
|
[#:name] [#:title] [#:reader] [#:printer] @
|
|
[#:parser=#f] [#:compilers='()] @
|
|
[#:decompilers='()] [#:evaluator=#f] @
|
|
[#:joiner=#f] [#:for-humans?=#t] @
|
|
[#:make-default-environment=make-fresh-user-module] @
|
|
[#:lowerer=#f] [#:analyzer=#f] [#:compiler-chooser=#f]
|
|
Define a language.
|
|
|
|
This syntax defines a @code{<language>} object, bound to @var{name} in
|
|
the current environment. In addition, the language will be added to the
|
|
global language set. For example, this is the language definition for
|
|
Scheme:
|
|
|
|
@example
|
|
(define-language scheme
|
|
#:title "Scheme"
|
|
#:reader (lambda (port env) ...)
|
|
#:compilers `((tree-il . ,compile-tree-il))
|
|
#:decompilers `((tree-il . ,decompile-tree-il))
|
|
#:evaluator (lambda (x module) (primitive-eval x))
|
|
#:printer write
|
|
#:make-default-environment (lambda () ...))
|
|
@end example
|
|
@end deffn
|
|
|
|
The interesting thing about having languages defined this way is that
|
|
they present a uniform interface to the read-eval-print loop. This
|
|
allows the user to change the current language of the REPL:
|
|
|
|
@example
|
|
scheme@@(guile-user)> ,language tree-il
|
|
Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
|
|
tree-il@@(guile-user)> ,L scheme
|
|
Happy hacking with Scheme! To switch back, type `,L tree-il'.
|
|
scheme@@(guile-user)>
|
|
@end example
|
|
|
|
Languages can be looked up by name, as they were above.
|
|
|
|
@deffn {Scheme Procedure} lookup-language name
|
|
Looks up a language named @var{name}, autoloading it if necessary.
|
|
|
|
Languages are autoloaded by looking for a variable named @var{name} in
|
|
a module named @code{(language @var{name} spec)}.
|
|
|
|
The language object will be returned, or @code{#f} if there does not
|
|
exist a language with that name.
|
|
@end deffn
|
|
|
|
When Guile goes to compile Scheme to bytecode, it will ask the Scheme
|
|
language to choose a compiler from Scheme to the next language on the
|
|
path from Scheme to bytecode. Performing this computation recursively
|
|
builds transformations from a flexible chain of compilers. The next
|
|
link will be obtained by invoking the language's compiler chooser, or if
|
|
not present, from the language's compilers field.
|
|
|
|
A language can specify an analyzer, which is run before a term of that
|
|
language is lowered and compiled. This is where compiler warnings are
|
|
issued.
|
|
|
|
If a language specifies a lowerer, that procedure is called on
|
|
expressions before compilation. This is where optimizations and
|
|
canonicalizations go.
|
|
|
|
Finally a language's compiler translates a lowered term from one
|
|
language to the next one in the chain.
|
|
|
|
There is a notion of a ``current language'', which is maintained in the
|
|
@code{current-language} parameter, defined in the core @code{(guile)}
|
|
module. This language is normally Scheme, and may be rebound by the
|
|
user. The run-time compilation interfaces
|
|
(@pxref{Read/Load/Eval/Compile}) also allow you to choose other source
|
|
and target languages.
|
|
|
|
The normal tower of languages when compiling Scheme goes like this:
|
|
|
|
@itemize
|
|
@item Scheme
|
|
@item Tree Intermediate Language (Tree-IL)
|
|
@item Continuation-Passing Style (CPS)
|
|
@item Bytecode
|
|
@end itemize
|
|
|
|
As discussed before (@pxref{Object File Format}), bytecode is in ELF
|
|
format, ready to be serialized to disk. But when compiling Scheme at
|
|
run time, you want a Scheme value: for example, a compiled procedure.
|
|
For this reason, so as not to break the abstraction, Guile defines a
|
|
fake language at the bottom of the tower:
|
|
|
|
@itemize
|
|
@item Value
|
|
@end itemize
|
|
|
|
Compiling to @code{value} loads the bytecode into a procedure, turning
|
|
cold bytes into warm code.
|
|
|
|
Perhaps this strangeness can be explained by example:
|
|
@code{compile-file} defaults to compiling to bytecode, because it
|
|
produces object code that has to live in the barren world outside the
|
|
Guile runtime; but @code{compile} defaults to compiling to @code{value},
|
|
as its product re-enters the Guile world.
|
|
|
|
@c FIXME: This doesn't work anymore :( Should we add some kind of
|
|
@c special GC pass, or disclaim this kind of code, or what?
|
|
|
|
Indeed, the process of compilation can circulate through these
|
|
different worlds indefinitely, as shown by the following quine:
|
|
|
|
@example
|
|
((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))
|
|
@end example
|
|
|
|
@node The Scheme Compiler
|
|
@subsection The Scheme Compiler
|
|
|
|
The job of the Scheme compiler is to expand all macros and all of Scheme
|
|
to its most primitive expressions. The definition of ``primitive
|
|
expression'' is given by the inventory of constructs provided by
|
|
Tree-IL, the target language of the Scheme compiler: procedure calls,
|
|
conditionals, lexical references, and so on. This is described more
|
|
fully in the next section.
|
|
|
|
The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
|
|
that it is completely implemented by the macro expander. Since the
|
|
macro expander has to run over all of the source code already in order
|
|
to expand macros, it might as well do the analysis at the same time,
|
|
producing Tree-IL expressions directly.
|
|
|
|
Because this compiler is actually the macro expander, it is extensible.
|
|
Any macro which the user writes becomes part of the compiler.
|
|
|
|
The Scheme-to-Tree-IL expander may be invoked using the generic
|
|
@code{compile} procedure:
|
|
|
|
@lisp
|
|
(compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
|
|
@result{}
|
|
#<tree-il (call (toplevel +) (const 1) (const 2))>
|
|
@end lisp
|
|
|
|
@code{(compile @var{foo} #:from 'scheme #:to 'tree-il)} is entirely
|
|
equivalent to calling the macro expander as @code{(macroexpand @var{foo}
|
|
'c '(compile load eval))}. @xref{Macro Expansion}.
|
|
@code{compile-tree-il}, the procedure dispatched by @code{compile} to
|
|
@code{'tree-il}, is a small wrapper around @code{macroexpand}, to make
|
|
it conform to the general form of compiler procedures in Guile's
|
|
language tower.
|
|
|
|
Compiler procedures take three arguments: an expression, an
|
|
environment, and a keyword list of options. They return three values:
|
|
the compiled expression, the corresponding environment for the target
|
|
language, and a ``continuation environment''. The compiled expression
|
|
and environment will serve as input to the next language's compiler.
|
|
The ``continuation environment'' can be used to compile another
|
|
expression from the same source language within the same module.
|
|
|
|
For example, you might compile the expression, @code{(define-module
|
|
(foo))}. This will result in a Tree-IL expression and environment. But
|
|
if you compiled a second expression, you would want to take into account
|
|
the compile-time effect of compiling the previous expression, which puts
|
|
the user in the @code{(foo)} module. That is the purpose of the
|
|
``continuation environment''; you would pass it as the environment when
|
|
compiling the subsequent expression.
|
|
|
|
For Scheme, an environment is a module. By default, the @code{compile}
|
|
and @code{compile-file} procedures compile in a fresh module, such
|
|
that bindings and macros introduced by the expression being compiled
|
|
are isolated:
|
|
|
|
@example
|
|
(eq? (current-module) (compile '(current-module)))
|
|
@result{} #f
|
|
|
|
(compile '(define hello 'world))
|
|
(defined? 'hello)
|
|
@result{} #f
|
|
|
|
(define / *)
|
|
(eq? (compile '/) /)
|
|
@result{} #f
|
|
@end example
|
|
|
|
Similarly, changes to the @code{current-reader} fluid (@pxref{Loading,
|
|
@code{current-reader}}) are isolated:
|
|
|
|
@example
|
|
(compile '(fluid-set! current-reader (lambda args 'fail)))
|
|
(fluid-ref current-reader)
|
|
@result{} #f
|
|
@end example
|
|
|
|
Nevertheless, having the compiler and @dfn{compilee} share the same name
|
|
space can be achieved by explicitly passing @code{(current-module)} as
|
|
the compilation environment:
|
|
|
|
@example
|
|
(define hello 'world)
|
|
(compile 'hello #:env (current-module))
|
|
@result{} world
|
|
@end example
|
|
|
|
@node Tree-IL
|
|
@subsection Tree-IL
|
|
|
|
Tree Intermediate Language (Tree-IL) is a structured intermediate
|
|
language that is close in expressive power to Scheme. It is an
|
|
expanded, pre-analyzed Scheme.
|
|
|
|
Tree-IL is ``structured'' in the sense that its representation is
|
|
based on records, not S-expressions. This gives a rigidity to the
|
|
language that ensures that compiling to a lower-level language only
|
|
requires a limited set of transformations. For example, the Tree-IL
|
|
type @code{<const>} is a record type with two fields, @code{src} and
|
|
@code{exp}. Instances of this type are created via @code{make-const}.
|
|
Fields of this type are accessed via the @code{const-src} and
|
|
@code{const-exp} procedures. There is also a predicate, @code{const?}.
|
|
@xref{Records}, for more information on records.
|
|
|
|
@c alpha renaming
|
|
|
|
All Tree-IL types have a @code{src} slot, which holds source location
|
|
information for the expression. This information, if present, will be
|
|
residualized into the compiled object code, allowing backtraces to
|
|
show source information. The format of @code{src} is the same as that
|
|
returned by Guile's @code{source-properties} function. @xref{Source
|
|
Properties}, for more information.
|
|
|
|
Although Tree-IL objects are represented internally using records,
|
|
there is also an equivalent S-expression external representation for
|
|
each kind of Tree-IL. For example, the S-expression representation
|
|
of @code{#<const src: #f exp: 3>} expression would be:
|
|
|
|
@example
|
|
(const 3)
|
|
@end example
|
|
|
|
Users may program with this format directly at the REPL:
|
|
|
|
@example
|
|
scheme@@(guile-user)> ,language tree-il
|
|
Happy hacking with Tree Intermediate Language! To switch back, type `,L scheme'.
|
|
tree-il@@(guile-user)> (call (primitive +) (const 32) (const 10))
|
|
@result{} 42
|
|
@end example
|
|
|
|
The @code{src} fields are left out of the external representation.
|
|
|
|
One may create Tree-IL objects from their external representations via
|
|
calling @code{parse-tree-il}, the reader for Tree-IL. If any source
|
|
information is attached to the input S-expression, it will be
|
|
propagated to the resulting Tree-IL expressions. This is probably the
|
|
easiest way to compile to Tree-IL: just make the appropriate external
|
|
representations in S-expression format, and let @code{parse-tree-il}
|
|
take care of the rest.
|
|
|
|
@deftp {Scheme Variable} <void> src
|
|
@deftpx {External Representation} (void)
|
|
An empty expression. In practice, equivalent to Scheme's @code{(if #f
|
|
#f)}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <const> src exp
|
|
@deftpx {External Representation} (const @var{exp})
|
|
A constant.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <primitive-ref> src name
|
|
@deftpx {External Representation} (primitive @var{name})
|
|
A reference to a ``primitive''. A primitive is a procedure that, when
|
|
compiled, may be open-coded. For example, @code{cons} is usually
|
|
recognized as a primitive, so that it compiles down to a single
|
|
instruction.
|
|
|
|
Compilation of Tree-IL usually begins with a pass that resolves some
|
|
@code{<module-ref>} and @code{<toplevel-ref>} expressions to
|
|
@code{<primitive-ref>} expressions. The actual compilation pass has
|
|
special cases for calls to certain primitives, like @code{apply} or
|
|
@code{cons}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lexical-ref> src name gensym
|
|
@deftpx {External Representation} (lexical @var{name} @var{gensym})
|
|
A reference to a lexically-bound variable. The @var{name} is the
|
|
original name of the variable in the source program. @var{gensym} is a
|
|
unique identifier for this variable.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lexical-set> src name gensym exp
|
|
@deftpx {External Representation} (set! (lexical @var{name} @var{gensym}) @var{exp})
|
|
Sets a lexically-bound variable.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <module-ref> src mod name public?
|
|
@deftpx {External Representation} (@@ @var{mod} @var{name})
|
|
@deftpx {External Representation} (@@@@ @var{mod} @var{name})
|
|
A reference to a variable in a specific module. @var{mod} should be
|
|
the name of the module, e.g.@: @code{(guile-user)}.
|
|
|
|
If @var{public?} is true, the variable named @var{name} will be looked
|
|
up in @var{mod}'s public interface, and serialized with @code{@@};
|
|
otherwise it will be looked up among the module's private bindings,
|
|
and is serialized with @code{@@@@}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <module-set> src mod name public? exp
|
|
@deftpx {External Representation} (set! (@@ @var{mod} @var{name}) @var{exp})
|
|
@deftpx {External Representation} (set! (@@@@ @var{mod} @var{name}) @var{exp})
|
|
Sets a variable in a specific module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <toplevel-ref> src name
|
|
@deftpx {External Representation} (toplevel @var{name})
|
|
References a variable from the current procedure's module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <toplevel-set> src name exp
|
|
@deftpx {External Representation} (set! (toplevel @var{name}) @var{exp})
|
|
Sets a variable in the current procedure's module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <toplevel-define> src name exp
|
|
@deftpx {External Representation} (define @var{name} @var{exp})
|
|
Defines a new top-level variable in the current procedure's module.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <conditional> src test then else
|
|
@deftpx {External Representation} (if @var{test} @var{then} @var{else})
|
|
A conditional. Note that @var{else} is not optional.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <call> src proc args
|
|
@deftpx {External Representation} (call @var{proc} . @var{args})
|
|
A procedure call.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <primcall> src name args
|
|
@deftpx {External Representation} (primcall @var{name} . @var{args})
|
|
A call to a primitive. Equivalent to @code{(call (primitive @var{name})
|
|
. @var{args})}. This construct is often more convenient to generate and
|
|
analyze than @code{<call>}.
|
|
|
|
As part of the compilation process, instances of @code{(call (primitive
|
|
@var{name}) . @var{args})} are transformed into primcalls.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <seq> src head tail
|
|
@deftpx {External Representation} (seq @var{head} @var{tail})
|
|
A sequence. The semantics is that @var{head} is evaluated first, and
|
|
any resulting values are ignored. Then @var{tail} is evaluated, in tail
|
|
position.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lambda> src meta body
|
|
@deftpx {External Representation} (lambda @var{meta} @var{body})
|
|
A closure. @var{meta} is an association list of properties for the
|
|
procedure. @var{body} is a single Tree-IL expression of type
|
|
@code{<lambda-case>}. As the @code{<lambda-case>} clause can chain to
|
|
an alternate clause, this makes Tree-IL's @code{<lambda>} have the
|
|
expressiveness of Scheme's @code{case-lambda}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <lambda-case> req opt rest kw inits gensyms body alternate
|
|
@deftpx {External Representation} @
|
|
(lambda-case ((@var{req} @var{opt} @var{rest} @var{kw} @var{inits} @var{gensyms})@
|
|
@var{body})@
|
|
[@var{alternate}])
|
|
One clause of a @code{case-lambda}. A @code{lambda} expression in
|
|
Scheme is treated as a @code{case-lambda} with one clause.
|
|
|
|
@var{req} is a list of the procedure's required arguments, as symbols.
|
|
@var{opt} is a list of the optional arguments, or @code{#f} if there
|
|
are no optional arguments. @var{rest} is the name of the rest
|
|
argument, or @code{#f}.
|
|
|
|
@var{kw} is a list of the form, @code{(@var{allow-other-keys?}
|
|
(@var{keyword} @var{name} @var{var}) ...)}, where @var{keyword} is the
|
|
keyword corresponding to the argument named @var{name}, and whose
|
|
corresponding gensym is @var{var}, or @code{#f} if there are no keyword
|
|
arguments. @var{inits} are tree-il expressions corresponding to all of
|
|
the optional and keyword arguments, evaluated to bind variables whose
|
|
value is not supplied by the procedure caller. Each @var{init}
|
|
expression is evaluated in the lexical context of previously bound
|
|
variables, from left to right.
|
|
|
|
@var{gensyms} is a list of gensyms corresponding to all arguments:
|
|
first all of the required arguments, then the optional arguments if
|
|
any, then the rest argument if any, then all of the keyword arguments.
|
|
|
|
@var{body} is the body of the clause. If the procedure is called with
|
|
an appropriate number of arguments, @var{body} is evaluated in tail
|
|
position. Otherwise, if there is an @var{alternate}, it should be a
|
|
@code{<lambda-case>} expression, representing the next clause to try.
|
|
If there is no @var{alternate}, a wrong-number-of-arguments error is
|
|
signaled.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <let> src names gensyms vals exp
|
|
@deftpx {External Representation} (let @var{names} @var{gensyms} @var{vals} @var{exp})
|
|
Lexical binding, like Scheme's @code{let}. @var{names} are the original
|
|
binding names, @var{gensyms} are gensyms corresponding to the
|
|
@var{names}, and @var{vals} are Tree-IL expressions for the values.
|
|
@var{exp} is a single Tree-IL expression.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <letrec> in-order? src names gensyms vals exp
|
|
@deftpx {External Representation} (letrec @var{names} @var{gensyms} @var{vals} @var{exp})
|
|
@deftpx {External Representation} (letrec* @var{names} @var{gensyms} @var{vals} @var{exp})
|
|
A version of @code{<let>} that creates recursive bindings, like
|
|
Scheme's @code{letrec}, or @code{letrec*} if @var{in-order?} is true.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <prompt> escape-only? tag body handler
|
|
@deftpx {External Representation} (prompt @var{escape-only?} @var{tag} @var{body} @var{handler})
|
|
A dynamic prompt. Instates a prompt named @var{tag}, an expression,
|
|
during the dynamic extent of the execution of @var{body}, also an
|
|
expression. If an abort occurs to this prompt, control will be passed
|
|
to @var{handler}, also an expression, which should be a procedure. The
|
|
first argument to the handler procedure will be the captured
|
|
continuation, followed by all of the values passed to the abort. If
|
|
@var{escape-only?} is true, the handler should be a @code{<lambda>} with
|
|
a single @code{<lambda-case>} body expression with no optional or
|
|
keyword arguments, and no alternate, and whose first argument is
|
|
unreferenced. @xref{Prompts}, for more information.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <abort> tag args tail
|
|
@deftpx {External Representation} (abort @var{tag} @var{args} @var{tail})
|
|
An abort to the nearest prompt with the name @var{tag}, an expression.
|
|
@var{args} should be a list of expressions to pass to the prompt's
|
|
handler, and @var{tail} should be an expression that will evaluate to
|
|
a list of additional arguments. An abort will save the partial
|
|
continuation, which may later be reinstated, resulting in the
|
|
@code{<abort>} expression evaluating to some number of values.
|
|
@end deftp
|
|
|
|
There are two Tree-IL constructs that are not normally produced by
|
|
higher-level compilers, but instead are generated during the
|
|
source-to-source optimization and analysis passes that the Tree-IL
|
|
compiler does. Users should not generate these expressions directly,
|
|
unless they feel very clever, as the default analysis pass will generate
|
|
them as necessary.
|
|
|
|
@deftp {Scheme Variable} <let-values> src names gensyms exp body
|
|
@deftpx {External Representation} (let-values @var{names} @var{gensyms} @var{exp} @var{body})
|
|
Like Scheme's @code{receive} -- binds the values returned by
|
|
evaluating @code{exp} to the @code{lambda}-like bindings described by
|
|
@var{gensyms}. That is to say, @var{gensyms} may be an improper list.
|
|
|
|
@code{<let-values>} is an optimization of a @code{<call>} to the
|
|
primitive, @code{call-with-values}.
|
|
@end deftp
|
|
|
|
@deftp {Scheme Variable} <fix> src names gensyms vals body
|
|
@deftpx {External Representation} (fix @var{names} @var{gensyms} @var{vals} @var{body})
|
|
Like @code{<letrec>}, but only for @var{vals} that are unset
|
|
@code{lambda} expressions.
|
|
|
|
@code{fix} is an optimization of @code{letrec} (and @code{let}).
|
|
@end deftp
|
|
|
|
Tree-IL is a convenient compilation target from source languages. It
|
|
can be convenient as a medium for optimization, though CPS is usually
|
|
better. The strength of Tree-IL is that it does not fix order of
|
|
evaluation, so it makes some code motion a bit easier.
|
|
|
|
Optimization passes performed on Tree-IL currently include:
|
|
|
|
@itemize
|
|
@item Open-coding (turning toplevel-refs into primitive-refs,
|
|
and calls to primitives to primcalls)
|
|
@item Partial evaluation (comprising inlining, copy propagation, and
|
|
constant folding)
|
|
@end itemize
|
|
|
|
@node Continuation-Passing Style
|
|
@subsection Continuation-Passing Style
|
|
|
|
@cindex CPS
|
|
Continuation-passing style (CPS) is Guile's principal intermediate
|
|
language, bridging the gap between languages for people and languages
|
|
for machines. CPS gives a name to every part of a program: every
|
|
control point, and every intermediate value. This makes it an excellent
|
|
medium for reasoning about programs, which is the principal job of a
|
|
compiler.
|
|
|
|
@menu
|
|
* An Introduction to CPS::
|
|
* CPS in Guile::
|
|
* Building CPS::
|
|
* CPS Soup::
|
|
* Compiling CPS::
|
|
@end menu
|
|
|
|
@node An Introduction to CPS
|
|
@subsubsection An Introduction to CPS
|
|
|
|
Consider the following Scheme expression:
|
|
|
|
@lisp
|
|
(begin
|
|
(display "The sum of 32 and 10 is: ")
|
|
(display 42)
|
|
(newline))
|
|
@end lisp
|
|
|
|
Let us identify all of the sub-expressions in this expression,
|
|
annotating them with unique labels:
|
|
|
|
@lisp
|
|
(begin
|
|
(display "The sum of 32 and 10 is: ")
|
|
|k1 k2
|
|
k0
|
|
(display 42)
|
|
|k4 k5
|
|
k3
|
|
(newline))
|
|
|k7
|
|
k6
|
|
@end lisp
|
|
|
|
Each of these labels identifies a point in a program. One label may be
|
|
the continuation of another label. For example, the continuation of
|
|
@code{k7} is @code{k6}. This is because after evaluating the value of
|
|
@code{newline}, performed by the expression labelled @code{k7}, we
|
|
continue to apply it in @code{k6}.
|
|
|
|
Which expression has @code{k0} as its continuation? It is either the
|
|
expression labelled @code{k1} or the expression labelled @code{k2}.
|
|
Scheme does not have a fixed order of evaluation of arguments, though it
|
|
does guarantee that they are evaluated in some order. Unlike general
|
|
Scheme, continuation-passing style makes evaluation order explicit. In
|
|
Guile, this choice is made by the higher-level language compilers.
|
|
|
|
Let us assume a left-to-right evaluation order. In that case the
|
|
continuation of @code{k1} is @code{k2}, and the continuation of
|
|
@code{k2} is @code{k0}.
|
|
|
|
With this example established, we are ready to give an example of CPS in
|
|
Scheme:
|
|
|
|
@smalllisp
|
|
(lambda (ktail)
|
|
(let ((k1 (lambda ()
|
|
(let ((k2 (lambda (proc)
|
|
(let ((k0 (lambda (arg0)
|
|
(proc k4 arg0))))
|
|
(k0 "The sum of 32 and 10 is: ")))))
|
|
(k2 display))))
|
|
(k4 (lambda _
|
|
(let ((k5 (lambda (proc)
|
|
(let ((k3 (lambda (arg0)
|
|
(proc k7 arg0))))
|
|
(k3 42)))))
|
|
(k5 display))))
|
|
(k7 (lambda _
|
|
(let ((k6 (lambda (proc)
|
|
(proc ktail))))
|
|
(k6 newline)))))
|
|
(k1))
|
|
@end smalllisp
|
|
|
|
Holy code explosion, Batman! What's with all the lambdas? Indeed, CPS
|
|
is by nature much more verbose than ``direct-style'' intermediate
|
|
languages like Tree-IL. At the same time, CPS is simpler than full
|
|
Scheme, because it makes things more explicit.
|
|
|
|
In the original program, the expression labelled @code{k0} is in effect
|
|
context. Any values it returns are ignored. In Scheme, this fact is
|
|
implicit. In CPS, we can see it explicitly by noting that its
|
|
continuation, @code{k4}, takes any number of values and ignores them.
|
|
Compare this to @code{k2}, which takes a single value; in this way we
|
|
can say that @code{k1} is in a ``value'' context. Likewise @code{k6} is
|
|
in tail context with respect to the expression as a whole, because its
|
|
continuation is the tail continuation, @code{ktail}. CPS makes these
|
|
details manifest, and gives them names.
|
|
|
|
@node CPS in Guile
|
|
@subsubsection CPS in Guile
|
|
|
|
@cindex continuation, CPS
|
|
Guile's CPS language is composed of @dfn{continuations}. A continuation
|
|
is a labelled program point. If you are used to traditional compilers,
|
|
think of a continuation as a trivial basic block. A program is a
|
|
``soup'' of continuations, represented as a map from labels to
|
|
continuations.
|
|
|
|
@cindex term, CPS
|
|
@cindex expression, CPS
|
|
Like basic blocks, each continuation belongs to only one function. Some
|
|
continuations are special, like the continuation corresponding to a
|
|
function's entry point, or the continuation that represents the tail of
|
|
a function. Others contain a @dfn{term}. A term contains an
|
|
@dfn{expression}, which evaluates to zero or more values. The term also
|
|
records the continuation to which it will pass its values. Some terms,
|
|
like conditional branches, may continue to one of a number of
|
|
continuations.
|
|
|
|
Continuation labels are small integers. This makes it easy to sort them
|
|
and to group them into sets. Whenever a term refers to a continuation,
|
|
it does so by name, simply recording the label of the continuation.
|
|
Continuation labels are unique among the set of labels in a program.
|
|
|
|
Variables are also named by small integers. Variable names are unique
|
|
among the set of variables in a program.
|
|
|
|
For example, a simple continuation that receives two values and adds
|
|
them together can be matched like this, using the @code{match} form from
|
|
@code{(ice-9 match)}:
|
|
|
|
@smallexample
|
|
(match cont
|
|
(($ $kargs (x-name y-name) (x-var y-var)
|
|
($ $continue k src ($ $primcall '+ #f (x-var y-var))))
|
|
(format #t "Add ~a and ~a and pass the result to label ~a"
|
|
x-var y-var k)))
|
|
@end smallexample
|
|
|
|
Here we see the most common kind of continuation, @code{$kargs}, which
|
|
binds some number of values to variables and then evaluates a term.
|
|
|
|
@deftp {CPS Continuation} $kargs names vars term
|
|
Bind the incoming values to the variables @var{vars}, with original
|
|
names @var{names}, and then evaluate @var{term}.
|
|
@end deftp
|
|
|
|
The @var{names} of a @code{$kargs} are just for debugging, and will end
|
|
up residualized in the object file for use by the debugger.
|
|
|
|
The @var{term} in a @code{$kargs} is always a @code{$continue}, which
|
|
evaluates an expression and continues to a continuation.
|
|
|
|
@deftp {CPS Term} $continue k src exp
|
|
Evaluate the expression @var{exp} and pass the resulting values (if any)
|
|
to the continuation labelled @var{k}. The source information associated
|
|
with the expression may be found in @var{src}, which is either an alist
|
|
as in @code{source-properties} or is @code{#f} if there is no associated
|
|
source.
|
|
@end deftp
|
|
|
|
There are a number of expression kinds. Above you see an example of
|
|
@code{$primcall}.
|
|
|
|
@deftp {CPS Expression} $primcall name param args
|
|
Perform the primitive operation identified by @code{name}, a well-known
|
|
symbol, passing it the arguments @var{args}, and pass all resulting
|
|
values to the continuation.
|
|
|
|
@var{param} is a constant parameter whose interpretation is up to the
|
|
primcall in question. Usually it's @code{#f} but for a primcall that
|
|
might need some compile-time constant information -- such as
|
|
@code{add/immediate}, which adds a constant number to a value -- the
|
|
parameter holds this information.
|
|
|
|
The set of available primitives includes many primitives known to
|
|
Tree-IL and then some more; see the source code for details. Note that
|
|
some Tree-IL primcalls need to be converted to a sequence of lower-level
|
|
CPS primcalls. Again, see @code{(language tree-il compile-cps)} for
|
|
full details.
|
|
@end deftp
|
|
|
|
@cindex dominate, CPS
|
|
The variables that are used by @code{$primcall}, or indeed by any
|
|
expression, must be defined before the expression is evaluated. An
|
|
equivalent way of saying this is that predecessor @code{$kargs}
|
|
continuation(s) that bind the variables(s) used by the expression must
|
|
@dfn{dominate} the continuation that uses the expression: definitions
|
|
dominate uses. This condition is trivially satisfied in our example
|
|
above, but in general to determine the set of variables that are in
|
|
``scope'' for a given term, you need to do a flow analysis to see what
|
|
continuations dominate a term. The variables that are in scope are
|
|
those variables defined by the continuations that dominate a term.
|
|
|
|
Here is an inventory of the kinds of expressions in Guile's CPS
|
|
language, besides @code{$primcall} which has already been described.
|
|
Recall that all expressions are wrapped in a @code{$continue} term which
|
|
specifies their continuation.
|
|
|
|
@deftp {CPS Expression} $const val
|
|
Continue with the constant value @var{val}.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $prim name
|
|
Continue with the procedure that implements the primitive operation
|
|
named by @var{name}.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $call proc args
|
|
Call @var{proc} with the arguments @var{args}, and pass all values to
|
|
the continuation. @var{proc} and the elements of the @var{args} list
|
|
should all be variable names. The continuation identified by the term's
|
|
@var{k} should be a @code{$kreceive} or a @code{$ktail} instance.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $values args
|
|
Pass the values named by the list @var{args} to the continuation.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $prompt escape? tag handler
|
|
@end deftp
|
|
|
|
@cindex higher-order CPS
|
|
@cindex CPS, higher-order
|
|
@cindex first-order CPS
|
|
@cindex CPS, first-order
|
|
There are two sub-languages of CPS, @dfn{higher-order CPS} and
|
|
@dfn{first-order CPS}. The difference is that in higher-order CPS,
|
|
there are @code{$fun} and @code{$rec} expressions that bind functions or
|
|
mutually-recursive functions in the implicit scope of their use sites.
|
|
Guile transforms higher-order CPS into first-order CPS by @dfn{closure
|
|
conversion}, which chooses representations for all closures and which
|
|
arranges to access free variables through the implicit closure parameter
|
|
that is passed to every function call.
|
|
|
|
@deftp {CPS Expression} $fun body
|
|
Continue with a procedure. @var{body} names the entry point of the
|
|
function, which should be a @code{$kfun}. This expression kind is only
|
|
valid in higher-order CPS, which is the CPS language before closure
|
|
conversion.
|
|
@end deftp
|
|
|
|
@deftp {CPS Expression} $rec names vars funs
|
|
Continue with a set of mutually recursive procedures denoted by
|
|
@var{names}, @var{vars}, and @var{funs}. @var{names} is a list of
|
|
symbols, @var{vars} is a list of variable names (unique integers), and
|
|
@var{funs} is a list of @code{$fun} values. Note that the @code{$kargs}
|
|
continuation should also define @var{names}/@var{vars} bindings.
|
|
@end deftp
|
|
|
|
The contification pass will attempt to transform the functions declared
|
|
in a @code{$rec} into local continuations. Any remaining @code{$fun}
|
|
instances are later removed by the closure conversion pass. If the
|
|
function has no free variables, it gets allocated as a constant.
|
|
|
|
@deftp {CPS Expression} $const-fun label
|
|
A constant which is a function whose entry point is @var{label}. As a
|
|
constant, instances of @code{$const-fun} with the same @var{label} will
|
|
not allocate; the space for the function is allocated as part of the
|
|
compilation unit.
|
|
|
|
In practice, @code{$const-fun} expressions are reified by CPS-conversion
|
|
for functions whose call sites are not all visible within the
|
|
compilation unit and which have no free variables. This expression kind
|
|
is part of first-order CPS.
|
|
@end deftp
|
|
|
|
Otherwise, if the closure has free variables, it will be allocated at
|
|
its definition site via an @code{allocate-words} primcall and its free
|
|
variables initialized there. The code pointer in the closure is
|
|
initialized from a @code{$code} expression.
|
|
|
|
@deftp {CPS Expression} $code label
|
|
Continue with the value of @var{label}, which should denote some
|
|
@code{$kfun} continuation in the program. Used when initializing the
|
|
code pointer of closure objects.
|
|
@end deftp
|
|
|
|
However, If the closure can be proven to never escape its scope then
|
|
other lighter-weight representations can be chosen. Additionally, if
|
|
all call sites are known, closure conversion will hard-wire the calls by
|
|
lowering @code{$call} to @code{$callk}.
|
|
|
|
@deftp {CPS Expression} $callk label proc args
|
|
Like @code{$call}, but for the case where the call target is known to be
|
|
in the same compilation unit. @var{label} should denote some
|
|
@code{$kfun} continuation in the program. In this case the @var{proc}
|
|
is simply an additional argument, since it is not used to determine the
|
|
call target at run-time.
|
|
@end deftp
|
|
|
|
To summarize: a @code{$continue} is a CPS term that continues to a
|
|
single label. But there are other kinds of CPS terms that can continue
|
|
to a different number of labels: @code{$branch}, @code{$switch},
|
|
@code{$throw}, and @code{$prompt}.
|
|
|
|
@deftp {CPS Term} $branch kf kt src op param args
|
|
Evaluate the branching primcall @var{op}, with arguments @var{args} and
|
|
constant parameter @var{param}, and continue to @var{kt} with zero
|
|
values if the test is true. Otherwise continue to @var{kf}.
|
|
|
|
The @code{$branch} term is like a @code{$continue} term with a
|
|
@code{$primcall} expression, except that instead of binding a value and
|
|
continuing to a single label, the result of the test is not bound but
|
|
instead used to choose the continuation label.
|
|
|
|
The set of operations (corresponding to @var{op} values) that are valid
|
|
in a @var{$branch} is limited. In the general case, bind the result of
|
|
a test expression to a variable, and then make a @code{$branch} on a
|
|
@code{true?} op referencing that variable. The optimizer should inline
|
|
the branch if possible.
|
|
@end deftp
|
|
|
|
@deftp {CPS Term} $switch kf kt* src arg
|
|
Continue to a label in the list @var{k*} according to the index argument
|
|
@var{arg}, or to the default continuation @var{kf} if @var{arg} is
|
|
greater than or equal to the length @var{k*}. The index variable
|
|
@var{arg} is an unboxed, unsigned 64-bit value.
|
|
|
|
The @code{$switch} term is like C's @code{switch} statement. The
|
|
compiler to CPS can generate a @code{$switch} term directly, if the
|
|
source language has such a concept, or it can rely on the CPS optimizer
|
|
to turn appropriate chains of @code{$branch} statements to
|
|
@code{$switch} instances, which is what the Scheme compiler does.
|
|
@end deftp
|
|
|
|
@deftp {CPS Term} $throw src op param args
|
|
Throw a non-resumable exception. Throw terms do not continue at all.
|
|
The usual value of @var{op} is @code{throw}, with two arguments
|
|
@var{key} and @var{args}. There are also some specific primcalls that
|
|
compile to the VM @code{throw/value} and @code{throw/value+data}
|
|
instructions; see the code for full details.
|
|
|
|
The advantage of having @code{$throw} as a term is that, because it does
|
|
not continue, this allows the optimizer to gather more information from
|
|
type predicates. For example, if the predicate is @code{char?} and the
|
|
@var{kf} continues to a throw, the set of labels dominated by @var{kt}
|
|
is larger than if the throw notationally continued to some label that
|
|
would never be reached by the throw.
|
|
@end deftp
|
|
|
|
@deftp {CPS Term} $prompt k kh src escape? tag
|
|
Push a prompt on the stack identified by the variable name @var{tag},
|
|
which may be escape-only if @var{escape?} is true, and continue to
|
|
@var{kh} with zero values. If the body aborts to this prompt, control
|
|
will proceed at the continuation labelled @var{kh}, which should be a
|
|
@code{$kreceive} continuation. Prompts are later popped by
|
|
@code{pop-prompt} primcalls.
|
|
@end deftp
|
|
|
|
At this point we have described terms, expressions, and the most common
|
|
kind of continuation, @code{$kargs}. @code{$kargs} is used when the
|
|
predecessors of the continuation can be instructed to pass the values
|
|
where the continuation wants them. For example, if a @code{$kargs}
|
|
continuation @var{k} binds a variable @var{v}, and the compiler decides
|
|
to allocate @var{v} to slot 6, all predecessors of @var{k} should put
|
|
the value for @var{v} in slot 6 before jumping to @var{k}. One
|
|
situation in which this isn't possible is receiving values from function
|
|
calls. Guile has a calling convention for functions which currently
|
|
places return values on the stack. A continuation of a call must check
|
|
that the number of values returned from a function matches the expected
|
|
number of values, and then must shuffle or collect those values to named
|
|
variables. @code{$kreceive} denotes this kind of continuation.
|
|
|
|
@deftp {CPS Continuation} $kreceive arity k
|
|
Receive values on the stack. Parse them according to @var{arity}, and
|
|
then proceed with the parsed values to the @code{$kargs} continuation
|
|
labelled @var{k}. As a limitation specific to @code{$kreceive},
|
|
@var{arity} may only contain required and rest arguments.
|
|
@end deftp
|
|
|
|
@code{$arity} is a helper data structure used by @code{$kreceive} and
|
|
also by @code{$kclause}, described below.
|
|
|
|
@deftp {CPS Data} $arity req opt rest kw allow-other-keys?
|
|
A data type declaring an arity. @var{req} and @var{opt} are lists of
|
|
source names of required and optional arguments, respectively.
|
|
@var{rest} is either the source name of the rest variable, or @code{#f}
|
|
if this arity does not accept additional values. @var{kw} is a list of
|
|
the form @code{((@var{keyword} @var{name} @var{var}) ...)}, describing
|
|
the keyword arguments. @var{allow-other-keys?} is true if other keyword
|
|
arguments are allowed and false otherwise.
|
|
|
|
Note that all of these names with the exception of the @var{var}s in the
|
|
@var{kw} list are source names, not unique variable names.
|
|
@end deftp
|
|
|
|
Additionally, there are three specific kinds of continuations that are
|
|
only used in function entries.
|
|
|
|
@deftp {CPS Continuation} $kfun src meta self tail clause
|
|
Declare a function entry. @var{src} is the source information for the
|
|
procedure declaration, and @var{meta} is the metadata alist as described
|
|
above in Tree-IL's @code{<lambda>}. @var{self} is a variable bound to
|
|
the procedure being called, and which may be used for self-references.
|
|
@var{tail} is the label of the @code{$ktail} for this function,
|
|
corresponding to the function's tail continuation. @var{clause} is the
|
|
label of the first @code{$kclause} for the first @code{case-lambda}
|
|
clause in the function, or otherwise @code{#f}.
|
|
@end deftp
|
|
|
|
@deftp {CPS Continuation} $ktail
|
|
A tail continuation.
|
|
@end deftp
|
|
|
|
@deftp {CPS Continuation} $kclause arity cont alternate
|
|
A clause of a function with a given arity. Applications of a function
|
|
with a compatible set of actual arguments will continue to the
|
|
continuation labelled @var{cont}, a @code{$kargs} instance representing
|
|
the clause body. If the arguments are incompatible, control proceeds to
|
|
@var{alternate}, which is a @code{$kclause} for the next clause, or
|
|
@code{#f} if there is no next clause.
|
|
@end deftp
|
|
|
|
@node Building CPS
|
|
@subsubsection Building CPS
|
|
|
|
Unlike Tree-IL, the CPS language is built to be constructed and
|
|
deconstructed with abstract macros instead of via procedural
|
|
constructors or accessors, or instead of S-expression matching.
|
|
|
|
Deconstruction and matching is handled adequately by the @code{match}
|
|
form from @code{(ice-9 match)}. @xref{Pattern Matching}. Construction
|
|
is handled by a set of mutually builder macros:
|
|
@code{build-term}, @code{build-cont}, and @code{build-exp}.
|
|
|
|
In the following interface definitions, consider @code{term} and
|
|
@code{exp} to be built by @code{build-term} or @code{build-exp},
|
|
respectively. Consider any other name to be evaluated as a Scheme
|
|
expression. Many of these forms recognize @code{unquote} in some
|
|
contexts, to splice in a previously-built value; see the specifications
|
|
below for full details.
|
|
|
|
@deffn {Scheme Syntax} build-term ,val
|
|
@deffnx {Scheme Syntax} build-term ($continue k src exp)
|
|
@deffnx {Scheme Syntax} build-exp ,val
|
|
@deffnx {Scheme Syntax} build-exp ($const val)
|
|
@deffnx {Scheme Syntax} build-exp ($prim name)
|
|
@deffnx {Scheme Syntax} build-exp ($fun kentry)
|
|
@deffnx {Scheme Syntax} build-exp ($const-fun kentry)
|
|
@deffnx {Scheme Syntax} build-exp ($code kentry)
|
|
@deffnx {Scheme Syntax} build-exp ($rec names syms funs)
|
|
@deffnx {Scheme Syntax} build-exp ($call proc (arg ...))
|
|
@deffnx {Scheme Syntax} build-exp ($call proc args)
|
|
@deffnx {Scheme Syntax} build-exp ($callk k proc (arg ...))
|
|
@deffnx {Scheme Syntax} build-exp ($callk k proc args)
|
|
@deffnx {Scheme Syntax} build-exp ($primcall name param (arg ...))
|
|
@deffnx {Scheme Syntax} build-exp ($primcall name param args)
|
|
@deffnx {Scheme Syntax} build-exp ($values (arg ...))
|
|
@deffnx {Scheme Syntax} build-exp ($values args)
|
|
@deffnx {Scheme Syntax} build-exp ($prompt escape? tag handler)
|
|
@deffnx {Scheme Syntax} build-term ($branch kf kt src op param (arg ...))
|
|
@deffnx {Scheme Syntax} build-term ($branch kf kt src op param args)
|
|
@deffnx {Scheme Syntax} build-term ($switch kf kt* src arg)
|
|
@deffnx {Scheme Syntax} build-term ($throw src op param (arg ...))
|
|
@deffnx {Scheme Syntax} build-term ($throw src op param args)
|
|
@deffnx {Scheme Syntax} build-term ($prompt k kh src escape? tag)
|
|
@deffnx {Scheme Syntax} build-cont ,val
|
|
@deffnx {Scheme Syntax} build-cont ($kargs (name ...) (sym ...) term)
|
|
@deffnx {Scheme Syntax} build-cont ($kargs names syms term)
|
|
@deffnx {Scheme Syntax} build-cont ($kreceive req rest kargs)
|
|
@deffnx {Scheme Syntax} build-cont ($kfun src meta self ktail kclause)
|
|
@deffnx {Scheme Syntax} build-cont ($kclause ,arity kbody kalt)
|
|
@deffnx {Scheme Syntax} build-cont ($kclause (req opt rest kw aok?) kbody)
|
|
Construct a CPS term, expression, or continuation.
|
|
@end deffn
|
|
|
|
There are a few more miscellaneous interfaces as well.
|
|
|
|
@deffn {Scheme Procedure} make-arity req opt rest kw allow-other-keywords?
|
|
A procedural constructor for @code{$arity} objects.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Syntax} rewrite-term val (pat term) ...
|
|
@deffnx {Scheme Syntax} rewrite-exp val (pat exp) ...
|
|
@deffnx {Scheme Syntax} rewrite-cont val (pat cont) ...
|
|
Match @var{val} against the series of patterns @var{pat...}, using
|
|
@code{match}. The body of the matching clause should be a template in
|
|
the syntax of @code{build-term}, @code{build-exp}, or @code{build-cont},
|
|
respectively.
|
|
@end deffn
|
|
|
|
@node CPS Soup
|
|
@subsubsection CPS Soup
|
|
|
|
We describe programs in Guile's CPS language as being a kind of ``soup''
|
|
because all continuations in the program are mixed into the same
|
|
``pot'', so to speak, without explicit markers as to what function or
|
|
scope a continuation is in. A program in CPS is a map from continuation
|
|
labels to continuation values. As discussed in the introduction, a
|
|
continuation label is an integer. No label may be negative.
|
|
|
|
As a matter of convention, label 0 should map to the @code{$kfun}
|
|
continuation of the entry to the program, which should be a function of
|
|
no arguments. The body of a function consists of the labelled
|
|
continuations that are reachable from the function entry. A program can
|
|
refer to other functions, either via @code{$fun} and @code{$rec} in
|
|
higher-order CPS, or via @code{$const-fun}, @code{$callk}, and allocated
|
|
closures in first-order CPS. The program logically contains all
|
|
continuations of all functions reachable from the entry function. A
|
|
compiler pass may leave unreachable continuations in a program;
|
|
subsequent compiler passes should ensure that their transformations and
|
|
analyses only take reachable continuations into account. It's OK though
|
|
if transformation runs over all continuations if including the
|
|
unreachable continuations has no effect on the transformations on the
|
|
live continuations.
|
|
|
|
@cindex intmap
|
|
The ``soup'' itself is implemented as an @dfn{intmap}, a functional
|
|
array-mapped trie specialized for integer keys. Intmaps associate
|
|
integers with values of any kind. Currently intmaps are a private data
|
|
structure only used by the CPS phase of the compiler. To work with
|
|
intmaps, load the @code{(language cps intmap)} module:
|
|
|
|
@example
|
|
(use-modules (language cps intmap))
|
|
@end example
|
|
|
|
Intmaps are functional data structures, so there is no constructor as
|
|
such: one can simply start with the empty intmap and add entries to it.
|
|
|
|
@example
|
|
(intmap? empty-intmap) @result{} #t
|
|
(define x (intmap-add empty-intmap 42 "hi"))
|
|
(intmap? x) @result{} #t
|
|
(intmap-ref x 42) @result{} "hi"
|
|
(intmap-ref x 43) @result{} @i{error: 43 not present}
|
|
(intmap-ref x 43 (lambda (k) "yo!")) @result{} "yo"
|
|
(intmap-add x 42 "hej") @result{} @i{error: 42 already present}
|
|
@end example
|
|
|
|
@code{intmap-ref} and @code{intmap-add} are the core of the intmap
|
|
interface. There is also @code{intmap-replace}, which replaces the
|
|
value associated with a given key, requiring that the key was present
|
|
already, and @code{intmap-remove}, which removes a key from an intmap.
|
|
|
|
Intmaps have a tree-like structure that is well-suited to set operations
|
|
such as union and intersection, so there are also the binary
|
|
@code{intmap-union} and @code{intmap-intersect} procedures. If the
|
|
result is equivalent to either argument, that argument is returned
|
|
as-is; in that way, one can detect whether the set operation produced a
|
|
new result simply by checking with @code{eq?}. This makes intmaps
|
|
useful when computing fixed points.
|
|
|
|
If a key is present in both intmaps and the associated values are not
|
|
the same in the sense of @code{eq?}, the resulting value is determined
|
|
by a ``meet'' procedure, which is the optional last argument to
|
|
@code{intmap-union}, @code{intmap-intersect}, and also to
|
|
@code{intmap-add}, @code{intmap-replace}, and similar functions. The
|
|
meet procedure will be called with the two values and should return the
|
|
intersected or unioned value in some domain-specific way. If no meet
|
|
procedure is given, the default meet procedure will raise an error.
|
|
|
|
To traverse over the set of values in an intmap, there are the
|
|
@code{intmap-next} and @code{intmap-prev} procedures. For example, if
|
|
intmap @var{x} has one entry mapping 42 to some value, we would have:
|
|
|
|
@example
|
|
(intmap-next x) @result{} 42
|
|
(intmap-next x 0) @result{} 42
|
|
(intmap-next x 42) @result{} 42
|
|
(intmap-next x 43) @result{} #f
|
|
(intmap-prev x) @result{} 42
|
|
(intmap-prev x 42) @result{} 42
|
|
(intmap-prev x 41) @result{} #f
|
|
@end example
|
|
|
|
There is also the @code{intmap-fold} procedure, which folds over keys
|
|
and values in the intmap from lowest to highest value, and
|
|
@code{intmap-fold-right} which does so in the opposite direction. These
|
|
procedures may take up to 3 seed values. The number of values that the
|
|
fold procedure returns is the number of seed values.
|
|
|
|
@example
|
|
(define q (intmap-add (intmap-add empty-intmap 1 2) 3 4))
|
|
(intmap-fold acons q '()) @result{} ((3 . 4) (1 . 2))
|
|
(intmap-fold-right acons q '()) @result{} ((1 . 2) (3 . 4))
|
|
@end example
|
|
|
|
When an entry in an intmap is updated (removed, added, or changed), a
|
|
new intmap is created that shares structure with the original intmap.
|
|
This operation ensures that the result of existing computations is not
|
|
affected by future computations: no mutation is ever visible to user
|
|
code. This is a great property in a compiler data structure, as it lets
|
|
us hold a copy of a program before a transformation and use it while we
|
|
build a post-transformation program. Updating an intmap is O(log
|
|
@var{n}) in the size of the intmap.
|
|
|
|
However, the O(log @var{n}) allocation costs are sometimes too much,
|
|
especially in cases when we know that we can just update the intmap in
|
|
place. As an example, say we have an intmap mapping the integers 1 to
|
|
100 to the integers 42 to 141. Let's say that we want to transform this
|
|
map by adding 1 to each value. There is already an efficient
|
|
@code{intmap-map} procedure in the @code{(language cps utils)} module,
|
|
but if we didn't know about that we might do:
|
|
|
|
@example
|
|
(define (intmap-increment map)
|
|
(let lp ((k 0) (map map))
|
|
(let ((k (intmap-next map k)))
|
|
(if k
|
|
(let ((v (intmap-ref map k)))
|
|
(lp (1+ k) (intmap-replace map k (1+ v))))
|
|
map))))
|
|
@end example
|
|
|
|
@cindex intmap, transient
|
|
@cindex transient intmaps
|
|
Observe that the intermediate values created by @code{intmap-replace}
|
|
are completely invisible to the program -- only the last result of
|
|
@code{intmap-replace} value is needed. The rest might as well share
|
|
state with the last one, and we could update in place. Guile allows
|
|
this kind of interface via @dfn{transient intmaps}, inspired by
|
|
Clojure's transient interface (@uref{http://clojure.org/transients}).
|
|
|
|
The in-place @code{intmap-add!} and @code{intmap-replace!} procedures
|
|
return transient intmaps. If one of these in-place procedures is called
|
|
on a normal persistent intmap, a new transient intmap is created. This
|
|
is an O(1) operation. In all other respects the interface is like their
|
|
persistent counterparts, @code{intmap-add} and @code{intmap-replace}.
|
|
If an in-place procedure is called on a transient intmap, the intmap is
|
|
mutated in-place and the same value is returned.
|
|
|
|
If a persistent operation like @code{intmap-add} is called on a
|
|
transient intmap, the transient's mutable substructure is then marked as
|
|
persistent, and @code{intmap-add} then runs on a new persistent intmap
|
|
sharing structure but not state with the original transient. Mutating a
|
|
transient will cause enough copying to ensure that it can make its
|
|
change, but if part of its substructure is already ``owned'' by it, no
|
|
more copying is needed.
|
|
|
|
We can use transients to make @code{intmap-increment} more efficient.
|
|
The two changed elements have been marked @strong{like this}.
|
|
|
|
@example
|
|
(define (intmap-increment map)
|
|
(let lp ((k 0) (map map))
|
|
(let ((k (intmap-next map k)))
|
|
(if k
|
|
(let ((v (intmap-ref map k)))
|
|
(lp (1+ k) (@strong{intmap-replace!} map k (1+ v))))
|
|
(@strong{persistent-intmap} map)))))
|
|
@end example
|
|
|
|
Be sure to tag the result as persistent using the
|
|
@code{persistent-intmap} procedure to prevent the mutability from
|
|
leaking to other parts of the program. For added paranoia, you could
|
|
call @code{persistent-intmap} on the incoming map, to ensure that if it
|
|
were already transient, that the mutations in the body of
|
|
@code{intmap-increment} wouldn't affect the incoming value.
|
|
|
|
In summary, programs in CPS are intmaps whose values are continuations.
|
|
See the source code of @code{(language cps utils)} for a number of
|
|
useful facilities for working with CPS values.
|
|
|
|
@node Compiling CPS
|
|
@subsubsection Compiling CPS
|
|
|
|
Compiling CPS in Guile has three phases: conversion, optimization, and
|
|
code generation.
|
|
|
|
CPS conversion is the process of taking a higher-level language and
|
|
compiling it to CPS. Source languages can do this directly, or they can
|
|
convert to Tree-IL (which is probably easier) and let Tree-IL convert to
|
|
CPS later. Going through Tree-IL has the advantage of running Tree-IL
|
|
optimization passes, like partial evaluation. Also, the compiler from
|
|
Tree-IL to CPS handles assignment conversion, in which assigned local
|
|
variables (in Tree-IL, locals that are @code{<lexical-set>}) are
|
|
converted to being boxed values on the heap. @xref{Variables and the
|
|
VM}.
|
|
|
|
After CPS conversion, Guile runs some optimization passes over the CPS.
|
|
Most optimization in Guile is done on the CPS language. The one major
|
|
exception is partial evaluation, which for historic reasons is done on
|
|
Tree-IL.
|
|
|
|
The major optimization performed on CPS is contification, in which
|
|
functions that are always called with the same continuation are
|
|
incorporated directly into a function's body. This opens up space for
|
|
more optimizations, and turns procedure calls into @code{goto}. It can
|
|
also make loops out of recursive function nests. Guile also does dead
|
|
code elimination, common subexpression elimination, loop peeling and
|
|
invariant code motion, and range and type inference.
|
|
|
|
The rest of the optimization passes are really cleanups and
|
|
canonicalizations. CPS spans the gap between high-level languages and
|
|
low-level bytecodes, which allows much of the compilation process to be
|
|
expressed as source-to-source transformations. Such is the case for
|
|
closure conversion, in which references to variables that are free in a
|
|
function are converted to closure references, and in which functions are
|
|
converted to closures. There are a few more passes to ensure that the
|
|
only primcalls left in the term are those that have a corresponding
|
|
instruction in the virtual machine, and that their continuations expect
|
|
the right number of values.
|
|
|
|
Finally, the backend of the CPS compiler emits bytecode for each
|
|
function, one by one. To do so, it determines the set of live variables
|
|
at all points in the function. Using this liveness information, it
|
|
allocates stack slots to each variable, such that a variable can live in
|
|
one slot for the duration of its lifetime, without shuffling. (Of
|
|
course, variables with disjoint lifetimes can share a slot.) Finally
|
|
the backend emits code, typically just one VM instruction, for each
|
|
continuation in the function.
|
|
|
|
|
|
@node Bytecode
|
|
@subsection Bytecode
|
|
|
|
As mentioned before, Guile compiles all code to bytecode, and that
|
|
bytecode is contained in ELF images. @xref{Object File Format}, for
|
|
more on Guile's use of ELF.
|
|
|
|
To produce a bytecode image, Guile provides an assembler and a linker.
|
|
|
|
The assembler, defined in the @code{(system vm assembler)} module, has a
|
|
relatively straightforward imperative interface. It provides a
|
|
@code{make-assembler} function to instantiate an assembler and a set of
|
|
@code{emit-@var{inst}} procedures to emit instructions of each kind.
|
|
|
|
The @code{emit-@var{inst}} procedures are actually generated at
|
|
compile-time from a machine-readable description of the VM. With a few
|
|
exceptions for certain operand types, each operand of an emit procedure
|
|
corresponds to an operand of the corresponding instruction.
|
|
|
|
Consider @code{allocate-words}, from @pxref{Memory Access Instructions}.
|
|
It is documented as:
|
|
|
|
@deftypefn Instruction {} allocate-words s12:@var{dst} s12:@var{nwords}
|
|
@end deftypefn
|
|
|
|
Therefore the emit procedure has the form:
|
|
|
|
@deffn {Scheme Procedure} emit-allocate-words asm dst nwords
|
|
@end deffn
|
|
|
|
All emit procedure take the assembler as their first argument, and
|
|
return no useful values.
|
|
|
|
The argument types depend on the operand types. @xref{Instruction Set}.
|
|
Most are integers within a restricted range, though labels are generally
|
|
expressed as opaque symbols. Besides the emitters that correspond to
|
|
instructions, there are a few additional helpers defined in the
|
|
assembler module.
|
|
|
|
@deffn {Scheme Procedure} emit-label asm label
|
|
Define a label at the current program point.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Procedure} emit-source asm source
|
|
Associate @var{source} with the current program point.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Procedure} emit-cache-ref asm dst key
|
|
@deffnx {Scheme Procedure} emit-cache-set! asm key val
|
|
Macro-instructions to implement compilation-unit caches. A single cache
|
|
cell corresponding to @var{key} will be allocated for the compilation
|
|
unit.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Procedure} emit-load-constant asm dst constant
|
|
Load the Scheme datum @var{constant} into @var{dst}.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Procedure} emit-begin-program asm label properties
|
|
@deffnx {Scheme Procedure} emit-end-program asm
|
|
Delimit the bounds of a procedure, with the given @var{label} and the
|
|
metadata @var{properties}.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Procedure} emit-load-static-procedure asm dst label
|
|
Load a procedure with the given @var{label} into local @var{dst}. This
|
|
macro-instruction should only be used with procedures without free
|
|
variables -- procedures that are not closures.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Procedure} emit-begin-standard-arity asm req nlocals alternate
|
|
@deffnx {Scheme Procedure} emit-begin-opt-arity asm req opt rest nlocals alternate
|
|
@deffnx {Scheme Procedure} emit-begin-kw-arity asm req opt rest kw-indices allow-other-keys? nlocals alternate
|
|
@deffnx {Scheme Procedure} emit-end-arity asm
|
|
Delimit a clause of a procedure.
|
|
@end deffn
|
|
|
|
The linker is a complicated beast. Hackers interested in how it works
|
|
would do well do read Ian Lance Taylor's series of articles on linkers.
|
|
Searching the internet should find them easily. From the user's
|
|
perspective, there is only one knob to control: whether the resulting
|
|
image will be written out to a file or not. If the user passes
|
|
@code{#:to-file? #t} as part of the compiler options (@pxref{The Scheme
|
|
Compiler}), the linker will align the resulting segments on page
|
|
boundaries, and otherwise not.
|
|
|
|
@deffn {Scheme Procedure} link-assembly asm #:page-aligned?=#t
|
|
Link an ELF image, and return the bytevector. If @var{page-aligned?} is
|
|
true, Guile will align the segments with different permissions on
|
|
page-sized boundaries, in order to maximize code sharing between
|
|
different processes. Otherwise, padding is minimized, to minimize
|
|
address space consumption.
|
|
@end deffn
|
|
|
|
To write an image to disk, just use @code{put-bytevector} from
|
|
@code{(ice-9 binary-ports)}.
|
|
|
|
Compiling object code to the fake language, @code{value}, is performed
|
|
via loading objcode into a program, then executing that thunk with
|
|
respect to the compilation environment. Normally the environment
|
|
propagates through the compiler transparently, but users may specify the
|
|
compilation environment manually as well, as a module. Procedures to
|
|
load images can be found in the @code{(system vm loader)} module:
|
|
|
|
@lisp
|
|
(use-modules (system vm loader))
|
|
@end lisp
|
|
|
|
@deffn {Scheme Variable} load-thunk-from-file file
|
|
@deffnx {C Function} scm_load_thunk_from_file (file)
|
|
Load object code from a file named @var{file}. The file will be mapped
|
|
into memory via @code{mmap}, so this is a very fast operation.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Variable} load-thunk-from-memory bv
|
|
@deffnx {C Function} scm_load_thunk_from_memory (bv)
|
|
Load object code from a bytevector. The data will be copied out of the
|
|
bytevector in order to ensure proper alignment of embedded Scheme
|
|
values.
|
|
@end deffn
|
|
|
|
Additionally there are procedures to find the ELF image for a given
|
|
pointer, or to list all mapped ELF images:
|
|
|
|
@deffn {Scheme Variable} find-mapped-elf-image ptr
|
|
Given the integer value @var{ptr}, find and return the ELF image that
|
|
contains that pointer, as a bytevector. If no image is found, return
|
|
@code{#f}. This routine is mostly used by debuggers and other
|
|
introspective tools.
|
|
@end deffn
|
|
|
|
@deffn {Scheme Variable} all-mapped-elf-images
|
|
Return all mapped ELF images, as a list of bytevectors.
|
|
@end deffn
|
|
|
|
|
|
@node Writing New High-Level Languages
|
|
@subsection Writing New High-Level Languages
|
|
|
|
In order to integrate a new language @var{lang} into Guile's compiler
|
|
system, one has to create the module @code{(language @var{lang} spec)}
|
|
containing the language definition and referencing the parser,
|
|
compiler and other routines processing it. The module hierarchy in
|
|
@code{(language brainfuck)} defines a very basic Brainfuck
|
|
implementation meant to serve as easy-to-understand example on how to
|
|
do this. See for instance @url{http://en.wikipedia.org/wiki/Brainfuck}
|
|
for more information about the Brainfuck language itself.
|
|
|
|
|
|
@node Extending the Compiler
|
|
@subsection Extending the Compiler
|
|
|
|
At this point we take a detour from the impersonal tone of the rest of
|
|
the manual. Admit it: if you've read this far into the compiler
|
|
internals manual, you are a junkie. Perhaps a course at your university
|
|
left you unsated, or perhaps you've always harbored a desire to hack the
|
|
holy of computer science holies: a compiler. Well you're in good
|
|
company, and in a good position. Guile's compiler needs your help.
|
|
|
|
There are many possible avenues for improving Guile's compiler.
|
|
Probably the most important improvement, speed-wise, will be some form
|
|
of optimized ahead-of-time native compilation with global register
|
|
allocation. A first pass could simply extend the compiler to also emit
|
|
machine code in addition to bytecode, pre-filling the corresponding JIT
|
|
data structures referenced by the @code{instrument-entry} bytecodes.
|
|
@xref{Instrumentation Instructions}.
|
|
|
|
The compiler also needs help at the top end, adding new high-level
|
|
compilers. We have JavaScript and Emacs Lisp mostly complete, but they
|
|
could use some love; Lua would be nice as well, but whatever language it
|
|
is that strikes your fancy would be welcome too.
|
|
|
|
Compilers are for hacking, not for admiring or for complaining about.
|
|
Get to it!
|