mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-05-02 04:40:29 +02:00
592 lines
18 KiB
Text
592 lines
18 KiB
Text
* Introduction
|
||
|
||
Version: $Id: langtools.text,v 1.5 2000-08-13 04:47:26 mdj Exp $
|
||
|
||
This is a proposal for how Guile could interface with language
|
||
translators. It will be posted on the Guile list and revised for some
|
||
short time (days rather than weeks) before being implemented.
|
||
|
||
The document can be found in the CVS repository as
|
||
guile-core/devel/translation/langtools.text. All Guile developers are
|
||
welcome to modify and extend it according to the ongoing discussion
|
||
using CVS.
|
||
|
||
Ideas and comments are welcome.
|
||
|
||
For clarity, the proposal is partially written as if describing an
|
||
already existing system.
|
||
|
||
MDJ 000812 <djurfeldt@nada.kth.se>
|
||
|
||
* Language names
|
||
|
||
A translator for Guile is a certain kind of Guile module, implemented
|
||
in Scheme, C, or a mixture of both.
|
||
|
||
To make things simple, the name of the language is closely related to
|
||
the name of the translator module.
|
||
|
||
Languages have long and short names. The long form is simply the name
|
||
of the translator module: `(lang ctax)', `(lang emacs-lisp)',
|
||
`(my-modules foo-lang)' etc.
|
||
|
||
Languages with the long name `(lang IDENTIFIER)' can be referred to
|
||
with the short name IDENTIFIER, for example `emacs-lisp'.
|
||
|
||
* How to tell Guile to read code in a different language (than Scheme)
|
||
|
||
There are four methods of specifying which translator to use when
|
||
reading a file:
|
||
|
||
** Command option
|
||
|
||
The options to the guile command are parsed linearly from left to
|
||
right. You can change the language at zero or more points using the
|
||
option
|
||
|
||
-t, --language LANGUAGE
|
||
|
||
Example:
|
||
|
||
guile -t emacs-lisp -l foo -l bar -t scheme -l baz
|
||
|
||
will use the emacs-lisp translator while reading "foo" and "bar", and
|
||
the default translator (scheme) for "baz".
|
||
|
||
You can use this technique in a script together with the meta switch:
|
||
|
||
#!/usr/local/bin/guile \
|
||
-t emacs-lisp -s
|
||
!#
|
||
|
||
** Commentary in file
|
||
|
||
When opening a file for reading, Guile will read the first few lines,
|
||
looking for the string "-*- LANGNAME -*-", where LANGNAME can be
|
||
either the long or short form of the name.
|
||
|
||
If found, the corresponding translator is loaded and used to read the
|
||
file.
|
||
|
||
** File extension
|
||
|
||
Guile maintains an alist mapping filename extensions to languages.
|
||
Each entry has the form:
|
||
|
||
(REGEXP . LANGNAME)
|
||
|
||
where REGEXP is a string and LANGNAME a symbol or a list of symbols.
|
||
|
||
The alist can be accessed using `language-alist' which is exported
|
||
by the module `(core config)':
|
||
|
||
(language-alist) --> current alist
|
||
(language-alist ALIST) sets the alist to ALIST
|
||
(language-alist ALIST :prepend) prepends ALIST onto the current list
|
||
(language-alist ALIST :append) appends ALIST after current list
|
||
|
||
The `load' command will match filenames against this alist and choose
|
||
the translator to use accordingly.
|
||
|
||
There will be a default alist for common translators. For translators
|
||
not listed, the alist has to be extended in .guile just as Emacs users
|
||
extend auto-mode-alist in .emacs.
|
||
|
||
** Module header
|
||
|
||
You specify the language used by a module with the :language option in
|
||
the module header. (See below under "Module configuration language".)
|
||
|
||
* Module system
|
||
|
||
This section describes how the Guile module system is adapted to use
|
||
with other languages.
|
||
|
||
** Module configuration language
|
||
|
||
*** The `(config)' module
|
||
|
||
Guile has a sophisticated module system. We don't require each
|
||
translator implementation to implement its own syntax for modules.
|
||
That would be too much work for the implementor, and users would have
|
||
to learn the module system anew for each syntax.
|
||
|
||
Instead, the module `(config)' exports the module header form
|
||
`(define-module ...)'.
|
||
|
||
The config module also exports a number of primitives by which you can
|
||
customize the Guile library, such as `language-alist' and `load-path'.
|
||
|
||
*** Default module environment
|
||
|
||
The bindings of the config module is available in the default
|
||
interaction environment when Guile starts up. This is because the
|
||
config module is on the module use list for the startup environment.
|
||
|
||
However, config bindings are *not* available by default in new
|
||
modules.
|
||
|
||
The default module environment provides bindings from the R5RS module
|
||
only.
|
||
|
||
*** Module headers
|
||
|
||
The module header of the current module system is the form
|
||
|
||
(define-module NAME OPTION1 ...)
|
||
|
||
You can specify a translator using the option
|
||
|
||
:language LANGNAME
|
||
|
||
where LANGNAME is the long or short form of language name as described
|
||
above.
|
||
|
||
The translator is being fed characters from the module file, starting
|
||
immediately after the end-parenthesis of the module header form.
|
||
|
||
NOTE: There can be only one module header per file.
|
||
|
||
It is also possible to put the module header in a separate file and
|
||
use the option
|
||
|
||
:file FILENAME
|
||
|
||
to point out a file containing the actual code.
|
||
|
||
Example:
|
||
|
||
foo.gm:
|
||
----------------------------------------------------------------------
|
||
(define-module (foo)
|
||
:language emacs-lisp
|
||
:file "foo.el"
|
||
:export (foo bar)
|
||
)
|
||
----------------------------------------------------------------------
|
||
|
||
foo.el:
|
||
----------------------------------------------------------------------
|
||
(defun foo ()
|
||
...)
|
||
|
||
(defun bar ()
|
||
...)
|
||
----------------------------------------------------------------------
|
||
|
||
** Repl commands
|
||
|
||
Up till now, Guile has been dependent upon the available bindings in
|
||
the selected module in order to do basic operations such as moving to
|
||
a different module, enter the debugger or getting documentation.
|
||
|
||
This is not acceptable since we want be able to control Guile
|
||
consistently regardless of in which module we are, and sinc we don't
|
||
want to equip a module with bindings which don't have anything to do
|
||
with the purpose of the module.
|
||
|
||
Therefore, the repl provides a special command language on top of
|
||
whatever syntax the current module provides. (Scheme48 and RScheme
|
||
provides similar repl command languages.)
|
||
|
||
[Jost Boekemeier has suggested the following alternative solution:
|
||
Commands are bindings just like any other binding. It is enough if
|
||
some modules carry command bindings (it's in fact enough if *one*
|
||
module has them), because from such a module you can use the command
|
||
(in MODULE) to walk into a module not carrying command bindings, and
|
||
then use CTRL-D to exit.
|
||
|
||
However, this has the disadvantage of mixing the "real" bindings with
|
||
command bindings (the module might want to use "in" for other
|
||
purposes), that CTRL-D could cause problems since for some channels
|
||
CTRL-D might close down the connection, and that using one type of
|
||
command ("in") to go "into" the module and another (CTRL-D) to "exit"
|
||
is more complex than simply "going to" a module.]
|
||
|
||
*** Repl command syntax
|
||
|
||
Normally, repl commands have the syntax
|
||
|
||
,COMMAND ARG1 ...
|
||
|
||
Input starting with arbitrary amount of whitespace + a comma thus
|
||
works as an escape syntax.
|
||
|
||
This syntax is probably compatible with all languages. (Note that we
|
||
don't need to activate the lexer of the language until we've checked
|
||
if the first non-whitespace char is a comma.)
|
||
|
||
(Hypothetically, if this would become a problem, we can provide means
|
||
of disabling this behaviour of the repl and let that particular
|
||
language module take sole control of reading at the repl prompt.)
|
||
|
||
Among the commands available are
|
||
|
||
*** ,in MODULE
|
||
|
||
Select module named MODULE, that is any new expressions typed by the
|
||
user after this command will be evaluated in the evaluation
|
||
environment provided by MODULE.
|
||
|
||
*** ,in MODULE EXPR
|
||
|
||
Evaluate expression EXPR in MODULE. EXPR has the syntax supplied by
|
||
the language used by MODULE.
|
||
|
||
*** ,use MODULE
|
||
|
||
Import all bindings exported by MODULE to the current module.
|
||
|
||
* Language modules
|
||
|
||
Since code written in any kind of language should be able to implement
|
||
most tasks, which may include reading, evaluating and writing, and
|
||
generally computing with, expressions and data originating from other
|
||
languages, we want the basic reading, evaluation and printing
|
||
operations to be independent of the language.
|
||
|
||
That is, instead of supplying separate `read', `eval' and `write'
|
||
procedures for different languages, a language module is required to
|
||
use the system procedures in the translated code.
|
||
|
||
This means that the behaviour of `read', `eval' and `write' are
|
||
context dependent. (See further "How Guile system procedures `read',
|
||
`eval', `write' use language modules" below.)
|
||
|
||
** Language data types
|
||
|
||
Each language module should try to use the fundamental Scheme data
|
||
types as far as this is possible.
|
||
|
||
Some data types have important differences in semantics between
|
||
languages, though, and all required data types may not exist in
|
||
Guile.
|
||
|
||
In such cases, the language module must supply its own, distinct, data
|
||
types. So, each language supported by Guile uses a certain set of
|
||
data types, with the basic Scheme data types as the intersection
|
||
between all sets.
|
||
|
||
Specifically, syntax trees representing source code expressions should
|
||
normally be a distinct data type.
|
||
|
||
** Foreign language escape syntax
|
||
|
||
Note that such data can flow freely between modules. In order to
|
||
accomodate data with different native syntaxes, each language module
|
||
provides a foreign language escape syntax. In Scheme, this syntax
|
||
uses the sharp comma extension specified by SRFI-10. The read
|
||
constructor is simply the last symbol in the long language name (which
|
||
is usually the same as the short language name).
|
||
|
||
** Example 1
|
||
|
||
Characters have the syntax in Scheme and in ctax. Lists currently
|
||
have syntax in Scheme but lack ctax syntax. Ctax doesn't have a
|
||
datatype "enum", but we pretend it has for this example.
|
||
|
||
The following table now shows the syntax used for reading and writing
|
||
these expressions in module A using the language scheme, and module B
|
||
using the language ctax (we assume that the foreign language escape
|
||
syntax in ctax is #LANGUAGE EXPR):
|
||
|
||
A B
|
||
|
||
chars #\X 'X'
|
||
|
||
lists (1 2 3) #scheme (1 2 3)
|
||
|
||
enums #,(ctax ENUM) ENUM
|
||
|
||
** Example 2
|
||
|
||
A user is typing expressions in a ctax module which imports the
|
||
bindings x and y from the module `(foo)':
|
||
|
||
ctax> x = read ();
|
||
1+2;
|
||
1+2;
|
||
ctax> x
|
||
1+2;
|
||
ctax> y = 1;
|
||
1
|
||
ctax> y;
|
||
1
|
||
ctax> ,in (guile-user)
|
||
guile> ,use (foo)
|
||
guile> x
|
||
#,(ctax 1+2;)
|
||
guile> y
|
||
1
|
||
guile>
|
||
|
||
The example shows that ctax uses a distinct representation for ctax
|
||
expressions, but Scheme integers for integers.
|
||
|
||
** Language module interface
|
||
|
||
A language module is an ordinary Guile module importing bindings from
|
||
other modules and exporting bindings through its public interface.
|
||
|
||
It is required to export the following variable and procedures:
|
||
|
||
*** language-environment --> ENVIRONMENT
|
||
|
||
Returns a fresh top-level ENVIRONMENT (a module) where expressions
|
||
in this language are evaluated by default.
|
||
|
||
Modules using this language will by default have this environment
|
||
on their use list.
|
||
|
||
The intention is for this procedure to provide the "run-time
|
||
environment" for the language.
|
||
|
||
*** native-read PORT --> OBJECT
|
||
|
||
Read next expression in the foreign syntax from PORT and return an
|
||
object OBJECT representing it.
|
||
|
||
It is entirely up to the language module to define what one
|
||
expression is, that is, how much to read.
|
||
|
||
In lisp-like languages, `native-read' corresponds to `read'. Note
|
||
that in such languages, OBJECT need not be source code, but could
|
||
be data.
|
||
|
||
The representation of OBJECT is also chosen by the language
|
||
module. It can consist of Scheme data types, data types distinct for
|
||
the language, or a mixture.
|
||
|
||
There is one requirement, however: Distinct data types must be
|
||
instances of a subclass of `language-specific-class'.
|
||
|
||
This procedure will be called during interactive use (the user
|
||
types expressions at a prompt) and when the system `read'
|
||
procedure is called at a time when a module using this language is
|
||
selected.
|
||
|
||
Some languages (for example Python) parse differently depending if
|
||
its an interactive or non-interactive session. Guile prvides the
|
||
predicate `interactive-port?' to test for this.
|
||
|
||
*** language-specific-class
|
||
|
||
This variable contains the superclass of all non-Scheme data-types
|
||
provided by the language.
|
||
|
||
*** native-write OBJECT PORT
|
||
|
||
This procedure prints the OBJECT on PORT using the specific
|
||
language syntax.
|
||
|
||
*** write-foreign-syntax OBJECT LANGUAGE NATIVE-WRITE PORT
|
||
|
||
Write OBJECT in the foreign language escape syntax of this module.
|
||
The object is specific to language LANGUAGE and can be written using
|
||
NATIVE-WRITE.
|
||
|
||
Here's an implementation for Scheme:
|
||
|
||
(define (write-foreign-syntax object language native-write port)
|
||
(format port "#(~A " language))
|
||
(native-write object port)
|
||
(display #\) port)
|
||
|
||
*** translate EXPRESSION --> SCHEMECODE
|
||
|
||
Translate an EXPRESSION into SCHEMECODE.
|
||
|
||
EXPRESSION can be anything returned by `read'.
|
||
|
||
SCHEMECODE is Scheme source code represented using ordinary Scheme
|
||
data. It will be passed to `eval' in an environment containing
|
||
bindings in the environment returned by `language-environment'.
|
||
|
||
This procedure will be called duing interactive use and when the
|
||
system `eval
|
||
|
||
*** translate-all PORT [ALIST] --> THUNK
|
||
|
||
Translate the entire stream of characters PORT until #<eof>.
|
||
Return a THUNK which can be called repeatedly like this:
|
||
|
||
THUNK --> SCHEMECODE
|
||
|
||
Each call will yield a new piece of scheme code. The THUNK signals
|
||
end of translation by returning the value *end-of-translation* (which
|
||
is tested using the predicate `end-of-translation?').
|
||
|
||
The optional argument ALIST provides compilation options for the
|
||
translator:
|
||
|
||
(debug . #t) means produce code suitable for debugging
|
||
|
||
This procedure will be called by the system `load' command and by
|
||
the module system when loading files.
|
||
|
||
The intensions are:
|
||
|
||
1. To let the language module decide when and in how large chunks
|
||
to do the processing. It may choose to do all processing at
|
||
the time translate-all is called, all processing when THUNK is
|
||
called the first time, or small pieces of processing each time
|
||
THUNK is called, or any conceivable combination.
|
||
|
||
2. To let the language module decide in how large chunks to output
|
||
the resulting Scheme code in order not to overload memory.
|
||
|
||
3. To enable the language module to use temporary files, and
|
||
whole-module analysis and optimization techniques.
|
||
|
||
*** untranslate SCHEMECODE --> EXPRESSION
|
||
|
||
Attempt to do the inverse of `translate'. An approximation is OK. It
|
||
is also OK to return #f. This procedure will be called from the
|
||
debugger, when generating error messages, backtraces etc.
|
||
|
||
The debugger uses the local evaluation environment to determine from
|
||
which module an expression come. This is how the debugger can know
|
||
which `untranslate' procedure to call for a given expression.
|
||
|
||
(This is used currently to decide whether which backtrace frames to
|
||
display. System modules use the option :no-backtrace to prevent
|
||
displaying of Guile's internals to the user.)
|
||
|
||
Note that `untranslate' can use source-properties set by `native-read'
|
||
to give hints about how to do the reverse translation. Such hints
|
||
could for example be the filename, and line and column numbers for the
|
||
source expression, or an actual copy of the source expression.
|
||
|
||
** How Guile system procedures `read', `eval', `write' use language modules
|
||
|
||
*** read
|
||
|
||
The idea is that the `read' exported from the R5RS library will
|
||
continue work when called from other languages, and will keep its
|
||
semantics.
|
||
|
||
A call to `read' simply means "read in an expression from PORT using
|
||
the syntax associated with that port".
|
||
|
||
Each module carries information about its language.
|
||
|
||
When an input port is created for a module to be read or during
|
||
interaction with a given module, this information is copied to the
|
||
port object.
|
||
|
||
read uses this information to call `native-read' in the correct
|
||
language module.
|
||
|
||
*** eval
|
||
|
||
[To be written.]
|
||
|
||
*** write
|
||
|
||
[To be written.]
|
||
|
||
* Error handling
|
||
|
||
** Errors during translation
|
||
|
||
Errors during translation are generated as usual by calling scm-error
|
||
(from Scheme) or scm_misc_error etc (from C). The effect of
|
||
throwing errors from within `translate-all' is the same as when they
|
||
are generated within a call to the THUNK returned from
|
||
`translate-all'.
|
||
|
||
scm-error takes a fifth argument. This is a property list (alist)
|
||
which you can use to pass extra information to the error reporting
|
||
machinery.
|
||
|
||
Currently, the following properties are supported:
|
||
|
||
filename filename of file being translated
|
||
line line number of errring expression
|
||
column column number
|
||
|
||
** Run-time errors (errors in SCHEMECODE)
|
||
|
||
This section pertains to what happens when a run-time error occurs
|
||
during evaluation of the translated code.
|
||
|
||
In order to get "foreign code" in error messages, make sure that
|
||
`untranslate' yields good output. Note the possibility of maintaining
|
||
a table (preferably using weak references) mapping SCHEMECODE to
|
||
EXPRESSION.
|
||
|
||
Note the availability of source-properties for attaching filename,
|
||
line and column number, and other, information, such as EXPRESSION, to
|
||
SCHEMECODE. If filename, line, and, column properties are defined,
|
||
they will be automatically used by the error reporting machinery.
|
||
|
||
* Proposed changes to Guile
|
||
|
||
** Implement the above proposal.
|
||
|
||
** Add new field `reader' and `translator' to all module objects
|
||
|
||
Make sure they are initialized when a language is specified.
|
||
|
||
** Use `untranslate' during error handling.
|
||
|
||
** Implement the use of arg 5 to scm-error
|
||
|
||
(specified in "Errors during translation")
|
||
|
||
** Implement a generic lexical analyzer with interface similar to read/rp
|
||
|
||
Mikael is working on this. (It might take a few days, since he is
|
||
busy with his studies right now.)
|
||
|
||
** Remove scm:eval-transformer
|
||
|
||
This is replaced by new fields in each module object (environment).
|
||
|
||
`eval' will instead directly the `transformer' field in the module
|
||
passed as second arg.
|
||
|
||
Internal evaluation will, similarly, use the transformer of the module
|
||
representing the top-level of the local environment.
|
||
|
||
Note that this level of transformation is something independent of
|
||
language translation. *This* is a hook for adding Scheme macro
|
||
packages and belong to the core language.
|
||
|
||
We also need to check the new `translator' field, potentially using
|
||
it.
|
||
|
||
** Package local environments as smobs
|
||
|
||
so that environment list structures can't leak out on the Scheme
|
||
level. (This has already been done in SCM.)
|
||
|
||
** Introduce new fields in input ports
|
||
|
||
These carries state information such as
|
||
|
||
*** which keyword syntax to support
|
||
|
||
*** whether to be case sensitive or not
|
||
|
||
*** which lexical grammar to use
|
||
|
||
*** whether the port is used in an interactive session or not
|
||
|
||
There will be a new Guile primitive `interactive-port?' testing for this.
|
||
|
||
** Move configuration of keyword syntax and case sensitivity to the read-state
|
||
|
||
Add new fields to the module objects for these values, so that the
|
||
read-state can be initialized from them.
|
||
|
||
*fixme* When? Why? How?
|
||
|
||
Probably as soon as the language has been determined during file loading.
|
||
|
||
Need to figure out how to set these values.
|
||
|
||
|
||
Local Variables:
|
||
mode: outline
|
||
End:
|