mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-05-04 22:40:25 +02:00
* translate/langtools.text: New file.
This commit is contained in:
parent
b63434358d
commit
5da1a3da3e
1 changed files with 307 additions and 0 deletions
307
devel/translation/langtools.text
Normal file
307
devel/translation/langtools.text
Normal file
|
@ -0,0 +1,307 @@
|
|||
* Introduction
|
||||
|
||||
This is a proposal for how Guile could interface with language
|
||||
translators. It will be posted on the Guile list and revised for some
|
||||
short time (days rather than weeks) before being implemented.
|
||||
|
||||
The document can be found in the CVS repository as
|
||||
guile-core/devel/translation/lantools.text. All Guile developers are
|
||||
welcome to modify and extend it according to the ongoing discussion
|
||||
using CVS.
|
||||
|
||||
Ideas and comments are welcome.
|
||||
|
||||
For clarity, the proposal is partially written as if describing an
|
||||
already existing system.
|
||||
|
||||
MDJ 000812 <djurfeldt@nada.kth.se>
|
||||
|
||||
* Language names
|
||||
|
||||
A translator for Guile is a certain kind of Guile module, implemented
|
||||
in Scheme, C, or a mixture of both.
|
||||
|
||||
To make things simple, the name of the language is closely related to
|
||||
the name of the translator module.
|
||||
|
||||
Languages have long and short names. The long form is simply the name
|
||||
of the translator module: `(lang ctax)', `(lang emacs-lisp)',
|
||||
`(my-modules foo-lang)' etc.
|
||||
|
||||
Languages with the long name `(lang IDENTIFIER)' can be referred to
|
||||
with the short name IDENTIFIER, for example `emacs-lisp'.
|
||||
|
||||
* How to tell Guile to read code in a different language (than Scheme)
|
||||
|
||||
There are four methods of specifying which translator to use when
|
||||
reading a file:
|
||||
|
||||
** Command option
|
||||
|
||||
The options to the guile command are parsed linearly from left to
|
||||
right. You can change the language at zero or more points using the
|
||||
option
|
||||
|
||||
-t, --language LANGUAGE
|
||||
|
||||
Example:
|
||||
|
||||
guile -t emacs-lisp -l foo -l bar -t scheme -l baz
|
||||
|
||||
will use the emacs-lisp translator while reading "foo" and "bar", and
|
||||
the default translator (scheme) for "baz".
|
||||
|
||||
You can use this technique in a script together with the meta switch:
|
||||
|
||||
#!/usr/local/bin/guile \
|
||||
-t emacs-lisp -s
|
||||
!#
|
||||
|
||||
** Commentary in file
|
||||
|
||||
When opening a file for reading, Guile will read the first few lines,
|
||||
looking for the string "-*- LANGNAME -*-", where LANGNAME can be
|
||||
either the long or short form of the name.
|
||||
|
||||
If found, the corresponding translator is loaded and used to read the
|
||||
file.
|
||||
|
||||
** File extension
|
||||
|
||||
Guile maintains an alist mapping filename extensions to languages.
|
||||
Each entry has the form:
|
||||
|
||||
(REGEXP . LANGNAME)
|
||||
|
||||
where REGEXP is a string and LANGNAME a symbol or a list of symbols.
|
||||
|
||||
The alist can be accessed using `language-alist' which is exported
|
||||
by the module `(core config)':
|
||||
|
||||
(language-alist) --> current alist
|
||||
(language-alist ALIST) sets the alist to ALIST
|
||||
(language-alist ALIST :prepend) prepends ALIST onto the current list
|
||||
(language-alist ALIST :append) appends ALIST after current list
|
||||
|
||||
The `load' command will match filenames against this alist and choose
|
||||
the translator to use accordingly.
|
||||
|
||||
** Module header
|
||||
|
||||
The module header of the current module system is the form
|
||||
|
||||
(define-module NAME OPTION1 ...)
|
||||
|
||||
You can specify a translator using the option
|
||||
|
||||
:language LANGNAME
|
||||
|
||||
where LANGNAME is the long or short form of language name as described
|
||||
above.
|
||||
|
||||
The translator is being fed characters from the module file, starting
|
||||
immediately after the end-parenthesis of the module header form.
|
||||
|
||||
NOTE: There can be only one module header per file.
|
||||
|
||||
It is also possible to put the module header in a separate file and
|
||||
use the option
|
||||
|
||||
:file FILENAME
|
||||
|
||||
to point out a file containing the actual code.
|
||||
|
||||
Example:
|
||||
|
||||
foo.gm:
|
||||
----------------------------------------------------------------------
|
||||
(define-module (foo)
|
||||
:language emacs-lisp
|
||||
:file "foo.el"
|
||||
:export (foo bar)
|
||||
)
|
||||
----------------------------------------------------------------------
|
||||
|
||||
foo.el:
|
||||
----------------------------------------------------------------------
|
||||
(defun foo ()
|
||||
...)
|
||||
|
||||
(defun bar ()
|
||||
...)
|
||||
----------------------------------------------------------------------
|
||||
|
||||
* Language modules
|
||||
|
||||
A language module is an ordinary Guile module importing bindings from
|
||||
other modules and exporting bindings through its public interface.
|
||||
|
||||
It is required to export the following procedures:
|
||||
|
||||
language-environment --> ENVIRONMENT
|
||||
|
||||
Returns a fresh top-level ENVIRONMENT (a module) where expressions
|
||||
in this language are evaluated by default.
|
||||
|
||||
Modules using this language will by default have this environment
|
||||
on their use list.
|
||||
|
||||
The intention is for this procedure to provide the "run-time
|
||||
environment" for the language.
|
||||
|
||||
read-expression PORT --> EXPRESSION
|
||||
|
||||
Read next expression in the foreign syntax from PORT and return an
|
||||
object EXPRESSION representing it.
|
||||
|
||||
It is entirely up to the language module to define what one
|
||||
expression is. The representation of EXPRESSION is also chosen by
|
||||
the language module.
|
||||
|
||||
This procedure will be called during interactive use (the user
|
||||
types expressions at a prompt) and when the system `read'
|
||||
procedure is called when a module using this language is selected.
|
||||
|
||||
translate EXPRESSION --> SCHEMECODE
|
||||
|
||||
Translate an EXPRESSION into SCHEMECODE.
|
||||
|
||||
EXPRESSION can be anything returned by `read-expression'.
|
||||
|
||||
SCHEMECODE is Scheme source code represented using ordinary Scheme
|
||||
data. It will be passed to `eval' in an environment containing
|
||||
bindings in the environment returned by `language-environment'.
|
||||
|
||||
This procedure will be called duing interactive use and when the
|
||||
system `eval
|
||||
|
||||
translate-all PORT --> THUNK
|
||||
|
||||
Translate the entire stream of characters PORT until #<eof>.
|
||||
Return a THUNK which can be called repeatedly like this:
|
||||
|
||||
THUNK --> SCHEMECODE
|
||||
|
||||
Each call will yield a new piece of scheme code. #f is returned
|
||||
to signal the end of the stream of scheme expressions.
|
||||
|
||||
This procedure will be called by the system `load' command and by
|
||||
the module system when loading files.
|
||||
|
||||
The intensions are:
|
||||
|
||||
1. To let the language module decide when and in how large chunks
|
||||
to do the processing. It may choose to do all processing at
|
||||
the time translate-all is called, all processing when THUNK is
|
||||
called the first time, or small pieces of processing each time
|
||||
THUNK is called, or any conceivable combination.
|
||||
|
||||
2. To let the language module decide in how large chunks to output
|
||||
the resulting Scheme code in order not to overload memory.
|
||||
|
||||
3. To enable the language module to use temporary files, and
|
||||
whole-module analysis and optimization techniques.
|
||||
|
||||
untranslate SCHEMECODE --> EXPRESSION
|
||||
|
||||
Attempt to do the inverse of `translate'. An approximation is
|
||||
OK. It is also OK to return #f. This procedure will be called
|
||||
from the debugger, when generating error messages, backtraces etc.
|
||||
|
||||
* Error handling
|
||||
|
||||
** Errors during translation
|
||||
|
||||
Errors during translation are generated as usual by calling scm-error
|
||||
(from Scheme) or scm_misc_error etc (from C). The effect of
|
||||
throwing errors from within `translate-all' is the same as when they
|
||||
are generated within a call to the THUNK returned from
|
||||
`translate-all'.
|
||||
|
||||
scm-error takes a fifth argument. This is a property list (alist)
|
||||
which you can use to pass extra information to the error reporting
|
||||
machinery.
|
||||
|
||||
Currently, the following properties are supported:
|
||||
|
||||
filename filename of file being translated
|
||||
line line number of errring expression
|
||||
column column number
|
||||
|
||||
** Run-time errors (errors in SCHEMECODE)
|
||||
|
||||
This section pertains to what happens when a run-time error occurs
|
||||
during evaluation of the translated code.
|
||||
|
||||
In order to get "foreign code" in error messages, make sure that
|
||||
`untranslate' yields good output. Note the possibility of maintaining
|
||||
a table (preferably using weak references) mapping SCHEMECODE to
|
||||
EXPRESSION.
|
||||
|
||||
Note the availability of source-properties for attaching filename,
|
||||
line and column number, and other, information, such as EXPRESSION, to
|
||||
SCHEMECODE. If filename, line, and, column properties are defined,
|
||||
they will be automatically used by the error reporting machinery.
|
||||
|
||||
* Proposed changes to Guile
|
||||
|
||||
** Implement the above proposal.
|
||||
|
||||
* Add new field `reader' and `translator' to all module objects
|
||||
|
||||
Make sure they are initialized when a language is specified.
|
||||
|
||||
* Use `untranslate' during error handling.
|
||||
|
||||
* Implement the use of arg 5 to scm-error
|
||||
|
||||
(specified in "Errors during translation")
|
||||
|
||||
** Implement a generic lexical analyzer with interface similar to read/rp
|
||||
|
||||
Mikael is working on this. (It might take a few days, since he is
|
||||
busy with his studies right now.)
|
||||
|
||||
** Remove scm:eval-transformer
|
||||
|
||||
This is replaced by new fields in each module object (environment).
|
||||
|
||||
`eval' will instead directly the `transformer' field in the module
|
||||
passed as second arg.
|
||||
|
||||
Internal evaluation will, similarly, use the transformer of the module
|
||||
representing the top-level of the local environment.
|
||||
|
||||
Note that this level of transformation is something independent of
|
||||
language translation. *This* is a hook for adding Scheme macro
|
||||
packages and belong to the core language.
|
||||
|
||||
We also need to check the new `translator' field, potentially using
|
||||
it.
|
||||
|
||||
** Package local environments as smobs
|
||||
|
||||
so that environment list structures can't leak out on the Scheme
|
||||
level. (This has already been done in SCM.)
|
||||
|
||||
** Introduce "read-states" (symmetrical to "print-states")
|
||||
|
||||
These carries state information belonging to a read call chain, such
|
||||
as which keyword syntax to support, whether to be case sensitive or
|
||||
not, and, which lexical grammar to use.
|
||||
|
||||
** Move configuration of keyword syntax and case sensitivity to the read-state
|
||||
|
||||
Add new fields to the module objects for these values, so that the
|
||||
read-state can be initialized from them.
|
||||
|
||||
*fixme* When? Why? How?
|
||||
|
||||
Probably as soon as the language has been determined during file loading.
|
||||
|
||||
Need to figure out how to set these values.
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: outline
|
||||
End:
|
Loading…
Add table
Add a link
Reference in a new issue