1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-06-03 18:50:19 +02:00
This commit is contained in:
Thien-Thi Nguyen 2002-03-24 00:38:21 +00:00
parent a7dc0db49a
commit 8660251f7d
29 changed files with 0 additions and 1467 deletions

View file

View file

View file

View file

View file

View file

View file

@ -1,295 +0,0 @@
@c devel/modules/desgin-notes.texi
@c TODO
@c - distill wishlist, index
@c - in Findings, characterize current module system wrt wishlist
@node Module System Design Notes
@chapter Module System Design Notes
This chapter documents module system design history. At the moment
(guile-1.5.4, 2002-02-08), the module system is supposedly undergoing
redesign; the provisional implementation works but has problems, notably
making compilation difficult.
Because module systems design is (was?) an area of active research and
development in the Scheme community, many different features are possible.
This section is a historical record of the selection process used by the Guile
hackers (if one can be discerned).
@menu
* Wishlist::
* Findings::
* Selection Criteria::
* Rationale Statements::
* Specification::
@end menu
@node Wishlist
@subsection Wishlist
In the guile-related mailing lists of yore, discussion resulted in the
following desirable traits. Note that some of these contradict each other.
@itemize @bullet
@item
support separate compilation
@item
hierarchical module names
@item
support relative references within the module name space (so that a
module within a package can use a sibling module without knowing the
prefix of the module name)
@item
support re-use of code (the same implementation can be presented to
the world through several interfaces)
@item
support individual and group renaming of bindings when using other
modules
@item
easy to import and re-export entire interfaces (so that a main
interface in a package can function as a "relay" and publish
bindings from many modules in the package)
@item
support both variable and syntactic bindings (these should be
clearly separated, though) and mesh well with hygienic macro
systems
@item
hygienic implies that we shouldn't need to care to export bindings
which an exported macro introduces at the point of its use
@item
autoloading
@item
unmemoization protocol
@item
cleanliness
A module should be able to be totally clean. There should be no
need to have *any* extra bindings in a module (a la
%module-interface or `define-module').
Therefore, we should have at least one dedicated "command" or
"config" or "repl" module.
It would probably be a good idea to follow other good Scheme
interpreters' lead and introduce the ,<command> syntax for walking
around modules, inspecting things, entering the debugger, etc.
Such commands can be looked up in this repl module.
If we insist on not using ,<command> syntax, we are forced to let
the module use list consist of a "sticky" part and the rest, where
the "sticky" part is only available at the repl prompt and not to
the code within the module, and which follows us when we walk around
in the system.
@item
well integrated with the translator framework
We should be able to say that a module uses a different syntax or
language.
Note here the nice config language of the Scheme48 module system
where it is possible to separate code from module specification: the
module specification can use scheme syntax and rest in one file,
while the module itself can use the syntax of another language.
This config language also supports re-use of code in a neat way.
@item
examine connection with object system: how easy is it to support
Java and other class-centered object systems?
@item
easy to export the same module under several different names
@item
easily supports both compiled and interpreted modules
@item
compiled modules can by dynamically loaded or statically linked in
(libltdl might provide this automatically)
@item
convenient syntax for referencing bindings in modules that are
loaded but not used
(Assuming this distinction exists.) But maybe group renaming is a better
solution to a similar class of problems.
@item
ability to unuse a module (and have it get collected when there are
no more references)
@item
orthoganality between source files, directories and modules. i.e. ability to
have multiple modules in one source file and/or multiple source files in one
module
@item
backward compatibility
@item
whenever possible the module's meta-information should be stored
within the module itself (only possible for scheme files)
@item
the compiler stores the meta-information into a single file and updates it
accordingly
(FIXME: per module, per package, directory?, per project?) This
meta-information should still be human readable (Sun's EJB use XML for their
deployment descriptors).
@item
use the file system as module repository
Since guile is a GNU project we can expect a GNU (or Unix) environment. That
means that we should use the file system as the module repository. (This
would help us avoid modules pointing to files which are pointing to other
files telling the user "hey, that's me (the module) over there".)
@item
every module has exactly @emph{one} owner who is responsible for the
module @emph{and} its interface (this contradicts with the "more than one
interface" concept)
@item
support module collections
Provide "packages" with a package scope for people working together on a
project. In some sense a module is a view on the underlying package.
@item
ability to request (i.e. import or access) complete packages
@item
support module "generations" (or "versions")
Whenever a new module fails to handle a request (due to an error) it will be
replaced by the old version.
@item
help the user to handle exceptions (note: exceptions
are not errors, see above)
@item
no special configuration language (like @code{,in} etc.)
You can always press Control-D to terminate the module's repl and return to
the config module.
@item
both C and Scheme level interfaces
@item
programming interface to module system primitives
One should be able to make customized module systems from the low-level
interface, as an alternative to using the default module system. The
default module system should use the same low-level procedures.
@item
Here are some features Keisuke Nishida desires to support his VM/compiler
[snarfed directly from post <m37l33z0cl.wl@kei.cwru.edu> dated 2001-02-06,
and requires formatting]:
* There is no "current module".
* Module variables are globally identified by an identifier
like "system::core::car".
* Bindings are solved syntactically, either by a translator
or a compiler. If you write just "car", it is expanded to
"system::core::car" by a translator or compiler, depending
on what modules you use.
* An interpreter (repl) may memorize the "current working module".
It tells the translator or the compiler what modules should be
used to identify a variable. So, during interactive sessions,
a user may feel as if there *is* the current module.
* But the fact is, all variables are globally identified at
syntax level. Therefore, the compiler can identify all
variables at compile time. This means the following code
is not allowed:
;; I want to access `foo' in the module named some-module-name
(let ((module (lookup-module some-module-name)))
(set! (current-module) module)
(foo x))
-> ERROR: Unbound variable: current-module
(let ((module (lookup-module some-module-name)))
(module::foo x))
-> ERROR: There is no variable named "module::foo"
Instead, you should write as follows if you need a dynamic access:
(let ((module (lookup-module some-module-name)))
((module-ref module 'foo) x))
(let ((module (lookup-module some-module-name)))
((module 'foo) x)) ;; if module is an applicable smob
@end itemize
@c $Date: 2002-02-08 10:50:36 $
@node Findings
@subsection Findings
This section briefly describes module system truths that are one step more
detailed than "module systems are hairy". These truths are not self-evident;
we rely on research, active experimentation and lots of discussion. The goal
of this section is to save ourselves from rehashing that which was hashed
previously.
@itemize @bullet
@item Kent Dybvig's module system
A paper is available at
@uref{http://www.cs.indiana.edu/~dyb/papers/popl99.ps.gz,
http://www.cs.indiana.edu/~dyb/papers/popl99.ps.gz}.
This was discussed in 2000-11 and 2000-12.
@item Distinction between Top-Level Environment and Module
These two are different beasts! Each of the following needs to be
well-defined with respect to both of these concepts: @code{eval},
@code{define}, @code{define-public}, @code{define-module}, @code{load},
working from REPL, top-level @code{begin}, [add here].
In guile-1.4, the distinction was not clear.
@item Current module system internals
@xref{Top,Module Internals,,module-snippets}, for implemetation
details of the module system up to and including guile-1.6.x.
@item [add here]
@end itemize
@node Selection Criteria
@subsection Selection Criteria
@node Rationale Statements
@subsection Rationale Statements
@node Specification
@subsection Specification
@c devel/modules/desgin-notes.texi ends here

View file

@ -1,288 +0,0 @@
Module Layout Proposal
======================
Martin Grabmueller
<mgrabmue@cs.tu-berlin.de>
Draft: 2001-03-11
Version: $Id: module-layout.text,v 1.1 2001-03-16 08:37:37 mgrabmue Exp $
* Table of contents
** Abstract
** Overview
*** What do we have now?
*** What should we change?
** Policy of module separation
*** Functionality
*** Standards
*** Importance
*** Compatibility
** Module naming
*** Scheme
*** Object oriented programming
*** Systems programming
*** Database programming
*** Text processing
*** Math programming
*** Network programming
*** Graphics
*** GTK+ programming
*** X programming
*** Games
*** Multiple names
*** Application modules
** Future ideas
* Abstract
This is a proposal for a new layout of the module name space. The
goal is to reduce (or even eliminate) the clutter in the current ice-9
module directory, and to provide a clean framework for splitting
libguile into subsystems, grouped by functionality, standards
compliance and maybe other characteristics.
This is not a completed policy document, but rather a collection of
ideas and proposals which still have to be decided. I will mention by
personal preference, where appropriate, but the final decisions are of
course up to the maintainers.
* Overview
Currently, new modules are added in an ad-hoc manner to the ice-9
module name space when the need for them arises. I think that was
mainly because no other directory for installed Scheme modules was
created. With the integration of GOOPS, the new top-level module
directory oop was introduced, and we should follow this practice for
other subsystems which share functionality.
DISCLAIMER: Please note that I am no expert on Guile's module system,
so be patient with me and correct me where I got anything wrong.
** What do we have now?
The module (oop goops) contains all functionality needed for
object-oriented programming with Guile (with a few exceptions in the
evaluator, which is clearly needed for performance).
Except for the procedures in the module (ice-9 rdelim), all Guile
primitives are currently located in the root module (I think it is the
module (guile)), and some procedures defined in `boot-9.scm' are
installed in the module (guile-user).
** What should we change?
In the core, there are a lot of primitive procedures which can cleanly
be grouped into subsystems, and then grouped into modules. That would
make the core library more maintainable, would ease seperate testing
of subsystems and clean up dependencies between subsystems.
* Policy of module separation
There are several possibilities to group procedures into modules.
- They could be grouped by functionality.
- They could be grouped by standards compliance.
- They could be grouped by level of importance.
One important group of modules should of course be provided
additionally:
- Compatibility modules.
So the first thing to decide is: Which of these policies should we
adopt? Personally, I think that it is not possible to cleanly use
exactly one of the possibilities, we will probably use a mixture of
them. I propose to group by functionality, and maybe use some
`bridge-modules', which make functionality available when the user
requests the modules for a given standard.
** Functionality
Candidates for the first possibility are groups of procedures, which
already are grouped in source files, such as
- Regular expression procedures.
- Network procedures.
- Systems programming procedures.
- Random number procedures.
- Math/numeric procedures.
- String-processing procedures.
- List-processing procedures.
- Character handling procedures.
- Object-oriented programming support.
** Standards
Guile now complies to R5RS, and I think that the procedures required
by this standards should always be available to the programmer.
People who do not want them, could always create :pure modules when
they need it.
On the other hand, the SRFI procedures fit nicely into a `group by
standards' scheme. An example which is already provided, is the
SRFI-8 syntax `receive'. Following that, we could provide two modules
for each SRFI, one named after the SRFI (like `srfi-8') and one named
after the main functionality (`receive').
** Importance
By importance, I mean `how important are procedures for the average
Guile user'. That means that procedures which are only useful to a
small group of users (the Guile developers, for example) should not be
immediately available at the REPL, so that they not confuse the user
when thay appear in the `apropos' output or the tab completion.
A good example would be debugging procedures (which also could be
added with a special command-line option), or low-level system calls.
** Compatibility
This group is for modules providing compatibility procedures. An
example would be a module for old string-processing procedures, which
could someday get overridden by incompatible SRFI procedures of the
same name.
* Module naming
Provided we choose to take the `group by functionality' approach, I
propose the following naming hierarchy (some of them were actually
suggested by Mikael Djurfeldt).
- Schame language related in (scheme)
- Object oriented programming in (oop)
- Systems programming in (system)
- Database programming in (database)
- Text processing in (text)
- Math/numeric programming in (math)
- Network programming in (network)
- Graphics programming in (graphics)
- GTK+ programming in (gtk)
- X programming in (xlib)
- Games in (games)
The layout of sub-hierarchies is up to the writers of modules, we
should not enforce a strict policy here, because we can not imagine
what will happen in this area.
** Scheme
(scheme r5rs) Complete R5RS procedures set.
(scheme safe) Safe modules.
(scheme srfi-1) List processing.
(scheme srfi-8) Multiple valuas via `receive'.
(scheme receive) dito.
(scheme and-let-star) and-let*
(scheme syncase) syntax-case hygienic macros (maybe included in
(scheme r5rs?).
(scheme slib) SLIB, for historic reasons in (scheme).
** Object oriented programming
Examples in this section are
(oop goops) For GOOPS.
(oop goops ...) For lower-level GOOPS functionality and utilities.
** Systems programming
(system shell) Shell utilities (glob, system etc).
(system process) Process handling.
(system file-system) Low-level filesystem support.
(system user) getuid, setpgrp, etc.
_or_
(system posix) All posix procedures.
** Database programming
In the database section, there should be sub-module hierarchies for
each supported database which contains the low-level code, and a
common database layer, which should unify access to SQL databases via a single interface a la Perl's DBMI.
(database postgres ...) Low-level database functionality.
(database oracle ...) ...
(database mysql ...) ...
(database msql ...) ...
(database sql) Common SQL accessors.
(database gdbm ...) ...
(database hashed) Common hashed database accessors (like gdbm).
(database util) Leftovers.
** Text processing
(text rdelim) Line oriented in-/output.
(text util) Mangling text files.
** Math programming
(math random) Random numbers.
(math primes) Prime numbers.
(math vector) Vector math.
(math algebra) Algebra.
(math analysis) Analysis.
(math util) Leftovers.
** Network programming
(network inet) Internet procedures.
(network socket) Socket interface.
(network db) Network database accessors.
(network util) ntohl, htons and friends.
** Graphics
(graphics vector) Generalized vector graphics handling.
(graphics vector vrml) VRML parsers etc.
(graphisc bitmap) Generalized bitmap handling.
(graphics bitmap ...) Bitmap format handling (TIFF, PNG, etc.).
** GTK+ programming
(gtk gtk) GTK+ procedures.
(gtk gdk) GDK procedures.
(gtk threads) gtktreads.
** X programming
(xlib xlib) Low-level XLib programming.
** Games
(games robots) GNU robots.
** Multiple names
As already mentioned above, I think that some modules should have
several names, to make it easier for the user to get the functionality
she needs. For example, a user could say: `hey, I need the receive
macro', or she could say: `I want to stick to SRFI syntax, so where
the hell is the module for SRFI-8?!?'.
** Application modules
We should not enforce policy on applications. So I propose that
application writers should be advised to place modules either in
application-specific directories $PREFIX/share/$APP/guile/... and name
that however they like, or to use the application's name as the first
part of the module name, e.g (gnucash import), (scwm background),
(rcalc ui).
* Future ideas
I have not yet come up with a good idea for grouping modules, which
deal for example with XML processing. They would fit into the (text)
module space, because most XML files contain text data, but they would
also fit into (database), because XML files are essentially databases.
On the other hand, XML processing is such a large field that it
probably is worth it's own top-level name space (xml).
Local Variables:
mode: outline
End:

View file

@ -1,149 +0,0 @@
* Intro / Index (last modified: $Date: 2002-02-28 05:09:19 $)
This working document explains the design of the libguile API,
specifically the interface to the C programming language.
Note that this is NOT an API reference manual.
- Motivation
- History
- Decisions
- gh_ removal
- malloc policy
- [add here]
* Motivation
The file goals.text says:
Guile's primary aim is to provide a good extension language
which is easy to add to an application written in C for the GNU
system. This means that it must export the features of a higher
level language in a way that makes it easy not to break them
from C code.
Although it may no longer be a "primary aim", creating a stable API is
important anyway since without something defined, people will take libguile
and use it in ad-hoc ways that may cause them trouble later.
* History
The initial (in ttn's memory, which only goes back to guile-1.3.4) stab at an
API was known as the "gh_ interface", which provided "high-level" abstraction
to libguile w/ the premise of supporting multiple implementations at some
future date. In practice this approach resulted in many gh_* objects being
very slight wrappers for the underlying scm_* objects, so eventually this
maintenance burden outweighed the (as yet unrealized) hope for alternate
implementations, and the gh_ interface was deprecated starting in guile-1.5.x.
Starting w/ guile-1.7.x, in concurrence w/ an effort to make libguile
available to usloth windows platforms, the naked library was once again
dressed w/ the "SCM_API interface".
Here is a table of versions (! means planned):
guile libguile readline qthreads srfi-4 -13-14
---------------------------------------------------
1.3.4 6.0.0 0.0.0 0.0.0 - -
1.4 9.0.0 0.0.0 0.0.0 - -
1.4.1 10.0.0 TBD 15.0.0 - - !
1.6.x 15.0.0 10.0.0 15.0.0 1.0.0 1.0.0 !
Note: These are libtool-style versions: CURRENT:REVISION:AGE
* Decisions
** gh_ removal
At some point, we need to remove gh_ entirely: guile-X.Y.Z.
** malloc policy
Here's a proposal by ela:
I would like to propose the follow gh_() equivalents:
gh_scm2newstr() -> scm_string2str() in string.[ch]
gh_symbol2newstr() -> scm_symbol2str() in symbol.[ch]
Both taking the (SCM obj, char *context) and allocating memory via
scm_must_malloc(). Thus the user can safely free the returned strings
with scm_must_free(). The latter feature would be an improvement to the
previous gh_() interface functions, for which the user was told to free()
them directly. This caused problems when libguile.so used libc malloc()
and the calling application used its own standard free(), which might not
be libc free().
It seems to address the general question of: How should client programs use
malloc with respect to libguile? Some specific questions:
* Do you like the names of the functions? Maybe they should be named
scm_c_*() instead of scm_*().
* Do they make sense?
* Should we provide something like scm_c_free() for pointers returned by
these kind of functions?
The first proposal regarding a malloc policy has been rejected for the
following resons:
That would mean, users of guile - even on non M$ systems - would have to
keep track where their memory came from?
Assume there are users which have some kind of hash table where they store
strings in. The hash table is responsible for removing entries from the
table. Now, if you want to put strings from guile as well as other
strings into that table you would have to store a pointer to the
corresponding version of 'free' with every string? We should demand such
coding from all guile users?
The proposal itself read: For a clean memory interface of a client program
to libguile we use the following functions from libguile:
* scm_c_malloc -- should be used to allocate memory returned by some
of the SCM to C converter functions in libguile if the
client program does not supply memory
* scm_c_free -- must be used by the client program to free the memory
returned by the SCM to C converter functions in
libguile if the client program did not supply a buffer
* scm_c_realloc -- to be complete, do not know a real purpose yet
Yet another proposal regarding this problem reads as follows: We could make
life easier, if we supplied the following:
[in gc.h]
typedef void * (* scm_t_malloc_func) (size_t);
typedef void (* svz_t_free_func) (void *);
SCM_API scm_t_malloc_func scm_c_malloc;
SCM_API scm_t_free_func scm_c_free;
[in gc.c]
{
/* At some library initialization point. */
scm_c_malloc = malloc;
scm_c_free = free;
}
Then the SCM to C converters allocating memory to store their results use
scm_c_malloc() instead of simply malloc(). This way all libguile/Unix users
can stick to the previous free() policy, saying that you need to free()
pointers delivered by libguile. On the other hand M$-Windows users can pass
their own malloc()-function-pointer to the library and use their own free()
then. Basically this can be achieved in the following order:
{
char *str;
scm_boot_guile (...);
scm_c_malloc = malloc;
str = scm_c_string2str (obj, NULL, NULL);
free (str);
}
This policy is still discussed:
If there is one global variable scm_c_malloc, then setting it within one
thread may interfere with another thread that expects scm_c_malloc to be
set differently. In other words, you would have to introduce some locking
mechanism to guarantee that the sequence of setting scm_c_malloc and
calling scm_string2str can not be interrupted by a different thread that
sets scm_c_malloc to a different value.

View file

@ -1,143 +0,0 @@
Implementation of shared substrings with fresh-copy semantics
=============================================================
Version: $Id: sharedstr.text,v 1.1 2000-08-26 20:55:21 mdj Exp $
Background
----------
In Guile, most string operations work on two other data types apart
from strings: shared substrings and read-only strings (which includes
symbols). One of Guile's sub-goals is to be a scripting language in
which string management is important. Read-only strings and shared
substrings were introduced in order to reduce overhead in string
manipulation.
We now want to simplify the Guile API by removing these two data
types, but keeping performance by allowing ordinary strings to share
storage.
The idea is to let operations like `symbol->string' and `substring'
return a pointer into the original string/symbol, thus avoiding the
need to copy the string.
Two of the problems which then arise are:
* If s2 is derived from s1, and therefore share storage with s1, a
modification to either s1 or s2 will affect the other.
* Guile is supposed to interact closely with many UNIX libraries in
which the NUL character is used to terminate strings. Therefore
Guile strings contain a NUL character at the end, in addition to the
string length (the latter of which is used by Guile's string
operations).
The solutions to these problems are to
* Copy a string with shared storage when it's modified.
* Copy a string with shared storage when it's being used as argument
to a C library call. (Copying implies inserting an ending NUL
character.)
But this leads to memory management problems. When is it OK to free
a character array which was allocated for a symbol or a string?
Abstract description of proposed solution
-----------------------------------------
Definitions
STRING = <TYPETAG, LENGTH, CHARRECORDPTR, CHARPTR>
SYMBOL = <TYPETAG, LENGTH, CHARRECORDPTR, CHARPTR>
CHARRECORD = <PHASE, SHAREDFLAG, CHARS>
PHASE = black | white
SHAREDFLAG = private | shared
CHARS is a character array
CHARPTR points into it
Memory management
A string or symbol is initially allocated with its contents stored in
a character array in a character record. The string/symbol header
contains a pointer to this record. The initial value of the shared
flag in the character record is `private'.
The GC mark phases alternate between black and white---every second
phase is black, the rest are white. This is used to distinguish
whether a character record has been encountered before:
During a black mark phase, when the GC encounters a string or symbol,
it changes the PHASE and SHAREDFLAG marks of the corresponding
character record according to the following table:
<white, private> --> <black, private> (white => unconditionally
<white, shared> --> <black, private> set to <black, private>)
<black, private> --> <black, shared> (SHAREDFLAG changed)
<black, shared> --> <black, shared> (no change)
The behaviour of a white phase is quivalent with the color names
switched.
The GC sweep phase frees any unmarked string or symbol header and
frees its character record either if it is marked with the "wrong"
color (not matching the color of the last mark phase) or if its
SHAREDFLAG is `private'.
Copy-on-write
An attempt at mutating string contents leads to copying if SHAREDFLAG
is `shared'. Copying means making a copy of the character record and
mutating the CHARRECORDPTR and CHARPTR fields of the object header to
point to the copy.
Substring operation
When making a substring, a new string header is allocated, with new
contents for the LENGTH and CHARPTR fields.
Implementation details
----------------------
* We store the character record consecutively with the character
array and lump the PHASE and SHAREDFLAG fields together into one
byte containing an integer code for the four possible states of the
PHASE and SHAREDFLAG fields. Another way of viewing it is that
these fields are represented as bits 1 and 0 in the "header" of the
character array. We let CHARRECORDPTR point to the first character
position instead of on this header:
CHARRECORDPTR
|
V
FCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
F = 0, 1, 2, 3
* We represent strings as the sub-types `simple-string' and
`substring'.
* In a simple string, CHARRECORDPTR and CHARPTR are represented by a
single pointer, so a `simple-string' is an ordinary heap cell with
TYPETAG and LENGTH in the CAR and CHARPTR in the CDR.
* substring:s are represented as double cells, with TYPETAG and LENGTH
in word 0, CHARRECORDPTR in word 1 and CHARPTR in word 2
(alternatively, we could store an offset from CHARRECORDPTR).
Problems with this implementation
---------------------------------
* How do we make copy-on-write thread-safe? Is there a different
implementation which is efficient and thread-safe?
* If small substrings are frequently generated from large, temporary
strings and the small substrings are kept in a data structure, the
heap will still have to host the large original strings. Should we
simply accept this?

View file

@ -1,592 +0,0 @@
* Introduction
Version: $Id: langtools.text,v 1.5 2000-08-13 04:47:26 mdj Exp $
This is a proposal for how Guile could interface with language
translators. It will be posted on the Guile list and revised for some
short time (days rather than weeks) before being implemented.
The document can be found in the CVS repository as
guile-core/devel/translation/langtools.text. All Guile developers are
welcome to modify and extend it according to the ongoing discussion
using CVS.
Ideas and comments are welcome.
For clarity, the proposal is partially written as if describing an
already existing system.
MDJ 000812 <djurfeldt@nada.kth.se>
* Language names
A translator for Guile is a certain kind of Guile module, implemented
in Scheme, C, or a mixture of both.
To make things simple, the name of the language is closely related to
the name of the translator module.
Languages have long and short names. The long form is simply the name
of the translator module: `(lang ctax)', `(lang emacs-lisp)',
`(my-modules foo-lang)' etc.
Languages with the long name `(lang IDENTIFIER)' can be referred to
with the short name IDENTIFIER, for example `emacs-lisp'.
* How to tell Guile to read code in a different language (than Scheme)
There are four methods of specifying which translator to use when
reading a file:
** Command option
The options to the guile command are parsed linearly from left to
right. You can change the language at zero or more points using the
option
-t, --language LANGUAGE
Example:
guile -t emacs-lisp -l foo -l bar -t scheme -l baz
will use the emacs-lisp translator while reading "foo" and "bar", and
the default translator (scheme) for "baz".
You can use this technique in a script together with the meta switch:
#!/usr/local/bin/guile \
-t emacs-lisp -s
!#
** Commentary in file
When opening a file for reading, Guile will read the first few lines,
looking for the string "-*- LANGNAME -*-", where LANGNAME can be
either the long or short form of the name.
If found, the corresponding translator is loaded and used to read the
file.
** File extension
Guile maintains an alist mapping filename extensions to languages.
Each entry has the form:
(REGEXP . LANGNAME)
where REGEXP is a string and LANGNAME a symbol or a list of symbols.
The alist can be accessed using `language-alist' which is exported
by the module `(core config)':
(language-alist) --> current alist
(language-alist ALIST) sets the alist to ALIST
(language-alist ALIST :prepend) prepends ALIST onto the current list
(language-alist ALIST :append) appends ALIST after current list
The `load' command will match filenames against this alist and choose
the translator to use accordingly.
There will be a default alist for common translators. For translators
not listed, the alist has to be extended in .guile just as Emacs users
extend auto-mode-alist in .emacs.
** Module header
You specify the language used by a module with the :language option in
the module header. (See below under "Module configuration language".)
* Module system
This section describes how the Guile module system is adapted to use
with other languages.
** Module configuration language
*** The `(config)' module
Guile has a sophisticated module system. We don't require each
translator implementation to implement its own syntax for modules.
That would be too much work for the implementor, and users would have
to learn the module system anew for each syntax.
Instead, the module `(config)' exports the module header form
`(define-module ...)'.
The config module also exports a number of primitives by which you can
customize the Guile library, such as `language-alist' and `load-path'.
*** Default module environment
The bindings of the config module is available in the default
interaction environment when Guile starts up. This is because the
config module is on the module use list for the startup environment.
However, config bindings are *not* available by default in new
modules.
The default module environment provides bindings from the R5RS module
only.
*** Module headers
The module header of the current module system is the form
(define-module NAME OPTION1 ...)
You can specify a translator using the option
:language LANGNAME
where LANGNAME is the long or short form of language name as described
above.
The translator is being fed characters from the module file, starting
immediately after the end-parenthesis of the module header form.
NOTE: There can be only one module header per file.
It is also possible to put the module header in a separate file and
use the option
:file FILENAME
to point out a file containing the actual code.
Example:
foo.gm:
----------------------------------------------------------------------
(define-module (foo)
:language emacs-lisp
:file "foo.el"
:export (foo bar)
)
----------------------------------------------------------------------
foo.el:
----------------------------------------------------------------------
(defun foo ()
...)
(defun bar ()
...)
----------------------------------------------------------------------
** Repl commands
Up till now, Guile has been dependent upon the available bindings in
the selected module in order to do basic operations such as moving to
a different module, enter the debugger or getting documentation.
This is not acceptable since we want be able to control Guile
consistently regardless of in which module we are, and sinc we don't
want to equip a module with bindings which don't have anything to do
with the purpose of the module.
Therefore, the repl provides a special command language on top of
whatever syntax the current module provides. (Scheme48 and RScheme
provides similar repl command languages.)
[Jost Boekemeier has suggested the following alternative solution:
Commands are bindings just like any other binding. It is enough if
some modules carry command bindings (it's in fact enough if *one*
module has them), because from such a module you can use the command
(in MODULE) to walk into a module not carrying command bindings, and
then use CTRL-D to exit.
However, this has the disadvantage of mixing the "real" bindings with
command bindings (the module might want to use "in" for other
purposes), that CTRL-D could cause problems since for some channels
CTRL-D might close down the connection, and that using one type of
command ("in") to go "into" the module and another (CTRL-D) to "exit"
is more complex than simply "going to" a module.]
*** Repl command syntax
Normally, repl commands have the syntax
,COMMAND ARG1 ...
Input starting with arbitrary amount of whitespace + a comma thus
works as an escape syntax.
This syntax is probably compatible with all languages. (Note that we
don't need to activate the lexer of the language until we've checked
if the first non-whitespace char is a comma.)
(Hypothetically, if this would become a problem, we can provide means
of disabling this behaviour of the repl and let that particular
language module take sole control of reading at the repl prompt.)
Among the commands available are
*** ,in MODULE
Select module named MODULE, that is any new expressions typed by the
user after this command will be evaluated in the evaluation
environment provided by MODULE.
*** ,in MODULE EXPR
Evaluate expression EXPR in MODULE. EXPR has the syntax supplied by
the language used by MODULE.
*** ,use MODULE
Import all bindings exported by MODULE to the current module.
* Language modules
Since code written in any kind of language should be able to implement
most tasks, which may include reading, evaluating and writing, and
generally computing with, expressions and data originating from other
languages, we want the basic reading, evaluation and printing
operations to be independent of the language.
That is, instead of supplying separate `read', `eval' and `write'
procedures for different languages, a language module is required to
use the system procedures in the translated code.
This means that the behaviour of `read', `eval' and `write' are
context dependent. (See further "How Guile system procedures `read',
`eval', `write' use language modules" below.)
** Language data types
Each language module should try to use the fundamental Scheme data
types as far as this is possible.
Some data types have important differences in semantics between
languages, though, and all required data types may not exist in
Guile.
In such cases, the language module must supply its own, distinct, data
types. So, each language supported by Guile uses a certain set of
data types, with the basic Scheme data types as the intersection
between all sets.
Specifically, syntax trees representing source code expressions should
normally be a distinct data type.
** Foreign language escape syntax
Note that such data can flow freely between modules. In order to
accomodate data with different native syntaxes, each language module
provides a foreign language escape syntax. In Scheme, this syntax
uses the sharp comma extension specified by SRFI-10. The read
constructor is simply the last symbol in the long language name (which
is usually the same as the short language name).
** Example 1
Characters have the syntax in Scheme and in ctax. Lists currently
have syntax in Scheme but lack ctax syntax. Ctax doesn't have a
datatype "enum", but we pretend it has for this example.
The following table now shows the syntax used for reading and writing
these expressions in module A using the language scheme, and module B
using the language ctax (we assume that the foreign language escape
syntax in ctax is #LANGUAGE EXPR):
A B
chars #\X 'X'
lists (1 2 3) #scheme (1 2 3)
enums #,(ctax ENUM) ENUM
** Example 2
A user is typing expressions in a ctax module which imports the
bindings x and y from the module `(foo)':
ctax> x = read ();
1+2;
1+2;
ctax> x
1+2;
ctax> y = 1;
1
ctax> y;
1
ctax> ,in (guile-user)
guile> ,use (foo)
guile> x
#,(ctax 1+2;)
guile> y
1
guile>
The example shows that ctax uses a distinct representation for ctax
expressions, but Scheme integers for integers.
** Language module interface
A language module is an ordinary Guile module importing bindings from
other modules and exporting bindings through its public interface.
It is required to export the following variable and procedures:
*** language-environment --> ENVIRONMENT
Returns a fresh top-level ENVIRONMENT (a module) where expressions
in this language are evaluated by default.
Modules using this language will by default have this environment
on their use list.
The intention is for this procedure to provide the "run-time
environment" for the language.
*** native-read PORT --> OBJECT
Read next expression in the foreign syntax from PORT and return an
object OBJECT representing it.
It is entirely up to the language module to define what one
expression is, that is, how much to read.
In lisp-like languages, `native-read' corresponds to `read'. Note
that in such languages, OBJECT need not be source code, but could
be data.
The representation of OBJECT is also chosen by the language
module. It can consist of Scheme data types, data types distinct for
the language, or a mixture.
There is one requirement, however: Distinct data types must be
instances of a subclass of `language-specific-class'.
This procedure will be called during interactive use (the user
types expressions at a prompt) and when the system `read'
procedure is called at a time when a module using this language is
selected.
Some languages (for example Python) parse differently depending if
its an interactive or non-interactive session. Guile prvides the
predicate `interactive-port?' to test for this.
*** language-specific-class
This variable contains the superclass of all non-Scheme data-types
provided by the language.
*** native-write OBJECT PORT
This procedure prints the OBJECT on PORT using the specific
language syntax.
*** write-foreign-syntax OBJECT LANGUAGE NATIVE-WRITE PORT
Write OBJECT in the foreign language escape syntax of this module.
The object is specific to language LANGUAGE and can be written using
NATIVE-WRITE.
Here's an implementation for Scheme:
(define (write-foreign-syntax object language native-write port)
(format port "#(~A " language))
(native-write object port)
(display #\) port)
*** translate EXPRESSION --> SCHEMECODE
Translate an EXPRESSION into SCHEMECODE.
EXPRESSION can be anything returned by `read'.
SCHEMECODE is Scheme source code represented using ordinary Scheme
data. It will be passed to `eval' in an environment containing
bindings in the environment returned by `language-environment'.
This procedure will be called duing interactive use and when the
system `eval
*** translate-all PORT [ALIST] --> THUNK
Translate the entire stream of characters PORT until #<eof>.
Return a THUNK which can be called repeatedly like this:
THUNK --> SCHEMECODE
Each call will yield a new piece of scheme code. The THUNK signals
end of translation by returning the value *end-of-translation* (which
is tested using the predicate `end-of-translation?').
The optional argument ALIST provides compilation options for the
translator:
(debug . #t) means produce code suitable for debugging
This procedure will be called by the system `load' command and by
the module system when loading files.
The intensions are:
1. To let the language module decide when and in how large chunks
to do the processing. It may choose to do all processing at
the time translate-all is called, all processing when THUNK is
called the first time, or small pieces of processing each time
THUNK is called, or any conceivable combination.
2. To let the language module decide in how large chunks to output
the resulting Scheme code in order not to overload memory.
3. To enable the language module to use temporary files, and
whole-module analysis and optimization techniques.
*** untranslate SCHEMECODE --> EXPRESSION
Attempt to do the inverse of `translate'. An approximation is OK. It
is also OK to return #f. This procedure will be called from the
debugger, when generating error messages, backtraces etc.
The debugger uses the local evaluation environment to determine from
which module an expression come. This is how the debugger can know
which `untranslate' procedure to call for a given expression.
(This is used currently to decide whether which backtrace frames to
display. System modules use the option :no-backtrace to prevent
displaying of Guile's internals to the user.)
Note that `untranslate' can use source-properties set by `native-read'
to give hints about how to do the reverse translation. Such hints
could for example be the filename, and line and column numbers for the
source expression, or an actual copy of the source expression.
** How Guile system procedures `read', `eval', `write' use language modules
*** read
The idea is that the `read' exported from the R5RS library will
continue work when called from other languages, and will keep its
semantics.
A call to `read' simply means "read in an expression from PORT using
the syntax associated with that port".
Each module carries information about its language.
When an input port is created for a module to be read or during
interaction with a given module, this information is copied to the
port object.
read uses this information to call `native-read' in the correct
language module.
*** eval
[To be written.]
*** write
[To be written.]
* Error handling
** Errors during translation
Errors during translation are generated as usual by calling scm-error
(from Scheme) or scm_misc_error etc (from C). The effect of
throwing errors from within `translate-all' is the same as when they
are generated within a call to the THUNK returned from
`translate-all'.
scm-error takes a fifth argument. This is a property list (alist)
which you can use to pass extra information to the error reporting
machinery.
Currently, the following properties are supported:
filename filename of file being translated
line line number of errring expression
column column number
** Run-time errors (errors in SCHEMECODE)
This section pertains to what happens when a run-time error occurs
during evaluation of the translated code.
In order to get "foreign code" in error messages, make sure that
`untranslate' yields good output. Note the possibility of maintaining
a table (preferably using weak references) mapping SCHEMECODE to
EXPRESSION.
Note the availability of source-properties for attaching filename,
line and column number, and other, information, such as EXPRESSION, to
SCHEMECODE. If filename, line, and, column properties are defined,
they will be automatically used by the error reporting machinery.
* Proposed changes to Guile
** Implement the above proposal.
** Add new field `reader' and `translator' to all module objects
Make sure they are initialized when a language is specified.
** Use `untranslate' during error handling.
** Implement the use of arg 5 to scm-error
(specified in "Errors during translation")
** Implement a generic lexical analyzer with interface similar to read/rp
Mikael is working on this. (It might take a few days, since he is
busy with his studies right now.)
** Remove scm:eval-transformer
This is replaced by new fields in each module object (environment).
`eval' will instead directly the `transformer' field in the module
passed as second arg.
Internal evaluation will, similarly, use the transformer of the module
representing the top-level of the local environment.
Note that this level of transformation is something independent of
language translation. *This* is a hook for adding Scheme macro
packages and belong to the core language.
We also need to check the new `translator' field, potentially using
it.
** Package local environments as smobs
so that environment list structures can't leak out on the Scheme
level. (This has already been done in SCM.)
** Introduce new fields in input ports
These carries state information such as
*** which keyword syntax to support
*** whether to be case sensitive or not
*** which lexical grammar to use
*** whether the port is used in an interactive session or not
There will be a new Guile primitive `interactive-port?' testing for this.
** Move configuration of keyword syntax and case sensitivity to the read-state
Add new fields to the module objects for these values, so that the
read-state can be initialized from them.
*fixme* When? Why? How?
Probably as soon as the language has been determined during file loading.
Need to figure out how to set these values.
Local Variables:
mode: outline
End:

View file