mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-05-05 06:50:21 +02:00
bye bye
This commit is contained in:
parent
dffe307d60
commit
fbea34b7cc
13 changed files with 0 additions and 1759 deletions
|
@ -1,58 +0,0 @@
|
||||||
2001-06-27 Thien-Thi Nguyen <ttn@revel.glug.org>
|
|
||||||
|
|
||||||
* README: Remove tasks.text.
|
|
||||||
|
|
||||||
* tasks.text: Bye bye (contents folded into ../TODO).
|
|
||||||
|
|
||||||
2001-05-08 Martin Grabmueller <mgrabmue@cs.tu-berlin.de>
|
|
||||||
|
|
||||||
* modules/module-snippets.texi: Fixed a lot of typos and clarified
|
|
||||||
some points. Thanks to Neil for the typo+questions patch!
|
|
||||||
|
|
||||||
2001-05-07 Martin Grabmueller <mgrabmue@cs.tu-berlin.de>
|
|
||||||
|
|
||||||
* modules/module-snippets.texi: New file, documenting the module
|
|
||||||
system. Placed in `devel' for review purposes.
|
|
||||||
|
|
||||||
2001-03-16 Martin Grabmueller <mgrabmue@cs.tu-berlin.de>
|
|
||||||
|
|
||||||
* modules: New directory.
|
|
||||||
|
|
||||||
* modules/module-layout.text: New file.
|
|
||||||
|
|
||||||
2000-08-26 Mikael Djurfeldt <mdj@linnaeus.mit.edu>
|
|
||||||
|
|
||||||
* strings: New directory.
|
|
||||||
|
|
||||||
* strings/sharedstr.text (sharedstr.text): New file.
|
|
||||||
|
|
||||||
2000-08-12 Mikael Djurfeldt <mdj@linnaeus.mit.edu>
|
|
||||||
|
|
||||||
* translate: New directory.
|
|
||||||
|
|
||||||
* translate/langtools.text: New file.
|
|
||||||
|
|
||||||
2000-05-30 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
|
||||||
|
|
||||||
* tasks.text: Use outline-mode. Added section for tasks in need
|
|
||||||
of attention.
|
|
||||||
|
|
||||||
2000-05-29 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
|
||||||
|
|
||||||
* tasks.text: New file.
|
|
||||||
|
|
||||||
2000-05-25 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
|
||||||
|
|
||||||
* README: New file.
|
|
||||||
|
|
||||||
* build/snarf-macros.text: New file.
|
|
||||||
|
|
||||||
2000-05-20 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
|
||||||
|
|
||||||
* policy/goals.text, policy/principles.text, policy/plans.text:
|
|
||||||
New files.
|
|
||||||
|
|
||||||
2000-03-21 Mikael Djurfeldt <mdj@thalamus.nada.kth.se>
|
|
||||||
|
|
||||||
* policy/names.text: New file.
|
|
||||||
|
|
13
devel/README
13
devel/README
|
@ -1,13 +0,0 @@
|
||||||
Directories:
|
|
||||||
|
|
||||||
policy Guile policy documents
|
|
||||||
|
|
||||||
build Build/installation process
|
|
||||||
|
|
||||||
string Strings and characters
|
|
||||||
|
|
||||||
translation Language traslation
|
|
||||||
|
|
||||||
vm Virtual machines
|
|
||||||
|
|
||||||
vm/ior Mikael's ideas on a new type of Scheme interpreter
|
|
|
@ -1,288 +0,0 @@
|
||||||
Module Layout Proposal
|
|
||||||
======================
|
|
||||||
|
|
||||||
Martin Grabmueller
|
|
||||||
<mgrabmue@cs.tu-berlin.de>
|
|
||||||
Draft: 2001-03-11
|
|
||||||
|
|
||||||
Version: $Id: module-layout.text,v 1.1 2001-03-16 08:37:37 mgrabmue Exp $
|
|
||||||
|
|
||||||
* Table of contents
|
|
||||||
|
|
||||||
** Abstract
|
|
||||||
** Overview
|
|
||||||
*** What do we have now?
|
|
||||||
*** What should we change?
|
|
||||||
** Policy of module separation
|
|
||||||
*** Functionality
|
|
||||||
*** Standards
|
|
||||||
*** Importance
|
|
||||||
*** Compatibility
|
|
||||||
** Module naming
|
|
||||||
*** Scheme
|
|
||||||
*** Object oriented programming
|
|
||||||
*** Systems programming
|
|
||||||
*** Database programming
|
|
||||||
*** Text processing
|
|
||||||
*** Math programming
|
|
||||||
*** Network programming
|
|
||||||
*** Graphics
|
|
||||||
*** GTK+ programming
|
|
||||||
*** X programming
|
|
||||||
*** Games
|
|
||||||
*** Multiple names
|
|
||||||
*** Application modules
|
|
||||||
** Future ideas
|
|
||||||
|
|
||||||
|
|
||||||
* Abstract
|
|
||||||
|
|
||||||
This is a proposal for a new layout of the module name space. The
|
|
||||||
goal is to reduce (or even eliminate) the clutter in the current ice-9
|
|
||||||
module directory, and to provide a clean framework for splitting
|
|
||||||
libguile into subsystems, grouped by functionality, standards
|
|
||||||
compliance and maybe other characteristics.
|
|
||||||
|
|
||||||
This is not a completed policy document, but rather a collection of
|
|
||||||
ideas and proposals which still have to be decided. I will mention by
|
|
||||||
personal preference, where appropriate, but the final decisions are of
|
|
||||||
course up to the maintainers.
|
|
||||||
|
|
||||||
|
|
||||||
* Overview
|
|
||||||
|
|
||||||
Currently, new modules are added in an ad-hoc manner to the ice-9
|
|
||||||
module name space when the need for them arises. I think that was
|
|
||||||
mainly because no other directory for installed Scheme modules was
|
|
||||||
created. With the integration of GOOPS, the new top-level module
|
|
||||||
directory oop was introduced, and we should follow this practice for
|
|
||||||
other subsystems which share functionality.
|
|
||||||
|
|
||||||
DISCLAIMER: Please note that I am no expert on Guile's module system,
|
|
||||||
so be patient with me and correct me where I got anything wrong.
|
|
||||||
|
|
||||||
** What do we have now?
|
|
||||||
|
|
||||||
The module (oop goops) contains all functionality needed for
|
|
||||||
object-oriented programming with Guile (with a few exceptions in the
|
|
||||||
evaluator, which is clearly needed for performance).
|
|
||||||
|
|
||||||
Except for the procedures in the module (ice-9 rdelim), all Guile
|
|
||||||
primitives are currently located in the root module (I think it is the
|
|
||||||
module (guile)), and some procedures defined in `boot-9.scm' are
|
|
||||||
installed in the module (guile-user).
|
|
||||||
|
|
||||||
** What should we change?
|
|
||||||
|
|
||||||
In the core, there are a lot of primitive procedures which can cleanly
|
|
||||||
be grouped into subsystems, and then grouped into modules. That would
|
|
||||||
make the core library more maintainable, would ease seperate testing
|
|
||||||
of subsystems and clean up dependencies between subsystems.
|
|
||||||
|
|
||||||
|
|
||||||
* Policy of module separation
|
|
||||||
|
|
||||||
There are several possibilities to group procedures into modules.
|
|
||||||
|
|
||||||
- They could be grouped by functionality.
|
|
||||||
- They could be grouped by standards compliance.
|
|
||||||
- They could be grouped by level of importance.
|
|
||||||
|
|
||||||
One important group of modules should of course be provided
|
|
||||||
additionally:
|
|
||||||
|
|
||||||
- Compatibility modules.
|
|
||||||
|
|
||||||
So the first thing to decide is: Which of these policies should we
|
|
||||||
adopt? Personally, I think that it is not possible to cleanly use
|
|
||||||
exactly one of the possibilities, we will probably use a mixture of
|
|
||||||
them. I propose to group by functionality, and maybe use some
|
|
||||||
`bridge-modules', which make functionality available when the user
|
|
||||||
requests the modules for a given standard.
|
|
||||||
|
|
||||||
** Functionality
|
|
||||||
|
|
||||||
Candidates for the first possibility are groups of procedures, which
|
|
||||||
already are grouped in source files, such as
|
|
||||||
|
|
||||||
- Regular expression procedures.
|
|
||||||
- Network procedures.
|
|
||||||
- Systems programming procedures.
|
|
||||||
- Random number procedures.
|
|
||||||
- Math/numeric procedures.
|
|
||||||
- String-processing procedures.
|
|
||||||
- List-processing procedures.
|
|
||||||
- Character handling procedures.
|
|
||||||
- Object-oriented programming support.
|
|
||||||
|
|
||||||
** Standards
|
|
||||||
|
|
||||||
Guile now complies to R5RS, and I think that the procedures required
|
|
||||||
by this standards should always be available to the programmer.
|
|
||||||
People who do not want them, could always create :pure modules when
|
|
||||||
they need it.
|
|
||||||
|
|
||||||
On the other hand, the SRFI procedures fit nicely into a `group by
|
|
||||||
standards' scheme. An example which is already provided, is the
|
|
||||||
SRFI-8 syntax `receive'. Following that, we could provide two modules
|
|
||||||
for each SRFI, one named after the SRFI (like `srfi-8') and one named
|
|
||||||
after the main functionality (`receive').
|
|
||||||
|
|
||||||
** Importance
|
|
||||||
|
|
||||||
By importance, I mean `how important are procedures for the average
|
|
||||||
Guile user'. That means that procedures which are only useful to a
|
|
||||||
small group of users (the Guile developers, for example) should not be
|
|
||||||
immediately available at the REPL, so that they not confuse the user
|
|
||||||
when thay appear in the `apropos' output or the tab completion.
|
|
||||||
|
|
||||||
A good example would be debugging procedures (which also could be
|
|
||||||
added with a special command-line option), or low-level system calls.
|
|
||||||
|
|
||||||
** Compatibility
|
|
||||||
|
|
||||||
This group is for modules providing compatibility procedures. An
|
|
||||||
example would be a module for old string-processing procedures, which
|
|
||||||
could someday get overridden by incompatible SRFI procedures of the
|
|
||||||
same name.
|
|
||||||
|
|
||||||
|
|
||||||
* Module naming
|
|
||||||
|
|
||||||
Provided we choose to take the `group by functionality' approach, I
|
|
||||||
propose the following naming hierarchy (some of them were actually
|
|
||||||
suggested by Mikael Djurfeldt).
|
|
||||||
|
|
||||||
- Schame language related in (scheme)
|
|
||||||
- Object oriented programming in (oop)
|
|
||||||
- Systems programming in (system)
|
|
||||||
- Database programming in (database)
|
|
||||||
- Text processing in (text)
|
|
||||||
- Math/numeric programming in (math)
|
|
||||||
- Network programming in (network)
|
|
||||||
- Graphics programming in (graphics)
|
|
||||||
- GTK+ programming in (gtk)
|
|
||||||
- X programming in (xlib)
|
|
||||||
- Games in (games)
|
|
||||||
|
|
||||||
The layout of sub-hierarchies is up to the writers of modules, we
|
|
||||||
should not enforce a strict policy here, because we can not imagine
|
|
||||||
what will happen in this area.
|
|
||||||
|
|
||||||
** Scheme
|
|
||||||
|
|
||||||
(scheme r5rs) Complete R5RS procedures set.
|
|
||||||
(scheme safe) Safe modules.
|
|
||||||
(scheme srfi-1) List processing.
|
|
||||||
(scheme srfi-8) Multiple valuas via `receive'.
|
|
||||||
(scheme receive) dito.
|
|
||||||
(scheme and-let-star) and-let*
|
|
||||||
(scheme syncase) syntax-case hygienic macros (maybe included in
|
|
||||||
(scheme r5rs?).
|
|
||||||
(scheme slib) SLIB, for historic reasons in (scheme).
|
|
||||||
|
|
||||||
** Object oriented programming
|
|
||||||
|
|
||||||
Examples in this section are
|
|
||||||
(oop goops) For GOOPS.
|
|
||||||
(oop goops ...) For lower-level GOOPS functionality and utilities.
|
|
||||||
|
|
||||||
** Systems programming
|
|
||||||
|
|
||||||
(system shell) Shell utilities (glob, system etc).
|
|
||||||
(system process) Process handling.
|
|
||||||
(system file-system) Low-level filesystem support.
|
|
||||||
(system user) getuid, setpgrp, etc.
|
|
||||||
|
|
||||||
_or_
|
|
||||||
|
|
||||||
(system posix) All posix procedures.
|
|
||||||
|
|
||||||
** Database programming
|
|
||||||
|
|
||||||
In the database section, there should be sub-module hierarchies for
|
|
||||||
each supported database which contains the low-level code, and a
|
|
||||||
common database layer, which should unify access to SQL databases via a single interface a la Perl's DBMI.
|
|
||||||
|
|
||||||
(database postgres ...) Low-level database functionality.
|
|
||||||
(database oracle ...) ...
|
|
||||||
(database mysql ...) ...
|
|
||||||
(database msql ...) ...
|
|
||||||
(database sql) Common SQL accessors.
|
|
||||||
(database gdbm ...) ...
|
|
||||||
(database hashed) Common hashed database accessors (like gdbm).
|
|
||||||
(database util) Leftovers.
|
|
||||||
|
|
||||||
** Text processing
|
|
||||||
|
|
||||||
(text rdelim) Line oriented in-/output.
|
|
||||||
(text util) Mangling text files.
|
|
||||||
|
|
||||||
** Math programming
|
|
||||||
|
|
||||||
(math random) Random numbers.
|
|
||||||
(math primes) Prime numbers.
|
|
||||||
(math vector) Vector math.
|
|
||||||
(math algebra) Algebra.
|
|
||||||
(math analysis) Analysis.
|
|
||||||
(math util) Leftovers.
|
|
||||||
|
|
||||||
** Network programming
|
|
||||||
|
|
||||||
(network inet) Internet procedures.
|
|
||||||
(network socket) Socket interface.
|
|
||||||
(network db) Network database accessors.
|
|
||||||
(network util) ntohl, htons and friends.
|
|
||||||
|
|
||||||
** Graphics
|
|
||||||
|
|
||||||
(graphics vector) Generalized vector graphics handling.
|
|
||||||
(graphics vector vrml) VRML parsers etc.
|
|
||||||
(graphisc bitmap) Generalized bitmap handling.
|
|
||||||
(graphics bitmap ...) Bitmap format handling (TIFF, PNG, etc.).
|
|
||||||
|
|
||||||
** GTK+ programming
|
|
||||||
|
|
||||||
(gtk gtk) GTK+ procedures.
|
|
||||||
(gtk gdk) GDK procedures.
|
|
||||||
(gtk threads) gtktreads.
|
|
||||||
|
|
||||||
** X programming
|
|
||||||
|
|
||||||
(xlib xlib) Low-level XLib programming.
|
|
||||||
|
|
||||||
** Games
|
|
||||||
|
|
||||||
(games robots) GNU robots.
|
|
||||||
|
|
||||||
** Multiple names
|
|
||||||
|
|
||||||
As already mentioned above, I think that some modules should have
|
|
||||||
several names, to make it easier for the user to get the functionality
|
|
||||||
she needs. For example, a user could say: `hey, I need the receive
|
|
||||||
macro', or she could say: `I want to stick to SRFI syntax, so where
|
|
||||||
the hell is the module for SRFI-8?!?'.
|
|
||||||
|
|
||||||
** Application modules
|
|
||||||
|
|
||||||
We should not enforce policy on applications. So I propose that
|
|
||||||
application writers should be advised to place modules either in
|
|
||||||
application-specific directories $PREFIX/share/$APP/guile/... and name
|
|
||||||
that however they like, or to use the application's name as the first
|
|
||||||
part of the module name, e.g (gnucash import), (scwm background),
|
|
||||||
(rcalc ui).
|
|
||||||
|
|
||||||
* Future ideas
|
|
||||||
|
|
||||||
I have not yet come up with a good idea for grouping modules, which
|
|
||||||
deal for example with XML processing. They would fit into the (text)
|
|
||||||
module space, because most XML files contain text data, but they would
|
|
||||||
also fit into (database), because XML files are essentially databases.
|
|
||||||
|
|
||||||
On the other hand, XML processing is such a large field that it
|
|
||||||
probably is worth it's own top-level name space (xml).
|
|
||||||
|
|
||||||
|
|
||||||
Local Variables:
|
|
||||||
mode: outline
|
|
||||||
End:
|
|
|
@ -1,143 +0,0 @@
|
||||||
Implementation of shared substrings with fresh-copy semantics
|
|
||||||
=============================================================
|
|
||||||
|
|
||||||
Version: $Id: sharedstr.text,v 1.1 2000-08-26 20:55:21 mdj Exp $
|
|
||||||
|
|
||||||
Background
|
|
||||||
----------
|
|
||||||
|
|
||||||
In Guile, most string operations work on two other data types apart
|
|
||||||
from strings: shared substrings and read-only strings (which includes
|
|
||||||
symbols). One of Guile's sub-goals is to be a scripting language in
|
|
||||||
which string management is important. Read-only strings and shared
|
|
||||||
substrings were introduced in order to reduce overhead in string
|
|
||||||
manipulation.
|
|
||||||
|
|
||||||
We now want to simplify the Guile API by removing these two data
|
|
||||||
types, but keeping performance by allowing ordinary strings to share
|
|
||||||
storage.
|
|
||||||
|
|
||||||
The idea is to let operations like `symbol->string' and `substring'
|
|
||||||
return a pointer into the original string/symbol, thus avoiding the
|
|
||||||
need to copy the string.
|
|
||||||
|
|
||||||
Two of the problems which then arise are:
|
|
||||||
|
|
||||||
* If s2 is derived from s1, and therefore share storage with s1, a
|
|
||||||
modification to either s1 or s2 will affect the other.
|
|
||||||
|
|
||||||
* Guile is supposed to interact closely with many UNIX libraries in
|
|
||||||
which the NUL character is used to terminate strings. Therefore
|
|
||||||
Guile strings contain a NUL character at the end, in addition to the
|
|
||||||
string length (the latter of which is used by Guile's string
|
|
||||||
operations).
|
|
||||||
|
|
||||||
The solutions to these problems are to
|
|
||||||
|
|
||||||
* Copy a string with shared storage when it's modified.
|
|
||||||
|
|
||||||
* Copy a string with shared storage when it's being used as argument
|
|
||||||
to a C library call. (Copying implies inserting an ending NUL
|
|
||||||
character.)
|
|
||||||
|
|
||||||
But this leads to memory management problems. When is it OK to free
|
|
||||||
a character array which was allocated for a symbol or a string?
|
|
||||||
|
|
||||||
Abstract description of proposed solution
|
|
||||||
-----------------------------------------
|
|
||||||
|
|
||||||
Definitions
|
|
||||||
|
|
||||||
STRING = <TYPETAG, LENGTH, CHARRECORDPTR, CHARPTR>
|
|
||||||
|
|
||||||
SYMBOL = <TYPETAG, LENGTH, CHARRECORDPTR, CHARPTR>
|
|
||||||
|
|
||||||
CHARRECORD = <PHASE, SHAREDFLAG, CHARS>
|
|
||||||
|
|
||||||
PHASE = black | white
|
|
||||||
|
|
||||||
SHAREDFLAG = private | shared
|
|
||||||
|
|
||||||
CHARS is a character array
|
|
||||||
|
|
||||||
CHARPTR points into it
|
|
||||||
|
|
||||||
Memory management
|
|
||||||
|
|
||||||
A string or symbol is initially allocated with its contents stored in
|
|
||||||
a character array in a character record. The string/symbol header
|
|
||||||
contains a pointer to this record. The initial value of the shared
|
|
||||||
flag in the character record is `private'.
|
|
||||||
|
|
||||||
The GC mark phases alternate between black and white---every second
|
|
||||||
phase is black, the rest are white. This is used to distinguish
|
|
||||||
whether a character record has been encountered before:
|
|
||||||
|
|
||||||
During a black mark phase, when the GC encounters a string or symbol,
|
|
||||||
it changes the PHASE and SHAREDFLAG marks of the corresponding
|
|
||||||
character record according to the following table:
|
|
||||||
|
|
||||||
<white, private> --> <black, private> (white => unconditionally
|
|
||||||
<white, shared> --> <black, private> set to <black, private>)
|
|
||||||
<black, private> --> <black, shared> (SHAREDFLAG changed)
|
|
||||||
<black, shared> --> <black, shared> (no change)
|
|
||||||
|
|
||||||
The behaviour of a white phase is quivalent with the color names
|
|
||||||
switched.
|
|
||||||
|
|
||||||
The GC sweep phase frees any unmarked string or symbol header and
|
|
||||||
frees its character record either if it is marked with the "wrong"
|
|
||||||
color (not matching the color of the last mark phase) or if its
|
|
||||||
SHAREDFLAG is `private'.
|
|
||||||
|
|
||||||
Copy-on-write
|
|
||||||
|
|
||||||
An attempt at mutating string contents leads to copying if SHAREDFLAG
|
|
||||||
is `shared'. Copying means making a copy of the character record and
|
|
||||||
mutating the CHARRECORDPTR and CHARPTR fields of the object header to
|
|
||||||
point to the copy.
|
|
||||||
|
|
||||||
Substring operation
|
|
||||||
|
|
||||||
When making a substring, a new string header is allocated, with new
|
|
||||||
contents for the LENGTH and CHARPTR fields.
|
|
||||||
|
|
||||||
Implementation details
|
|
||||||
----------------------
|
|
||||||
|
|
||||||
* We store the character record consecutively with the character
|
|
||||||
array and lump the PHASE and SHAREDFLAG fields together into one
|
|
||||||
byte containing an integer code for the four possible states of the
|
|
||||||
PHASE and SHAREDFLAG fields. Another way of viewing it is that
|
|
||||||
these fields are represented as bits 1 and 0 in the "header" of the
|
|
||||||
character array. We let CHARRECORDPTR point to the first character
|
|
||||||
position instead of on this header:
|
|
||||||
|
|
||||||
CHARRECORDPTR
|
|
||||||
|
|
|
||||||
V
|
|
||||||
FCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
|
|
||||||
|
|
||||||
F = 0, 1, 2, 3
|
|
||||||
|
|
||||||
* We represent strings as the sub-types `simple-string' and
|
|
||||||
`substring'.
|
|
||||||
|
|
||||||
* In a simple string, CHARRECORDPTR and CHARPTR are represented by a
|
|
||||||
single pointer, so a `simple-string' is an ordinary heap cell with
|
|
||||||
TYPETAG and LENGTH in the CAR and CHARPTR in the CDR.
|
|
||||||
|
|
||||||
* substring:s are represented as double cells, with TYPETAG and LENGTH
|
|
||||||
in word 0, CHARRECORDPTR in word 1 and CHARPTR in word 2
|
|
||||||
(alternatively, we could store an offset from CHARRECORDPTR).
|
|
||||||
|
|
||||||
Problems with this implementation
|
|
||||||
---------------------------------
|
|
||||||
|
|
||||||
* How do we make copy-on-write thread-safe? Is there a different
|
|
||||||
implementation which is efficient and thread-safe?
|
|
||||||
|
|
||||||
* If small substrings are frequently generated from large, temporary
|
|
||||||
strings and the small substrings are kept in a data structure, the
|
|
||||||
heap will still have to host the large original strings. Should we
|
|
||||||
simply accept this?
|
|
|
@ -1,592 +0,0 @@
|
||||||
* Introduction
|
|
||||||
|
|
||||||
Version: $Id: langtools.text,v 1.5 2000-08-13 04:47:26 mdj Exp $
|
|
||||||
|
|
||||||
This is a proposal for how Guile could interface with language
|
|
||||||
translators. It will be posted on the Guile list and revised for some
|
|
||||||
short time (days rather than weeks) before being implemented.
|
|
||||||
|
|
||||||
The document can be found in the CVS repository as
|
|
||||||
guile-core/devel/translation/langtools.text. All Guile developers are
|
|
||||||
welcome to modify and extend it according to the ongoing discussion
|
|
||||||
using CVS.
|
|
||||||
|
|
||||||
Ideas and comments are welcome.
|
|
||||||
|
|
||||||
For clarity, the proposal is partially written as if describing an
|
|
||||||
already existing system.
|
|
||||||
|
|
||||||
MDJ 000812 <djurfeldt@nada.kth.se>
|
|
||||||
|
|
||||||
* Language names
|
|
||||||
|
|
||||||
A translator for Guile is a certain kind of Guile module, implemented
|
|
||||||
in Scheme, C, or a mixture of both.
|
|
||||||
|
|
||||||
To make things simple, the name of the language is closely related to
|
|
||||||
the name of the translator module.
|
|
||||||
|
|
||||||
Languages have long and short names. The long form is simply the name
|
|
||||||
of the translator module: `(lang ctax)', `(lang emacs-lisp)',
|
|
||||||
`(my-modules foo-lang)' etc.
|
|
||||||
|
|
||||||
Languages with the long name `(lang IDENTIFIER)' can be referred to
|
|
||||||
with the short name IDENTIFIER, for example `emacs-lisp'.
|
|
||||||
|
|
||||||
* How to tell Guile to read code in a different language (than Scheme)
|
|
||||||
|
|
||||||
There are four methods of specifying which translator to use when
|
|
||||||
reading a file:
|
|
||||||
|
|
||||||
** Command option
|
|
||||||
|
|
||||||
The options to the guile command are parsed linearly from left to
|
|
||||||
right. You can change the language at zero or more points using the
|
|
||||||
option
|
|
||||||
|
|
||||||
-t, --language LANGUAGE
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
guile -t emacs-lisp -l foo -l bar -t scheme -l baz
|
|
||||||
|
|
||||||
will use the emacs-lisp translator while reading "foo" and "bar", and
|
|
||||||
the default translator (scheme) for "baz".
|
|
||||||
|
|
||||||
You can use this technique in a script together with the meta switch:
|
|
||||||
|
|
||||||
#!/usr/local/bin/guile \
|
|
||||||
-t emacs-lisp -s
|
|
||||||
!#
|
|
||||||
|
|
||||||
** Commentary in file
|
|
||||||
|
|
||||||
When opening a file for reading, Guile will read the first few lines,
|
|
||||||
looking for the string "-*- LANGNAME -*-", where LANGNAME can be
|
|
||||||
either the long or short form of the name.
|
|
||||||
|
|
||||||
If found, the corresponding translator is loaded and used to read the
|
|
||||||
file.
|
|
||||||
|
|
||||||
** File extension
|
|
||||||
|
|
||||||
Guile maintains an alist mapping filename extensions to languages.
|
|
||||||
Each entry has the form:
|
|
||||||
|
|
||||||
(REGEXP . LANGNAME)
|
|
||||||
|
|
||||||
where REGEXP is a string and LANGNAME a symbol or a list of symbols.
|
|
||||||
|
|
||||||
The alist can be accessed using `language-alist' which is exported
|
|
||||||
by the module `(core config)':
|
|
||||||
|
|
||||||
(language-alist) --> current alist
|
|
||||||
(language-alist ALIST) sets the alist to ALIST
|
|
||||||
(language-alist ALIST :prepend) prepends ALIST onto the current list
|
|
||||||
(language-alist ALIST :append) appends ALIST after current list
|
|
||||||
|
|
||||||
The `load' command will match filenames against this alist and choose
|
|
||||||
the translator to use accordingly.
|
|
||||||
|
|
||||||
There will be a default alist for common translators. For translators
|
|
||||||
not listed, the alist has to be extended in .guile just as Emacs users
|
|
||||||
extend auto-mode-alist in .emacs.
|
|
||||||
|
|
||||||
** Module header
|
|
||||||
|
|
||||||
You specify the language used by a module with the :language option in
|
|
||||||
the module header. (See below under "Module configuration language".)
|
|
||||||
|
|
||||||
* Module system
|
|
||||||
|
|
||||||
This section describes how the Guile module system is adapted to use
|
|
||||||
with other languages.
|
|
||||||
|
|
||||||
** Module configuration language
|
|
||||||
|
|
||||||
*** The `(config)' module
|
|
||||||
|
|
||||||
Guile has a sophisticated module system. We don't require each
|
|
||||||
translator implementation to implement its own syntax for modules.
|
|
||||||
That would be too much work for the implementor, and users would have
|
|
||||||
to learn the module system anew for each syntax.
|
|
||||||
|
|
||||||
Instead, the module `(config)' exports the module header form
|
|
||||||
`(define-module ...)'.
|
|
||||||
|
|
||||||
The config module also exports a number of primitives by which you can
|
|
||||||
customize the Guile library, such as `language-alist' and `load-path'.
|
|
||||||
|
|
||||||
*** Default module environment
|
|
||||||
|
|
||||||
The bindings of the config module is available in the default
|
|
||||||
interaction environment when Guile starts up. This is because the
|
|
||||||
config module is on the module use list for the startup environment.
|
|
||||||
|
|
||||||
However, config bindings are *not* available by default in new
|
|
||||||
modules.
|
|
||||||
|
|
||||||
The default module environment provides bindings from the R5RS module
|
|
||||||
only.
|
|
||||||
|
|
||||||
*** Module headers
|
|
||||||
|
|
||||||
The module header of the current module system is the form
|
|
||||||
|
|
||||||
(define-module NAME OPTION1 ...)
|
|
||||||
|
|
||||||
You can specify a translator using the option
|
|
||||||
|
|
||||||
:language LANGNAME
|
|
||||||
|
|
||||||
where LANGNAME is the long or short form of language name as described
|
|
||||||
above.
|
|
||||||
|
|
||||||
The translator is being fed characters from the module file, starting
|
|
||||||
immediately after the end-parenthesis of the module header form.
|
|
||||||
|
|
||||||
NOTE: There can be only one module header per file.
|
|
||||||
|
|
||||||
It is also possible to put the module header in a separate file and
|
|
||||||
use the option
|
|
||||||
|
|
||||||
:file FILENAME
|
|
||||||
|
|
||||||
to point out a file containing the actual code.
|
|
||||||
|
|
||||||
Example:
|
|
||||||
|
|
||||||
foo.gm:
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
(define-module (foo)
|
|
||||||
:language emacs-lisp
|
|
||||||
:file "foo.el"
|
|
||||||
:export (foo bar)
|
|
||||||
)
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
|
|
||||||
foo.el:
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
(defun foo ()
|
|
||||||
...)
|
|
||||||
|
|
||||||
(defun bar ()
|
|
||||||
...)
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
|
|
||||||
** Repl commands
|
|
||||||
|
|
||||||
Up till now, Guile has been dependent upon the available bindings in
|
|
||||||
the selected module in order to do basic operations such as moving to
|
|
||||||
a different module, enter the debugger or getting documentation.
|
|
||||||
|
|
||||||
This is not acceptable since we want be able to control Guile
|
|
||||||
consistently regardless of in which module we are, and sinc we don't
|
|
||||||
want to equip a module with bindings which don't have anything to do
|
|
||||||
with the purpose of the module.
|
|
||||||
|
|
||||||
Therefore, the repl provides a special command language on top of
|
|
||||||
whatever syntax the current module provides. (Scheme48 and RScheme
|
|
||||||
provides similar repl command languages.)
|
|
||||||
|
|
||||||
[Jost Boekemeier has suggested the following alternative solution:
|
|
||||||
Commands are bindings just like any other binding. It is enough if
|
|
||||||
some modules carry command bindings (it's in fact enough if *one*
|
|
||||||
module has them), because from such a module you can use the command
|
|
||||||
(in MODULE) to walk into a module not carrying command bindings, and
|
|
||||||
then use CTRL-D to exit.
|
|
||||||
|
|
||||||
However, this has the disadvantage of mixing the "real" bindings with
|
|
||||||
command bindings (the module might want to use "in" for other
|
|
||||||
purposes), that CTRL-D could cause problems since for some channels
|
|
||||||
CTRL-D might close down the connection, and that using one type of
|
|
||||||
command ("in") to go "into" the module and another (CTRL-D) to "exit"
|
|
||||||
is more complex than simply "going to" a module.]
|
|
||||||
|
|
||||||
*** Repl command syntax
|
|
||||||
|
|
||||||
Normally, repl commands have the syntax
|
|
||||||
|
|
||||||
,COMMAND ARG1 ...
|
|
||||||
|
|
||||||
Input starting with arbitrary amount of whitespace + a comma thus
|
|
||||||
works as an escape syntax.
|
|
||||||
|
|
||||||
This syntax is probably compatible with all languages. (Note that we
|
|
||||||
don't need to activate the lexer of the language until we've checked
|
|
||||||
if the first non-whitespace char is a comma.)
|
|
||||||
|
|
||||||
(Hypothetically, if this would become a problem, we can provide means
|
|
||||||
of disabling this behaviour of the repl and let that particular
|
|
||||||
language module take sole control of reading at the repl prompt.)
|
|
||||||
|
|
||||||
Among the commands available are
|
|
||||||
|
|
||||||
*** ,in MODULE
|
|
||||||
|
|
||||||
Select module named MODULE, that is any new expressions typed by the
|
|
||||||
user after this command will be evaluated in the evaluation
|
|
||||||
environment provided by MODULE.
|
|
||||||
|
|
||||||
*** ,in MODULE EXPR
|
|
||||||
|
|
||||||
Evaluate expression EXPR in MODULE. EXPR has the syntax supplied by
|
|
||||||
the language used by MODULE.
|
|
||||||
|
|
||||||
*** ,use MODULE
|
|
||||||
|
|
||||||
Import all bindings exported by MODULE to the current module.
|
|
||||||
|
|
||||||
* Language modules
|
|
||||||
|
|
||||||
Since code written in any kind of language should be able to implement
|
|
||||||
most tasks, which may include reading, evaluating and writing, and
|
|
||||||
generally computing with, expressions and data originating from other
|
|
||||||
languages, we want the basic reading, evaluation and printing
|
|
||||||
operations to be independent of the language.
|
|
||||||
|
|
||||||
That is, instead of supplying separate `read', `eval' and `write'
|
|
||||||
procedures for different languages, a language module is required to
|
|
||||||
use the system procedures in the translated code.
|
|
||||||
|
|
||||||
This means that the behaviour of `read', `eval' and `write' are
|
|
||||||
context dependent. (See further "How Guile system procedures `read',
|
|
||||||
`eval', `write' use language modules" below.)
|
|
||||||
|
|
||||||
** Language data types
|
|
||||||
|
|
||||||
Each language module should try to use the fundamental Scheme data
|
|
||||||
types as far as this is possible.
|
|
||||||
|
|
||||||
Some data types have important differences in semantics between
|
|
||||||
languages, though, and all required data types may not exist in
|
|
||||||
Guile.
|
|
||||||
|
|
||||||
In such cases, the language module must supply its own, distinct, data
|
|
||||||
types. So, each language supported by Guile uses a certain set of
|
|
||||||
data types, with the basic Scheme data types as the intersection
|
|
||||||
between all sets.
|
|
||||||
|
|
||||||
Specifically, syntax trees representing source code expressions should
|
|
||||||
normally be a distinct data type.
|
|
||||||
|
|
||||||
** Foreign language escape syntax
|
|
||||||
|
|
||||||
Note that such data can flow freely between modules. In order to
|
|
||||||
accomodate data with different native syntaxes, each language module
|
|
||||||
provides a foreign language escape syntax. In Scheme, this syntax
|
|
||||||
uses the sharp comma extension specified by SRFI-10. The read
|
|
||||||
constructor is simply the last symbol in the long language name (which
|
|
||||||
is usually the same as the short language name).
|
|
||||||
|
|
||||||
** Example 1
|
|
||||||
|
|
||||||
Characters have the syntax in Scheme and in ctax. Lists currently
|
|
||||||
have syntax in Scheme but lack ctax syntax. Ctax doesn't have a
|
|
||||||
datatype "enum", but we pretend it has for this example.
|
|
||||||
|
|
||||||
The following table now shows the syntax used for reading and writing
|
|
||||||
these expressions in module A using the language scheme, and module B
|
|
||||||
using the language ctax (we assume that the foreign language escape
|
|
||||||
syntax in ctax is #LANGUAGE EXPR):
|
|
||||||
|
|
||||||
A B
|
|
||||||
|
|
||||||
chars #\X 'X'
|
|
||||||
|
|
||||||
lists (1 2 3) #scheme (1 2 3)
|
|
||||||
|
|
||||||
enums #,(ctax ENUM) ENUM
|
|
||||||
|
|
||||||
** Example 2
|
|
||||||
|
|
||||||
A user is typing expressions in a ctax module which imports the
|
|
||||||
bindings x and y from the module `(foo)':
|
|
||||||
|
|
||||||
ctax> x = read ();
|
|
||||||
1+2;
|
|
||||||
1+2;
|
|
||||||
ctax> x
|
|
||||||
1+2;
|
|
||||||
ctax> y = 1;
|
|
||||||
1
|
|
||||||
ctax> y;
|
|
||||||
1
|
|
||||||
ctax> ,in (guile-user)
|
|
||||||
guile> ,use (foo)
|
|
||||||
guile> x
|
|
||||||
#,(ctax 1+2;)
|
|
||||||
guile> y
|
|
||||||
1
|
|
||||||
guile>
|
|
||||||
|
|
||||||
The example shows that ctax uses a distinct representation for ctax
|
|
||||||
expressions, but Scheme integers for integers.
|
|
||||||
|
|
||||||
** Language module interface
|
|
||||||
|
|
||||||
A language module is an ordinary Guile module importing bindings from
|
|
||||||
other modules and exporting bindings through its public interface.
|
|
||||||
|
|
||||||
It is required to export the following variable and procedures:
|
|
||||||
|
|
||||||
*** language-environment --> ENVIRONMENT
|
|
||||||
|
|
||||||
Returns a fresh top-level ENVIRONMENT (a module) where expressions
|
|
||||||
in this language are evaluated by default.
|
|
||||||
|
|
||||||
Modules using this language will by default have this environment
|
|
||||||
on their use list.
|
|
||||||
|
|
||||||
The intention is for this procedure to provide the "run-time
|
|
||||||
environment" for the language.
|
|
||||||
|
|
||||||
*** native-read PORT --> OBJECT
|
|
||||||
|
|
||||||
Read next expression in the foreign syntax from PORT and return an
|
|
||||||
object OBJECT representing it.
|
|
||||||
|
|
||||||
It is entirely up to the language module to define what one
|
|
||||||
expression is, that is, how much to read.
|
|
||||||
|
|
||||||
In lisp-like languages, `native-read' corresponds to `read'. Note
|
|
||||||
that in such languages, OBJECT need not be source code, but could
|
|
||||||
be data.
|
|
||||||
|
|
||||||
The representation of OBJECT is also chosen by the language
|
|
||||||
module. It can consist of Scheme data types, data types distinct for
|
|
||||||
the language, or a mixture.
|
|
||||||
|
|
||||||
There is one requirement, however: Distinct data types must be
|
|
||||||
instances of a subclass of `language-specific-class'.
|
|
||||||
|
|
||||||
This procedure will be called during interactive use (the user
|
|
||||||
types expressions at a prompt) and when the system `read'
|
|
||||||
procedure is called at a time when a module using this language is
|
|
||||||
selected.
|
|
||||||
|
|
||||||
Some languages (for example Python) parse differently depending if
|
|
||||||
its an interactive or non-interactive session. Guile prvides the
|
|
||||||
predicate `interactive-port?' to test for this.
|
|
||||||
|
|
||||||
*** language-specific-class
|
|
||||||
|
|
||||||
This variable contains the superclass of all non-Scheme data-types
|
|
||||||
provided by the language.
|
|
||||||
|
|
||||||
*** native-write OBJECT PORT
|
|
||||||
|
|
||||||
This procedure prints the OBJECT on PORT using the specific
|
|
||||||
language syntax.
|
|
||||||
|
|
||||||
*** write-foreign-syntax OBJECT LANGUAGE NATIVE-WRITE PORT
|
|
||||||
|
|
||||||
Write OBJECT in the foreign language escape syntax of this module.
|
|
||||||
The object is specific to language LANGUAGE and can be written using
|
|
||||||
NATIVE-WRITE.
|
|
||||||
|
|
||||||
Here's an implementation for Scheme:
|
|
||||||
|
|
||||||
(define (write-foreign-syntax object language native-write port)
|
|
||||||
(format port "#(~A " language))
|
|
||||||
(native-write object port)
|
|
||||||
(display #\) port)
|
|
||||||
|
|
||||||
*** translate EXPRESSION --> SCHEMECODE
|
|
||||||
|
|
||||||
Translate an EXPRESSION into SCHEMECODE.
|
|
||||||
|
|
||||||
EXPRESSION can be anything returned by `read'.
|
|
||||||
|
|
||||||
SCHEMECODE is Scheme source code represented using ordinary Scheme
|
|
||||||
data. It will be passed to `eval' in an environment containing
|
|
||||||
bindings in the environment returned by `language-environment'.
|
|
||||||
|
|
||||||
This procedure will be called duing interactive use and when the
|
|
||||||
system `eval
|
|
||||||
|
|
||||||
*** translate-all PORT [ALIST] --> THUNK
|
|
||||||
|
|
||||||
Translate the entire stream of characters PORT until #<eof>.
|
|
||||||
Return a THUNK which can be called repeatedly like this:
|
|
||||||
|
|
||||||
THUNK --> SCHEMECODE
|
|
||||||
|
|
||||||
Each call will yield a new piece of scheme code. The THUNK signals
|
|
||||||
end of translation by returning the value *end-of-translation* (which
|
|
||||||
is tested using the predicate `end-of-translation?').
|
|
||||||
|
|
||||||
The optional argument ALIST provides compilation options for the
|
|
||||||
translator:
|
|
||||||
|
|
||||||
(debug . #t) means produce code suitable for debugging
|
|
||||||
|
|
||||||
This procedure will be called by the system `load' command and by
|
|
||||||
the module system when loading files.
|
|
||||||
|
|
||||||
The intensions are:
|
|
||||||
|
|
||||||
1. To let the language module decide when and in how large chunks
|
|
||||||
to do the processing. It may choose to do all processing at
|
|
||||||
the time translate-all is called, all processing when THUNK is
|
|
||||||
called the first time, or small pieces of processing each time
|
|
||||||
THUNK is called, or any conceivable combination.
|
|
||||||
|
|
||||||
2. To let the language module decide in how large chunks to output
|
|
||||||
the resulting Scheme code in order not to overload memory.
|
|
||||||
|
|
||||||
3. To enable the language module to use temporary files, and
|
|
||||||
whole-module analysis and optimization techniques.
|
|
||||||
|
|
||||||
*** untranslate SCHEMECODE --> EXPRESSION
|
|
||||||
|
|
||||||
Attempt to do the inverse of `translate'. An approximation is OK. It
|
|
||||||
is also OK to return #f. This procedure will be called from the
|
|
||||||
debugger, when generating error messages, backtraces etc.
|
|
||||||
|
|
||||||
The debugger uses the local evaluation environment to determine from
|
|
||||||
which module an expression come. This is how the debugger can know
|
|
||||||
which `untranslate' procedure to call for a given expression.
|
|
||||||
|
|
||||||
(This is used currently to decide whether which backtrace frames to
|
|
||||||
display. System modules use the option :no-backtrace to prevent
|
|
||||||
displaying of Guile's internals to the user.)
|
|
||||||
|
|
||||||
Note that `untranslate' can use source-properties set by `native-read'
|
|
||||||
to give hints about how to do the reverse translation. Such hints
|
|
||||||
could for example be the filename, and line and column numbers for the
|
|
||||||
source expression, or an actual copy of the source expression.
|
|
||||||
|
|
||||||
** How Guile system procedures `read', `eval', `write' use language modules
|
|
||||||
|
|
||||||
*** read
|
|
||||||
|
|
||||||
The idea is that the `read' exported from the R5RS library will
|
|
||||||
continue work when called from other languages, and will keep its
|
|
||||||
semantics.
|
|
||||||
|
|
||||||
A call to `read' simply means "read in an expression from PORT using
|
|
||||||
the syntax associated with that port".
|
|
||||||
|
|
||||||
Each module carries information about its language.
|
|
||||||
|
|
||||||
When an input port is created for a module to be read or during
|
|
||||||
interaction with a given module, this information is copied to the
|
|
||||||
port object.
|
|
||||||
|
|
||||||
read uses this information to call `native-read' in the correct
|
|
||||||
language module.
|
|
||||||
|
|
||||||
*** eval
|
|
||||||
|
|
||||||
[To be written.]
|
|
||||||
|
|
||||||
*** write
|
|
||||||
|
|
||||||
[To be written.]
|
|
||||||
|
|
||||||
* Error handling
|
|
||||||
|
|
||||||
** Errors during translation
|
|
||||||
|
|
||||||
Errors during translation are generated as usual by calling scm-error
|
|
||||||
(from Scheme) or scm_misc_error etc (from C). The effect of
|
|
||||||
throwing errors from within `translate-all' is the same as when they
|
|
||||||
are generated within a call to the THUNK returned from
|
|
||||||
`translate-all'.
|
|
||||||
|
|
||||||
scm-error takes a fifth argument. This is a property list (alist)
|
|
||||||
which you can use to pass extra information to the error reporting
|
|
||||||
machinery.
|
|
||||||
|
|
||||||
Currently, the following properties are supported:
|
|
||||||
|
|
||||||
filename filename of file being translated
|
|
||||||
line line number of errring expression
|
|
||||||
column column number
|
|
||||||
|
|
||||||
** Run-time errors (errors in SCHEMECODE)
|
|
||||||
|
|
||||||
This section pertains to what happens when a run-time error occurs
|
|
||||||
during evaluation of the translated code.
|
|
||||||
|
|
||||||
In order to get "foreign code" in error messages, make sure that
|
|
||||||
`untranslate' yields good output. Note the possibility of maintaining
|
|
||||||
a table (preferably using weak references) mapping SCHEMECODE to
|
|
||||||
EXPRESSION.
|
|
||||||
|
|
||||||
Note the availability of source-properties for attaching filename,
|
|
||||||
line and column number, and other, information, such as EXPRESSION, to
|
|
||||||
SCHEMECODE. If filename, line, and, column properties are defined,
|
|
||||||
they will be automatically used by the error reporting machinery.
|
|
||||||
|
|
||||||
* Proposed changes to Guile
|
|
||||||
|
|
||||||
** Implement the above proposal.
|
|
||||||
|
|
||||||
** Add new field `reader' and `translator' to all module objects
|
|
||||||
|
|
||||||
Make sure they are initialized when a language is specified.
|
|
||||||
|
|
||||||
** Use `untranslate' during error handling.
|
|
||||||
|
|
||||||
** Implement the use of arg 5 to scm-error
|
|
||||||
|
|
||||||
(specified in "Errors during translation")
|
|
||||||
|
|
||||||
** Implement a generic lexical analyzer with interface similar to read/rp
|
|
||||||
|
|
||||||
Mikael is working on this. (It might take a few days, since he is
|
|
||||||
busy with his studies right now.)
|
|
||||||
|
|
||||||
** Remove scm:eval-transformer
|
|
||||||
|
|
||||||
This is replaced by new fields in each module object (environment).
|
|
||||||
|
|
||||||
`eval' will instead directly the `transformer' field in the module
|
|
||||||
passed as second arg.
|
|
||||||
|
|
||||||
Internal evaluation will, similarly, use the transformer of the module
|
|
||||||
representing the top-level of the local environment.
|
|
||||||
|
|
||||||
Note that this level of transformation is something independent of
|
|
||||||
language translation. *This* is a hook for adding Scheme macro
|
|
||||||
packages and belong to the core language.
|
|
||||||
|
|
||||||
We also need to check the new `translator' field, potentially using
|
|
||||||
it.
|
|
||||||
|
|
||||||
** Package local environments as smobs
|
|
||||||
|
|
||||||
so that environment list structures can't leak out on the Scheme
|
|
||||||
level. (This has already been done in SCM.)
|
|
||||||
|
|
||||||
** Introduce new fields in input ports
|
|
||||||
|
|
||||||
These carries state information such as
|
|
||||||
|
|
||||||
*** which keyword syntax to support
|
|
||||||
|
|
||||||
*** whether to be case sensitive or not
|
|
||||||
|
|
||||||
*** which lexical grammar to use
|
|
||||||
|
|
||||||
*** whether the port is used in an interactive session or not
|
|
||||||
|
|
||||||
There will be a new Guile primitive `interactive-port?' testing for this.
|
|
||||||
|
|
||||||
** Move configuration of keyword syntax and case sensitivity to the read-state
|
|
||||||
|
|
||||||
Add new fields to the module objects for these values, so that the
|
|
||||||
read-state can be initialized from them.
|
|
||||||
|
|
||||||
*fixme* When? Why? How?
|
|
||||||
|
|
||||||
Probably as soon as the language has been determined during file loading.
|
|
||||||
|
|
||||||
Need to figure out how to set these values.
|
|
||||||
|
|
||||||
|
|
||||||
Local Variables:
|
|
||||||
mode: outline
|
|
||||||
End:
|
|
|
@ -1,665 +0,0 @@
|
||||||
***
|
|
||||||
*** These notes about the design of a new type of Scheme interpreter
|
|
||||||
*** "Ior" are cut out from various emails from early spring 2000.
|
|
||||||
***
|
|
||||||
*** MDJ 000817 <djurfeldt@nada.kth.se>
|
|
||||||
***
|
|
||||||
|
|
||||||
Generally, we should try to make a design which is clean and
|
|
||||||
minimalistic in as many respects as possible. For example, even if we
|
|
||||||
need more primitives than those in R5RS internally, I don't think
|
|
||||||
these should be made available to the user in the core, but rather be
|
|
||||||
made available *through* libraries (implementation in core,
|
|
||||||
publication via library).
|
|
||||||
|
|
||||||
The suggested working name for this project is "Ior" (Swedish name for
|
|
||||||
the donkey in "Winnie the Pooh" :). If, against the odds, we really
|
|
||||||
would succeed in producing an Ior, and we find it suitable, we could
|
|
||||||
turn it into a Guile 2.0 (or whatever). (The architecture still
|
|
||||||
allows for support of the gh interface and uses conservative GC (Hans
|
|
||||||
Böhm's, in fact).)
|
|
||||||
|
|
||||||
Beware now that I'm just sending over my original letter, which is
|
|
||||||
just a sketch of the more detailed, but cryptic, design notes I made
|
|
||||||
originally, which are, in turn, not as detailed as the design has
|
|
||||||
become now. :)
|
|
||||||
|
|
||||||
Please also excuse the lack of structure. I shouldn't work on this at
|
|
||||||
all right now. Choose for yourselves if you want to read this
|
|
||||||
unstructured information or if you want to wait until I've structured
|
|
||||||
it after end of January.
|
|
||||||
|
|
||||||
But then I actually have to blurt out the basic idea of my
|
|
||||||
architecture already now. (I had hoped to present you with a proper
|
|
||||||
and fairly detailed spec, but I won't be able to complete such a spec
|
|
||||||
quickly.)
|
|
||||||
|
|
||||||
|
|
||||||
The basic idea is this:
|
|
||||||
|
|
||||||
* Don't waste time on non-computation!
|
|
||||||
|
|
||||||
Why waste a lot of time on type-checks, unboxing and boxing of data?
|
|
||||||
Neither of these actions do any computations!
|
|
||||||
|
|
||||||
I'd like both interpreter and compiled code to work directly with data
|
|
||||||
in raw, native form (integers represented as 32bit longs, inexact
|
|
||||||
numbers as doubles, short strings as bytes in a word, longer strings
|
|
||||||
as a normal pointer to malloced memory, bignums are just pointers to a
|
|
||||||
gmp (GNU MultiPrecision library) object, etc.)
|
|
||||||
|
|
||||||
* Don't we need to dispatch on type to know what to do?
|
|
||||||
|
|
||||||
But don't we need to dispatch on the type in order to know how to
|
|
||||||
compute with the data? E.g., `display' does entirely different
|
|
||||||
computations on a <fixnum> and a <string>. (<fixnum> is an integer
|
|
||||||
between -2^31 and 2^31-1.)
|
|
||||||
|
|
||||||
The answer is *no*, not in 95% of all cases. The main reason is that
|
|
||||||
the interpreter does type analysis while converting closures to
|
|
||||||
bytecode, and knows already when _calling_ `display' what type it's
|
|
||||||
arguments has. This means that the bytecode compiler can choose a
|
|
||||||
suitable _version_ of `display' which handles that particular type.
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
This type analysis is greatly simplified by the fact that just as the
|
|
||||||
type analysis _results_ in the type of the argument in the call to
|
|
||||||
`display', and, thus, we can select the correct _version_ of
|
|
||||||
`display', the closure byte-code itself will only be one _version_ of
|
|
||||||
the closure with the types of its arguments fixed at the start of the
|
|
||||||
analysis.
|
|
||||||
|
|
||||||
As you already have understood by now, the basic architecture is that
|
|
||||||
all procedures are generic functions, and the "versions" I'm speaking
|
|
||||||
about is a kind of methods. Let's call them "branches" by now.
|
|
||||||
|
|
||||||
For example:
|
|
||||||
|
|
||||||
(define foo
|
|
||||||
(lambda (x)
|
|
||||||
...
|
|
||||||
(display x)
|
|
||||||
...)
|
|
||||||
|
|
||||||
may result in the following two branches:
|
|
||||||
|
|
||||||
1. [<fixnum>-foo] =
|
|
||||||
(branch ((x <fixnum>))
|
|
||||||
...
|
|
||||||
([<fixnum>-display] x)
|
|
||||||
...)
|
|
||||||
|
|
||||||
2. [<string>-foo] =
|
|
||||||
(branch ((x <string>))
|
|
||||||
...
|
|
||||||
([<string>-display] x)
|
|
||||||
...)
|
|
||||||
|
|
||||||
and a new closure
|
|
||||||
|
|
||||||
(define bar
|
|
||||||
(lambda (x y)
|
|
||||||
...
|
|
||||||
(foo x)
|
|
||||||
...))
|
|
||||||
|
|
||||||
results in
|
|
||||||
|
|
||||||
[<fixnum>-<fixnum>-bar] =
|
|
||||||
(branch ((x <fixnum>) (y <fixnum>))
|
|
||||||
...
|
|
||||||
([<fixnum>-foo] x)
|
|
||||||
...)
|
|
||||||
|
|
||||||
Note how all type dispatch is eliminated in these examples.
|
|
||||||
|
|
||||||
As a further reinforcement to the type analysis, branches will not
|
|
||||||
only have typed parameters but also have return types. This means
|
|
||||||
that the type of a branch will look like
|
|
||||||
|
|
||||||
<type 1> x ... x <type n> --> <type r>
|
|
||||||
|
|
||||||
In essence, the entire system will be very ML-like internally, and we
|
|
||||||
can benefit from the research done on ML-compilation.
|
|
||||||
|
|
||||||
However, we now get three major problems to confront:
|
|
||||||
|
|
||||||
1. In the Scheme language not all situations can be completely type
|
|
||||||
analyzed.
|
|
||||||
|
|
||||||
2. In particular, for some operations, even if the types of the
|
|
||||||
parameters are well defined, we can't determine the return type
|
|
||||||
generically. For example, [<fixnum>-<fixnum>-+] may have return
|
|
||||||
type <fixnum> _or_ <bignum>.
|
|
||||||
|
|
||||||
3. Even if we can do a complete analysis, some closures will generate
|
|
||||||
a combinatoric explosion of branches.
|
|
||||||
|
|
||||||
|
|
||||||
Problem 1: Incomplete analysis
|
|
||||||
|
|
||||||
We introduce a new type <boxed>. This data type has type <boxed> and
|
|
||||||
contents
|
|
||||||
|
|
||||||
struct ior_boxed_t {
|
|
||||||
ior_type *type; /* pointer to class struct */
|
|
||||||
void *data; /* generic field, may also contain immediate objects
|
|
||||||
*/
|
|
||||||
}
|
|
||||||
|
|
||||||
For example, a boxed fixnum 4711 has type <boxed> and contents
|
|
||||||
{ <fixnum>, 4711 }. The boxed type essentially corresponds to Guile's
|
|
||||||
SCM type. It's just that the 1 or 3 or 7 or 16-bit type tag has been
|
|
||||||
replaced with a 32-bit type tag (the pointer to the class structure
|
|
||||||
describing the type of the object).
|
|
||||||
|
|
||||||
This is more inefficient than the SCM type system, but it's no problem
|
|
||||||
since it won't be used in 95% of all cases. The big advantage
|
|
||||||
compared to SCM's type system is that it is so simple and uniform.
|
|
||||||
|
|
||||||
I should note here that while SCM and Guile are centered around the
|
|
||||||
cell representation and all objects either _are_ cells or have a cell
|
|
||||||
handle, objects in ior will more look like mallocs. This is the
|
|
||||||
reason why I planned to start with B<><42>öhm's GC which has C pointers as
|
|
||||||
object handles. But it is of course still possible to use a heap, or,
|
|
||||||
preferably several heaps for different kinds of objects. (B<><42>öhm's GC
|
|
||||||
has multiple heaps for different sizes of objects.) If we write a
|
|
||||||
custom GC, we can increase speed further.
|
|
||||||
|
|
||||||
|
|
||||||
Problem 3 (yes, I skipped :) Combinatoric explosion
|
|
||||||
|
|
||||||
We simply don't generate all possible branches. In the interpreter we
|
|
||||||
generate branches "just-too-late" (well, it's normally called "lazy
|
|
||||||
compilation" or "just-in-time", but if it was "in-time", the procedure
|
|
||||||
would already be compiled when it was needed, right? :) as when Guile
|
|
||||||
memoizes or when a Java machine turns byte-codes into machine code, or
|
|
||||||
as when GOOPS turns methods into cmethods for that matter.
|
|
||||||
|
|
||||||
Have noticed that branches (although still without return type
|
|
||||||
information) already exist in GOOPS? They are currently called
|
|
||||||
"cmethods" and are generated on demand from the method code and put
|
|
||||||
into the GF cache during evaluation of GOOPS code. :-) (I have not
|
|
||||||
utilized this fully yet. I plan to soon use this method compilation
|
|
||||||
(into branches) to eliminate almost all type dispatch in calls to
|
|
||||||
accessors.)
|
|
||||||
|
|
||||||
For the compiler, we use profiling information, just as the modern GCC
|
|
||||||
scheduler, or else relies on some type analysis (if a procedure says
|
|
||||||
(+ x y), x is not normally a <string> but rather some subclass of
|
|
||||||
<number>) and some common sense (it's usually more important to
|
|
||||||
generate <fixnum> branches than <foobar> branches).
|
|
||||||
|
|
||||||
The rest of the cases can be handled by <boxed>-branches. We can, for
|
|
||||||
example, have a:
|
|
||||||
|
|
||||||
[<boxed>-<boxed>-bar] =
|
|
||||||
(branch ((x <boxed>) (y <boxed>))
|
|
||||||
...
|
|
||||||
([<boxed>-foo] x)
|
|
||||||
...)
|
|
||||||
|
|
||||||
[<boxed>-foo] will use an efficient type dispatch mechanism (for
|
|
||||||
example akin to the GOOPS one) to select the right branch of
|
|
||||||
`display'.
|
|
||||||
|
|
||||||
|
|
||||||
Problem 2: Ambiguous return type
|
|
||||||
|
|
||||||
If the return type of a branch is ambiguous, we simply define the
|
|
||||||
return type as <boxed>, and box data at the point in the branch where
|
|
||||||
it can be decided which type of data we will return. This is how
|
|
||||||
things can be handled in the general case. However, we might be able
|
|
||||||
to handle things in a more neat way, at least in some cases:
|
|
||||||
|
|
||||||
During compilation to byte code, we'll probably use an intermediate
|
|
||||||
representation in continuation passing style. We might even use a
|
|
||||||
subtype of branches reprented as continuations (not a heavy
|
|
||||||
representation, as in Guile and SCM, but probably not much more than a
|
|
||||||
function pointer). This is, for example, one way of handling tail
|
|
||||||
recursion, especially mutual tail recursion.
|
|
||||||
|
|
||||||
One case where we would like to try really hard not to box data is
|
|
||||||
when fixnums "overflow into" bignums.
|
|
||||||
|
|
||||||
Let's say that the branch [<fixnum>-<fixnum>-bar] contains a form
|
|
||||||
|
|
||||||
(+ x y)
|
|
||||||
|
|
||||||
where the type analyzer knows that x and y are fixnums. We then split
|
|
||||||
the branch right after the form and let it fork into two possible
|
|
||||||
continuation branches bar1 and bar2:
|
|
||||||
|
|
||||||
[The following is only pseudo code. It can be made efficient on the C
|
|
||||||
level. We can also use the asm compiler directive in conditional
|
|
||||||
compilation for GCC on i386. We could even let autoconf/automake
|
|
||||||
substitute an architecture specific solution for multiple
|
|
||||||
architectures, but still support a C level default case.]
|
|
||||||
|
|
||||||
(if (sum-over/underflow? x y)
|
|
||||||
(bar1 (fixnum->bignum x) (fixnum->bignum y) ...)
|
|
||||||
(bar2 x y ...))
|
|
||||||
|
|
||||||
bar1 begins with the evaluation of the form
|
|
||||||
|
|
||||||
([<bignum>-<bignum>-+] x y)
|
|
||||||
|
|
||||||
while bar 2 begins with
|
|
||||||
|
|
||||||
([<fixnum>-<fixnum>-+] x y)
|
|
||||||
|
|
||||||
Note that the return type of each of these forms is unambiguous.
|
|
||||||
|
|
||||||
|
|
||||||
Now some random points from the design:
|
|
||||||
|
|
||||||
* The basic concept in Ior is the class. A type is a concrete class.
|
|
||||||
Classes which are subclasses of <object> are concrete, otherwise they
|
|
||||||
are abstract.
|
|
||||||
|
|
||||||
* A procedure is a collection of methods. Each method can have
|
|
||||||
arbitrary number of parameters of arbitrary class (not type).
|
|
||||||
|
|
||||||
* The type of a method is the tuple of it's argument classes.
|
|
||||||
|
|
||||||
* The type of a procedure is the set of it's method types.
|
|
||||||
|
|
||||||
But the most important new concept is the branch.
|
|
||||||
Regard the procedure:
|
|
||||||
|
|
||||||
(define (half x)
|
|
||||||
(quotient x 2))
|
|
||||||
|
|
||||||
The procedure half will have the single method
|
|
||||||
|
|
||||||
(method ((x <top>))
|
|
||||||
(quotient x 2))
|
|
||||||
|
|
||||||
When `(half 128)' is called the Ior evaluator will create a new branch
|
|
||||||
during the actual evaluation. I'm now going to extend the branch
|
|
||||||
syntax by adding a second list of formals: the continuations of the
|
|
||||||
branch.
|
|
||||||
|
|
||||||
* The type of a branch is namely the tuple of the tuple of it's
|
|
||||||
argument types (not classes!) and the tuple of it's continuation
|
|
||||||
argument types. The branch generated above will be:
|
|
||||||
|
|
||||||
(branch ((x <fixnum>) ((c <fixnum>))
|
|
||||||
(c (quotient x 2)))
|
|
||||||
|
|
||||||
If the method
|
|
||||||
|
|
||||||
(method ((x <top>) (y <top>))
|
|
||||||
(quotient (+ x 1) y))
|
|
||||||
|
|
||||||
is called with arguments 1 and 2 it results in the branch
|
|
||||||
|
|
||||||
(branch ((x <fixnum>) (y <fixnum>)) ((c1 <fixnum>) (c2 <bignum>))
|
|
||||||
(quotient (+ x 1 c3) 2))
|
|
||||||
|
|
||||||
where c3 is:
|
|
||||||
|
|
||||||
(branch ((x <fixnum>) (y <fixnum>)) ((c <bignum>))
|
|
||||||
(quotient (+ (fixnum->bignum x) 1) 2)
|
|
||||||
|
|
||||||
The generated branches are stored in a cache in the procedure object.
|
|
||||||
|
|
||||||
|
|
||||||
But wait a minute! What about variables and data structures?
|
|
||||||
|
|
||||||
In essence, what we do is that we fork up all data paths so that they
|
|
||||||
can be typed: We put the type tags on the _data paths_ instead of on
|
|
||||||
the data itself. You can look upon the "branches" as tubes of
|
|
||||||
information where the type tag is attached to the tube instead of on
|
|
||||||
what passes through it.
|
|
||||||
|
|
||||||
Variables and data structures are part of the "tubes", so they need to
|
|
||||||
be typed. For example, the generic pair looks like:
|
|
||||||
|
|
||||||
(define-class <pair> ()
|
|
||||||
car-type
|
|
||||||
car
|
|
||||||
cdr-type
|
|
||||||
cdr)
|
|
||||||
|
|
||||||
But note that since car and cdr are generic procedures, we can let
|
|
||||||
more efficient pairs exist in parallel, like
|
|
||||||
|
|
||||||
(define-class <immutable-fixnum-list> ()
|
|
||||||
(car (class <fixnum>))
|
|
||||||
(cdr (class <immutable-fixnum-list>)))
|
|
||||||
|
|
||||||
Note that instances of this last type only takes two words of memory!
|
|
||||||
They are easy to use too. We can't use `cons' or `list' to create
|
|
||||||
them, since these procedures can't assume immutability, but we don't
|
|
||||||
need to specify the type <fixnum> in our program. Something like
|
|
||||||
|
|
||||||
(const-cons 1 x)
|
|
||||||
|
|
||||||
where x is in the data flow path tagged as <immutable-fixnum-list>, or
|
|
||||||
|
|
||||||
(const-list 1 2 3)
|
|
||||||
|
|
||||||
|
|
||||||
Some further notes:
|
|
||||||
|
|
||||||
* The concepts module and instance are the same thing. Using other
|
|
||||||
modules means 1. creating a new module class which inherits the
|
|
||||||
classes of the used modules and 2. instantiating it.
|
|
||||||
|
|
||||||
* Module definitions and class definitions are equivalent but
|
|
||||||
different syntactic sugar adapted for each kind of use.
|
|
||||||
|
|
||||||
* (define x 1) means: create an instance variable which is itself a
|
|
||||||
subclass of <boxed> with initial value 1 (which is an instance of
|
|
||||||
<fixnum>).
|
|
||||||
|
|
||||||
|
|
||||||
The interpreter is a mixture between a stack machine and a register
|
|
||||||
machine. The evaluator looks like this... :)
|
|
||||||
|
|
||||||
/* the interpreter! */
|
|
||||||
if (!setjmp (ior_context->exit_buf))
|
|
||||||
#ifndef i386_GCC
|
|
||||||
while (1)
|
|
||||||
#endif
|
|
||||||
(*ior_continue) (IOR_MICRO_OP_ARGS);
|
|
||||||
|
|
||||||
The branches are represented as an array of pointers to micro
|
|
||||||
operations. In essence, the evaluator doesn't exist in itself, but is
|
|
||||||
folded out over the entire implementation. This allows for an extreme
|
|
||||||
form of modularity!
|
|
||||||
|
|
||||||
The i386_GCC is a machine specific optimization which avoids all
|
|
||||||
unnecessary popping and pushing of the CPU stack (which is different
|
|
||||||
from the Ior data stack).
|
|
||||||
|
|
||||||
The execution environment consists of
|
|
||||||
|
|
||||||
* a continue register similar to the program counter in the CPU
|
|
||||||
* a data stack (where micro operation arguments and results are stored)
|
|
||||||
* a linked chain of environment frames (but look at exception below!)
|
|
||||||
* a dynamic context
|
|
||||||
|
|
||||||
I've written a small baby Ior which uses Guile's infrastructure.
|
|
||||||
Here's the context from that baby Ior:
|
|
||||||
|
|
||||||
typedef struct ior_context_t {
|
|
||||||
ior_data_t *env; /* rest of environment frames */
|
|
||||||
ior_cont_t save_continue; /* saves or represents continuation */
|
|
||||||
ior_data_t *save_env; /* saves or represents environment */
|
|
||||||
ior_data_t *fluids; /* array of fluids (use GC_malloc!) */
|
|
||||||
int n_fluids;
|
|
||||||
int fluids_size;
|
|
||||||
/* dynwind chain is stored directly in the environment, not in context */
|
|
||||||
jmp_buf exit_buf;
|
|
||||||
IOR_SCM guile_protected; /* temporary */
|
|
||||||
} ior_context_t;
|
|
||||||
|
|
||||||
There's an important exception regarding the lowest environment
|
|
||||||
frame. That frame isn't stored in a separate block on the heap, but
|
|
||||||
on Ior's data stack. Frames are copied out onto the heap when
|
|
||||||
necessary (for example when closures "escape").
|
|
||||||
|
|
||||||
|
|
||||||
Now a concrete example:
|
|
||||||
|
|
||||||
Look at:
|
|
||||||
|
|
||||||
(define sum
|
|
||||||
(lambda (from to res)
|
|
||||||
(if (= from to)
|
|
||||||
res
|
|
||||||
(sum (+ 1 from) to (+ from res)))))
|
|
||||||
|
|
||||||
This can be rewritten into CPS (which captures a lot of what happens
|
|
||||||
during flow analysis):
|
|
||||||
|
|
||||||
(define sum
|
|
||||||
(lambda (from to res c1)
|
|
||||||
(let ((c2 (lambda (limit?)
|
|
||||||
(let ((c3 (lambda ()
|
|
||||||
(c1 res)))
|
|
||||||
(c4 (lambda ()
|
|
||||||
(let ((c5 (lambda (from+1)
|
|
||||||
(let ((c6 (lambda (from+res)
|
|
||||||
(sum from+1 to from+res c1))))
|
|
||||||
(_+ from res c6)))))
|
|
||||||
(_+ 1 from c5)))))
|
|
||||||
(_if limit? c3 c4)))))
|
|
||||||
(_= from to c2))))
|
|
||||||
|
|
||||||
Finally, after branch expansion, some optimization, code generation,
|
|
||||||
and some optimization again, we end up with the byte code for the two
|
|
||||||
branches (here marked by labels `sum' and `sumbig'):
|
|
||||||
|
|
||||||
c5
|
|
||||||
(ref -3)
|
|
||||||
(shift -1)
|
|
||||||
(+ <fixnum> <fixnum> c4big)
|
|
||||||
;; c4
|
|
||||||
(shift -2)
|
|
||||||
(+ <fixnum> 1 sumbig)
|
|
||||||
;; c6
|
|
||||||
sum
|
|
||||||
(shift 3)
|
|
||||||
(ref2 -3)
|
|
||||||
;; c2
|
|
||||||
(if!= <fixnum> <fixnum> c5)
|
|
||||||
;; c3
|
|
||||||
(ref -1)
|
|
||||||
;; c1
|
|
||||||
(end)
|
|
||||||
|
|
||||||
c5big
|
|
||||||
(ref -3)
|
|
||||||
(shift -1)
|
|
||||||
(+ <bignum> <bignum>)
|
|
||||||
c4big
|
|
||||||
(shift -2)
|
|
||||||
(+ <bignum> 1)
|
|
||||||
;; c6
|
|
||||||
sumbig
|
|
||||||
(shift 3)
|
|
||||||
(ref2 -3)
|
|
||||||
;; c2
|
|
||||||
(= <bignum> <bignum>)
|
|
||||||
(if! c5big)
|
|
||||||
;; c3
|
|
||||||
(ref -1)
|
|
||||||
;; c1
|
|
||||||
(end)
|
|
||||||
|
|
||||||
Let's take a closer look upon the (+ <fixnum> 1 sumbig) micro
|
|
||||||
operation. The generated assembler from the Ior C source + machine
|
|
||||||
specific optimizations for i386_GCC looks like this (with some rubbish
|
|
||||||
deleted):
|
|
||||||
|
|
||||||
ior_int_int_sum_intbig:
|
|
||||||
movl 4(%ebx),%eax ; fetch arg 2
|
|
||||||
addl (%ebx),%eax ; fetch arg 1 and do the work!
|
|
||||||
jo ior_big_sum_int_int ; dispatch to other branch on overflow
|
|
||||||
movl %eax,(%ebx) ; store result in first environment frame
|
|
||||||
addl $8,%esi ; increment program counter
|
|
||||||
jmp (%esi) ; execute next opcode
|
|
||||||
|
|
||||||
ior_big_sum_int_int:
|
|
||||||
|
|
||||||
To clearify: This is output from the C compiler. I added the comments
|
|
||||||
afterwards.
|
|
||||||
|
|
||||||
The source currently looks like this:
|
|
||||||
|
|
||||||
IOR_MICRO_BRANCH_2_2 ("+", int, big, sum, int, int, 1, 0)
|
|
||||||
{
|
|
||||||
int res = IOR_ARG (int, 0) + IOR_ARG (int, 1);
|
|
||||||
IOR_JUMP_OVERFLOW (res, ior_big_sum_int_int);
|
|
||||||
IOR_NEXT2 (z);
|
|
||||||
}
|
|
||||||
|
|
||||||
where the macros allow for different definitions depending on if we
|
|
||||||
want to play pure ANSI or optimize for a certain machine/compiler.
|
|
||||||
|
|
||||||
The plan is actually to write all source in the Ior language and write
|
|
||||||
Ior code to translate the core code into bootstrapping C code.
|
|
||||||
|
|
||||||
Please note that if i386_GCC isn't defined, we run plain portable ANSI C.
|
|
||||||
|
|
||||||
|
|
||||||
Just one further note:
|
|
||||||
|
|
||||||
In Ior, there are three modes of evaluation
|
|
||||||
|
|
||||||
1. evaluating and type analyzing (these go in parallel)
|
|
||||||
2. code generation
|
|
||||||
3. executing byte codes
|
|
||||||
|
|
||||||
It is mode 3 which is really fast in Ior.
|
|
||||||
|
|
||||||
You can look upon your program as a web of branch segments where one
|
|
||||||
branch segment can be generated from fragments of many closures. Mode
|
|
||||||
switches doesn't occur at the procedure borders, but at "growth
|
|
||||||
points". I don't have time to define them here, but they are based
|
|
||||||
upon the idea that the continuation together with the type signature
|
|
||||||
of the data flow path is unique.
|
|
||||||
|
|
||||||
We normally run in mode 3. When we come to a source growth point
|
|
||||||
(essentially an apply instruction) for uncompiled code we "dive out"
|
|
||||||
of mode 3 into mode 1 which starts to eval/analyze code until we come
|
|
||||||
to a "sink". When we reach the "sink", we have enough information
|
|
||||||
about the data path to do code generation, so we backtrack to the
|
|
||||||
source growth point and grow the branch between source and sink.
|
|
||||||
Finally, we "dive into" mode 3!
|
|
||||||
|
|
||||||
So, code generation doesn't respect procedure borders. We instead get
|
|
||||||
a very neat kind of inlining, which, e.g., means that it is OK to use
|
|
||||||
closures instead of macros in many cases.
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
Ior and module system
|
|
||||||
=====================
|
|
||||||
|
|
||||||
How, exactly, should the module system of Ior look like?
|
|
||||||
|
|
||||||
There is this general issue of whether to have a single-dispatch or
|
|
||||||
multi-dispatch system. Personally, I see that Scheme already use
|
|
||||||
multi-dispatch. Compare (+ 1.0 2) and (+ 1 2.0).
|
|
||||||
|
|
||||||
As you've seen if you've read the notes about Ior design, efficiency
|
|
||||||
is not an issue here, since almost all dispatch will be eliminated
|
|
||||||
anyway.
|
|
||||||
|
|
||||||
Also, note an interesting thing: GOOPS actually has a special,
|
|
||||||
implicit, argument to all of it's methods: the lexical environment.
|
|
||||||
It would be very ugly to add a second, special, argument to this.
|
|
||||||
|
|
||||||
Of course, the theoreticians have already recognised this, and in many
|
|
||||||
systems, the implicit argument (the object) and the environment for
|
|
||||||
the method is the same thing.
|
|
||||||
|
|
||||||
I think we should especially take impressions from Matthias Blume's
|
|
||||||
module/object system.
|
|
||||||
|
|
||||||
The idea, now, for Ior (remember that everything about Ior is
|
|
||||||
negotiable between us) is that a module is a type, as well as an
|
|
||||||
instance of that type. The idea is that we basically keep the GOOPS
|
|
||||||
style of methods, with the implicit argument being the module object
|
|
||||||
(or some other lexical environment, in a chain with the module as
|
|
||||||
root).
|
|
||||||
|
|
||||||
Let's say now that module C uses modules A and B. Modules A and B
|
|
||||||
both exports the procedure `foo'. But A:foo and B:foo as different
|
|
||||||
sets of methods.
|
|
||||||
|
|
||||||
What does this mean? Well, it obviously means that the procedure
|
|
||||||
`foo' in module C is a subtype of A:foo and B:foo. Note how this is
|
|
||||||
similar in structure to slot inheritance: When class C is created with
|
|
||||||
superclasses A and B, the properties of a slot in C are created
|
|
||||||
through slot inheritance. One way of interpreting variable foo in
|
|
||||||
module A is as a slot with init value foo. Through the MOP, we can
|
|
||||||
specify that procedure slot inheritance in a module class implies
|
|
||||||
creation of new init values through inheritance.
|
|
||||||
|
|
||||||
This may look like a kludge, and perhaps it is, and, sure, we are not
|
|
||||||
going to accept any kludges in Ior. But, it might actually not be a
|
|
||||||
kludge...
|
|
||||||
|
|
||||||
I think it is commonly accepted by computer scientists that a module,
|
|
||||||
and/or at least a module interface is a type. Again, this type can be
|
|
||||||
seen as the set of types of the functions in the interface. The types
|
|
||||||
of our procedures are the set of branch types the provide. It is then
|
|
||||||
natural that a module using two other modules create new procedure
|
|
||||||
types by folding.
|
|
||||||
|
|
||||||
This thing would become less cloudy (yes, this is a cloudy part of my
|
|
||||||
reasoning; I meant previously that the interpreter itself is now
|
|
||||||
clear) if module interfaces were required to be explicitly types.
|
|
||||||
|
|
||||||
Actually, this would fit much better together with the rest of Ior's
|
|
||||||
design. On one hand, we might be free to introduce such a restriction
|
|
||||||
(compiler writers would applaud it), since R5RS hasn't specified any
|
|
||||||
module system. On the other hand, it might be strange to require
|
|
||||||
explicit typing when Scheme is fundamentally implicitly types...
|
|
||||||
|
|
||||||
We also have to consider that a module has an "inward" face, which is
|
|
||||||
one type, and possibly many "outward" faces, which are different
|
|
||||||
types. (Compare the idea of "interfaces" in Scheme48.)
|
|
||||||
|
|
||||||
It thus, seems that, while a module can truly be an Ior class, the
|
|
||||||
reverse should probably not hold in the general case...
|
|
||||||
|
|
||||||
Unless
|
|
||||||
|
|
||||||
instance <-> module proper
|
|
||||||
class of the instance <-> "inward interface"
|
|
||||||
superclasses <-> "outward interfaces + inward uses"
|
|
||||||
|
|
||||||
...hmm, is this possible to reconcile with Rees' object system?
|
|
||||||
|
|
||||||
Please think about these issues. We should try to end up with a
|
|
||||||
beautiful and consistent object/module system.
|
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
|
|
||||||
Here's a difficult problem in Ior's design:
|
|
||||||
|
|
||||||
Let's say that we have a mutable data structure, like an ordinary
|
|
||||||
list. Since, in Ior, the type tag (which is really a pointer to a
|
|
||||||
class structure) is stored separately from the data, it is thinkable
|
|
||||||
that another thread modifies the location in the list between when our
|
|
||||||
thread reads the type tag and when it reads the data.
|
|
||||||
|
|
||||||
The reading of type and data must be made atomic in some way.
|
|
||||||
Probably, some kind of locking of the heap is required. It's just
|
|
||||||
that it may cause a lot of overhead to look the heap at every *read*
|
|
||||||
from a mutable data structure.
|
|
||||||
|
|
||||||
Look how much trouble those set!-operations cause! Not only does it
|
|
||||||
force us to store type tags for each car and cdr in the list, but it
|
|
||||||
also forces a lot of explicit dispatch to be done, and causes troubles
|
|
||||||
in a threaded system...
|
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
|
||||||
|
|
||||||
Jim Blandy <jimb@red-bean.com> writes:
|
|
||||||
|
|
||||||
> We also should try to make less work for the GC, by avoiding consing
|
|
||||||
> up local environments until they're closed over.
|
|
||||||
|
|
||||||
Did the texts which I sent to you talk about Ior's solution?
|
|
||||||
|
|
||||||
It basically is: Use *two* environment "arguments" to the evaluator
|
|
||||||
(in Ior, they aren't arguments but registers):
|
|
||||||
|
|
||||||
* One argument is a pointer to the "top" of an environment stack.
|
|
||||||
This is used in the "inner loop" for very efficient access to
|
|
||||||
in-between results. The "top" segment of the environment stack is
|
|
||||||
also regarded as the first environment frame in the lexical
|
|
||||||
environment. ("top" is bottom on a stack which grows downwards)
|
|
||||||
|
|
||||||
* The other argument points to a structure holding the evaluation
|
|
||||||
context. In this context, there is a pointer to the chain of the
|
|
||||||
rest of the environment frames. Note that since frames are just
|
|
||||||
blocks of SCM values, you can very efficiently "release" a frame
|
|
||||||
into the heap by block copying it (remember that Ior uses Boehms GC;
|
|
||||||
this is how we allocate the block).
|
|
Loading…
Add table
Add a link
Reference in a new issue