mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-05-04 22:40:25 +02:00
bye bye
This commit is contained in:
parent
dffe307d60
commit
fbea34b7cc
13 changed files with 0 additions and 1759 deletions
|
@ -1,58 +0,0 @@
|
|||
2001-06-27 Thien-Thi Nguyen <ttn@revel.glug.org>
|
||||
|
||||
* README: Remove tasks.text.
|
||||
|
||||
* tasks.text: Bye bye (contents folded into ../TODO).
|
||||
|
||||
2001-05-08 Martin Grabmueller <mgrabmue@cs.tu-berlin.de>
|
||||
|
||||
* modules/module-snippets.texi: Fixed a lot of typos and clarified
|
||||
some points. Thanks to Neil for the typo+questions patch!
|
||||
|
||||
2001-05-07 Martin Grabmueller <mgrabmue@cs.tu-berlin.de>
|
||||
|
||||
* modules/module-snippets.texi: New file, documenting the module
|
||||
system. Placed in `devel' for review purposes.
|
||||
|
||||
2001-03-16 Martin Grabmueller <mgrabmue@cs.tu-berlin.de>
|
||||
|
||||
* modules: New directory.
|
||||
|
||||
* modules/module-layout.text: New file.
|
||||
|
||||
2000-08-26 Mikael Djurfeldt <mdj@linnaeus.mit.edu>
|
||||
|
||||
* strings: New directory.
|
||||
|
||||
* strings/sharedstr.text (sharedstr.text): New file.
|
||||
|
||||
2000-08-12 Mikael Djurfeldt <mdj@linnaeus.mit.edu>
|
||||
|
||||
* translate: New directory.
|
||||
|
||||
* translate/langtools.text: New file.
|
||||
|
||||
2000-05-30 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
||||
|
||||
* tasks.text: Use outline-mode. Added section for tasks in need
|
||||
of attention.
|
||||
|
||||
2000-05-29 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
||||
|
||||
* tasks.text: New file.
|
||||
|
||||
2000-05-25 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
||||
|
||||
* README: New file.
|
||||
|
||||
* build/snarf-macros.text: New file.
|
||||
|
||||
2000-05-20 Mikael Djurfeldt <mdj@mdj.nada.kth.se>
|
||||
|
||||
* policy/goals.text, policy/principles.text, policy/plans.text:
|
||||
New files.
|
||||
|
||||
2000-03-21 Mikael Djurfeldt <mdj@thalamus.nada.kth.se>
|
||||
|
||||
* policy/names.text: New file.
|
||||
|
13
devel/README
13
devel/README
|
@ -1,13 +0,0 @@
|
|||
Directories:
|
||||
|
||||
policy Guile policy documents
|
||||
|
||||
build Build/installation process
|
||||
|
||||
string Strings and characters
|
||||
|
||||
translation Language traslation
|
||||
|
||||
vm Virtual machines
|
||||
|
||||
vm/ior Mikael's ideas on a new type of Scheme interpreter
|
|
@ -1,288 +0,0 @@
|
|||
Module Layout Proposal
|
||||
======================
|
||||
|
||||
Martin Grabmueller
|
||||
<mgrabmue@cs.tu-berlin.de>
|
||||
Draft: 2001-03-11
|
||||
|
||||
Version: $Id: module-layout.text,v 1.1 2001-03-16 08:37:37 mgrabmue Exp $
|
||||
|
||||
* Table of contents
|
||||
|
||||
** Abstract
|
||||
** Overview
|
||||
*** What do we have now?
|
||||
*** What should we change?
|
||||
** Policy of module separation
|
||||
*** Functionality
|
||||
*** Standards
|
||||
*** Importance
|
||||
*** Compatibility
|
||||
** Module naming
|
||||
*** Scheme
|
||||
*** Object oriented programming
|
||||
*** Systems programming
|
||||
*** Database programming
|
||||
*** Text processing
|
||||
*** Math programming
|
||||
*** Network programming
|
||||
*** Graphics
|
||||
*** GTK+ programming
|
||||
*** X programming
|
||||
*** Games
|
||||
*** Multiple names
|
||||
*** Application modules
|
||||
** Future ideas
|
||||
|
||||
|
||||
* Abstract
|
||||
|
||||
This is a proposal for a new layout of the module name space. The
|
||||
goal is to reduce (or even eliminate) the clutter in the current ice-9
|
||||
module directory, and to provide a clean framework for splitting
|
||||
libguile into subsystems, grouped by functionality, standards
|
||||
compliance and maybe other characteristics.
|
||||
|
||||
This is not a completed policy document, but rather a collection of
|
||||
ideas and proposals which still have to be decided. I will mention by
|
||||
personal preference, where appropriate, but the final decisions are of
|
||||
course up to the maintainers.
|
||||
|
||||
|
||||
* Overview
|
||||
|
||||
Currently, new modules are added in an ad-hoc manner to the ice-9
|
||||
module name space when the need for them arises. I think that was
|
||||
mainly because no other directory for installed Scheme modules was
|
||||
created. With the integration of GOOPS, the new top-level module
|
||||
directory oop was introduced, and we should follow this practice for
|
||||
other subsystems which share functionality.
|
||||
|
||||
DISCLAIMER: Please note that I am no expert on Guile's module system,
|
||||
so be patient with me and correct me where I got anything wrong.
|
||||
|
||||
** What do we have now?
|
||||
|
||||
The module (oop goops) contains all functionality needed for
|
||||
object-oriented programming with Guile (with a few exceptions in the
|
||||
evaluator, which is clearly needed for performance).
|
||||
|
||||
Except for the procedures in the module (ice-9 rdelim), all Guile
|
||||
primitives are currently located in the root module (I think it is the
|
||||
module (guile)), and some procedures defined in `boot-9.scm' are
|
||||
installed in the module (guile-user).
|
||||
|
||||
** What should we change?
|
||||
|
||||
In the core, there are a lot of primitive procedures which can cleanly
|
||||
be grouped into subsystems, and then grouped into modules. That would
|
||||
make the core library more maintainable, would ease seperate testing
|
||||
of subsystems and clean up dependencies between subsystems.
|
||||
|
||||
|
||||
* Policy of module separation
|
||||
|
||||
There are several possibilities to group procedures into modules.
|
||||
|
||||
- They could be grouped by functionality.
|
||||
- They could be grouped by standards compliance.
|
||||
- They could be grouped by level of importance.
|
||||
|
||||
One important group of modules should of course be provided
|
||||
additionally:
|
||||
|
||||
- Compatibility modules.
|
||||
|
||||
So the first thing to decide is: Which of these policies should we
|
||||
adopt? Personally, I think that it is not possible to cleanly use
|
||||
exactly one of the possibilities, we will probably use a mixture of
|
||||
them. I propose to group by functionality, and maybe use some
|
||||
`bridge-modules', which make functionality available when the user
|
||||
requests the modules for a given standard.
|
||||
|
||||
** Functionality
|
||||
|
||||
Candidates for the first possibility are groups of procedures, which
|
||||
already are grouped in source files, such as
|
||||
|
||||
- Regular expression procedures.
|
||||
- Network procedures.
|
||||
- Systems programming procedures.
|
||||
- Random number procedures.
|
||||
- Math/numeric procedures.
|
||||
- String-processing procedures.
|
||||
- List-processing procedures.
|
||||
- Character handling procedures.
|
||||
- Object-oriented programming support.
|
||||
|
||||
** Standards
|
||||
|
||||
Guile now complies to R5RS, and I think that the procedures required
|
||||
by this standards should always be available to the programmer.
|
||||
People who do not want them, could always create :pure modules when
|
||||
they need it.
|
||||
|
||||
On the other hand, the SRFI procedures fit nicely into a `group by
|
||||
standards' scheme. An example which is already provided, is the
|
||||
SRFI-8 syntax `receive'. Following that, we could provide two modules
|
||||
for each SRFI, one named after the SRFI (like `srfi-8') and one named
|
||||
after the main functionality (`receive').
|
||||
|
||||
** Importance
|
||||
|
||||
By importance, I mean `how important are procedures for the average
|
||||
Guile user'. That means that procedures which are only useful to a
|
||||
small group of users (the Guile developers, for example) should not be
|
||||
immediately available at the REPL, so that they not confuse the user
|
||||
when thay appear in the `apropos' output or the tab completion.
|
||||
|
||||
A good example would be debugging procedures (which also could be
|
||||
added with a special command-line option), or low-level system calls.
|
||||
|
||||
** Compatibility
|
||||
|
||||
This group is for modules providing compatibility procedures. An
|
||||
example would be a module for old string-processing procedures, which
|
||||
could someday get overridden by incompatible SRFI procedures of the
|
||||
same name.
|
||||
|
||||
|
||||
* Module naming
|
||||
|
||||
Provided we choose to take the `group by functionality' approach, I
|
||||
propose the following naming hierarchy (some of them were actually
|
||||
suggested by Mikael Djurfeldt).
|
||||
|
||||
- Schame language related in (scheme)
|
||||
- Object oriented programming in (oop)
|
||||
- Systems programming in (system)
|
||||
- Database programming in (database)
|
||||
- Text processing in (text)
|
||||
- Math/numeric programming in (math)
|
||||
- Network programming in (network)
|
||||
- Graphics programming in (graphics)
|
||||
- GTK+ programming in (gtk)
|
||||
- X programming in (xlib)
|
||||
- Games in (games)
|
||||
|
||||
The layout of sub-hierarchies is up to the writers of modules, we
|
||||
should not enforce a strict policy here, because we can not imagine
|
||||
what will happen in this area.
|
||||
|
||||
** Scheme
|
||||
|
||||
(scheme r5rs) Complete R5RS procedures set.
|
||||
(scheme safe) Safe modules.
|
||||
(scheme srfi-1) List processing.
|
||||
(scheme srfi-8) Multiple valuas via `receive'.
|
||||
(scheme receive) dito.
|
||||
(scheme and-let-star) and-let*
|
||||
(scheme syncase) syntax-case hygienic macros (maybe included in
|
||||
(scheme r5rs?).
|
||||
(scheme slib) SLIB, for historic reasons in (scheme).
|
||||
|
||||
** Object oriented programming
|
||||
|
||||
Examples in this section are
|
||||
(oop goops) For GOOPS.
|
||||
(oop goops ...) For lower-level GOOPS functionality and utilities.
|
||||
|
||||
** Systems programming
|
||||
|
||||
(system shell) Shell utilities (glob, system etc).
|
||||
(system process) Process handling.
|
||||
(system file-system) Low-level filesystem support.
|
||||
(system user) getuid, setpgrp, etc.
|
||||
|
||||
_or_
|
||||
|
||||
(system posix) All posix procedures.
|
||||
|
||||
** Database programming
|
||||
|
||||
In the database section, there should be sub-module hierarchies for
|
||||
each supported database which contains the low-level code, and a
|
||||
common database layer, which should unify access to SQL databases via a single interface a la Perl's DBMI.
|
||||
|
||||
(database postgres ...) Low-level database functionality.
|
||||
(database oracle ...) ...
|
||||
(database mysql ...) ...
|
||||
(database msql ...) ...
|
||||
(database sql) Common SQL accessors.
|
||||
(database gdbm ...) ...
|
||||
(database hashed) Common hashed database accessors (like gdbm).
|
||||
(database util) Leftovers.
|
||||
|
||||
** Text processing
|
||||
|
||||
(text rdelim) Line oriented in-/output.
|
||||
(text util) Mangling text files.
|
||||
|
||||
** Math programming
|
||||
|
||||
(math random) Random numbers.
|
||||
(math primes) Prime numbers.
|
||||
(math vector) Vector math.
|
||||
(math algebra) Algebra.
|
||||
(math analysis) Analysis.
|
||||
(math util) Leftovers.
|
||||
|
||||
** Network programming
|
||||
|
||||
(network inet) Internet procedures.
|
||||
(network socket) Socket interface.
|
||||
(network db) Network database accessors.
|
||||
(network util) ntohl, htons and friends.
|
||||
|
||||
** Graphics
|
||||
|
||||
(graphics vector) Generalized vector graphics handling.
|
||||
(graphics vector vrml) VRML parsers etc.
|
||||
(graphisc bitmap) Generalized bitmap handling.
|
||||
(graphics bitmap ...) Bitmap format handling (TIFF, PNG, etc.).
|
||||
|
||||
** GTK+ programming
|
||||
|
||||
(gtk gtk) GTK+ procedures.
|
||||
(gtk gdk) GDK procedures.
|
||||
(gtk threads) gtktreads.
|
||||
|
||||
** X programming
|
||||
|
||||
(xlib xlib) Low-level XLib programming.
|
||||
|
||||
** Games
|
||||
|
||||
(games robots) GNU robots.
|
||||
|
||||
** Multiple names
|
||||
|
||||
As already mentioned above, I think that some modules should have
|
||||
several names, to make it easier for the user to get the functionality
|
||||
she needs. For example, a user could say: `hey, I need the receive
|
||||
macro', or she could say: `I want to stick to SRFI syntax, so where
|
||||
the hell is the module for SRFI-8?!?'.
|
||||
|
||||
** Application modules
|
||||
|
||||
We should not enforce policy on applications. So I propose that
|
||||
application writers should be advised to place modules either in
|
||||
application-specific directories $PREFIX/share/$APP/guile/... and name
|
||||
that however they like, or to use the application's name as the first
|
||||
part of the module name, e.g (gnucash import), (scwm background),
|
||||
(rcalc ui).
|
||||
|
||||
* Future ideas
|
||||
|
||||
I have not yet come up with a good idea for grouping modules, which
|
||||
deal for example with XML processing. They would fit into the (text)
|
||||
module space, because most XML files contain text data, but they would
|
||||
also fit into (database), because XML files are essentially databases.
|
||||
|
||||
On the other hand, XML processing is such a large field that it
|
||||
probably is worth it's own top-level name space (xml).
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: outline
|
||||
End:
|
|
@ -1,143 +0,0 @@
|
|||
Implementation of shared substrings with fresh-copy semantics
|
||||
=============================================================
|
||||
|
||||
Version: $Id: sharedstr.text,v 1.1 2000-08-26 20:55:21 mdj Exp $
|
||||
|
||||
Background
|
||||
----------
|
||||
|
||||
In Guile, most string operations work on two other data types apart
|
||||
from strings: shared substrings and read-only strings (which includes
|
||||
symbols). One of Guile's sub-goals is to be a scripting language in
|
||||
which string management is important. Read-only strings and shared
|
||||
substrings were introduced in order to reduce overhead in string
|
||||
manipulation.
|
||||
|
||||
We now want to simplify the Guile API by removing these two data
|
||||
types, but keeping performance by allowing ordinary strings to share
|
||||
storage.
|
||||
|
||||
The idea is to let operations like `symbol->string' and `substring'
|
||||
return a pointer into the original string/symbol, thus avoiding the
|
||||
need to copy the string.
|
||||
|
||||
Two of the problems which then arise are:
|
||||
|
||||
* If s2 is derived from s1, and therefore share storage with s1, a
|
||||
modification to either s1 or s2 will affect the other.
|
||||
|
||||
* Guile is supposed to interact closely with many UNIX libraries in
|
||||
which the NUL character is used to terminate strings. Therefore
|
||||
Guile strings contain a NUL character at the end, in addition to the
|
||||
string length (the latter of which is used by Guile's string
|
||||
operations).
|
||||
|
||||
The solutions to these problems are to
|
||||
|
||||
* Copy a string with shared storage when it's modified.
|
||||
|
||||
* Copy a string with shared storage when it's being used as argument
|
||||
to a C library call. (Copying implies inserting an ending NUL
|
||||
character.)
|
||||
|
||||
But this leads to memory management problems. When is it OK to free
|
||||
a character array which was allocated for a symbol or a string?
|
||||
|
||||
Abstract description of proposed solution
|
||||
-----------------------------------------
|
||||
|
||||
Definitions
|
||||
|
||||
STRING = <TYPETAG, LENGTH, CHARRECORDPTR, CHARPTR>
|
||||
|
||||
SYMBOL = <TYPETAG, LENGTH, CHARRECORDPTR, CHARPTR>
|
||||
|
||||
CHARRECORD = <PHASE, SHAREDFLAG, CHARS>
|
||||
|
||||
PHASE = black | white
|
||||
|
||||
SHAREDFLAG = private | shared
|
||||
|
||||
CHARS is a character array
|
||||
|
||||
CHARPTR points into it
|
||||
|
||||
Memory management
|
||||
|
||||
A string or symbol is initially allocated with its contents stored in
|
||||
a character array in a character record. The string/symbol header
|
||||
contains a pointer to this record. The initial value of the shared
|
||||
flag in the character record is `private'.
|
||||
|
||||
The GC mark phases alternate between black and white---every second
|
||||
phase is black, the rest are white. This is used to distinguish
|
||||
whether a character record has been encountered before:
|
||||
|
||||
During a black mark phase, when the GC encounters a string or symbol,
|
||||
it changes the PHASE and SHAREDFLAG marks of the corresponding
|
||||
character record according to the following table:
|
||||
|
||||
<white, private> --> <black, private> (white => unconditionally
|
||||
<white, shared> --> <black, private> set to <black, private>)
|
||||
<black, private> --> <black, shared> (SHAREDFLAG changed)
|
||||
<black, shared> --> <black, shared> (no change)
|
||||
|
||||
The behaviour of a white phase is quivalent with the color names
|
||||
switched.
|
||||
|
||||
The GC sweep phase frees any unmarked string or symbol header and
|
||||
frees its character record either if it is marked with the "wrong"
|
||||
color (not matching the color of the last mark phase) or if its
|
||||
SHAREDFLAG is `private'.
|
||||
|
||||
Copy-on-write
|
||||
|
||||
An attempt at mutating string contents leads to copying if SHAREDFLAG
|
||||
is `shared'. Copying means making a copy of the character record and
|
||||
mutating the CHARRECORDPTR and CHARPTR fields of the object header to
|
||||
point to the copy.
|
||||
|
||||
Substring operation
|
||||
|
||||
When making a substring, a new string header is allocated, with new
|
||||
contents for the LENGTH and CHARPTR fields.
|
||||
|
||||
Implementation details
|
||||
----------------------
|
||||
|
||||
* We store the character record consecutively with the character
|
||||
array and lump the PHASE and SHAREDFLAG fields together into one
|
||||
byte containing an integer code for the four possible states of the
|
||||
PHASE and SHAREDFLAG fields. Another way of viewing it is that
|
||||
these fields are represented as bits 1 and 0 in the "header" of the
|
||||
character array. We let CHARRECORDPTR point to the first character
|
||||
position instead of on this header:
|
||||
|
||||
CHARRECORDPTR
|
||||
|
|
||||
V
|
||||
FCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
|
||||
|
||||
F = 0, 1, 2, 3
|
||||
|
||||
* We represent strings as the sub-types `simple-string' and
|
||||
`substring'.
|
||||
|
||||
* In a simple string, CHARRECORDPTR and CHARPTR are represented by a
|
||||
single pointer, so a `simple-string' is an ordinary heap cell with
|
||||
TYPETAG and LENGTH in the CAR and CHARPTR in the CDR.
|
||||
|
||||
* substring:s are represented as double cells, with TYPETAG and LENGTH
|
||||
in word 0, CHARRECORDPTR in word 1 and CHARPTR in word 2
|
||||
(alternatively, we could store an offset from CHARRECORDPTR).
|
||||
|
||||
Problems with this implementation
|
||||
---------------------------------
|
||||
|
||||
* How do we make copy-on-write thread-safe? Is there a different
|
||||
implementation which is efficient and thread-safe?
|
||||
|
||||
* If small substrings are frequently generated from large, temporary
|
||||
strings and the small substrings are kept in a data structure, the
|
||||
heap will still have to host the large original strings. Should we
|
||||
simply accept this?
|
|
@ -1,592 +0,0 @@
|
|||
* Introduction
|
||||
|
||||
Version: $Id: langtools.text,v 1.5 2000-08-13 04:47:26 mdj Exp $
|
||||
|
||||
This is a proposal for how Guile could interface with language
|
||||
translators. It will be posted on the Guile list and revised for some
|
||||
short time (days rather than weeks) before being implemented.
|
||||
|
||||
The document can be found in the CVS repository as
|
||||
guile-core/devel/translation/langtools.text. All Guile developers are
|
||||
welcome to modify and extend it according to the ongoing discussion
|
||||
using CVS.
|
||||
|
||||
Ideas and comments are welcome.
|
||||
|
||||
For clarity, the proposal is partially written as if describing an
|
||||
already existing system.
|
||||
|
||||
MDJ 000812 <djurfeldt@nada.kth.se>
|
||||
|
||||
* Language names
|
||||
|
||||
A translator for Guile is a certain kind of Guile module, implemented
|
||||
in Scheme, C, or a mixture of both.
|
||||
|
||||
To make things simple, the name of the language is closely related to
|
||||
the name of the translator module.
|
||||
|
||||
Languages have long and short names. The long form is simply the name
|
||||
of the translator module: `(lang ctax)', `(lang emacs-lisp)',
|
||||
`(my-modules foo-lang)' etc.
|
||||
|
||||
Languages with the long name `(lang IDENTIFIER)' can be referred to
|
||||
with the short name IDENTIFIER, for example `emacs-lisp'.
|
||||
|
||||
* How to tell Guile to read code in a different language (than Scheme)
|
||||
|
||||
There are four methods of specifying which translator to use when
|
||||
reading a file:
|
||||
|
||||
** Command option
|
||||
|
||||
The options to the guile command are parsed linearly from left to
|
||||
right. You can change the language at zero or more points using the
|
||||
option
|
||||
|
||||
-t, --language LANGUAGE
|
||||
|
||||
Example:
|
||||
|
||||
guile -t emacs-lisp -l foo -l bar -t scheme -l baz
|
||||
|
||||
will use the emacs-lisp translator while reading "foo" and "bar", and
|
||||
the default translator (scheme) for "baz".
|
||||
|
||||
You can use this technique in a script together with the meta switch:
|
||||
|
||||
#!/usr/local/bin/guile \
|
||||
-t emacs-lisp -s
|
||||
!#
|
||||
|
||||
** Commentary in file
|
||||
|
||||
When opening a file for reading, Guile will read the first few lines,
|
||||
looking for the string "-*- LANGNAME -*-", where LANGNAME can be
|
||||
either the long or short form of the name.
|
||||
|
||||
If found, the corresponding translator is loaded and used to read the
|
||||
file.
|
||||
|
||||
** File extension
|
||||
|
||||
Guile maintains an alist mapping filename extensions to languages.
|
||||
Each entry has the form:
|
||||
|
||||
(REGEXP . LANGNAME)
|
||||
|
||||
where REGEXP is a string and LANGNAME a symbol or a list of symbols.
|
||||
|
||||
The alist can be accessed using `language-alist' which is exported
|
||||
by the module `(core config)':
|
||||
|
||||
(language-alist) --> current alist
|
||||
(language-alist ALIST) sets the alist to ALIST
|
||||
(language-alist ALIST :prepend) prepends ALIST onto the current list
|
||||
(language-alist ALIST :append) appends ALIST after current list
|
||||
|
||||
The `load' command will match filenames against this alist and choose
|
||||
the translator to use accordingly.
|
||||
|
||||
There will be a default alist for common translators. For translators
|
||||
not listed, the alist has to be extended in .guile just as Emacs users
|
||||
extend auto-mode-alist in .emacs.
|
||||
|
||||
** Module header
|
||||
|
||||
You specify the language used by a module with the :language option in
|
||||
the module header. (See below under "Module configuration language".)
|
||||
|
||||
* Module system
|
||||
|
||||
This section describes how the Guile module system is adapted to use
|
||||
with other languages.
|
||||
|
||||
** Module configuration language
|
||||
|
||||
*** The `(config)' module
|
||||
|
||||
Guile has a sophisticated module system. We don't require each
|
||||
translator implementation to implement its own syntax for modules.
|
||||
That would be too much work for the implementor, and users would have
|
||||
to learn the module system anew for each syntax.
|
||||
|
||||
Instead, the module `(config)' exports the module header form
|
||||
`(define-module ...)'.
|
||||
|
||||
The config module also exports a number of primitives by which you can
|
||||
customize the Guile library, such as `language-alist' and `load-path'.
|
||||
|
||||
*** Default module environment
|
||||
|
||||
The bindings of the config module is available in the default
|
||||
interaction environment when Guile starts up. This is because the
|
||||
config module is on the module use list for the startup environment.
|
||||
|
||||
However, config bindings are *not* available by default in new
|
||||
modules.
|
||||
|
||||
The default module environment provides bindings from the R5RS module
|
||||
only.
|
||||
|
||||
*** Module headers
|
||||
|
||||
The module header of the current module system is the form
|
||||
|
||||
(define-module NAME OPTION1 ...)
|
||||
|
||||
You can specify a translator using the option
|
||||
|
||||
:language LANGNAME
|
||||
|
||||
where LANGNAME is the long or short form of language name as described
|
||||
above.
|
||||
|
||||
The translator is being fed characters from the module file, starting
|
||||
immediately after the end-parenthesis of the module header form.
|
||||
|
||||
NOTE: There can be only one module header per file.
|
||||
|
||||
It is also possible to put the module header in a separate file and
|
||||
use the option
|
||||
|
||||
:file FILENAME
|
||||
|
||||
to point out a file containing the actual code.
|
||||
|
||||
Example:
|
||||
|
||||
foo.gm:
|
||||
----------------------------------------------------------------------
|
||||
(define-module (foo)
|
||||
:language emacs-lisp
|
||||
:file "foo.el"
|
||||
:export (foo bar)
|
||||
)
|
||||
----------------------------------------------------------------------
|
||||
|
||||
foo.el:
|
||||
----------------------------------------------------------------------
|
||||
(defun foo ()
|
||||
...)
|
||||
|
||||
(defun bar ()
|
||||
...)
|
||||
----------------------------------------------------------------------
|
||||
|
||||
** Repl commands
|
||||
|
||||
Up till now, Guile has been dependent upon the available bindings in
|
||||
the selected module in order to do basic operations such as moving to
|
||||
a different module, enter the debugger or getting documentation.
|
||||
|
||||
This is not acceptable since we want be able to control Guile
|
||||
consistently regardless of in which module we are, and sinc we don't
|
||||
want to equip a module with bindings which don't have anything to do
|
||||
with the purpose of the module.
|
||||
|
||||
Therefore, the repl provides a special command language on top of
|
||||
whatever syntax the current module provides. (Scheme48 and RScheme
|
||||
provides similar repl command languages.)
|
||||
|
||||
[Jost Boekemeier has suggested the following alternative solution:
|
||||
Commands are bindings just like any other binding. It is enough if
|
||||
some modules carry command bindings (it's in fact enough if *one*
|
||||
module has them), because from such a module you can use the command
|
||||
(in MODULE) to walk into a module not carrying command bindings, and
|
||||
then use CTRL-D to exit.
|
||||
|
||||
However, this has the disadvantage of mixing the "real" bindings with
|
||||
command bindings (the module might want to use "in" for other
|
||||
purposes), that CTRL-D could cause problems since for some channels
|
||||
CTRL-D might close down the connection, and that using one type of
|
||||
command ("in") to go "into" the module and another (CTRL-D) to "exit"
|
||||
is more complex than simply "going to" a module.]
|
||||
|
||||
*** Repl command syntax
|
||||
|
||||
Normally, repl commands have the syntax
|
||||
|
||||
,COMMAND ARG1 ...
|
||||
|
||||
Input starting with arbitrary amount of whitespace + a comma thus
|
||||
works as an escape syntax.
|
||||
|
||||
This syntax is probably compatible with all languages. (Note that we
|
||||
don't need to activate the lexer of the language until we've checked
|
||||
if the first non-whitespace char is a comma.)
|
||||
|
||||
(Hypothetically, if this would become a problem, we can provide means
|
||||
of disabling this behaviour of the repl and let that particular
|
||||
language module take sole control of reading at the repl prompt.)
|
||||
|
||||
Among the commands available are
|
||||
|
||||
*** ,in MODULE
|
||||
|
||||
Select module named MODULE, that is any new expressions typed by the
|
||||
user after this command will be evaluated in the evaluation
|
||||
environment provided by MODULE.
|
||||
|
||||
*** ,in MODULE EXPR
|
||||
|
||||
Evaluate expression EXPR in MODULE. EXPR has the syntax supplied by
|
||||
the language used by MODULE.
|
||||
|
||||
*** ,use MODULE
|
||||
|
||||
Import all bindings exported by MODULE to the current module.
|
||||
|
||||
* Language modules
|
||||
|
||||
Since code written in any kind of language should be able to implement
|
||||
most tasks, which may include reading, evaluating and writing, and
|
||||
generally computing with, expressions and data originating from other
|
||||
languages, we want the basic reading, evaluation and printing
|
||||
operations to be independent of the language.
|
||||
|
||||
That is, instead of supplying separate `read', `eval' and `write'
|
||||
procedures for different languages, a language module is required to
|
||||
use the system procedures in the translated code.
|
||||
|
||||
This means that the behaviour of `read', `eval' and `write' are
|
||||
context dependent. (See further "How Guile system procedures `read',
|
||||
`eval', `write' use language modules" below.)
|
||||
|
||||
** Language data types
|
||||
|
||||
Each language module should try to use the fundamental Scheme data
|
||||
types as far as this is possible.
|
||||
|
||||
Some data types have important differences in semantics between
|
||||
languages, though, and all required data types may not exist in
|
||||
Guile.
|
||||
|
||||
In such cases, the language module must supply its own, distinct, data
|
||||
types. So, each language supported by Guile uses a certain set of
|
||||
data types, with the basic Scheme data types as the intersection
|
||||
between all sets.
|
||||
|
||||
Specifically, syntax trees representing source code expressions should
|
||||
normally be a distinct data type.
|
||||
|
||||
** Foreign language escape syntax
|
||||
|
||||
Note that such data can flow freely between modules. In order to
|
||||
accomodate data with different native syntaxes, each language module
|
||||
provides a foreign language escape syntax. In Scheme, this syntax
|
||||
uses the sharp comma extension specified by SRFI-10. The read
|
||||
constructor is simply the last symbol in the long language name (which
|
||||
is usually the same as the short language name).
|
||||
|
||||
** Example 1
|
||||
|
||||
Characters have the syntax in Scheme and in ctax. Lists currently
|
||||
have syntax in Scheme but lack ctax syntax. Ctax doesn't have a
|
||||
datatype "enum", but we pretend it has for this example.
|
||||
|
||||
The following table now shows the syntax used for reading and writing
|
||||
these expressions in module A using the language scheme, and module B
|
||||
using the language ctax (we assume that the foreign language escape
|
||||
syntax in ctax is #LANGUAGE EXPR):
|
||||
|
||||
A B
|
||||
|
||||
chars #\X 'X'
|
||||
|
||||
lists (1 2 3) #scheme (1 2 3)
|
||||
|
||||
enums #,(ctax ENUM) ENUM
|
||||
|
||||
** Example 2
|
||||
|
||||
A user is typing expressions in a ctax module which imports the
|
||||
bindings x and y from the module `(foo)':
|
||||
|
||||
ctax> x = read ();
|
||||
1+2;
|
||||
1+2;
|
||||
ctax> x
|
||||
1+2;
|
||||
ctax> y = 1;
|
||||
1
|
||||
ctax> y;
|
||||
1
|
||||
ctax> ,in (guile-user)
|
||||
guile> ,use (foo)
|
||||
guile> x
|
||||
#,(ctax 1+2;)
|
||||
guile> y
|
||||
1
|
||||
guile>
|
||||
|
||||
The example shows that ctax uses a distinct representation for ctax
|
||||
expressions, but Scheme integers for integers.
|
||||
|
||||
** Language module interface
|
||||
|
||||
A language module is an ordinary Guile module importing bindings from
|
||||
other modules and exporting bindings through its public interface.
|
||||
|
||||
It is required to export the following variable and procedures:
|
||||
|
||||
*** language-environment --> ENVIRONMENT
|
||||
|
||||
Returns a fresh top-level ENVIRONMENT (a module) where expressions
|
||||
in this language are evaluated by default.
|
||||
|
||||
Modules using this language will by default have this environment
|
||||
on their use list.
|
||||
|
||||
The intention is for this procedure to provide the "run-time
|
||||
environment" for the language.
|
||||
|
||||
*** native-read PORT --> OBJECT
|
||||
|
||||
Read next expression in the foreign syntax from PORT and return an
|
||||
object OBJECT representing it.
|
||||
|
||||
It is entirely up to the language module to define what one
|
||||
expression is, that is, how much to read.
|
||||
|
||||
In lisp-like languages, `native-read' corresponds to `read'. Note
|
||||
that in such languages, OBJECT need not be source code, but could
|
||||
be data.
|
||||
|
||||
The representation of OBJECT is also chosen by the language
|
||||
module. It can consist of Scheme data types, data types distinct for
|
||||
the language, or a mixture.
|
||||
|
||||
There is one requirement, however: Distinct data types must be
|
||||
instances of a subclass of `language-specific-class'.
|
||||
|
||||
This procedure will be called during interactive use (the user
|
||||
types expressions at a prompt) and when the system `read'
|
||||
procedure is called at a time when a module using this language is
|
||||
selected.
|
||||
|
||||
Some languages (for example Python) parse differently depending if
|
||||
its an interactive or non-interactive session. Guile prvides the
|
||||
predicate `interactive-port?' to test for this.
|
||||
|
||||
*** language-specific-class
|
||||
|
||||
This variable contains the superclass of all non-Scheme data-types
|
||||
provided by the language.
|
||||
|
||||
*** native-write OBJECT PORT
|
||||
|
||||
This procedure prints the OBJECT on PORT using the specific
|
||||
language syntax.
|
||||
|
||||
*** write-foreign-syntax OBJECT LANGUAGE NATIVE-WRITE PORT
|
||||
|
||||
Write OBJECT in the foreign language escape syntax of this module.
|
||||
The object is specific to language LANGUAGE and can be written using
|
||||
NATIVE-WRITE.
|
||||
|
||||
Here's an implementation for Scheme:
|
||||
|
||||
(define (write-foreign-syntax object language native-write port)
|
||||
(format port "#(~A " language))
|
||||
(native-write object port)
|
||||
(display #\) port)
|
||||
|
||||
*** translate EXPRESSION --> SCHEMECODE
|
||||
|
||||
Translate an EXPRESSION into SCHEMECODE.
|
||||
|
||||
EXPRESSION can be anything returned by `read'.
|
||||
|
||||
SCHEMECODE is Scheme source code represented using ordinary Scheme
|
||||
data. It will be passed to `eval' in an environment containing
|
||||
bindings in the environment returned by `language-environment'.
|
||||
|
||||
This procedure will be called duing interactive use and when the
|
||||
system `eval
|
||||
|
||||
*** translate-all PORT [ALIST] --> THUNK
|
||||
|
||||
Translate the entire stream of characters PORT until #<eof>.
|
||||
Return a THUNK which can be called repeatedly like this:
|
||||
|
||||
THUNK --> SCHEMECODE
|
||||
|
||||
Each call will yield a new piece of scheme code. The THUNK signals
|
||||
end of translation by returning the value *end-of-translation* (which
|
||||
is tested using the predicate `end-of-translation?').
|
||||
|
||||
The optional argument ALIST provides compilation options for the
|
||||
translator:
|
||||
|
||||
(debug . #t) means produce code suitable for debugging
|
||||
|
||||
This procedure will be called by the system `load' command and by
|
||||
the module system when loading files.
|
||||
|
||||
The intensions are:
|
||||
|
||||
1. To let the language module decide when and in how large chunks
|
||||
to do the processing. It may choose to do all processing at
|
||||
the time translate-all is called, all processing when THUNK is
|
||||
called the first time, or small pieces of processing each time
|
||||
THUNK is called, or any conceivable combination.
|
||||
|
||||
2. To let the language module decide in how large chunks to output
|
||||
the resulting Scheme code in order not to overload memory.
|
||||
|
||||
3. To enable the language module to use temporary files, and
|
||||
whole-module analysis and optimization techniques.
|
||||
|
||||
*** untranslate SCHEMECODE --> EXPRESSION
|
||||
|
||||
Attempt to do the inverse of `translate'. An approximation is OK. It
|
||||
is also OK to return #f. This procedure will be called from the
|
||||
debugger, when generating error messages, backtraces etc.
|
||||
|
||||
The debugger uses the local evaluation environment to determine from
|
||||
which module an expression come. This is how the debugger can know
|
||||
which `untranslate' procedure to call for a given expression.
|
||||
|
||||
(This is used currently to decide whether which backtrace frames to
|
||||
display. System modules use the option :no-backtrace to prevent
|
||||
displaying of Guile's internals to the user.)
|
||||
|
||||
Note that `untranslate' can use source-properties set by `native-read'
|
||||
to give hints about how to do the reverse translation. Such hints
|
||||
could for example be the filename, and line and column numbers for the
|
||||
source expression, or an actual copy of the source expression.
|
||||
|
||||
** How Guile system procedures `read', `eval', `write' use language modules
|
||||
|
||||
*** read
|
||||
|
||||
The idea is that the `read' exported from the R5RS library will
|
||||
continue work when called from other languages, and will keep its
|
||||
semantics.
|
||||
|
||||
A call to `read' simply means "read in an expression from PORT using
|
||||
the syntax associated with that port".
|
||||
|
||||
Each module carries information about its language.
|
||||
|
||||
When an input port is created for a module to be read or during
|
||||
interaction with a given module, this information is copied to the
|
||||
port object.
|
||||
|
||||
read uses this information to call `native-read' in the correct
|
||||
language module.
|
||||
|
||||
*** eval
|
||||
|
||||
[To be written.]
|
||||
|
||||
*** write
|
||||
|
||||
[To be written.]
|
||||
|
||||
* Error handling
|
||||
|
||||
** Errors during translation
|
||||
|
||||
Errors during translation are generated as usual by calling scm-error
|
||||
(from Scheme) or scm_misc_error etc (from C). The effect of
|
||||
throwing errors from within `translate-all' is the same as when they
|
||||
are generated within a call to the THUNK returned from
|
||||
`translate-all'.
|
||||
|
||||
scm-error takes a fifth argument. This is a property list (alist)
|
||||
which you can use to pass extra information to the error reporting
|
||||
machinery.
|
||||
|
||||
Currently, the following properties are supported:
|
||||
|
||||
filename filename of file being translated
|
||||
line line number of errring expression
|
||||
column column number
|
||||
|
||||
** Run-time errors (errors in SCHEMECODE)
|
||||
|
||||
This section pertains to what happens when a run-time error occurs
|
||||
during evaluation of the translated code.
|
||||
|
||||
In order to get "foreign code" in error messages, make sure that
|
||||
`untranslate' yields good output. Note the possibility of maintaining
|
||||
a table (preferably using weak references) mapping SCHEMECODE to
|
||||
EXPRESSION.
|
||||
|
||||
Note the availability of source-properties for attaching filename,
|
||||
line and column number, and other, information, such as EXPRESSION, to
|
||||
SCHEMECODE. If filename, line, and, column properties are defined,
|
||||
they will be automatically used by the error reporting machinery.
|
||||
|
||||
* Proposed changes to Guile
|
||||
|
||||
** Implement the above proposal.
|
||||
|
||||
** Add new field `reader' and `translator' to all module objects
|
||||
|
||||
Make sure they are initialized when a language is specified.
|
||||
|
||||
** Use `untranslate' during error handling.
|
||||
|
||||
** Implement the use of arg 5 to scm-error
|
||||
|
||||
(specified in "Errors during translation")
|
||||
|
||||
** Implement a generic lexical analyzer with interface similar to read/rp
|
||||
|
||||
Mikael is working on this. (It might take a few days, since he is
|
||||
busy with his studies right now.)
|
||||
|
||||
** Remove scm:eval-transformer
|
||||
|
||||
This is replaced by new fields in each module object (environment).
|
||||
|
||||
`eval' will instead directly the `transformer' field in the module
|
||||
passed as second arg.
|
||||
|
||||
Internal evaluation will, similarly, use the transformer of the module
|
||||
representing the top-level of the local environment.
|
||||
|
||||
Note that this level of transformation is something independent of
|
||||
language translation. *This* is a hook for adding Scheme macro
|
||||
packages and belong to the core language.
|
||||
|
||||
We also need to check the new `translator' field, potentially using
|
||||
it.
|
||||
|
||||
** Package local environments as smobs
|
||||
|
||||
so that environment list structures can't leak out on the Scheme
|
||||
level. (This has already been done in SCM.)
|
||||
|
||||
** Introduce new fields in input ports
|
||||
|
||||
These carries state information such as
|
||||
|
||||
*** which keyword syntax to support
|
||||
|
||||
*** whether to be case sensitive or not
|
||||
|
||||
*** which lexical grammar to use
|
||||
|
||||
*** whether the port is used in an interactive session or not
|
||||
|
||||
There will be a new Guile primitive `interactive-port?' testing for this.
|
||||
|
||||
** Move configuration of keyword syntax and case sensitivity to the read-state
|
||||
|
||||
Add new fields to the module objects for these values, so that the
|
||||
read-state can be initialized from them.
|
||||
|
||||
*fixme* When? Why? How?
|
||||
|
||||
Probably as soon as the language has been determined during file loading.
|
||||
|
||||
Need to figure out how to set these values.
|
||||
|
||||
|
||||
Local Variables:
|
||||
mode: outline
|
||||
End:
|
|
@ -1,665 +0,0 @@
|
|||
***
|
||||
*** These notes about the design of a new type of Scheme interpreter
|
||||
*** "Ior" are cut out from various emails from early spring 2000.
|
||||
***
|
||||
*** MDJ 000817 <djurfeldt@nada.kth.se>
|
||||
***
|
||||
|
||||
Generally, we should try to make a design which is clean and
|
||||
minimalistic in as many respects as possible. For example, even if we
|
||||
need more primitives than those in R5RS internally, I don't think
|
||||
these should be made available to the user in the core, but rather be
|
||||
made available *through* libraries (implementation in core,
|
||||
publication via library).
|
||||
|
||||
The suggested working name for this project is "Ior" (Swedish name for
|
||||
the donkey in "Winnie the Pooh" :). If, against the odds, we really
|
||||
would succeed in producing an Ior, and we find it suitable, we could
|
||||
turn it into a Guile 2.0 (or whatever). (The architecture still
|
||||
allows for support of the gh interface and uses conservative GC (Hans
|
||||
Böhm's, in fact).)
|
||||
|
||||
Beware now that I'm just sending over my original letter, which is
|
||||
just a sketch of the more detailed, but cryptic, design notes I made
|
||||
originally, which are, in turn, not as detailed as the design has
|
||||
become now. :)
|
||||
|
||||
Please also excuse the lack of structure. I shouldn't work on this at
|
||||
all right now. Choose for yourselves if you want to read this
|
||||
unstructured information or if you want to wait until I've structured
|
||||
it after end of January.
|
||||
|
||||
But then I actually have to blurt out the basic idea of my
|
||||
architecture already now. (I had hoped to present you with a proper
|
||||
and fairly detailed spec, but I won't be able to complete such a spec
|
||||
quickly.)
|
||||
|
||||
|
||||
The basic idea is this:
|
||||
|
||||
* Don't waste time on non-computation!
|
||||
|
||||
Why waste a lot of time on type-checks, unboxing and boxing of data?
|
||||
Neither of these actions do any computations!
|
||||
|
||||
I'd like both interpreter and compiled code to work directly with data
|
||||
in raw, native form (integers represented as 32bit longs, inexact
|
||||
numbers as doubles, short strings as bytes in a word, longer strings
|
||||
as a normal pointer to malloced memory, bignums are just pointers to a
|
||||
gmp (GNU MultiPrecision library) object, etc.)
|
||||
|
||||
* Don't we need to dispatch on type to know what to do?
|
||||
|
||||
But don't we need to dispatch on the type in order to know how to
|
||||
compute with the data? E.g., `display' does entirely different
|
||||
computations on a <fixnum> and a <string>. (<fixnum> is an integer
|
||||
between -2^31 and 2^31-1.)
|
||||
|
||||
The answer is *no*, not in 95% of all cases. The main reason is that
|
||||
the interpreter does type analysis while converting closures to
|
||||
bytecode, and knows already when _calling_ `display' what type it's
|
||||
arguments has. This means that the bytecode compiler can choose a
|
||||
suitable _version_ of `display' which handles that particular type.
|
||||
|
||||
|
||||
|
||||
This type analysis is greatly simplified by the fact that just as the
|
||||
type analysis _results_ in the type of the argument in the call to
|
||||
`display', and, thus, we can select the correct _version_ of
|
||||
`display', the closure byte-code itself will only be one _version_ of
|
||||
the closure with the types of its arguments fixed at the start of the
|
||||
analysis.
|
||||
|
||||
As you already have understood by now, the basic architecture is that
|
||||
all procedures are generic functions, and the "versions" I'm speaking
|
||||
about is a kind of methods. Let's call them "branches" by now.
|
||||
|
||||
For example:
|
||||
|
||||
(define foo
|
||||
(lambda (x)
|
||||
...
|
||||
(display x)
|
||||
...)
|
||||
|
||||
may result in the following two branches:
|
||||
|
||||
1. [<fixnum>-foo] =
|
||||
(branch ((x <fixnum>))
|
||||
...
|
||||
([<fixnum>-display] x)
|
||||
...)
|
||||
|
||||
2. [<string>-foo] =
|
||||
(branch ((x <string>))
|
||||
...
|
||||
([<string>-display] x)
|
||||
...)
|
||||
|
||||
and a new closure
|
||||
|
||||
(define bar
|
||||
(lambda (x y)
|
||||
...
|
||||
(foo x)
|
||||
...))
|
||||
|
||||
results in
|
||||
|
||||
[<fixnum>-<fixnum>-bar] =
|
||||
(branch ((x <fixnum>) (y <fixnum>))
|
||||
...
|
||||
([<fixnum>-foo] x)
|
||||
...)
|
||||
|
||||
Note how all type dispatch is eliminated in these examples.
|
||||
|
||||
As a further reinforcement to the type analysis, branches will not
|
||||
only have typed parameters but also have return types. This means
|
||||
that the type of a branch will look like
|
||||
|
||||
<type 1> x ... x <type n> --> <type r>
|
||||
|
||||
In essence, the entire system will be very ML-like internally, and we
|
||||
can benefit from the research done on ML-compilation.
|
||||
|
||||
However, we now get three major problems to confront:
|
||||
|
||||
1. In the Scheme language not all situations can be completely type
|
||||
analyzed.
|
||||
|
||||
2. In particular, for some operations, even if the types of the
|
||||
parameters are well defined, we can't determine the return type
|
||||
generically. For example, [<fixnum>-<fixnum>-+] may have return
|
||||
type <fixnum> _or_ <bignum>.
|
||||
|
||||
3. Even if we can do a complete analysis, some closures will generate
|
||||
a combinatoric explosion of branches.
|
||||
|
||||
|
||||
Problem 1: Incomplete analysis
|
||||
|
||||
We introduce a new type <boxed>. This data type has type <boxed> and
|
||||
contents
|
||||
|
||||
struct ior_boxed_t {
|
||||
ior_type *type; /* pointer to class struct */
|
||||
void *data; /* generic field, may also contain immediate objects
|
||||
*/
|
||||
}
|
||||
|
||||
For example, a boxed fixnum 4711 has type <boxed> and contents
|
||||
{ <fixnum>, 4711 }. The boxed type essentially corresponds to Guile's
|
||||
SCM type. It's just that the 1 or 3 or 7 or 16-bit type tag has been
|
||||
replaced with a 32-bit type tag (the pointer to the class structure
|
||||
describing the type of the object).
|
||||
|
||||
This is more inefficient than the SCM type system, but it's no problem
|
||||
since it won't be used in 95% of all cases. The big advantage
|
||||
compared to SCM's type system is that it is so simple and uniform.
|
||||
|
||||
I should note here that while SCM and Guile are centered around the
|
||||
cell representation and all objects either _are_ cells or have a cell
|
||||
handle, objects in ior will more look like mallocs. This is the
|
||||
reason why I planned to start with B<><42>öhm's GC which has C pointers as
|
||||
object handles. But it is of course still possible to use a heap, or,
|
||||
preferably several heaps for different kinds of objects. (B<><42>öhm's GC
|
||||
has multiple heaps for different sizes of objects.) If we write a
|
||||
custom GC, we can increase speed further.
|
||||
|
||||
|
||||
Problem 3 (yes, I skipped :) Combinatoric explosion
|
||||
|
||||
We simply don't generate all possible branches. In the interpreter we
|
||||
generate branches "just-too-late" (well, it's normally called "lazy
|
||||
compilation" or "just-in-time", but if it was "in-time", the procedure
|
||||
would already be compiled when it was needed, right? :) as when Guile
|
||||
memoizes or when a Java machine turns byte-codes into machine code, or
|
||||
as when GOOPS turns methods into cmethods for that matter.
|
||||
|
||||
Have noticed that branches (although still without return type
|
||||
information) already exist in GOOPS? They are currently called
|
||||
"cmethods" and are generated on demand from the method code and put
|
||||
into the GF cache during evaluation of GOOPS code. :-) (I have not
|
||||
utilized this fully yet. I plan to soon use this method compilation
|
||||
(into branches) to eliminate almost all type dispatch in calls to
|
||||
accessors.)
|
||||
|
||||
For the compiler, we use profiling information, just as the modern GCC
|
||||
scheduler, or else relies on some type analysis (if a procedure says
|
||||
(+ x y), x is not normally a <string> but rather some subclass of
|
||||
<number>) and some common sense (it's usually more important to
|
||||
generate <fixnum> branches than <foobar> branches).
|
||||
|
||||
The rest of the cases can be handled by <boxed>-branches. We can, for
|
||||
example, have a:
|
||||
|
||||
[<boxed>-<boxed>-bar] =
|
||||
(branch ((x <boxed>) (y <boxed>))
|
||||
...
|
||||
([<boxed>-foo] x)
|
||||
...)
|
||||
|
||||
[<boxed>-foo] will use an efficient type dispatch mechanism (for
|
||||
example akin to the GOOPS one) to select the right branch of
|
||||
`display'.
|
||||
|
||||
|
||||
Problem 2: Ambiguous return type
|
||||
|
||||
If the return type of a branch is ambiguous, we simply define the
|
||||
return type as <boxed>, and box data at the point in the branch where
|
||||
it can be decided which type of data we will return. This is how
|
||||
things can be handled in the general case. However, we might be able
|
||||
to handle things in a more neat way, at least in some cases:
|
||||
|
||||
During compilation to byte code, we'll probably use an intermediate
|
||||
representation in continuation passing style. We might even use a
|
||||
subtype of branches reprented as continuations (not a heavy
|
||||
representation, as in Guile and SCM, but probably not much more than a
|
||||
function pointer). This is, for example, one way of handling tail
|
||||
recursion, especially mutual tail recursion.
|
||||
|
||||
One case where we would like to try really hard not to box data is
|
||||
when fixnums "overflow into" bignums.
|
||||
|
||||
Let's say that the branch [<fixnum>-<fixnum>-bar] contains a form
|
||||
|
||||
(+ x y)
|
||||
|
||||
where the type analyzer knows that x and y are fixnums. We then split
|
||||
the branch right after the form and let it fork into two possible
|
||||
continuation branches bar1 and bar2:
|
||||
|
||||
[The following is only pseudo code. It can be made efficient on the C
|
||||
level. We can also use the asm compiler directive in conditional
|
||||
compilation for GCC on i386. We could even let autoconf/automake
|
||||
substitute an architecture specific solution for multiple
|
||||
architectures, but still support a C level default case.]
|
||||
|
||||
(if (sum-over/underflow? x y)
|
||||
(bar1 (fixnum->bignum x) (fixnum->bignum y) ...)
|
||||
(bar2 x y ...))
|
||||
|
||||
bar1 begins with the evaluation of the form
|
||||
|
||||
([<bignum>-<bignum>-+] x y)
|
||||
|
||||
while bar 2 begins with
|
||||
|
||||
([<fixnum>-<fixnum>-+] x y)
|
||||
|
||||
Note that the return type of each of these forms is unambiguous.
|
||||
|
||||
|
||||
Now some random points from the design:
|
||||
|
||||
* The basic concept in Ior is the class. A type is a concrete class.
|
||||
Classes which are subclasses of <object> are concrete, otherwise they
|
||||
are abstract.
|
||||
|
||||
* A procedure is a collection of methods. Each method can have
|
||||
arbitrary number of parameters of arbitrary class (not type).
|
||||
|
||||
* The type of a method is the tuple of it's argument classes.
|
||||
|
||||
* The type of a procedure is the set of it's method types.
|
||||
|
||||
But the most important new concept is the branch.
|
||||
Regard the procedure:
|
||||
|
||||
(define (half x)
|
||||
(quotient x 2))
|
||||
|
||||
The procedure half will have the single method
|
||||
|
||||
(method ((x <top>))
|
||||
(quotient x 2))
|
||||
|
||||
When `(half 128)' is called the Ior evaluator will create a new branch
|
||||
during the actual evaluation. I'm now going to extend the branch
|
||||
syntax by adding a second list of formals: the continuations of the
|
||||
branch.
|
||||
|
||||
* The type of a branch is namely the tuple of the tuple of it's
|
||||
argument types (not classes!) and the tuple of it's continuation
|
||||
argument types. The branch generated above will be:
|
||||
|
||||
(branch ((x <fixnum>) ((c <fixnum>))
|
||||
(c (quotient x 2)))
|
||||
|
||||
If the method
|
||||
|
||||
(method ((x <top>) (y <top>))
|
||||
(quotient (+ x 1) y))
|
||||
|
||||
is called with arguments 1 and 2 it results in the branch
|
||||
|
||||
(branch ((x <fixnum>) (y <fixnum>)) ((c1 <fixnum>) (c2 <bignum>))
|
||||
(quotient (+ x 1 c3) 2))
|
||||
|
||||
where c3 is:
|
||||
|
||||
(branch ((x <fixnum>) (y <fixnum>)) ((c <bignum>))
|
||||
(quotient (+ (fixnum->bignum x) 1) 2)
|
||||
|
||||
The generated branches are stored in a cache in the procedure object.
|
||||
|
||||
|
||||
But wait a minute! What about variables and data structures?
|
||||
|
||||
In essence, what we do is that we fork up all data paths so that they
|
||||
can be typed: We put the type tags on the _data paths_ instead of on
|
||||
the data itself. You can look upon the "branches" as tubes of
|
||||
information where the type tag is attached to the tube instead of on
|
||||
what passes through it.
|
||||
|
||||
Variables and data structures are part of the "tubes", so they need to
|
||||
be typed. For example, the generic pair looks like:
|
||||
|
||||
(define-class <pair> ()
|
||||
car-type
|
||||
car
|
||||
cdr-type
|
||||
cdr)
|
||||
|
||||
But note that since car and cdr are generic procedures, we can let
|
||||
more efficient pairs exist in parallel, like
|
||||
|
||||
(define-class <immutable-fixnum-list> ()
|
||||
(car (class <fixnum>))
|
||||
(cdr (class <immutable-fixnum-list>)))
|
||||
|
||||
Note that instances of this last type only takes two words of memory!
|
||||
They are easy to use too. We can't use `cons' or `list' to create
|
||||
them, since these procedures can't assume immutability, but we don't
|
||||
need to specify the type <fixnum> in our program. Something like
|
||||
|
||||
(const-cons 1 x)
|
||||
|
||||
where x is in the data flow path tagged as <immutable-fixnum-list>, or
|
||||
|
||||
(const-list 1 2 3)
|
||||
|
||||
|
||||
Some further notes:
|
||||
|
||||
* The concepts module and instance are the same thing. Using other
|
||||
modules means 1. creating a new module class which inherits the
|
||||
classes of the used modules and 2. instantiating it.
|
||||
|
||||
* Module definitions and class definitions are equivalent but
|
||||
different syntactic sugar adapted for each kind of use.
|
||||
|
||||
* (define x 1) means: create an instance variable which is itself a
|
||||
subclass of <boxed> with initial value 1 (which is an instance of
|
||||
<fixnum>).
|
||||
|
||||
|
||||
The interpreter is a mixture between a stack machine and a register
|
||||
machine. The evaluator looks like this... :)
|
||||
|
||||
/* the interpreter! */
|
||||
if (!setjmp (ior_context->exit_buf))
|
||||
#ifndef i386_GCC
|
||||
while (1)
|
||||
#endif
|
||||
(*ior_continue) (IOR_MICRO_OP_ARGS);
|
||||
|
||||
The branches are represented as an array of pointers to micro
|
||||
operations. In essence, the evaluator doesn't exist in itself, but is
|
||||
folded out over the entire implementation. This allows for an extreme
|
||||
form of modularity!
|
||||
|
||||
The i386_GCC is a machine specific optimization which avoids all
|
||||
unnecessary popping and pushing of the CPU stack (which is different
|
||||
from the Ior data stack).
|
||||
|
||||
The execution environment consists of
|
||||
|
||||
* a continue register similar to the program counter in the CPU
|
||||
* a data stack (where micro operation arguments and results are stored)
|
||||
* a linked chain of environment frames (but look at exception below!)
|
||||
* a dynamic context
|
||||
|
||||
I've written a small baby Ior which uses Guile's infrastructure.
|
||||
Here's the context from that baby Ior:
|
||||
|
||||
typedef struct ior_context_t {
|
||||
ior_data_t *env; /* rest of environment frames */
|
||||
ior_cont_t save_continue; /* saves or represents continuation */
|
||||
ior_data_t *save_env; /* saves or represents environment */
|
||||
ior_data_t *fluids; /* array of fluids (use GC_malloc!) */
|
||||
int n_fluids;
|
||||
int fluids_size;
|
||||
/* dynwind chain is stored directly in the environment, not in context */
|
||||
jmp_buf exit_buf;
|
||||
IOR_SCM guile_protected; /* temporary */
|
||||
} ior_context_t;
|
||||
|
||||
There's an important exception regarding the lowest environment
|
||||
frame. That frame isn't stored in a separate block on the heap, but
|
||||
on Ior's data stack. Frames are copied out onto the heap when
|
||||
necessary (for example when closures "escape").
|
||||
|
||||
|
||||
Now a concrete example:
|
||||
|
||||
Look at:
|
||||
|
||||
(define sum
|
||||
(lambda (from to res)
|
||||
(if (= from to)
|
||||
res
|
||||
(sum (+ 1 from) to (+ from res)))))
|
||||
|
||||
This can be rewritten into CPS (which captures a lot of what happens
|
||||
during flow analysis):
|
||||
|
||||
(define sum
|
||||
(lambda (from to res c1)
|
||||
(let ((c2 (lambda (limit?)
|
||||
(let ((c3 (lambda ()
|
||||
(c1 res)))
|
||||
(c4 (lambda ()
|
||||
(let ((c5 (lambda (from+1)
|
||||
(let ((c6 (lambda (from+res)
|
||||
(sum from+1 to from+res c1))))
|
||||
(_+ from res c6)))))
|
||||
(_+ 1 from c5)))))
|
||||
(_if limit? c3 c4)))))
|
||||
(_= from to c2))))
|
||||
|
||||
Finally, after branch expansion, some optimization, code generation,
|
||||
and some optimization again, we end up with the byte code for the two
|
||||
branches (here marked by labels `sum' and `sumbig'):
|
||||
|
||||
c5
|
||||
(ref -3)
|
||||
(shift -1)
|
||||
(+ <fixnum> <fixnum> c4big)
|
||||
;; c4
|
||||
(shift -2)
|
||||
(+ <fixnum> 1 sumbig)
|
||||
;; c6
|
||||
sum
|
||||
(shift 3)
|
||||
(ref2 -3)
|
||||
;; c2
|
||||
(if!= <fixnum> <fixnum> c5)
|
||||
;; c3
|
||||
(ref -1)
|
||||
;; c1
|
||||
(end)
|
||||
|
||||
c5big
|
||||
(ref -3)
|
||||
(shift -1)
|
||||
(+ <bignum> <bignum>)
|
||||
c4big
|
||||
(shift -2)
|
||||
(+ <bignum> 1)
|
||||
;; c6
|
||||
sumbig
|
||||
(shift 3)
|
||||
(ref2 -3)
|
||||
;; c2
|
||||
(= <bignum> <bignum>)
|
||||
(if! c5big)
|
||||
;; c3
|
||||
(ref -1)
|
||||
;; c1
|
||||
(end)
|
||||
|
||||
Let's take a closer look upon the (+ <fixnum> 1 sumbig) micro
|
||||
operation. The generated assembler from the Ior C source + machine
|
||||
specific optimizations for i386_GCC looks like this (with some rubbish
|
||||
deleted):
|
||||
|
||||
ior_int_int_sum_intbig:
|
||||
movl 4(%ebx),%eax ; fetch arg 2
|
||||
addl (%ebx),%eax ; fetch arg 1 and do the work!
|
||||
jo ior_big_sum_int_int ; dispatch to other branch on overflow
|
||||
movl %eax,(%ebx) ; store result in first environment frame
|
||||
addl $8,%esi ; increment program counter
|
||||
jmp (%esi) ; execute next opcode
|
||||
|
||||
ior_big_sum_int_int:
|
||||
|
||||
To clearify: This is output from the C compiler. I added the comments
|
||||
afterwards.
|
||||
|
||||
The source currently looks like this:
|
||||
|
||||
IOR_MICRO_BRANCH_2_2 ("+", int, big, sum, int, int, 1, 0)
|
||||
{
|
||||
int res = IOR_ARG (int, 0) + IOR_ARG (int, 1);
|
||||
IOR_JUMP_OVERFLOW (res, ior_big_sum_int_int);
|
||||
IOR_NEXT2 (z);
|
||||
}
|
||||
|
||||
where the macros allow for different definitions depending on if we
|
||||
want to play pure ANSI or optimize for a certain machine/compiler.
|
||||
|
||||
The plan is actually to write all source in the Ior language and write
|
||||
Ior code to translate the core code into bootstrapping C code.
|
||||
|
||||
Please note that if i386_GCC isn't defined, we run plain portable ANSI C.
|
||||
|
||||
|
||||
Just one further note:
|
||||
|
||||
In Ior, there are three modes of evaluation
|
||||
|
||||
1. evaluating and type analyzing (these go in parallel)
|
||||
2. code generation
|
||||
3. executing byte codes
|
||||
|
||||
It is mode 3 which is really fast in Ior.
|
||||
|
||||
You can look upon your program as a web of branch segments where one
|
||||
branch segment can be generated from fragments of many closures. Mode
|
||||
switches doesn't occur at the procedure borders, but at "growth
|
||||
points". I don't have time to define them here, but they are based
|
||||
upon the idea that the continuation together with the type signature
|
||||
of the data flow path is unique.
|
||||
|
||||
We normally run in mode 3. When we come to a source growth point
|
||||
(essentially an apply instruction) for uncompiled code we "dive out"
|
||||
of mode 3 into mode 1 which starts to eval/analyze code until we come
|
||||
to a "sink". When we reach the "sink", we have enough information
|
||||
about the data path to do code generation, so we backtrack to the
|
||||
source growth point and grow the branch between source and sink.
|
||||
Finally, we "dive into" mode 3!
|
||||
|
||||
So, code generation doesn't respect procedure borders. We instead get
|
||||
a very neat kind of inlining, which, e.g., means that it is OK to use
|
||||
closures instead of macros in many cases.
|
||||
----------------------------------------------------------------------
|
||||
Ior and module system
|
||||
=====================
|
||||
|
||||
How, exactly, should the module system of Ior look like?
|
||||
|
||||
There is this general issue of whether to have a single-dispatch or
|
||||
multi-dispatch system. Personally, I see that Scheme already use
|
||||
multi-dispatch. Compare (+ 1.0 2) and (+ 1 2.0).
|
||||
|
||||
As you've seen if you've read the notes about Ior design, efficiency
|
||||
is not an issue here, since almost all dispatch will be eliminated
|
||||
anyway.
|
||||
|
||||
Also, note an interesting thing: GOOPS actually has a special,
|
||||
implicit, argument to all of it's methods: the lexical environment.
|
||||
It would be very ugly to add a second, special, argument to this.
|
||||
|
||||
Of course, the theoreticians have already recognised this, and in many
|
||||
systems, the implicit argument (the object) and the environment for
|
||||
the method is the same thing.
|
||||
|
||||
I think we should especially take impressions from Matthias Blume's
|
||||
module/object system.
|
||||
|
||||
The idea, now, for Ior (remember that everything about Ior is
|
||||
negotiable between us) is that a module is a type, as well as an
|
||||
instance of that type. The idea is that we basically keep the GOOPS
|
||||
style of methods, with the implicit argument being the module object
|
||||
(or some other lexical environment, in a chain with the module as
|
||||
root).
|
||||
|
||||
Let's say now that module C uses modules A and B. Modules A and B
|
||||
both exports the procedure `foo'. But A:foo and B:foo as different
|
||||
sets of methods.
|
||||
|
||||
What does this mean? Well, it obviously means that the procedure
|
||||
`foo' in module C is a subtype of A:foo and B:foo. Note how this is
|
||||
similar in structure to slot inheritance: When class C is created with
|
||||
superclasses A and B, the properties of a slot in C are created
|
||||
through slot inheritance. One way of interpreting variable foo in
|
||||
module A is as a slot with init value foo. Through the MOP, we can
|
||||
specify that procedure slot inheritance in a module class implies
|
||||
creation of new init values through inheritance.
|
||||
|
||||
This may look like a kludge, and perhaps it is, and, sure, we are not
|
||||
going to accept any kludges in Ior. But, it might actually not be a
|
||||
kludge...
|
||||
|
||||
I think it is commonly accepted by computer scientists that a module,
|
||||
and/or at least a module interface is a type. Again, this type can be
|
||||
seen as the set of types of the functions in the interface. The types
|
||||
of our procedures are the set of branch types the provide. It is then
|
||||
natural that a module using two other modules create new procedure
|
||||
types by folding.
|
||||
|
||||
This thing would become less cloudy (yes, this is a cloudy part of my
|
||||
reasoning; I meant previously that the interpreter itself is now
|
||||
clear) if module interfaces were required to be explicitly types.
|
||||
|
||||
Actually, this would fit much better together with the rest of Ior's
|
||||
design. On one hand, we might be free to introduce such a restriction
|
||||
(compiler writers would applaud it), since R5RS hasn't specified any
|
||||
module system. On the other hand, it might be strange to require
|
||||
explicit typing when Scheme is fundamentally implicitly types...
|
||||
|
||||
We also have to consider that a module has an "inward" face, which is
|
||||
one type, and possibly many "outward" faces, which are different
|
||||
types. (Compare the idea of "interfaces" in Scheme48.)
|
||||
|
||||
It thus, seems that, while a module can truly be an Ior class, the
|
||||
reverse should probably not hold in the general case...
|
||||
|
||||
Unless
|
||||
|
||||
instance <-> module proper
|
||||
class of the instance <-> "inward interface"
|
||||
superclasses <-> "outward interfaces + inward uses"
|
||||
|
||||
...hmm, is this possible to reconcile with Rees' object system?
|
||||
|
||||
Please think about these issues. We should try to end up with a
|
||||
beautiful and consistent object/module system.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Here's a difficult problem in Ior's design:
|
||||
|
||||
Let's say that we have a mutable data structure, like an ordinary
|
||||
list. Since, in Ior, the type tag (which is really a pointer to a
|
||||
class structure) is stored separately from the data, it is thinkable
|
||||
that another thread modifies the location in the list between when our
|
||||
thread reads the type tag and when it reads the data.
|
||||
|
||||
The reading of type and data must be made atomic in some way.
|
||||
Probably, some kind of locking of the heap is required. It's just
|
||||
that it may cause a lot of overhead to look the heap at every *read*
|
||||
from a mutable data structure.
|
||||
|
||||
Look how much trouble those set!-operations cause! Not only does it
|
||||
force us to store type tags for each car and cdr in the list, but it
|
||||
also forces a lot of explicit dispatch to be done, and causes troubles
|
||||
in a threaded system...
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Jim Blandy <jimb@red-bean.com> writes:
|
||||
|
||||
> We also should try to make less work for the GC, by avoiding consing
|
||||
> up local environments until they're closed over.
|
||||
|
||||
Did the texts which I sent to you talk about Ior's solution?
|
||||
|
||||
It basically is: Use *two* environment "arguments" to the evaluator
|
||||
(in Ior, they aren't arguments but registers):
|
||||
|
||||
* One argument is a pointer to the "top" of an environment stack.
|
||||
This is used in the "inner loop" for very efficient access to
|
||||
in-between results. The "top" segment of the environment stack is
|
||||
also regarded as the first environment frame in the lexical
|
||||
environment. ("top" is bottom on a stack which grows downwards)
|
||||
|
||||
* The other argument points to a structure holding the evaluation
|
||||
context. In this context, there is a pointer to the chain of the
|
||||
rest of the environment frames. Note that since frames are just
|
||||
blocks of SCM values, you can very efficiently "release" a frame
|
||||
into the heap by block copying it (remember that Ior uses Boehms GC;
|
||||
this is how we allocate the block).
|
Loading…
Add table
Add a link
Reference in a new issue