diff --git a/devel/ChangeLog b/devel/ChangeLog deleted file mode 100644 index c258badce..000000000 --- a/devel/ChangeLog +++ /dev/null @@ -1,58 +0,0 @@ -2001-06-27 Thien-Thi Nguyen - - * README: Remove tasks.text. - - * tasks.text: Bye bye (contents folded into ../TODO). - -2001-05-08 Martin Grabmueller - - * modules/module-snippets.texi: Fixed a lot of typos and clarified - some points. Thanks to Neil for the typo+questions patch! - -2001-05-07 Martin Grabmueller - - * modules/module-snippets.texi: New file, documenting the module - system. Placed in `devel' for review purposes. - -2001-03-16 Martin Grabmueller - - * modules: New directory. - - * modules/module-layout.text: New file. - -2000-08-26 Mikael Djurfeldt - - * strings: New directory. - - * strings/sharedstr.text (sharedstr.text): New file. - -2000-08-12 Mikael Djurfeldt - - * translate: New directory. - - * translate/langtools.text: New file. - -2000-05-30 Mikael Djurfeldt - - * tasks.text: Use outline-mode. Added section for tasks in need - of attention. - -2000-05-29 Mikael Djurfeldt - - * tasks.text: New file. - -2000-05-25 Mikael Djurfeldt - - * README: New file. - - * build/snarf-macros.text: New file. - -2000-05-20 Mikael Djurfeldt - - * policy/goals.text, policy/principles.text, policy/plans.text: - New files. - -2000-03-21 Mikael Djurfeldt - - * policy/names.text: New file. - diff --git a/devel/README b/devel/README deleted file mode 100644 index 6fb2b4ca5..000000000 --- a/devel/README +++ /dev/null @@ -1,13 +0,0 @@ -Directories: - -policy Guile policy documents - -build Build/installation process - -string Strings and characters - -translation Language traslation - -vm Virtual machines - -vm/ior Mikael's ideas on a new type of Scheme interpreter diff --git a/devel/build/snarf-macros.text b/devel/build/snarf-macros.text deleted file mode 100644 index e69de29bb..000000000 diff --git a/devel/modules/module-layout.text b/devel/modules/module-layout.text deleted file mode 100644 index c088ad822..000000000 --- a/devel/modules/module-layout.text +++ /dev/null @@ -1,288 +0,0 @@ -Module Layout Proposal -====================== - -Martin Grabmueller - -Draft: 2001-03-11 - -Version: $Id: module-layout.text,v 1.1 2001-03-16 08:37:37 mgrabmue Exp $ - -* Table of contents - -** Abstract -** Overview -*** What do we have now? -*** What should we change? -** Policy of module separation -*** Functionality -*** Standards -*** Importance -*** Compatibility -** Module naming -*** Scheme -*** Object oriented programming -*** Systems programming -*** Database programming -*** Text processing -*** Math programming -*** Network programming -*** Graphics -*** GTK+ programming -*** X programming -*** Games -*** Multiple names -*** Application modules -** Future ideas - - -* Abstract - -This is a proposal for a new layout of the module name space. The -goal is to reduce (or even eliminate) the clutter in the current ice-9 -module directory, and to provide a clean framework for splitting -libguile into subsystems, grouped by functionality, standards -compliance and maybe other characteristics. - -This is not a completed policy document, but rather a collection of -ideas and proposals which still have to be decided. I will mention by -personal preference, where appropriate, but the final decisions are of -course up to the maintainers. - - -* Overview - -Currently, new modules are added in an ad-hoc manner to the ice-9 -module name space when the need for them arises. I think that was -mainly because no other directory for installed Scheme modules was -created. With the integration of GOOPS, the new top-level module -directory oop was introduced, and we should follow this practice for -other subsystems which share functionality. - -DISCLAIMER: Please note that I am no expert on Guile's module system, -so be patient with me and correct me where I got anything wrong. - -** What do we have now? - -The module (oop goops) contains all functionality needed for -object-oriented programming with Guile (with a few exceptions in the -evaluator, which is clearly needed for performance). - -Except for the procedures in the module (ice-9 rdelim), all Guile -primitives are currently located in the root module (I think it is the -module (guile)), and some procedures defined in `boot-9.scm' are -installed in the module (guile-user). - -** What should we change? - -In the core, there are a lot of primitive procedures which can cleanly -be grouped into subsystems, and then grouped into modules. That would -make the core library more maintainable, would ease seperate testing -of subsystems and clean up dependencies between subsystems. - - -* Policy of module separation - -There are several possibilities to group procedures into modules. - -- They could be grouped by functionality. -- They could be grouped by standards compliance. -- They could be grouped by level of importance. - -One important group of modules should of course be provided -additionally: - -- Compatibility modules. - -So the first thing to decide is: Which of these policies should we -adopt? Personally, I think that it is not possible to cleanly use -exactly one of the possibilities, we will probably use a mixture of -them. I propose to group by functionality, and maybe use some -`bridge-modules', which make functionality available when the user -requests the modules for a given standard. - -** Functionality - -Candidates for the first possibility are groups of procedures, which -already are grouped in source files, such as - -- Regular expression procedures. -- Network procedures. -- Systems programming procedures. -- Random number procedures. -- Math/numeric procedures. -- String-processing procedures. -- List-processing procedures. -- Character handling procedures. -- Object-oriented programming support. - -** Standards - -Guile now complies to R5RS, and I think that the procedures required -by this standards should always be available to the programmer. -People who do not want them, could always create :pure modules when -they need it. - -On the other hand, the SRFI procedures fit nicely into a `group by -standards' scheme. An example which is already provided, is the -SRFI-8 syntax `receive'. Following that, we could provide two modules -for each SRFI, one named after the SRFI (like `srfi-8') and one named -after the main functionality (`receive'). - -** Importance - -By importance, I mean `how important are procedures for the average -Guile user'. That means that procedures which are only useful to a -small group of users (the Guile developers, for example) should not be -immediately available at the REPL, so that they not confuse the user -when thay appear in the `apropos' output or the tab completion. - -A good example would be debugging procedures (which also could be -added with a special command-line option), or low-level system calls. - -** Compatibility - -This group is for modules providing compatibility procedures. An -example would be a module for old string-processing procedures, which -could someday get overridden by incompatible SRFI procedures of the -same name. - - -* Module naming - -Provided we choose to take the `group by functionality' approach, I -propose the following naming hierarchy (some of them were actually -suggested by Mikael Djurfeldt). - -- Schame language related in (scheme) -- Object oriented programming in (oop) -- Systems programming in (system) -- Database programming in (database) -- Text processing in (text) -- Math/numeric programming in (math) -- Network programming in (network) -- Graphics programming in (graphics) -- GTK+ programming in (gtk) -- X programming in (xlib) -- Games in (games) - -The layout of sub-hierarchies is up to the writers of modules, we -should not enforce a strict policy here, because we can not imagine -what will happen in this area. - -** Scheme - -(scheme r5rs) Complete R5RS procedures set. -(scheme safe) Safe modules. -(scheme srfi-1) List processing. -(scheme srfi-8) Multiple valuas via `receive'. -(scheme receive) dito. -(scheme and-let-star) and-let* -(scheme syncase) syntax-case hygienic macros (maybe included in - (scheme r5rs?). -(scheme slib) SLIB, for historic reasons in (scheme). - -** Object oriented programming - -Examples in this section are -(oop goops) For GOOPS. -(oop goops ...) For lower-level GOOPS functionality and utilities. - -** Systems programming - -(system shell) Shell utilities (glob, system etc). -(system process) Process handling. -(system file-system) Low-level filesystem support. -(system user) getuid, setpgrp, etc. - -_or_ - -(system posix) All posix procedures. - -** Database programming - -In the database section, there should be sub-module hierarchies for -each supported database which contains the low-level code, and a -common database layer, which should unify access to SQL databases via a single interface a la Perl's DBMI. - -(database postgres ...) Low-level database functionality. -(database oracle ...) ... -(database mysql ...) ... -(database msql ...) ... -(database sql) Common SQL accessors. -(database gdbm ...) ... -(database hashed) Common hashed database accessors (like gdbm). -(database util) Leftovers. - -** Text processing - -(text rdelim) Line oriented in-/output. -(text util) Mangling text files. - -** Math programming - -(math random) Random numbers. -(math primes) Prime numbers. -(math vector) Vector math. -(math algebra) Algebra. -(math analysis) Analysis. -(math util) Leftovers. - -** Network programming - -(network inet) Internet procedures. -(network socket) Socket interface. -(network db) Network database accessors. -(network util) ntohl, htons and friends. - -** Graphics - -(graphics vector) Generalized vector graphics handling. -(graphics vector vrml) VRML parsers etc. -(graphisc bitmap) Generalized bitmap handling. -(graphics bitmap ...) Bitmap format handling (TIFF, PNG, etc.). - -** GTK+ programming - -(gtk gtk) GTK+ procedures. -(gtk gdk) GDK procedures. -(gtk threads) gtktreads. - -** X programming - -(xlib xlib) Low-level XLib programming. - -** Games - -(games robots) GNU robots. - -** Multiple names - -As already mentioned above, I think that some modules should have -several names, to make it easier for the user to get the functionality -she needs. For example, a user could say: `hey, I need the receive -macro', or she could say: `I want to stick to SRFI syntax, so where -the hell is the module for SRFI-8?!?'. - -** Application modules - -We should not enforce policy on applications. So I propose that -application writers should be advised to place modules either in -application-specific directories $PREFIX/share/$APP/guile/... and name -that however they like, or to use the application's name as the first -part of the module name, e.g (gnucash import), (scwm background), -(rcalc ui). - -* Future ideas - -I have not yet come up with a good idea for grouping modules, which -deal for example with XML processing. They would fit into the (text) -module space, because most XML files contain text data, but they would -also fit into (database), because XML files are essentially databases. - -On the other hand, XML processing is such a large field that it -probably is worth it's own top-level name space (xml). - - -Local Variables: -mode: outline -End: diff --git a/devel/modules/module-snippets.texi b/devel/modules/module-snippets.texi deleted file mode 100644 index e69de29bb..000000000 diff --git a/devel/policy/goals.text b/devel/policy/goals.text deleted file mode 100644 index e69de29bb..000000000 diff --git a/devel/policy/names.text b/devel/policy/names.text deleted file mode 100644 index e69de29bb..000000000 diff --git a/devel/policy/plans.text b/devel/policy/plans.text deleted file mode 100644 index e69de29bb..000000000 diff --git a/devel/policy/principles.text b/devel/policy/principles.text deleted file mode 100644 index e69de29bb..000000000 diff --git a/devel/strings/sharedstr.text b/devel/strings/sharedstr.text deleted file mode 100644 index f568c3ebf..000000000 --- a/devel/strings/sharedstr.text +++ /dev/null @@ -1,143 +0,0 @@ -Implementation of shared substrings with fresh-copy semantics -============================================================= - -Version: $Id: sharedstr.text,v 1.1 2000-08-26 20:55:21 mdj Exp $ - -Background ----------- - -In Guile, most string operations work on two other data types apart -from strings: shared substrings and read-only strings (which includes -symbols). One of Guile's sub-goals is to be a scripting language in -which string management is important. Read-only strings and shared -substrings were introduced in order to reduce overhead in string -manipulation. - -We now want to simplify the Guile API by removing these two data -types, but keeping performance by allowing ordinary strings to share -storage. - -The idea is to let operations like `symbol->string' and `substring' -return a pointer into the original string/symbol, thus avoiding the -need to copy the string. - -Two of the problems which then arise are: - -* If s2 is derived from s1, and therefore share storage with s1, a - modification to either s1 or s2 will affect the other. - -* Guile is supposed to interact closely with many UNIX libraries in - which the NUL character is used to terminate strings. Therefore - Guile strings contain a NUL character at the end, in addition to the - string length (the latter of which is used by Guile's string - operations). - -The solutions to these problems are to - -* Copy a string with shared storage when it's modified. - -* Copy a string with shared storage when it's being used as argument - to a C library call. (Copying implies inserting an ending NUL - character.) - -But this leads to memory management problems. When is it OK to free -a character array which was allocated for a symbol or a string? - -Abstract description of proposed solution ------------------------------------------ - -Definitions - - STRING = - - SYMBOL = - - CHARRECORD = - - PHASE = black | white - - SHAREDFLAG = private | shared - - CHARS is a character array - - CHARPTR points into it - -Memory management - -A string or symbol is initially allocated with its contents stored in -a character array in a character record. The string/symbol header -contains a pointer to this record. The initial value of the shared -flag in the character record is `private'. - -The GC mark phases alternate between black and white---every second -phase is black, the rest are white. This is used to distinguish -whether a character record has been encountered before: - -During a black mark phase, when the GC encounters a string or symbol, -it changes the PHASE and SHAREDFLAG marks of the corresponding -character record according to the following table: - - --> (white => unconditionally - --> set to ) - --> (SHAREDFLAG changed) - --> (no change) - -The behaviour of a white phase is quivalent with the color names -switched. - -The GC sweep phase frees any unmarked string or symbol header and -frees its character record either if it is marked with the "wrong" -color (not matching the color of the last mark phase) or if its -SHAREDFLAG is `private'. - -Copy-on-write - -An attempt at mutating string contents leads to copying if SHAREDFLAG -is `shared'. Copying means making a copy of the character record and -mutating the CHARRECORDPTR and CHARPTR fields of the object header to -point to the copy. - -Substring operation - -When making a substring, a new string header is allocated, with new -contents for the LENGTH and CHARPTR fields. - -Implementation details ----------------------- - -* We store the character record consecutively with the character - array and lump the PHASE and SHAREDFLAG fields together into one - byte containing an integer code for the four possible states of the - PHASE and SHAREDFLAG fields. Another way of viewing it is that - these fields are represented as bits 1 and 0 in the "header" of the - character array. We let CHARRECORDPTR point to the first character - position instead of on this header: - - CHARRECORDPTR - | - V - FCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC - - F = 0, 1, 2, 3 - -* We represent strings as the sub-types `simple-string' and - `substring'. - -* In a simple string, CHARRECORDPTR and CHARPTR are represented by a - single pointer, so a `simple-string' is an ordinary heap cell with - TYPETAG and LENGTH in the CAR and CHARPTR in the CDR. - -* substring:s are represented as double cells, with TYPETAG and LENGTH - in word 0, CHARRECORDPTR in word 1 and CHARPTR in word 2 - (alternatively, we could store an offset from CHARRECORDPTR). - -Problems with this implementation ---------------------------------- - -* How do we make copy-on-write thread-safe? Is there a different - implementation which is efficient and thread-safe? - -* If small substrings are frequently generated from large, temporary - strings and the small substrings are kept in a data structure, the - heap will still have to host the large original strings. Should we - simply accept this? diff --git a/devel/translation/langtools.text b/devel/translation/langtools.text deleted file mode 100644 index ac07ce034..000000000 --- a/devel/translation/langtools.text +++ /dev/null @@ -1,592 +0,0 @@ -* Introduction - -Version: $Id: langtools.text,v 1.5 2000-08-13 04:47:26 mdj Exp $ - -This is a proposal for how Guile could interface with language -translators. It will be posted on the Guile list and revised for some -short time (days rather than weeks) before being implemented. - -The document can be found in the CVS repository as -guile-core/devel/translation/langtools.text. All Guile developers are -welcome to modify and extend it according to the ongoing discussion -using CVS. - -Ideas and comments are welcome. - -For clarity, the proposal is partially written as if describing an -already existing system. - -MDJ 000812 - -* Language names - -A translator for Guile is a certain kind of Guile module, implemented -in Scheme, C, or a mixture of both. - -To make things simple, the name of the language is closely related to -the name of the translator module. - -Languages have long and short names. The long form is simply the name -of the translator module: `(lang ctax)', `(lang emacs-lisp)', -`(my-modules foo-lang)' etc. - -Languages with the long name `(lang IDENTIFIER)' can be referred to -with the short name IDENTIFIER, for example `emacs-lisp'. - -* How to tell Guile to read code in a different language (than Scheme) - -There are four methods of specifying which translator to use when -reading a file: - -** Command option - -The options to the guile command are parsed linearly from left to -right. You can change the language at zero or more points using the -option - - -t, --language LANGUAGE - -Example: - - guile -t emacs-lisp -l foo -l bar -t scheme -l baz - -will use the emacs-lisp translator while reading "foo" and "bar", and -the default translator (scheme) for "baz". - -You can use this technique in a script together with the meta switch: - -#!/usr/local/bin/guile \ --t emacs-lisp -s -!# - -** Commentary in file - -When opening a file for reading, Guile will read the first few lines, -looking for the string "-*- LANGNAME -*-", where LANGNAME can be -either the long or short form of the name. - -If found, the corresponding translator is loaded and used to read the -file. - -** File extension - -Guile maintains an alist mapping filename extensions to languages. -Each entry has the form: - - (REGEXP . LANGNAME) - -where REGEXP is a string and LANGNAME a symbol or a list of symbols. - -The alist can be accessed using `language-alist' which is exported -by the module `(core config)': - - (language-alist) --> current alist - (language-alist ALIST) sets the alist to ALIST - (language-alist ALIST :prepend) prepends ALIST onto the current list - (language-alist ALIST :append) appends ALIST after current list - -The `load' command will match filenames against this alist and choose -the translator to use accordingly. - -There will be a default alist for common translators. For translators -not listed, the alist has to be extended in .guile just as Emacs users -extend auto-mode-alist in .emacs. - -** Module header - -You specify the language used by a module with the :language option in -the module header. (See below under "Module configuration language".) - -* Module system - -This section describes how the Guile module system is adapted to use -with other languages. - -** Module configuration language - -*** The `(config)' module - -Guile has a sophisticated module system. We don't require each -translator implementation to implement its own syntax for modules. -That would be too much work for the implementor, and users would have -to learn the module system anew for each syntax. - -Instead, the module `(config)' exports the module header form -`(define-module ...)'. - -The config module also exports a number of primitives by which you can -customize the Guile library, such as `language-alist' and `load-path'. - -*** Default module environment - -The bindings of the config module is available in the default -interaction environment when Guile starts up. This is because the -config module is on the module use list for the startup environment. - -However, config bindings are *not* available by default in new -modules. - -The default module environment provides bindings from the R5RS module -only. - -*** Module headers - -The module header of the current module system is the form - - (define-module NAME OPTION1 ...) - -You can specify a translator using the option - - :language LANGNAME - -where LANGNAME is the long or short form of language name as described -above. - -The translator is being fed characters from the module file, starting -immediately after the end-parenthesis of the module header form. - -NOTE: There can be only one module header per file. - -It is also possible to put the module header in a separate file and -use the option - - :file FILENAME - -to point out a file containing the actual code. - -Example: - -foo.gm: ----------------------------------------------------------------------- -(define-module (foo) - :language emacs-lisp - :file "foo.el" - :export (foo bar) - ) ----------------------------------------------------------------------- - -foo.el: ----------------------------------------------------------------------- -(defun foo () - ...) - -(defun bar () - ...) ----------------------------------------------------------------------- - -** Repl commands - -Up till now, Guile has been dependent upon the available bindings in -the selected module in order to do basic operations such as moving to -a different module, enter the debugger or getting documentation. - -This is not acceptable since we want be able to control Guile -consistently regardless of in which module we are, and sinc we don't -want to equip a module with bindings which don't have anything to do -with the purpose of the module. - -Therefore, the repl provides a special command language on top of -whatever syntax the current module provides. (Scheme48 and RScheme -provides similar repl command languages.) - -[Jost Boekemeier has suggested the following alternative solution: - Commands are bindings just like any other binding. It is enough if - some modules carry command bindings (it's in fact enough if *one* - module has them), because from such a module you can use the command - (in MODULE) to walk into a module not carrying command bindings, and - then use CTRL-D to exit. - - However, this has the disadvantage of mixing the "real" bindings with - command bindings (the module might want to use "in" for other - purposes), that CTRL-D could cause problems since for some channels - CTRL-D might close down the connection, and that using one type of - command ("in") to go "into" the module and another (CTRL-D) to "exit" - is more complex than simply "going to" a module.] - -*** Repl command syntax - -Normally, repl commands have the syntax - - ,COMMAND ARG1 ... - -Input starting with arbitrary amount of whitespace + a comma thus -works as an escape syntax. - -This syntax is probably compatible with all languages. (Note that we -don't need to activate the lexer of the language until we've checked -if the first non-whitespace char is a comma.) - -(Hypothetically, if this would become a problem, we can provide means -of disabling this behaviour of the repl and let that particular -language module take sole control of reading at the repl prompt.) - -Among the commands available are - -*** ,in MODULE - -Select module named MODULE, that is any new expressions typed by the -user after this command will be evaluated in the evaluation -environment provided by MODULE. - -*** ,in MODULE EXPR - -Evaluate expression EXPR in MODULE. EXPR has the syntax supplied by -the language used by MODULE. - -*** ,use MODULE - -Import all bindings exported by MODULE to the current module. - -* Language modules - -Since code written in any kind of language should be able to implement -most tasks, which may include reading, evaluating and writing, and -generally computing with, expressions and data originating from other -languages, we want the basic reading, evaluation and printing -operations to be independent of the language. - -That is, instead of supplying separate `read', `eval' and `write' -procedures for different languages, a language module is required to -use the system procedures in the translated code. - -This means that the behaviour of `read', `eval' and `write' are -context dependent. (See further "How Guile system procedures `read', -`eval', `write' use language modules" below.) - -** Language data types - -Each language module should try to use the fundamental Scheme data -types as far as this is possible. - -Some data types have important differences in semantics between -languages, though, and all required data types may not exist in -Guile. - -In such cases, the language module must supply its own, distinct, data -types. So, each language supported by Guile uses a certain set of -data types, with the basic Scheme data types as the intersection -between all sets. - -Specifically, syntax trees representing source code expressions should -normally be a distinct data type. - -** Foreign language escape syntax - -Note that such data can flow freely between modules. In order to -accomodate data with different native syntaxes, each language module -provides a foreign language escape syntax. In Scheme, this syntax -uses the sharp comma extension specified by SRFI-10. The read -constructor is simply the last symbol in the long language name (which -is usually the same as the short language name). - -** Example 1 - -Characters have the syntax in Scheme and in ctax. Lists currently -have syntax in Scheme but lack ctax syntax. Ctax doesn't have a -datatype "enum", but we pretend it has for this example. - -The following table now shows the syntax used for reading and writing -these expressions in module A using the language scheme, and module B -using the language ctax (we assume that the foreign language escape -syntax in ctax is #LANGUAGE EXPR): - - A B - -chars #\X 'X' - -lists (1 2 3) #scheme (1 2 3) - -enums #,(ctax ENUM) ENUM - -** Example 2 - - A user is typing expressions in a ctax module which imports the - bindings x and y from the module `(foo)': - - ctax> x = read (); - 1+2; - 1+2; - ctax> x - 1+2; - ctax> y = 1; - 1 - ctax> y; - 1 - ctax> ,in (guile-user) - guile> ,use (foo) - guile> x - #,(ctax 1+2;) - guile> y - 1 - guile> - -The example shows that ctax uses a distinct representation for ctax -expressions, but Scheme integers for integers. - -** Language module interface - -A language module is an ordinary Guile module importing bindings from -other modules and exporting bindings through its public interface. - -It is required to export the following variable and procedures: - -*** language-environment --> ENVIRONMENT - -Returns a fresh top-level ENVIRONMENT (a module) where expressions -in this language are evaluated by default. - -Modules using this language will by default have this environment -on their use list. - -The intention is for this procedure to provide the "run-time -environment" for the language. - -*** native-read PORT --> OBJECT - -Read next expression in the foreign syntax from PORT and return an -object OBJECT representing it. - -It is entirely up to the language module to define what one -expression is, that is, how much to read. - -In lisp-like languages, `native-read' corresponds to `read'. Note -that in such languages, OBJECT need not be source code, but could -be data. - -The representation of OBJECT is also chosen by the language -module. It can consist of Scheme data types, data types distinct for -the language, or a mixture. - -There is one requirement, however: Distinct data types must be -instances of a subclass of `language-specific-class'. - -This procedure will be called during interactive use (the user -types expressions at a prompt) and when the system `read' -procedure is called at a time when a module using this language is -selected. - -Some languages (for example Python) parse differently depending if -its an interactive or non-interactive session. Guile prvides the -predicate `interactive-port?' to test for this. - -*** language-specific-class - -This variable contains the superclass of all non-Scheme data-types -provided by the language. - -*** native-write OBJECT PORT - -This procedure prints the OBJECT on PORT using the specific -language syntax. - -*** write-foreign-syntax OBJECT LANGUAGE NATIVE-WRITE PORT - -Write OBJECT in the foreign language escape syntax of this module. -The object is specific to language LANGUAGE and can be written using -NATIVE-WRITE. - -Here's an implementation for Scheme: - -(define (write-foreign-syntax object language native-write port) - (format port "#(~A " language)) - (native-write object port) - (display #\) port) - -*** translate EXPRESSION --> SCHEMECODE - -Translate an EXPRESSION into SCHEMECODE. - -EXPRESSION can be anything returned by `read'. - -SCHEMECODE is Scheme source code represented using ordinary Scheme -data. It will be passed to `eval' in an environment containing -bindings in the environment returned by `language-environment'. - -This procedure will be called duing interactive use and when the -system `eval - -*** translate-all PORT [ALIST] --> THUNK - -Translate the entire stream of characters PORT until #. -Return a THUNK which can be called repeatedly like this: - - THUNK --> SCHEMECODE - -Each call will yield a new piece of scheme code. The THUNK signals -end of translation by returning the value *end-of-translation* (which -is tested using the predicate `end-of-translation?'). - -The optional argument ALIST provides compilation options for the -translator: - - (debug . #t) means produce code suitable for debugging - -This procedure will be called by the system `load' command and by -the module system when loading files. - -The intensions are: - -1. To let the language module decide when and in how large chunks - to do the processing. It may choose to do all processing at - the time translate-all is called, all processing when THUNK is - called the first time, or small pieces of processing each time - THUNK is called, or any conceivable combination. - -2. To let the language module decide in how large chunks to output - the resulting Scheme code in order not to overload memory. - -3. To enable the language module to use temporary files, and - whole-module analysis and optimization techniques. - -*** untranslate SCHEMECODE --> EXPRESSION - -Attempt to do the inverse of `translate'. An approximation is OK. It -is also OK to return #f. This procedure will be called from the -debugger, when generating error messages, backtraces etc. - -The debugger uses the local evaluation environment to determine from -which module an expression come. This is how the debugger can know -which `untranslate' procedure to call for a given expression. - -(This is used currently to decide whether which backtrace frames to -display. System modules use the option :no-backtrace to prevent -displaying of Guile's internals to the user.) - -Note that `untranslate' can use source-properties set by `native-read' -to give hints about how to do the reverse translation. Such hints -could for example be the filename, and line and column numbers for the -source expression, or an actual copy of the source expression. - -** How Guile system procedures `read', `eval', `write' use language modules - -*** read - -The idea is that the `read' exported from the R5RS library will -continue work when called from other languages, and will keep its -semantics. - -A call to `read' simply means "read in an expression from PORT using -the syntax associated with that port". - -Each module carries information about its language. - -When an input port is created for a module to be read or during -interaction with a given module, this information is copied to the -port object. - -read uses this information to call `native-read' in the correct -language module. - -*** eval - -[To be written.] - -*** write - -[To be written.] - -* Error handling - -** Errors during translation - -Errors during translation are generated as usual by calling scm-error -(from Scheme) or scm_misc_error etc (from C). The effect of -throwing errors from within `translate-all' is the same as when they -are generated within a call to the THUNK returned from -`translate-all'. - -scm-error takes a fifth argument. This is a property list (alist) -which you can use to pass extra information to the error reporting -machinery. - -Currently, the following properties are supported: - - filename filename of file being translated - line line number of errring expression - column column number - -** Run-time errors (errors in SCHEMECODE) - -This section pertains to what happens when a run-time error occurs -during evaluation of the translated code. - -In order to get "foreign code" in error messages, make sure that -`untranslate' yields good output. Note the possibility of maintaining -a table (preferably using weak references) mapping SCHEMECODE to -EXPRESSION. - -Note the availability of source-properties for attaching filename, -line and column number, and other, information, such as EXPRESSION, to -SCHEMECODE. If filename, line, and, column properties are defined, -they will be automatically used by the error reporting machinery. - -* Proposed changes to Guile - -** Implement the above proposal. - -** Add new field `reader' and `translator' to all module objects - -Make sure they are initialized when a language is specified. - -** Use `untranslate' during error handling. - -** Implement the use of arg 5 to scm-error - -(specified in "Errors during translation") - -** Implement a generic lexical analyzer with interface similar to read/rp - -Mikael is working on this. (It might take a few days, since he is -busy with his studies right now.) - -** Remove scm:eval-transformer - -This is replaced by new fields in each module object (environment). - -`eval' will instead directly the `transformer' field in the module -passed as second arg. - -Internal evaluation will, similarly, use the transformer of the module -representing the top-level of the local environment. - -Note that this level of transformation is something independent of -language translation. *This* is a hook for adding Scheme macro -packages and belong to the core language. - -We also need to check the new `translator' field, potentially using -it. - -** Package local environments as smobs - -so that environment list structures can't leak out on the Scheme -level. (This has already been done in SCM.) - -** Introduce new fields in input ports - -These carries state information such as - -*** which keyword syntax to support - -*** whether to be case sensitive or not - -*** which lexical grammar to use - -*** whether the port is used in an interactive session or not - -There will be a new Guile primitive `interactive-port?' testing for this. - -** Move configuration of keyword syntax and case sensitivity to the read-state - -Add new fields to the module objects for these values, so that the -read-state can be initialized from them. - - *fixme* When? Why? How? - -Probably as soon as the language has been determined during file loading. - -Need to figure out how to set these values. - - -Local Variables: -mode: outline -End: diff --git a/devel/vm/ior/ior-intro.text b/devel/vm/ior/ior-intro.text deleted file mode 100644 index e69de29bb..000000000 diff --git a/devel/vm/ior/ior.text b/devel/vm/ior/ior.text deleted file mode 100644 index 9730de55d..000000000 --- a/devel/vm/ior/ior.text +++ /dev/null @@ -1,665 +0,0 @@ -*** -*** These notes about the design of a new type of Scheme interpreter -*** "Ior" are cut out from various emails from early spring 2000. -*** -*** MDJ 000817 -*** - -Generally, we should try to make a design which is clean and -minimalistic in as many respects as possible. For example, even if we -need more primitives than those in R5RS internally, I don't think -these should be made available to the user in the core, but rather be -made available *through* libraries (implementation in core, -publication via library). - -The suggested working name for this project is "Ior" (Swedish name for -the donkey in "Winnie the Pooh" :). If, against the odds, we really -would succeed in producing an Ior, and we find it suitable, we could -turn it into a Guile 2.0 (or whatever). (The architecture still -allows for support of the gh interface and uses conservative GC (Hans -Böhm's, in fact).) - - Beware now that I'm just sending over my original letter, which is - just a sketch of the more detailed, but cryptic, design notes I made - originally, which are, in turn, not as detailed as the design has - become now. :) - - Please also excuse the lack of structure. I shouldn't work on this at - all right now. Choose for yourselves if you want to read this - unstructured information or if you want to wait until I've structured - it after end of January. - -But then I actually have to blurt out the basic idea of my -architecture already now. (I had hoped to present you with a proper -and fairly detailed spec, but I won't be able to complete such a spec -quickly.) - - -The basic idea is this: - -* Don't waste time on non-computation! - -Why waste a lot of time on type-checks, unboxing and boxing of data? -Neither of these actions do any computations! - -I'd like both interpreter and compiled code to work directly with data -in raw, native form (integers represented as 32bit longs, inexact -numbers as doubles, short strings as bytes in a word, longer strings -as a normal pointer to malloced memory, bignums are just pointers to a -gmp (GNU MultiPrecision library) object, etc.) - -* Don't we need to dispatch on type to know what to do? - -But don't we need to dispatch on the type in order to know how to -compute with the data? E.g., `display' does entirely different -computations on a and a . ( is an integer -between -2^31 and 2^31-1.) - -The answer is *no*, not in 95% of all cases. The main reason is that -the interpreter does type analysis while converting closures to -bytecode, and knows already when _calling_ `display' what type it's -arguments has. This means that the bytecode compiler can choose a -suitable _version_ of `display' which handles that particular type. - - - -This type analysis is greatly simplified by the fact that just as the -type analysis _results_ in the type of the argument in the call to -`display', and, thus, we can select the correct _version_ of -`display', the closure byte-code itself will only be one _version_ of -the closure with the types of its arguments fixed at the start of the -analysis. - -As you already have understood by now, the basic architecture is that -all procedures are generic functions, and the "versions" I'm speaking -about is a kind of methods. Let's call them "branches" by now. - -For example: - -(define foo - (lambda (x) - ... - (display x) - ...) - -may result in the following two branches: - -1. [-foo] = - (branch ((x )) - ... - ([-display] x) - ...) - -2. [-foo] = - (branch ((x )) - ... - ([-display] x) - ...) - -and a new closure - -(define bar - (lambda (x y) - ... - (foo x) - ...)) - -results in - -[--bar] = - (branch ((x ) (y )) - ... - ([-foo] x) - ...) - -Note how all type dispatch is eliminated in these examples. - -As a further reinforcement to the type analysis, branches will not -only have typed parameters but also have return types. This means -that the type of a branch will look like - - x ... x --> - -In essence, the entire system will be very ML-like internally, and we -can benefit from the research done on ML-compilation. - -However, we now get three major problems to confront: - -1. In the Scheme language not all situations can be completely type - analyzed. - -2. In particular, for some operations, even if the types of the - parameters are well defined, we can't determine the return type - generically. For example, [--+] may have return - type _or_ . - -3. Even if we can do a complete analysis, some closures will generate - a combinatoric explosion of branches. - - -Problem 1: Incomplete analysis - -We introduce a new type . This data type has type and -contents - -struct ior_boxed_t { - ior_type *type; /* pointer to class struct */ - void *data; /* generic field, may also contain immediate objects - */ -} - -For example, a boxed fixnum 4711 has type and contents -{ , 4711 }. The boxed type essentially corresponds to Guile's -SCM type. It's just that the 1 or 3 or 7 or 16-bit type tag has been -replaced with a 32-bit type tag (the pointer to the class structure -describing the type of the object). - -This is more inefficient than the SCM type system, but it's no problem -since it won't be used in 95% of all cases. The big advantage -compared to SCM's type system is that it is so simple and uniform. - -I should note here that while SCM and Guile are centered around the -cell representation and all objects either _are_ cells or have a cell -handle, objects in ior will more look like mallocs. This is the -reason why I planned to start with Böhm's GC which has C pointers as -object handles. But it is of course still possible to use a heap, or, -preferably several heaps for different kinds of objects. (Böhm's GC -has multiple heaps for different sizes of objects.) If we write a -custom GC, we can increase speed further. - - -Problem 3 (yes, I skipped :) Combinatoric explosion - -We simply don't generate all possible branches. In the interpreter we -generate branches "just-too-late" (well, it's normally called "lazy -compilation" or "just-in-time", but if it was "in-time", the procedure -would already be compiled when it was needed, right? :) as when Guile -memoizes or when a Java machine turns byte-codes into machine code, or -as when GOOPS turns methods into cmethods for that matter. - -Have noticed that branches (although still without return type -information) already exist in GOOPS? They are currently called -"cmethods" and are generated on demand from the method code and put -into the GF cache during evaluation of GOOPS code. :-) (I have not -utilized this fully yet. I plan to soon use this method compilation -(into branches) to eliminate almost all type dispatch in calls to -accessors.) - -For the compiler, we use profiling information, just as the modern GCC -scheduler, or else relies on some type analysis (if a procedure says -(+ x y), x is not normally a but rather some subclass of -) and some common sense (it's usually more important to -generate branches than branches). - -The rest of the cases can be handled by -branches. We can, for -example, have a: - -[--bar] = - (branch ((x ) (y )) - ... - ([-foo] x) - ...) - -[-foo] will use an efficient type dispatch mechanism (for -example akin to the GOOPS one) to select the right branch of -`display'. - - -Problem 2: Ambiguous return type - -If the return type of a branch is ambiguous, we simply define the -return type as , and box data at the point in the branch where -it can be decided which type of data we will return. This is how -things can be handled in the general case. However, we might be able -to handle things in a more neat way, at least in some cases: - -During compilation to byte code, we'll probably use an intermediate -representation in continuation passing style. We might even use a -subtype of branches reprented as continuations (not a heavy -representation, as in Guile and SCM, but probably not much more than a -function pointer). This is, for example, one way of handling tail -recursion, especially mutual tail recursion. - -One case where we would like to try really hard not to box data is -when fixnums "overflow into" bignums. - -Let's say that the branch [--bar] contains a form - - (+ x y) - -where the type analyzer knows that x and y are fixnums. We then split -the branch right after the form and let it fork into two possible -continuation branches bar1 and bar2: - -[The following is only pseudo code. It can be made efficient on the C - level. We can also use the asm compiler directive in conditional - compilation for GCC on i386. We could even let autoconf/automake - substitute an architecture specific solution for multiple - architectures, but still support a C level default case.] - - (if (sum-over/underflow? x y) - (bar1 (fixnum->bignum x) (fixnum->bignum y) ...) - (bar2 x y ...)) - -bar1 begins with the evaluation of the form - - ([--+] x y) - -while bar 2 begins with - - ([--+] x y) - -Note that the return type of each of these forms is unambiguous. - - -Now some random points from the design: - -* The basic concept in Ior is the class. A type is a concrete class. - Classes which are subclasses of are concrete, otherwise they - are abstract. - -* A procedure is a collection of methods. Each method can have - arbitrary number of parameters of arbitrary class (not type). - -* The type of a method is the tuple of it's argument classes. - -* The type of a procedure is the set of it's method types. - -But the most important new concept is the branch. -Regard the procedure: - -(define (half x) - (quotient x 2)) - -The procedure half will have the single method - - (method ((x )) - (quotient x 2)) - -When `(half 128)' is called the Ior evaluator will create a new branch -during the actual evaluation. I'm now going to extend the branch -syntax by adding a second list of formals: the continuations of the -branch. - -* The type of a branch is namely the tuple of the tuple of it's - argument types (not classes!) and the tuple of it's continuation - argument types. The branch generated above will be: - - (branch ((x ) ((c )) - (c (quotient x 2))) - - If the method - - (method ((x ) (y )) - (quotient (+ x 1) y)) - - is called with arguments 1 and 2 it results in the branch - - (branch ((x ) (y )) ((c1 ) (c2 )) - (quotient (+ x 1 c3) 2)) - - where c3 is: - - (branch ((x ) (y )) ((c )) - (quotient (+ (fixnum->bignum x) 1) 2) - -The generated branches are stored in a cache in the procedure object. - - -But wait a minute! What about variables and data structures? - -In essence, what we do is that we fork up all data paths so that they -can be typed: We put the type tags on the _data paths_ instead of on -the data itself. You can look upon the "branches" as tubes of -information where the type tag is attached to the tube instead of on -what passes through it. - -Variables and data structures are part of the "tubes", so they need to -be typed. For example, the generic pair looks like: - -(define-class () - car-type - car - cdr-type - cdr) - -But note that since car and cdr are generic procedures, we can let -more efficient pairs exist in parallel, like - -(define-class () - (car (class )) - (cdr (class ))) - -Note that instances of this last type only takes two words of memory! -They are easy to use too. We can't use `cons' or `list' to create -them, since these procedures can't assume immutability, but we don't -need to specify the type in our program. Something like - - (const-cons 1 x) - -where x is in the data flow path tagged as , or - - (const-list 1 2 3) - - -Some further notes: - -* The concepts module and instance are the same thing. Using other - modules means 1. creating a new module class which inherits the - classes of the used modules and 2. instantiating it. - -* Module definitions and class definitions are equivalent but - different syntactic sugar adapted for each kind of use. - -* (define x 1) means: create an instance variable which is itself a - subclass of with initial value 1 (which is an instance of - ). - - -The interpreter is a mixture between a stack machine and a register -machine. The evaluator looks like this... :) - - /* the interpreter! */ - if (!setjmp (ior_context->exit_buf)) -#ifndef i386_GCC - while (1) -#endif - (*ior_continue) (IOR_MICRO_OP_ARGS); - -The branches are represented as an array of pointers to micro -operations. In essence, the evaluator doesn't exist in itself, but is -folded out over the entire implementation. This allows for an extreme -form of modularity! - -The i386_GCC is a machine specific optimization which avoids all -unnecessary popping and pushing of the CPU stack (which is different -from the Ior data stack). - -The execution environment consists of - -* a continue register similar to the program counter in the CPU -* a data stack (where micro operation arguments and results are stored) -* a linked chain of environment frames (but look at exception below!) -* a dynamic context - -I've written a small baby Ior which uses Guile's infrastructure. -Here's the context from that baby Ior: - -typedef struct ior_context_t { - ior_data_t *env; /* rest of environment frames */ - ior_cont_t save_continue; /* saves or represents continuation */ - ior_data_t *save_env; /* saves or represents environment */ - ior_data_t *fluids; /* array of fluids (use GC_malloc!) */ - int n_fluids; - int fluids_size; - /* dynwind chain is stored directly in the environment, not in context */ - jmp_buf exit_buf; - IOR_SCM guile_protected; /* temporary */ -} ior_context_t; - -There's an important exception regarding the lowest environment -frame. That frame isn't stored in a separate block on the heap, but -on Ior's data stack. Frames are copied out onto the heap when -necessary (for example when closures "escape"). - - -Now a concrete example: - -Look at: - -(define sum - (lambda (from to res) - (if (= from to) - res - (sum (+ 1 from) to (+ from res))))) - -This can be rewritten into CPS (which captures a lot of what happens -during flow analysis): - -(define sum - (lambda (from to res c1) - (let ((c2 (lambda (limit?) - (let ((c3 (lambda () - (c1 res))) - (c4 (lambda () - (let ((c5 (lambda (from+1) - (let ((c6 (lambda (from+res) - (sum from+1 to from+res c1)))) - (_+ from res c6))))) - (_+ 1 from c5))))) - (_if limit? c3 c4))))) - (_= from to c2)))) - -Finally, after branch expansion, some optimization, code generation, -and some optimization again, we end up with the byte code for the two -branches (here marked by labels `sum' and `sumbig'): - - c5 - (ref -3) - (shift -1) - (+ c4big) - ;; c4 - (shift -2) - (+ 1 sumbig) - ;; c6 - sum - (shift 3) - (ref2 -3) - ;; c2 - (if!= c5) - ;; c3 - (ref -1) - ;; c1 - (end) - - c5big - (ref -3) - (shift -1) - (+ ) - c4big - (shift -2) - (+ 1) - ;; c6 - sumbig - (shift 3) - (ref2 -3) - ;; c2 - (= ) - (if! c5big) - ;; c3 - (ref -1) - ;; c1 - (end) - -Let's take a closer look upon the (+ 1 sumbig) micro -operation. The generated assembler from the Ior C source + machine -specific optimizations for i386_GCC looks like this (with some rubbish -deleted): - -ior_int_int_sum_intbig: - movl 4(%ebx),%eax ; fetch arg 2 - addl (%ebx),%eax ; fetch arg 1 and do the work! - jo ior_big_sum_int_int ; dispatch to other branch on overflow - movl %eax,(%ebx) ; store result in first environment frame - addl $8,%esi ; increment program counter - jmp (%esi) ; execute next opcode - -ior_big_sum_int_int: - -To clearify: This is output from the C compiler. I added the comments -afterwards. - -The source currently looks like this: - -IOR_MICRO_BRANCH_2_2 ("+", int, big, sum, int, int, 1, 0) -{ - int res = IOR_ARG (int, 0) + IOR_ARG (int, 1); - IOR_JUMP_OVERFLOW (res, ior_big_sum_int_int); - IOR_NEXT2 (z); -} - -where the macros allow for different definitions depending on if we -want to play pure ANSI or optimize for a certain machine/compiler. - -The plan is actually to write all source in the Ior language and write -Ior code to translate the core code into bootstrapping C code. - -Please note that if i386_GCC isn't defined, we run plain portable ANSI C. - - -Just one further note: - -In Ior, there are three modes of evaluation - -1. evaluating and type analyzing (these go in parallel) -2. code generation -3. executing byte codes - -It is mode 3 which is really fast in Ior. - -You can look upon your program as a web of branch segments where one -branch segment can be generated from fragments of many closures. Mode -switches doesn't occur at the procedure borders, but at "growth -points". I don't have time to define them here, but they are based -upon the idea that the continuation together with the type signature -of the data flow path is unique. - -We normally run in mode 3. When we come to a source growth point -(essentially an apply instruction) for uncompiled code we "dive out" -of mode 3 into mode 1 which starts to eval/analyze code until we come -to a "sink". When we reach the "sink", we have enough information -about the data path to do code generation, so we backtrack to the -source growth point and grow the branch between source and sink. -Finally, we "dive into" mode 3! - -So, code generation doesn't respect procedure borders. We instead get -a very neat kind of inlining, which, e.g., means that it is OK to use -closures instead of macros in many cases. ----------------------------------------------------------------------- -Ior and module system -===================== - -How, exactly, should the module system of Ior look like? - -There is this general issue of whether to have a single-dispatch or -multi-dispatch system. Personally, I see that Scheme already use -multi-dispatch. Compare (+ 1.0 2) and (+ 1 2.0). - -As you've seen if you've read the notes about Ior design, efficiency -is not an issue here, since almost all dispatch will be eliminated -anyway. - -Also, note an interesting thing: GOOPS actually has a special, -implicit, argument to all of it's methods: the lexical environment. -It would be very ugly to add a second, special, argument to this. - -Of course, the theoreticians have already recognised this, and in many -systems, the implicit argument (the object) and the environment for -the method is the same thing. - -I think we should especially take impressions from Matthias Blume's -module/object system. - -The idea, now, for Ior (remember that everything about Ior is -negotiable between us) is that a module is a type, as well as an -instance of that type. The idea is that we basically keep the GOOPS -style of methods, with the implicit argument being the module object -(or some other lexical environment, in a chain with the module as -root). - -Let's say now that module C uses modules A and B. Modules A and B -both exports the procedure `foo'. But A:foo and B:foo as different -sets of methods. - -What does this mean? Well, it obviously means that the procedure -`foo' in module C is a subtype of A:foo and B:foo. Note how this is -similar in structure to slot inheritance: When class C is created with -superclasses A and B, the properties of a slot in C are created -through slot inheritance. One way of interpreting variable foo in -module A is as a slot with init value foo. Through the MOP, we can -specify that procedure slot inheritance in a module class implies -creation of new init values through inheritance. - -This may look like a kludge, and perhaps it is, and, sure, we are not -going to accept any kludges in Ior. But, it might actually not be a -kludge... - -I think it is commonly accepted by computer scientists that a module, -and/or at least a module interface is a type. Again, this type can be -seen as the set of types of the functions in the interface. The types -of our procedures are the set of branch types the provide. It is then -natural that a module using two other modules create new procedure -types by folding. - -This thing would become less cloudy (yes, this is a cloudy part of my -reasoning; I meant previously that the interpreter itself is now -clear) if module interfaces were required to be explicitly types. - -Actually, this would fit much better together with the rest of Ior's -design. On one hand, we might be free to introduce such a restriction -(compiler writers would applaud it), since R5RS hasn't specified any -module system. On the other hand, it might be strange to require -explicit typing when Scheme is fundamentally implicitly types... - -We also have to consider that a module has an "inward" face, which is -one type, and possibly many "outward" faces, which are different -types. (Compare the idea of "interfaces" in Scheme48.) - -It thus, seems that, while a module can truly be an Ior class, the -reverse should probably not hold in the general case... - -Unless - - instance <-> module proper - class of the instance <-> "inward interface" - superclasses <-> "outward interfaces + inward uses" - -...hmm, is this possible to reconcile with Rees' object system? - -Please think about these issues. We should try to end up with a -beautiful and consistent object/module system. - ----------------------------------------------------------------------- - -Here's a difficult problem in Ior's design: - -Let's say that we have a mutable data structure, like an ordinary -list. Since, in Ior, the type tag (which is really a pointer to a -class structure) is stored separately from the data, it is thinkable -that another thread modifies the location in the list between when our -thread reads the type tag and when it reads the data. - -The reading of type and data must be made atomic in some way. -Probably, some kind of locking of the heap is required. It's just -that it may cause a lot of overhead to look the heap at every *read* -from a mutable data structure. - -Look how much trouble those set!-operations cause! Not only does it -force us to store type tags for each car and cdr in the list, but it -also forces a lot of explicit dispatch to be done, and causes troubles -in a threaded system... - ----------------------------------------------------------------------- - -Jim Blandy writes: - -> We also should try to make less work for the GC, by avoiding consing -> up local environments until they're closed over. - -Did the texts which I sent to you talk about Ior's solution? - -It basically is: Use *two* environment "arguments" to the evaluator -(in Ior, they aren't arguments but registers): - -* One argument is a pointer to the "top" of an environment stack. - This is used in the "inner loop" for very efficient access to - in-between results. The "top" segment of the environment stack is - also regarded as the first environment frame in the lexical - environment. ("top" is bottom on a stack which grows downwards) - -* The other argument points to a structure holding the evaluation - context. In this context, there is a pointer to the chain of the - rest of the environment frames. Note that since frames are just - blocks of SCM values, you can very efficiently "release" a frame - into the heap by block copying it (remember that Ior uses Boehms GC; - this is how we allocate the block).