From f888a1b586b63a593e7da16fd7b8f0e82b72b2a3 Mon Sep 17 00:00:00 2001
From: Mikael Djurfeldt <djurfeldt@nada.kth.se>
Date: Sun, 13 Aug 2000 02:31:46 +0000
Subject: [PATCH] *** empty log message ***

---
 devel/translation/langtools.text | 374 ++++++++++++++++++++++++++-----
 1 file changed, 323 insertions(+), 51 deletions(-)

diff --git a/devel/translation/langtools.text b/devel/translation/langtools.text
index d0d47e277..1c776e383 100644
--- a/devel/translation/langtools.text
+++ b/devel/translation/langtools.text
@@ -1,5 +1,7 @@
 * Introduction
 
+Version: $Id: langtools.text,v 1.2 2000-08-13 02:31:46 mdj Exp $
+
 This is a proposal for how Guile could interface with language
 translators.  It will be posted on the Guile list and revised for some
 short time (days rather than weeks) before being implemented.
@@ -86,8 +88,49 @@ by the module `(core config)':
 The `load' command will match filenames against this alist and choose
 the translator to use accordingly.
 
+There will be a default alist for common translators.  For translators
+not listed, the alist has to be extended in .guile just as Emacs users
+extend auto-mode-alist in .emacs.
+
 ** Module header
 
+You specify the language used by a module with the :language option in
+the module header.  (See below under "Module configuration language".)
+
+* Module system
+
+This section describes how the Guile module system is adapted to use
+with other languages.
+
+** Module configuration language
+
+*** The `(config)' module
+
+Guile has a sophisticated module system.  We don't require each
+translator implementation to implement its own syntax for modules.
+That would be too much work for the implementor, and users would have
+to learn the module system anew for each syntax.
+
+Instead, the module `(config)' exports the module header form
+`(define-module ...)'.
+
+The config module also exports a number of primitives by which you can
+customize the Guile library, such as `language-alist' and `load-path'.
+
+*** Default module environment
+
+The bindings of the config module is available in the default
+interaction environment when Guile starts up.  This is because the
+config module is on the module use list for the startup environment.
+
+However, config bindings are *not* available by default in new
+modules.
+
+The default module environment provides bindings from the R5RS module
+only.
+
+*** Module headers
+
 The module header of the current module system is the form
 
   (define-module NAME OPTION1 ...)
@@ -131,82 +174,303 @@ foo.el:
   ...)
 ----------------------------------------------------------------------
 
+** Repl commands
+
+Up till now, Guile has been dependent upon the available bindings in
+the selected module in order to do basic operations such as moving to
+a different module, enter the debugger or getting documentation.
+
+This is not acceptable since we want be able to control Guile
+consistently regardless of in which module we are, and sinc we don't
+want to equip a module with bindings which don't have anything to do
+with the purpose of the module.
+
+Therefore, the repl provides a special command language on top of
+whatever syntax the current module provides.  (Scheme48 and RScheme
+provides similar repl command languages.)
+
+*** Repl command syntax
+
+Normally, repl commands have the syntax
+
+  ,COMMAND ARG1 ...
+
+Input starting with arbitrary amount of whitespace + a comma thus
+works as an escape syntax.
+
+This syntax is probably compatible with all languages.  (Note that we
+don't need to activate the lexer of the language until we've checked
+if the first non-whitespace char is a comma.)
+
+(Hypothetically, if this would become a problem, we can provide means
+of disabling this behaviour of the repl and let that particular
+language module take sole control of reading at the repl prompt.)
+
+Among the commands available are
+
+*** ,in MODULE
+
+Select module named MODULE, that is any new expressions typed by the
+user after this command will be evaluated in the evaluation
+environment provided by MODULE.
+
+*** ,in MODULE EXPR
+
+Evaluate expression EXPR in MODULE.  EXPR has the syntax supplied by
+the language used by MODULE.
+
+*** ,use MODULE
+
+Import all bindings exported by MODULE to the current module.
+
 * Language modules
 
+Since code written in any kind of language should be able to implement
+most tasks, which may include reading, evaluating and writing, and
+generally computing with, expressions and data originating from other
+languages, we want the basic reading, evaluation and printing
+operations to be independent of the language.
+
+That is, instead of supplying separate `read', `eval' and `write'
+procedures for different languages, a language module is required to
+use the system procedures in the translated code.
+
+This means that the behaviour of `read', `eval' and `write' are
+context dependent.  (See further "How Guile system procedures `read',
+`eval', `write' use language modules" below.)
+
+** Language data types
+
+Each language module should try to use the fundamental Scheme data
+types as far as this is possible.
+
+Some data types have important differences in semantics between
+languages, though, and all required data types may not exist in
+Guile.
+
+In such cases, the language module must supply its own, distinct, data
+types.  So, each language supported by Guile uses a certain set of
+data types, with the basic Scheme data types as the intersection
+between all sets.
+
+Specifically, syntax trees representing source code expressions should
+normally be a distinct data type.
+
+** Foreign language escape syntax
+
+Note that such data can flow freely between modules.  In order to
+accomodate data with different native syntaxes, each language module
+provides a foreign language escape syntax.  In Scheme, this syntax
+uses the sharp comma extension specified by SRFI-10.  The read
+constructor is simply the last symbol in the long language name (which
+is usually the same as the short language name).
+
+** Example1
+
+Characters have the syntax in Scheme and in ctax.  Lists currently
+have syntax in Scheme but lack ctax syntax.  Enums have syntax in ctax
+but lack Scheme syntax.
+
+The following table now shows the syntax used for reading and writing
+these expressions in module A using the language scheme, and module B
+using the language ctax (we assume that the foreign language escape
+syntax in ctax is #LANGUAGE EXPR):
+
+	  A		   B
+
+chars	  #\X		   'X'
+
+lists	  (1 2 3)	   #scheme (1 2 3)
+
+enums	  #,(ctax ENUM)	   ENUM
+
+** Example2
+
+  A user is typing expressions in a ctax module which imports the
+  bindings x and y from the module `(foo)':
+
+  ctax> x = read ();
+  1+2;
+  1+2;
+  ctax> x
+  1+2;
+  ctax> y = 1;
+  1
+  ctax> y;
+  1  
+  ctax> ,in (guile-user)
+  guile> ,use (foo)
+  guile> x
+  #,(ctax 1+2;)
+  guile> y
+  1
+  guile>
+
+The example shows that ctax uses a distinct representation for ctax
+expressions, but Scheme integers for integers.
+
+** Language module interface
+
 A language module is an ordinary Guile module importing bindings from
 other modules and exporting bindings through its public interface.
 
-It is required to export the following procedures:
+It is required to export the following variable and procedures:
 
-  language-environment --> ENVIRONMENT
+*** language-environment --> ENVIRONMENT
 
-    Returns a fresh top-level ENVIRONMENT (a module) where expressions
-    in this language are evaluated by default.
+Returns a fresh top-level ENVIRONMENT (a module) where expressions
+in this language are evaluated by default.
 
-    Modules using this language will by default have this environment
-    on their use list.
+Modules using this language will by default have this environment
+on their use list.
 
-    The intention is for this procedure to provide the "run-time
-    environment" for the language.
+The intention is for this procedure to provide the "run-time
+environment" for the language.
 
-  read-expression PORT --> EXPRESSION
+*** native-read PORT --> OBJECT
 
-    Read next expression in the foreign syntax from PORT and return an
-    object EXPRESSION representing it.
+Read next expression in the foreign syntax from PORT and return an
+object OBJECT representing it.
 
-    It is entirely up to the language module to define what one
-    expression is.  The representation of EXPRESSION is also chosen by
-    the language module.
+It is entirely up to the language module to define what one
+expression is, that is, how much to read.
 
-    This procedure will be called during interactive use (the user
-    types expressions at a prompt) and when the system `read'
-    procedure is called when a module using this language is selected.
+In lisp-like languages, `native-read' corresponds to `read'.  Note
+that in such languages, OBJECT need not be source code, but could
+be data.
 
-  translate EXPRESSION --> SCHEMECODE
+The representation of OBJECT is also chosen by the language
+module.  It can consist of Scheme data types, data types distinct for
+the language, or a mixture.
 
-    Translate an EXPRESSION into SCHEMECODE.
+There is one requirement, however: Distinct data types must be
+instances of a subclass of `language-specific-class'.
 
-    EXPRESSION can be anything returned by `read-expression'.
+This procedure will be called during interactive use (the user
+types expressions at a prompt) and when the system `read'
+procedure is called at a time when a module using this language is
+selected.
 
-    SCHEMECODE is Scheme source code represented using ordinary Scheme
-    data.  It will be passed to `eval' in an environment containing
-    bindings in the environment returned by `language-environment'.
+Some languages (for example Python) parse differently depending if
+its an interactive or non-interactive session.  Guile prvides the
+predicate `interactive-port?' to test for this.
 
-    This procedure will be called duing interactive use and when the
-    system `eval
+*** language-specific-class
 
-  translate-all PORT --> THUNK
+This variable contains the superclass of all non-Scheme data-types
+provided by the language.
 
-    Translate the entire stream of characters PORT until #<eof>.
-    Return a THUNK which can be called repeatedly like this:
+*** native-write OBJECT PORT
 
-      THUNK --> SCHEMECODE
+This procedure prints the OBJECT on PORT using the specific
+language syntax.
 
-    Each call will yield a new piece of scheme code.  #f is returned
-    to signal the end of the stream of scheme expressions.
+*** write-foreign-syntax OBJECT LANGUAGE NATIVE-WRITE PORT
 
-    This procedure will be called by the system `load' command and by
-    the module system when loading files.
+Write OBJECT in the foreign language escape syntax of this module.
+The object is specific to language LANGUAGE and can be written using
+NATIVE-WRITE.
 
-    The intensions are:
+Here's an implementation for Scheme:
 
-    1. To let the language module decide when and in how large chunks
-       to do the processing.  It may choose to do all processing at
-       the time translate-all is called, all processing when THUNK is
-       called the first time, or small pieces of processing each time
-       THUNK is called, or any conceivable combination.
+(define (write-foreign-syntax object language native-write port)
+  (format port "#(~A " language))
+  (native-write object port)
+  (display #\) port)
 
-    2. To let the language module decide in how large chunks to output
-       the resulting Scheme code in order not to overload memory.
+*** translate EXPRESSION --> SCHEMECODE
 
-    3. To enable the language module to use temporary files, and
-       whole-module analysis and optimization techniques.
+Translate an EXPRESSION into SCHEMECODE.
 
-  untranslate SCHEMECODE --> EXPRESSION
+EXPRESSION can be anything returned by `read'.
 
-    Attempt to do the inverse of `translate'.  An approximation is
-    OK.  It is also OK to return #f.  This procedure will be called
-    from the debugger, when generating error messages, backtraces etc.
+SCHEMECODE is Scheme source code represented using ordinary Scheme
+data.  It will be passed to `eval' in an environment containing
+bindings in the environment returned by `language-environment'.
+
+This procedure will be called duing interactive use and when the
+system `eval
+
+*** translate-all PORT [ALIST] --> THUNK
+
+Translate the entire stream of characters PORT until #<eof>.
+Return a THUNK which can be called repeatedly like this:
+
+  THUNK --> SCHEMECODE
+
+Each call will yield a new piece of scheme code.  #f is returned
+to signal the end of the stream of scheme expressions.  (Note that
+it isn't meaningful for THUNK to return immediates.  In fact, it's
+only meaningful to return expressions with side-effects.)
+
+The optional argument ALIST provides compilation options for the
+translator:
+
+  (debug . #t) means produce code suitable for debugging
+
+This procedure will be called by the system `load' command and by
+the module system when loading files.
+
+The intensions are:
+
+1. To let the language module decide when and in how large chunks
+   to do the processing.  It may choose to do all processing at
+   the time translate-all is called, all processing when THUNK is
+   called the first time, or small pieces of processing each time
+   THUNK is called, or any conceivable combination.
+
+2. To let the language module decide in how large chunks to output
+   the resulting Scheme code in order not to overload memory.
+
+3. To enable the language module to use temporary files, and
+   whole-module analysis and optimization techniques.
+
+*** untranslate SCHEMECODE --> EXPRESSION
+
+Attempt to do the inverse of `translate'.  An approximation is OK.  It
+is also OK to return #f.  This procedure will be called from the
+debugger, when generating error messages, backtraces etc.
+
+The debugger uses the local evaluation environment to determine from
+which module an expression come.  This is how the debugger can know
+which `untranslate' procedure to call for a given expression.
+
+(This is used currently to decide whether which backtrace frames to
+display.  System modules use the option :no-backtrace to prevent
+displaying of Guile's internals to the user.)
+
+Note that `untranslate' can use source-properties set by `native-read'
+to give hints about how to do the reverse translation.  Such hints
+could for example be the filename, and line and column numbers for the
+source expression, or an actual copy of the source expression.
+
+** How Guile system procedures `read', `eval', `write' use language modules
+
+*** read
+
+The idea is that the `read' exported from the R5RS library will
+continue work when called from other languages, and will keep its
+semantics.
+
+A call to `read' simply means "read in an expression from PORT using
+the syntax associated with that port".
+
+Each module carries information about its language.
+
+When an input port is created for a module to be read or during
+interaction with a given module, this information is copied to the
+port object.
+
+read uses this information to call `native-read' in the correct
+language module.
+
+*** eval
+
+[To be written.]
+
+*** write
+
+[To be written.]
 
 * Error handling
 
@@ -284,11 +548,19 @@ it.
 so that environment list structures can't leak out on the Scheme
 level.  (This has already been done in SCM.)
 
-** Introduce "read-states" (symmetrical to "print-states")
+** Introduce new fields in input ports
 
-These carries state information belonging to a read call chain, such
-as which keyword syntax to support, whether to be case sensitive or
-not, and, which lexical grammar to use.
+These carries state information such as
+
+*** which keyword syntax to support
+
+*** whether to be case sensitive or not
+
+*** which lexical grammar to use
+
+*** whether the port is used in an interactive session or not
+
+There will be a new Guile primitive `interactive-port?' testing for this.
 
 ** Move configuration of keyword syntax and case sensitivity to the read-state