Mikael's ideas on a new type of Scheme interpreter

2025-07-02 23:50:47 +02:00 · 2000-08-17 04:10:42 +00:00 · 2000-08-17 04:10:42 +00:00 · 2fb8bdabd2
commit 2fb8bdabd2
parent 53bb550828
2 changed files with 665 additions and 0 deletions
--- a/devel/vm/ior/ior-intro.text
+++ b/devel/vm/ior/ior-intro.text
--- a/devel/vm/ior/ior.text
+++ b/devel/vm/ior/ior.text
@ -0,0 +1,665 @@
+***
+*** These notes about the design of a new type of Scheme interpreter
+*** "Ior" are cut out from various emails from early spring 2000.
+***
+*** MDJ 000817 <djurfeldt@nada.kth.se>
+***
+
+Generally, we should try to make a design which is clean and
+minimalistic in as many respects as possible.  For example, even if we
+need more primitives than those in R5RS internally, I don't think
+these should be made available to the user in the core, but rather be
+made available *through* libraries (implementation in core,
+publication via library).
+
+The suggested working name for this project is "Ior" (Swedish name for
+the donkey in "Winnie the Pooh" :).  If, against the odds, we really
+would succeed in producing an Ior, and we find it suitable, we could
+turn it into a Guile 2.0 (or whatever).  (The architecture still
+allows for support of the gh interface and uses conservative GC (Hans
+Böhm's, in fact).)
+
+ Beware now that I'm just sending over my original letter, which is
+ just a sketch of the more detailed, but cryptic, design notes I made
+ originally, which are, in turn, not as detailed as the design has
+ become now. :)
+
+ Please also excuse the lack of structure.  I shouldn't work on this at
+ all right now.  Choose for yourselves if you want to read this
+ unstructured information or if you want to wait until I've structured
+ it after end of January.
+
+But then I actually have to blurt out the basic idea of my
+architecture already now.  (I had hoped to present you with a proper
+and fairly detailed spec, but I won't be able to complete such a spec
+quickly.)
+
+
+The basic idea is this:
+
+* Don't waste time on non-computation!
+
+Why waste a lot of time on type-checks, unboxing and boxing of data?
+Neither of these actions do any computations!
+
+I'd like both interpreter and compiled code to work directly with data
+in raw, native form (integers represented as 32bit longs, inexact
+numbers as doubles, short strings as bytes in a word, longer strings
+as a normal pointer to malloced memory, bignums are just pointers to a
+gmp (GNU MultiPrecision library) object, etc.)
+
+* Don't we need to dispatch on type to know what to do?
+
+But don't we need to dispatch on the type in order to know how to
+compute with the data?  E.g., `display' does entirely different
+computations on a <fixnum> and a <string>.  (<fixnum> is an integer
+between -2^31 and 2^31-1.)
+
+The answer is *no*, not in 95% of all cases.  The main reason is that
+the interpreter does type analysis while converting closures to
+bytecode, and knows already when _calling_ `display' what type it's
+arguments has.  This means that the bytecode compiler can choose a
+suitable _version_ of `display' which handles that particular type.
+
+
+
+This type analysis is greatly simplified by the fact that just as the
+type analysis _results_ in the type of the argument in the call to
+`display', and, thus, we can select the correct _version_ of
+`display', the closure byte-code itself will only be one _version_ of
+the closure with the types of its arguments fixed at the start of the
+analysis.
+
+As you already have understood by now, the basic architecture is that
+all procedures are generic functions, and the "versions" I'm speaking
+about is a kind of methods.  Let's call them "branches" by now.
+
+For example:
+
+(define foo
+  (lambda (x)
+    ...
+    (display x)
+    ...)
+
+may result in the following two branches:
+
+1. [<fixnum>-foo] =
+     (branch ((x <fixnum>))
+       ...
+       ([<fixnum>-display] x)
+       ...)
+
+2. [<string>-foo] =
+     (branch ((x <string>))
+       ...
+       ([<string>-display] x)
+       ...)
+
+and a new closure
+
+(define bar
+  (lambda (x y)
+    ...
+    (foo x)
+    ...))
+
+results in
+
+[<fixnum>-<fixnum>-bar] =
+  (branch ((x <fixnum>) (y <fixnum>))
+    ...
+    ([<fixnum>-foo] x)
+    ...)
+
+Note how all type dispatch is eliminated in these examples.
+
+As a further reinforcement to the type analysis, branches will not
+only have typed parameters but also have return types.  This means
+that the type of a branch will look like
+
+  <type 1> x ... x <type n> --> <type r>
+
+In essence, the entire system will be very ML-like internally, and we
+can benefit from the research done on ML-compilation.
+
+However, we now get three major problems to confront:
+
+1. In the Scheme language not all situations can be completely type
+   analyzed.
+
+2. In particular, for some operations, even if the types of the
+   parameters are well defined, we can't determine the return type
+   generically.  For example, [<fixnum>-<fixnum>-+] may have return
+   type <fixnum> _or_ <bignum>.
+
+3. Even if we can do a complete analysis, some closures will generate
+   a combinatoric explosion of branches.
+
+
+Problem 1: Incomplete analysis
+
+We introduce a new type <boxed>.  This data type has type <boxed> and
+contents
+
+struct ior_boxed_t {
+  ior_type *type; /* pointer to class struct */
+  void *data;     /* generic field, may also contain immediate objects
+	           */
+}
+
+For example, a boxed fixnum 4711 has type <boxed> and contents
+{ <fixnum>, 4711 }.  The boxed type essentially corresponds to Guile's
+SCM type.  It's just that the 1 or 3 or 7 or 16-bit type tag has been
+replaced with a 32-bit type tag (the pointer to the class structure
+describing the type of the object).
+
+This is more inefficient than the SCM type system, but it's no problem
+since it won't be used in 95% of all cases.  The big advantage
+compared to SCM's type system is that it is so simple and uniform.
+
+I should note here that while SCM and Guile are centered around the
+cell representation and all objects either _are_ cells or have a cell
+handle, objects in ior will more look like mallocs.  This is the
+reason why I planned to start with B<><42>öhm's GC which has C pointers as
+object handles.  But it is of course still possible to use a heap, or,
+preferably several heaps for different kinds of objects.  (B<><42>öhm's GC
+has multiple heaps for different sizes of objects.)  If we write a
+custom GC, we can increase speed further.
+
+
+Problem 3 (yes, I skipped :) Combinatoric explosion
+
+We simply don't generate all possible branches.  In the interpreter we
+generate branches "just-too-late" (well, it's normally called "lazy
+compilation" or "just-in-time", but if it was "in-time", the procedure
+would already be compiled when it was needed, right? :) as when Guile
+memoizes or when a Java machine turns byte-codes into machine code, or
+as when GOOPS turns methods into cmethods for that matter.
+
+Have noticed that branches (although still without return type
+information) already exist in GOOPS?  They are currently called
+"cmethods" and are generated on demand from the method code and put
+into the GF cache during evaluation of GOOPS code.  :-)  (I have not
+utilized this fully yet.  I plan to soon use this method compilation
+(into branches) to eliminate almost all type dispatch in calls to
+accessors.)
+
+For the compiler, we use profiling information, just as the modern GCC
+scheduler, or else relies on some type analysis (if a procedure says
+(+ x y), x is not normally a <string> but rather some subclass of
+<number>) and some common sense (it's usually more important to
+generate <fixnum> branches than <foobar> branches).
+
+The rest of the cases can be handled by <boxed>-branches.  We can, for
+example, have a:
+
+[<boxed>-<boxed>-bar] =
+  (branch ((x <boxed>) (y <boxed>))
+    ...
+    ([<boxed>-foo] x)
+    ...)
+
+[<boxed>-foo] will use an efficient type dispatch mechanism (for
+example akin to the GOOPS one) to select the right branch of
+`display'.
+
+
+Problem 2: Ambiguous return type
+
+If the return type of a branch is ambiguous, we simply define the
+return type as <boxed>, and box data at the point in the branch where
+it can be decided which type of data we will return.  This is how
+things can be handled in the general case.  However, we might be able
+to handle things in a more neat way, at least in some cases:
+
+During compilation to byte code, we'll probably use an intermediate
+representation in continuation passing style.  We might even use a
+subtype of branches reprented as continuations (not a heavy
+representation, as in Guile and SCM, but probably not much more than a
+function pointer).  This is, for example, one way of handling tail
+recursion, especially mutual tail recursion.
+
+One case where we would like to try really hard not to box data is
+when fixnums "overflow into" bignums.
+
+Let's say that the branch [<fixnum>-<fixnum>-bar] contains a form
+
+  (+ x y)
+
+where the type analyzer knows that x and y are fixnums.  We then split
+the branch right after the form and let it fork into two possible
+continuation branches bar1 and bar2:
+
+[The following is only pseudo code.  It can be made efficient on the C
+ level.  We can also use the asm compiler directive in conditional
+ compilation for GCC on i386.  We could even let autoconf/automake
+ substitute an architecture specific solution for multiple
+ architectures, but still support a C level default case.]
+
+    (if (sum-over/underflow? x y)
+	(bar1 (fixnum->bignum x) (fixnum->bignum y) ...)
+        (bar2 x y ...))
+
+bar1 begins with the evaluation of the form
+
+  ([<bignum>-<bignum>-+] x y)
+
+while bar 2 begins with
+
+  ([<fixnum>-<fixnum>-+] x y)
+
+Note that the return type of each of these forms is unambiguous.
+
+
+Now some random points from the design:
+
+* The basic concept in Ior is the class.  A type is a concrete class.
+  Classes which are subclasses of <object> are concrete, otherwise they
+  are abstract.
+
+* A procedure is a collection of methods. Each method can have
+  arbitrary number of parameters of arbitrary class (not type).
+
+* The type of a method is the tuple of it's argument classes.
+
+* The type of a procedure is the set of it's method types.
+
+But the most important new concept is the branch.
+Regard the procedure:
+
+(define (half x)
+  (quotient x 2))
+
+The procedure half will have the single method
+
+  (method ((x <top>))
+    (quotient x 2))
+
+When `(half 128)' is called the Ior evaluator will create a new branch
+during the actual evaluation.  I'm now going to extend the branch
+syntax by adding a second list of formals: the continuations of the
+branch.
+
+* The type of a branch is namely the tuple of the tuple of it's
+  argument types (not classes!) and the tuple of it's continuation
+  argument types.  The branch generated above will be:
+
+  (branch ((x <fixnum>) ((c <fixnum>))
+    (c (quotient x 2)))
+
+  If the method
+
+  (method ((x <top>) (y <top>))
+    (quotient (+ x 1) y))
+
+  is called with arguments 1 and 2 it results in the branch
+
+  (branch ((x <fixnum>) (y <fixnum>)) ((c1 <fixnum>) (c2 <bignum>))
+    (quotient (+ x 1 c3) 2))
+
+  where c3 is:
+
+  (branch ((x <fixnum>) (y <fixnum>)) ((c <bignum>))
+    (quotient (+ (fixnum->bignum x) 1) 2)
+
+The generated branches are stored in a cache in the procedure object.
+
+
+But wait a minute!  What about variables and data structures?
+
+In essence, what we do is that we fork up all data paths so that they
+can be typed: We put the type tags on the _data paths_ instead of on
+the data itself.  You can look upon the "branches" as tubes of
+information where the type tag is attached to the tube instead of on
+what passes through it.
+
+Variables and data structures are part of the "tubes", so they need to
+be typed.  For example, the generic pair looks like:
+
+(define-class <pair> ()
+  car-type
+  car
+  cdr-type
+  cdr)
+
+But note that since car and cdr are generic procedures, we can let
+more efficient pairs exist in parallel, like
+
+(define-class <immutable-fixnum-list> ()
+  (car (class <fixnum>))
+  (cdr (class <immutable-fixnum-list>))) 
+
+Note that instances of this last type only takes two words of memory!
+They are easy to use too.  We can't use `cons' or `list' to create
+them, since these procedures can't assume immutability, but we don't
+need to specify the type <fixnum> in our program.  Something like
+
+  (const-cons 1 x)
+
+where x is in the data flow path tagged as <immutable-fixnum-list>, or
+
+  (const-list 1 2 3)
+
+
+Some further notes:
+
+* The concepts module and instance are the same thing.  Using other
+  modules means 1. creating a new module class which inherits the
+  classes of the used modules and 2. instantiating it.
+
+* Module definitions and class definitions are equivalent but
+  different syntactic sugar adapted for each kind of use.
+
+* (define x 1) means: create an instance variable which is itself a
+  subclass of <boxed> with initial value 1 (which is an instance of
+  <fixnum>).
+
+
+The interpreter is a mixture between a stack machine and a register
+machine.  The evaluator looks like this...  :)
+
+  /* the interpreter! */
+  if (!setjmp (ior_context->exit_buf))
+#ifndef i386_GCC
+    while (1)
+#endif
+      (*ior_continue) (IOR_MICRO_OP_ARGS);
+
+The branches are represented as an array of pointers to micro
+operations.  In essence, the evaluator doesn't exist in itself, but is
+folded out over the entire implementation.  This allows for an extreme
+form of modularity!
+
+The i386_GCC is a machine specific optimization which avoids all
+unnecessary popping and pushing of the CPU stack (which is different
+from the Ior data stack).
+
+The execution environment consists of
+
+* a continue register similar to the program counter in the CPU
+* a data stack (where micro operation arguments and results are stored)
+* a linked chain of environment frames (but look at exception below!)
+* a dynamic context
+
+I've written a small baby Ior which uses Guile's infrastructure.
+Here's the context from that baby Ior:
+
+typedef struct ior_context_t {
+  ior_data_t *env;		/* rest of environment frames */
+  ior_cont_t save_continue;	/* saves or represents continuation */
+  ior_data_t *save_env;		/* saves or represents environment */
+  ior_data_t *fluids;		/* array of fluids (use GC_malloc!) */
+  int n_fluids;
+  int fluids_size;
+  /* dynwind chain is stored directly in the environment, not in context */
+  jmp_buf exit_buf;
+  IOR_SCM guile_protected;	/* temporary */
+} ior_context_t;
+
+There's an important exception regarding the lowest environment
+frame.  That frame isn't stored in a separate block on the heap, but
+on Ior's data stack.  Frames are copied out onto the heap when
+necessary (for example when closures "escape").
+
+
+Now a concrete example:
+
+Look at:
+
+(define sum
+  (lambda (from to res)
+    (if (= from to)
+	res
+	(sum (+ 1 from) to (+ from res)))))
+
+This can be rewritten into CPS (which captures a lot of what happens
+during flow analysis):
+
+(define sum
+  (lambda (from to res c1)
+    (let ((c2 (lambda (limit?)
+		(let ((c3 (lambda ()
+			    (c1 res)))
+		      (c4 (lambda ()
+			    (let ((c5 (lambda (from+1)
+					(let ((c6 (lambda (from+res)
+						    (sum from+1 to from+res c1))))
+					  (_+ from res c6)))))
+			      (_+ 1 from c5)))))
+		  (_if limit? c3 c4)))))
+      (_= from to c2))))
+
+Finally, after branch expansion, some optimization, code generation,
+and some optimization again, we end up with the byte code for the two
+branches (here marked by labels `sum' and `sumbig'):
+
+ c5
+ (ref -3)
+ (shift -1)
+ (+ <fixnum> <fixnum> c4big)
+ ;; c4
+ (shift -2)
+ (+ <fixnum> 1 sumbig)
+ ;; c6
+ sum
+ (shift 3)
+ (ref2 -3)
+ ;; c2
+ (if!= <fixnum> <fixnum> c5)
+ ;; c3
+ (ref -1)
+ ;; c1
+ (end)
+
+ c5big
+ (ref -3)
+ (shift -1)
+ (+ <bignum> <bignum>)
+ c4big
+ (shift -2)
+ (+ <bignum> 1)
+ ;; c6
+ sumbig
+ (shift 3)
+ (ref2 -3)
+ ;; c2
+ (= <bignum> <bignum>)
+ (if! c5big)
+ ;; c3
+ (ref -1)
+ ;; c1
+ (end)
+
+Let's take a closer look upon the (+ <fixnum> 1 sumbig) micro
+operation.  The generated assembler from the Ior C source + machine
+specific optimizations for i386_GCC looks like this (with some rubbish
+deleted):
+
+ior_int_int_sum_intbig:
+	movl 4(%ebx),%eax	; fetch arg 2
+	addl (%ebx),%eax        ; fetch arg 1 and do the work!
+	jo ior_big_sum_int_int  ; dispatch to other branch on overflow
+	movl %eax,(%ebx)	; store result in first environment frame
+	addl $8,%esi		; increment program counter
+	jmp (%esi)		; execute next opcode
+
+ior_big_sum_int_int:
+
+To clearify: This is output from the C compiler.  I added the comments
+afterwards.
+
+The source currently looks like this:
+
+IOR_MICRO_BRANCH_2_2 ("+", int, big, sum, int, int, 1, 0)
+{
+  int res = IOR_ARG (int, 0) + IOR_ARG (int, 1);
+  IOR_JUMP_OVERFLOW (res, ior_big_sum_int_int);
+  IOR_NEXT2 (z);
+}
+
+where the macros allow for different definitions depending on if we
+want to play pure ANSI or optimize for a certain machine/compiler.
+
+The plan is actually to write all source in the Ior language and write
+Ior code to translate the core code into bootstrapping C code.
+
+Please note that if i386_GCC isn't defined, we run plain portable ANSI C.
+
+
+Just one further note:
+
+In Ior, there are three modes of evaluation
+
+1. evaluating and type analyzing (these go in parallel)
+2. code generation
+3. executing byte codes
+
+It is mode 3 which is really fast in Ior.
+
+You can look upon your program as a web of branch segments where one
+branch segment can be generated from fragments of many closures.  Mode
+switches doesn't occur at the procedure borders, but at "growth
+points".  I don't have time to define them here, but they are based
+upon the idea that the continuation together with the type signature
+of the data flow path is unique.
+
+We normally run in mode 3.  When we come to a source growth point
+(essentially an apply instruction) for uncompiled code we "dive out"
+of mode 3 into mode 1 which starts to eval/analyze code until we come
+to a "sink".  When we reach the "sink", we have enough information
+about the data path to do code generation, so we backtrack to the
+source growth point and grow the branch between source and sink.
+Finally, we "dive into" mode 3!
+
+So, code generation doesn't respect procedure borders.  We instead get
+a very neat kind of inlining, which, e.g., means that it is OK to use
+closures instead of macros in many cases.
+----------------------------------------------------------------------
+Ior and module system
+=====================
+
+How, exactly, should the module system of Ior look like?
+
+There is this general issue of whether to have a single-dispatch or
+multi-dispatch system.  Personally, I see that Scheme already use
+multi-dispatch.  Compare (+ 1.0 2) and (+ 1 2.0).
+
+As you've seen if you've read the notes about Ior design, efficiency
+is not an issue here, since almost all dispatch will be eliminated
+anyway.
+
+Also, note an interesting thing: GOOPS actually has a special,
+implicit, argument to all of it's methods: the lexical environment.
+It would be very ugly to add a second, special, argument to this.
+
+Of course, the theoreticians have already recognised this, and in many
+systems, the implicit argument (the object) and the environment for
+the method is the same thing.
+
+I think we should especially take impressions from Matthias Blume's
+module/object system.
+
+The idea, now, for Ior (remember that everything about Ior is
+negotiable between us) is that a module is a type, as well as an
+instance of that type.  The idea is that we basically keep the GOOPS
+style of methods, with the implicit argument being the module object
+(or some other lexical environment, in a chain with the module as
+root).
+
+Let's say now that module C uses modules A and B.  Modules A and B
+both exports the procedure `foo'.  But A:foo and B:foo as different
+sets of methods.
+
+What does this mean?  Well, it obviously means that the procedure
+`foo' in module C is a subtype of A:foo and B:foo.  Note how this is
+similar in structure to slot inheritance: When class C is created with
+superclasses A and B, the properties of a slot in C are created
+through slot inheritance.  One way of interpreting variable foo in
+module A is as a slot with init value foo.  Through the MOP, we can
+specify that procedure slot inheritance in a module class implies
+creation of new init values through inheritance.
+
+This may look like a kludge, and perhaps it is, and, sure, we are not
+going to accept any kludges in Ior.  But, it might actually not be a
+kludge...
+
+I think it is commonly accepted by computer scientists that a module,
+and/or at least a module interface is a type.  Again, this type can be
+seen as the set of types of the functions in the interface.  The types
+of our procedures are the set of branch types the provide.  It is then
+natural that a module using two other modules create new procedure
+types by folding.
+
+This thing would become less cloudy (yes, this is a cloudy part of my
+reasoning; I meant previously that the interpreter itself is now
+clear) if module interfaces were required to be explicitly types.
+
+Actually, this would fit much better together with the rest of Ior's
+design.  On one hand, we might be free to introduce such a restriction
+(compiler writers would applaud it), since R5RS hasn't specified any
+module system.  On the other hand, it might be strange to require
+explicit typing when Scheme is fundamentally implicitly types...
+
+We also have to consider that a module has an "inward" face, which is
+one type, and possibly many "outward" faces, which are different
+types.  (Compare the idea of "interfaces" in Scheme48.)
+
+It thus, seems that, while a module can truly be an Ior class, the
+reverse should probably not hold in the general case...
+
+Unless
+
+  instance		<-> module proper
+  class of the instance <-> "inward interface"
+  superclasses		<-> "outward interfaces + inward uses"
+
+...hmm, is this possible to reconcile with Rees' object system?
+
+Please think about these issues.  We should try to end up with a
+beautiful and consistent object/module system.
+
+----------------------------------------------------------------------
+
+Here's a difficult problem in Ior's design:
+
+Let's say that we have a mutable data structure, like an ordinary
+list.  Since, in Ior, the type tag (which is really a pointer to a
+class structure) is stored separately from the data, it is thinkable
+that another thread modifies the location in the list between when our
+thread reads the type tag and when it reads the data.
+
+The reading of type and data must be made atomic in some way.
+Probably, some kind of locking of the heap is required.  It's just
+that it may cause a lot of overhead to look the heap at every *read*
+from a mutable data structure.
+
+Look how much trouble those set!-operations cause!  Not only does it
+force us to store type tags for each car and cdr in the list, but it
+also forces a lot of explicit dispatch to be done, and causes troubles
+in a threaded system...
+
+----------------------------------------------------------------------
+
+Jim Blandy <jimb@red-bean.com> writes:
+
+> We also should try to make less work for the GC, by avoiding consing
+> up local environments until they're closed over.
+
+Did the texts which I sent to you talk about Ior's solution?
+
+It basically is: Use *two* environment "arguments" to the evaluator
+(in Ior, they aren't arguments but registers):
+
+* One argument is a pointer to the "top" of an environment stack.
+  This is used in the "inner loop" for very efficient access to
+  in-between results.  The "top" segment of the environment stack is
+  also regarded as the first environment frame in the lexical
+  environment.  ("top" is bottom on a stack which grows downwards)
+
+* The other argument points to a structure holding the evaluation
+  context.  In this context, there is a pointer to the chain of the
+  rest of the environment frames.  Note that since frames are just
+  blocks of SCM values, you can very efficiently "release" a frame
+  into the heap by block copying it (remember that Ior uses Boehms GC;
+  this is how we allocate the block).