From 07347b492ebc4656c546aa90cafe791883cb7532 Mon Sep 17 00:00:00 2001 From: Neil Jerram Date: Thu, 15 Feb 2001 22:15:25 +0000 Subject: [PATCH] * Retire this copy of data-rep.texi. --- doc/ChangeLog | 6 + doc/data-rep.texi | 1794 --------------------------------------------- 2 files changed, 6 insertions(+), 1794 deletions(-) diff --git a/doc/ChangeLog b/doc/ChangeLog index 8f158822c..180b948ef 100644 --- a/doc/ChangeLog +++ b/doc/ChangeLog @@ -1,3 +1,9 @@ +2001-02-15 Neil Jerram + + * data-rep.texi: Replace this copy of data-rep.texi with a notice + indicating that it has been retired. The master copy of + data-rep.texi is at guile-doc/ref/data-rep.texi. + 2001-02-04 Marius Vollmer * data-rep.texi: Use SCM_SMOB_DATA instead of SCM_CDR. Also diff --git a/doc/data-rep.texi b/doc/data-rep.texi index acf774416..e69de29bb 100644 --- a/doc/data-rep.texi +++ b/doc/data-rep.texi @@ -1,1794 +0,0 @@ -\input texinfo -@c -*-texinfo-*- -@c %**start of header -@setfilename data-rep.info -@settitle Data Representation in Guile -@c %**end of header - -@include version.texi - -@dircategory The Algorithmic Language Scheme -@direntry -* data-rep: (data-rep). Data Representation in Guile --- how to use - Guile objects in your C code. -@end direntry - -@setchapternewpage off - -@ifinfo -Data Representation in Guile - -Copyright (C) 1998, 1999, 2000 Free Software Foundation - -Permission is granted to make and distribute verbatim copies of -this manual provided the copyright notice and this permission notice -are preserved on all copies. - -@ignore -Permission is granted to process this file through TeX and print the -results, provided the printed document carries copying permission -notice identical to this one except for the removal of this paragraph -(this paragraph not being relevant to the printed manual). -@end ignore - -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided that the entire -resulting derived work is distributed under the terms of a permission -notice identical to this one. - -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that this permission notice may be stated in a translation approved -by the Free Software Foundation. -@end ifinfo - -@titlepage -@sp 10 -@comment The title is printed in a large font. -@title Data Representation in Guile -@subtitle $Id: data-rep.texi,v 1.14 2001-02-04 17:29:06 mvo Exp $ -@subtitle For use with Guile @value{VERSION} -@author Jim Blandy -@author Free Software Foundation -@author @email{jimb@@red-bean.com} -@c The following two commands start the copyright page. -@page -@vskip 0pt plus 1filll -@vskip 0pt plus 1filll -Copyright @copyright{} 1998 Free Software Foundation - -Permission is granted to make and distribute verbatim copies of -this manual provided the copyright notice and this permission notice -are preserved on all copies. - -Permission is granted to copy and distribute modified versions of this -manual under the conditions for verbatim copying, provided that the entire -resulting derived work is distributed under the terms of a permission -notice identical to this one. - -Permission is granted to copy and distribute translations of this manual -into another language, under the above conditions for modified versions, -except that this permission notice may be stated in a translation approved -by Free Software Foundation. -@end titlepage - -@c @smallbook -@c @finalout -@headings double - - -@node Top, Data Representation in Scheme, (dir), (dir) -@top Data Representation in Guile - -@ifinfo -This essay is meant to provide the background necessary to read and -write C code that manipulates Scheme values in a way that conforms to -libguile's interface. If you would like to write or maintain a -Guile-based application in C or C++, this is the first information you -need. - -In order to make sense of Guile's @code{SCM_} functions, or read -libguile's source code, it's essential to have a good grasp of how Guile -actually represents Scheme values. Otherwise, a lot of the code, and -the conventions it follows, won't make very much sense. - -We assume you know both C and Scheme, but we do not assume you are -familiar with Guile's C interface. -@end ifinfo - -@menu -* Data Representation in Scheme:: Why things aren't just totally - straightforward, in general terms. -* How Guile does it:: How to write C code that manipulates - Guile values, with an explanation - of Guile's garbage collector. -* Defining New Types (Smobs):: How to extend Guile with your own - application-specific datatypes. -@end menu - -@node Data Representation in Scheme, How Guile does it, Top, Top -@section Data Representation in Scheme - -Scheme is a latently-typed language; this means that the system cannot, -in general, determine the type of a given expression at compile time. -Types only become apparent at run time. Variables do not have fixed -types; a variable may hold a pair at one point, an integer at the next, -and a thousand-element vector later. Instead, values, not variables, -have fixed types. - -In order to implement standard Scheme functions like @code{pair?} and -@code{string?} and provide garbage collection, the representation of -every value must contain enough information to accurately determine its -type at run time. Often, Scheme systems also use this information to -determine whether a program has attempted to apply an operation to an -inappropriately typed value (such as taking the @code{car} of a string). - -Because variables, pairs, and vectors may hold values of any type, -Scheme implementations use a uniform representation for values --- a -single type large enough to hold either a complete value or a pointer -to a complete value, along with the necessary typing information. - -The following sections will present a simple typing system, and then -make some refinements to correct its major weaknesses. However, this is -not a description of the system Guile actually uses. It is only an -illustration of the issues Guile's system must address. We provide all -the information one needs to work with Guile's data in @ref{How Guile -does it}. - - -@menu -* A Simple Representation:: -* Faster Integers:: -* Cheaper Pairs:: -* Guile Is Hairier:: -@end menu - -@node A Simple Representation, Faster Integers, Data Representation in Scheme, Data Representation in Scheme -@subsection A Simple Representation - -The simplest way to meet the above requirements in C would be to -represent each value as a pointer to a structure containing a type -indicator, followed by a union carrying the real value. Assuming that -@code{SCM} is the name of our universal type, we can write: - -@example -enum type @{ integer, pair, string, vector, ... @}; - -typedef struct value *SCM; - -struct value @{ - enum type type; - union @{ - int integer; - struct @{ SCM car, cdr; @} pair; - struct @{ int length; char *elts; @} string; - struct @{ int length; SCM *elts; @} vector; - ... - @} value; -@}; -@end example -with the ellipses replaced with code for the remaining Scheme types. - -This representation is sufficient to implement all of Scheme's -semantics. If @var{x} is an @code{SCM} value: -@itemize @bullet -@item - To test if @var{x} is an integer, we can write @code{@var{x}->type == integer}. -@item - To find its value, we can write @code{@var{x}->value.integer}. -@item - To test if @var{x} is a vector, we can write @code{@var{x}->type == vector}. -@item - If we know @var{x} is a vector, we can write - @code{@var{x}->value.vector.elts[0]} to refer to its first element. -@item - If we know @var{x} is a pair, we can write - @code{@var{x}->value.pair.car} to extract its car. -@end itemize - - -@node Faster Integers, Cheaper Pairs, A Simple Representation, Data Representation in Scheme -@subsection Faster Integers - -Unfortunately, the above representation has a serious disadvantage. In -order to return an integer, an expression must allocate a @code{struct -value}, initialize it to represent that integer, and return a pointer to -it. Furthermore, fetching an integer's value requires a memory -reference, which is much slower than a register reference on most -processors. Since integers are extremely common, this representation is -too costly, in both time and space. Integers should be very cheap to -create and manipulate. - -One possible solution comes from the observation that, on many -architectures, structures must be aligned on a four-byte boundary. -(Whether or not the machine actually requires it, we can write our own -allocator for @code{struct value} objects that assures this is true.) -In this case, the lower two bits of the structure's address are known to -be zero. - -This gives us the room we need to provide an improved representation -for integers. We make the following rules: -@itemize @bullet -@item -If the lower two bits of an @code{SCM} value are zero, then the SCM -value is a pointer to a @code{struct value}, and everything proceeds as -before. -@item -Otherwise, the @code{SCM} value represents an integer, whose value -appears in its upper bits. -@end itemize - -Here is C code implementing this convention: -@example -enum type @{ pair, string, vector, ... @}; - -typedef struct value *SCM; - -struct value @{ - enum type type; - union @{ - struct @{ SCM car, cdr; @} pair; - struct @{ int length; char *elts; @} string; - struct @{ int length; SCM *elts; @} vector; - ... - @} value; -@}; - -#define POINTER_P(x) (((int) (x) & 3) == 0) -#define INTEGER_P(x) (! POINTER_P (x)) - -#define GET_INTEGER(x) ((int) (x) >> 2) -#define MAKE_INTEGER(x) ((SCM) (((x) << 2) | 1)) -@end example - -Notice that @code{integer} no longer appears as an element of @code{enum -type}, and the union has lost its @code{integer} member. Instead, we -use the @code{POINTER_P} and @code{INTEGER_P} macros to make a coarse -classification of values into integers and non-integers, and do further -type testing as before. - -Here's how we would answer the questions posed above (again, assume -@var{x} is an @code{SCM} value): -@itemize @bullet -@item - To test if @var{x} is an integer, we can write @code{INTEGER_P (@var{x})}. -@item - To find its value, we can write @code{GET_INTEGER (@var{x})}. -@item - To test if @var{x} is a vector, we can write: -@example - @code{POINTER_P (@var{x}) && @var{x}->type == vector} -@end example - Given the new representation, we must make sure @var{x} is truly a - pointer before we dereference it to determine its complete type. -@item - If we know @var{x} is a vector, we can write - @code{@var{x}->value.vector.elts[0]} to refer to its first element, as - before. -@item - If we know @var{x} is a pair, we can write - @code{@var{x}->value.pair.car} to extract its car, just as before. -@end itemize - -This representation allows us to operate more efficiently on integers -than the first. For example, if @var{x} and @var{y} are known to be -integers, we can compute their sum as follows: -@example -MAKE_INTEGER (GET_INTEGER (@var{x}) + GET_INTEGER (@var{y})) -@end example -Now, integer math requires no allocation or memory references. Most -real Scheme systems actually use an even more efficient representation, -but this essay isn't about bit-twiddling. (Hint: what if pointers had -@code{01} in their least significant bits, and integers had @code{00}?) - - -@node Cheaper Pairs, Guile Is Hairier, Faster Integers, Data Representation in Scheme -@subsection Cheaper Pairs - -However, there is yet another issue to confront. Most Scheme heaps -contain more pairs than any other type of object; Jonathan Rees says -that pairs occupy 45% of the heap in his Scheme implementation, Scheme -48. However, our representation above spends three @code{SCM}-sized -words per pair --- one for the type, and two for the @sc{car} and -@sc{cdr}. Is there any way to represent pairs using only two words? - -Let us refine the convention we established earlier. Let us assert -that: -@itemize @bullet -@item - If the bottom two bits of an @code{SCM} value are @code{#b00}, then - it is a pointer, as before. -@item - If the bottom two bits are @code{#b01}, then the upper bits are an - integer. This is a bit more restrictive than before. -@item - If the bottom two bits are @code{#b10}, then the value, with the bottom - two bits masked out, is the address of a pair. -@end itemize - -Here is the new C code: -@example -enum type @{ string, vector, ... @}; - -typedef struct value *SCM; - -struct value @{ - enum type type; - union @{ - struct @{ int length; char *elts; @} string; - struct @{ int length; SCM *elts; @} vector; - ... - @} value; -@}; - -struct pair @{ - SCM car, cdr; -@}; - -#define POINTER_P(x) (((int) (x) & 3) == 0) - -#define INTEGER_P(x) (((int) (x) & 3) == 1) -#define GET_INTEGER(x) ((int) (x) >> 2) -#define MAKE_INTEGER(x) ((SCM) (((x) << 2) | 1)) - -#define PAIR_P(x) (((int) (x) & 3) == 2) -#define GET_PAIR(x) ((struct pair *) ((int) (x) & ~3)) -@end example - -Notice that @code{enum type} and @code{struct value} now only contain -provisions for vectors and strings; both integers and pairs have become -special cases. The code above also assumes that an @code{int} is large -enough to hold a pointer, which isn't generally true. - - -Our list of examples is now as follows: -@itemize @bullet -@item - To test if @var{x} is an integer, we can write @code{INTEGER_P - (@var{x})}; this is as before. -@item - To find its value, we can write @code{GET_INTEGER (@var{x})}, as - before. -@item - To test if @var{x} is a vector, we can write: -@example - @code{POINTER_P (@var{x}) && @var{x}->type == vector} -@end example - We must still make sure that @var{x} is a pointer to a @code{struct - value} before dereferencing it to find its type. -@item - If we know @var{x} is a vector, we can write - @code{@var{x}->value.vector.elts[0]} to refer to its first element, as - before. -@item - We can write @code{PAIR_P (@var{x})} to determine if @var{x} is a - pair, and then write @code{GET_PAIR (@var{x})->car} to refer to its - car. -@end itemize - -This change in representation reduces our heap size by 15%. It also -makes it cheaper to decide if a value is a pair, because no memory -references are necessary; it suffices to check the bottom two bits of -the @code{SCM} value. This may be significant when traversing lists, a -common activity in a Scheme system. - -Again, most real Scheme systems use a slighty different implementation; -for example, if GET_PAIR subtracts off the low bits of @code{x}, instead -of masking them off, the optimizer will often be able to combine that -subtraction with the addition of the offset of the structure member we -are referencing, making a modified pointer as fast to use as an -unmodified pointer. - - -@node Guile Is Hairier, , Cheaper Pairs, Data Representation in Scheme -@subsection Guile Is Hairier - -We originally started with a very simple typing system --- each object -has a field that indicates its type. Then, for the sake of efficiency -in both time and space, we moved some of the typing information directly -into the @code{SCM} value, and left the rest in the @code{struct value}. -Guile itself employs a more complex hierarchy, storing finer and finer -gradations of type information in different places, depending on the -object's coarser type. - -In the author's opinion, Guile could be simplified greatly without -significant loss of efficiency, but the simplified system would still be -more complex than what we've presented above. - - -@node How Guile does it, Defining New Types (Smobs), Data Representation in Scheme, Top -@section How Guile does it - -Here we present the specifics of how Guile represents its data. We -don't go into complete detail; an exhaustive description of Guile's -system would be boring, and we do not wish to encourage people to write -code which depends on its details anyway. We do, however, present -everything one need know to use Guile's data. - - -@menu -* General Rules:: -* Garbage Collection:: -* Immediates vs. Non-immediates:: -* Immediate Datatypes:: -* Non-immediate Datatypes:: -* Signalling Type Errors:: -@end menu - -@node General Rules, Garbage Collection, How Guile does it, How Guile does it -@subsection General Rules - -Any code which operates on Guile datatypes must @code{#include} the -header file @code{}. This file contains a definition for -the @code{SCM} typedef (Guile's universal type, as in the examples -above), and definitions and declarations for a host of macros and -functions that operate on @code{SCM} values. - -All identifiers declared by @code{} begin with @code{scm_} -or @code{SCM_}. - -@c [[I wish this were true, but I don't think it is at the moment. -JimB]] -@c Macros do not evaluate their arguments more than once, unless documented -@c to do so. - -The functions described here generally check the types of their -@code{SCM} arguments, and signal an error if their arguments are of an -inappropriate type. Macros generally do not, unless that is their -specified purpose. You must verify their argument types beforehand, as -necessary. - -Macros and functions that return a boolean value have names ending in -@code{P} or @code{_p} (for ``predicate''). Those that return a negated -boolean value have names starting with @code{SCM_N}. For example, -@code{SCM_IMP (@var{x})} is a predicate which returns non-zero iff -@var{x} is an immediate value (an @code{IM}). @code{SCM_NCONSP -(@var{x})} is a predicate which returns non-zero iff @var{x} is -@emph{not} a pair object (a @code{CONS}). - - -@node Garbage Collection, Immediates vs. Non-immediates, General Rules, How Guile does it -@subsection Garbage Collection - -Aside from the latent typing, the major source of constraints on a -Scheme implementation's data representation is the garbage collector. -The collector must be able to traverse every live object in the heap, to -determine which objects are not live. - -There are many ways to implement this, but Guile uses an algorithm -called @dfn{mark and sweep}. The collector scans the system's global -variables and the local variables on the stack to determine which -objects are immediately accessible by the C code. It then scans those -objects to find the objects they point to, @i{et cetera}. The collector -sets a @dfn{mark bit} on each object it finds, so each object is -traversed only once. This process is called @dfn{tracing}. - -When the collector can find no unmarked objects pointed to by marked -objects, it assumes that any objects that are still unmarked will never -be used by the program (since there is no path of dereferences from any -global or local variable that reaches them) and deallocates them. - -In the above paragraphs, we did not specify how the garbage collector -finds the global and local variables; as usual, there are many different -approaches. Frequently, the programmer must maintain a list of pointers -to all global variables that refer to the heap, and another list -(adjusted upon entry to and exit from each function) of local variables, -for the collector's benefit. - -The list of global variables is usually not too difficult to maintain, -since global variables are relatively rare. However, an explicitly -maintained list of local variables (in the author's personal experience) -is a nightmare to maintain. Thus, Guile uses a technique called -@dfn{conservative garbage collection}, to make the local variable list -unnecessary. - -The trick to conservative collection is to treat the stack as an -ordinary range of memory, and assume that @emph{every} word on the stack -is a pointer into the heap. Thus, the collector marks all objects whose -addresses appear anywhere in the stack, without knowing for sure how -that word is meant to be interpreted. - -Obviously, such a system will occasionally retain objects that are -actually garbage, and should be freed. In practice, this is not a -problem. The alternative, an explicitly maintained list of local -variable addresses, is effectively much less reliable, due to programmer -error. - -To accommodate this technique, data must be represented so that the -collector can accurately determine whether a given stack word is a -pointer or not. Guile does this as follows: -@itemize @bullet - -@item -Every heap object has a two-word header, called a @dfn{cell}. Some -objects, like pairs, fit entirely in a cell's two words; others may -store pointers to additional memory in either of the words. For -example, strings and vectors store their length in the first word, and a -pointer to their elements in the second. - -@item -Guile allocates whole arrays of cells at a time, called @dfn{heap -segments}. These segments are always allocated so that the cells they -contain fall on eight-byte boundaries, or whatever is appropriate for -the machine's word size. Guile keeps all cells in a heap segment -initialized, whether or not they are currently in use. - -@item -Guile maintains a sorted table of heap segments. - -@end itemize - -Thus, given any random word @var{w} fetched from the stack, Guile's -garbage collector can consult the table to see if @var{w} falls within a -known heap segment, and check @var{w}'s alignment. If both tests pass, -the collector knows that @var{w} is a valid pointer to a cell, -intentional or not, and proceeds to trace the cell. - -Note that heap segments do not contain all the data Guile uses; cells -for objects like vectors and strings contain pointers to other memory -areas. However, since those pointers are internal, and not shared among -many pieces of code, it is enough for the collector to find the cell, -and then use the cell's type to find more pointers to trace. - - -@node Immediates vs. Non-immediates, Immediate Datatypes, Garbage Collection, How Guile does it -@subsection Immediates vs. Non-immediates - -Guile classifies Scheme objects into two kinds: those that fit entirely -within an @code{SCM}, and those that require heap storage. - -The former class are called @dfn{immediates}. The class of immediates -includes small integers, characters, boolean values, the empty list, the -mysterious end-of-file object, and some others. - -The remaining types are called, not suprisingly, @dfn{non-immediates}. -They include pairs, procedures, strings, vectors, and all other data -types in Guile. - -@deftypefn Macro int SCM_IMP (SCM @var{x}) -Return non-zero iff @var{x} is an immediate object. -@end deftypefn - -@deftypefn Macro int SCM_NIMP (SCM @var{x}) -Return non-zero iff @var{x} is a non-immediate object. This is the -exact complement of @code{SCM_IMP}, above. - -You must use this macro before calling a finer-grained predicate to -determine @var{x}'s type. For example, to see if @var{x} is a pair, you -must write: -@example -SCM_NIMP (@var{x}) && SCM_CONSP (@var{x}) -@end example -This is because Guile stores typing information for non-immediate values -in their cells, rather than in the @code{SCM} value itself; thus, you -must determine whether @var{x} refers to a cell before looking inside -it. - -This is somewhat of a pity, because it means that the programmer needs -to know which types Guile implements as immediates vs. non-immediates. -There are (possibly better) representations in which @code{SCM_CONSP} -can be self-sufficient. The immediate type predicates do not suffer -from this weakness. -@end deftypefn - - -@node Immediate Datatypes, Non-immediate Datatypes, Immediates vs. Non-immediates, How Guile does it -@subsection Immediate Datatypes - -The following datatypes are immediate values; that is, they fit entirely -within an @code{SCM} value. The @code{SCM_IMP} and @code{SCM_NIMP} -macros will distinguish these from non-immediates; see @ref{Immediates -vs. Non-immediates} for an explanation of the distinction. - -Note that the type predicates for immediate values work correctly on any -@code{SCM} value; you do not need to call @code{SCM_IMP} first, to -establish that a value is immediate. This differs from the -non-immediate type predicates, which work correctly only on -non-immediate values; you must be sure the value is @code{SCM_NIMP} -before applying them. - - -@menu -* Integers:: -* Characters:: -* Booleans:: -* Unique Values:: -@end menu - -@node Integers, Characters, Immediate Datatypes, Immediate Datatypes -@subsubsection Integers - -Here are functions for operating on small integers, that fit within an -@code{SCM}. Such integers are called @dfn{immediate numbers}, or -@dfn{INUMs}. In general, INUMs occupy all but two bits of an -@code{SCM}. - -Bignums and floating-point numbers are non-immediate objects, and have -their own, separate accessors. The functions here will not work on -them. This is not as much of a problem as you might think, however, -because the system never constructs bignums that could fit in an INUM, -and never uses floating point values for exact integers. - -@deftypefn Macro int SCM_INUMP (SCM @var{x}) -Return non-zero iff @var{x} is a small integer value. -@end deftypefn - -@deftypefn Macro int SCM_NINUMP (SCM @var{x}) -The complement of SCM_INUMP. -@end deftypefn - -@deftypefn Macro int SCM_INUM (SCM @var{x}) -Return the value of @var{x} as an ordinary, C integer. If @var{x} -is not an INUM, the result is undefined. -@end deftypefn - -@deftypefn Macro SCM SCM_MAKINUM (int @var{i}) -Given a C integer @var{i}, return its representation as an @code{SCM}. -This function does not check for overflow. -@end deftypefn - - -@node Characters, Booleans, Integers, Immediate Datatypes -@subsubsection Characters - -Here are functions for operating on characters. - -@deftypefn Macro int SCM_CHARP (SCM @var{x}) -Return non-zero iff @var{x} is a character value. -@end deftypefn - -@deftypefn Macro {unsigned int} SCM_CHAR (SCM @var{x}) -Return the value of @code{x} as a C character. If @var{x} is not a -Scheme character, the result is undefined. -@end deftypefn - -@deftypefn Macro SCM SCM_MAKE_CHAR (int @var{c}) -Given a C character @var{c}, return its representation as a Scheme -character value. -@end deftypefn - - -@node Booleans, Unique Values, Characters, Immediate Datatypes -@subsubsection Booleans - -Here are functions and macros for operating on booleans. - -@deftypefn Macro SCM SCM_BOOL_T -@deftypefnx Macro SCM SCM_BOOL_F -The Scheme true and false values. -@end deftypefn - -@deftypefn Macro int SCM_NFALSEP (@var{x}) -Convert the Scheme boolean value to a C boolean. Since every object in -Scheme except @code{#f} is true, this amounts to comparing @var{x} to -@code{#f}; hence the name. -@c Noel feels a chill here. -@end deftypefn - -@deftypefn Macro SCM SCM_BOOL_NOT (@var{x}) -Return the boolean inverse of @var{x}. If @var{x} is not a -Scheme boolean, the result is undefined. -@end deftypefn - - -@node Unique Values, , Booleans, Immediate Datatypes -@subsubsection Unique Values - -The immediate values that are neither small integers, characters, nor -booleans are all unique values --- that is, datatypes with only one -instance. - -@deftypefn Macro SCM SCM_EOL -The Scheme empty list object, or ``End Of List'' object, usually written -in Scheme as @code{'()}. -@end deftypefn - -@deftypefn Macro SCM SCM_EOF_VAL -The Scheme end-of-file value. It has no standard written -representation, for obvious reasons. -@end deftypefn - -@deftypefn Macro SCM SCM_UNSPECIFIED -The value returned by expressions which the Scheme standard says return -an ``unspecified'' value. - -This is sort of a weirdly literal way to take things, but the standard -read-eval-print loop prints nothing when the expression returns this -value, so it's not a bad idea to return this when you can't think of -anything else helpful. -@end deftypefn - -@deftypefn Macro SCM SCM_UNDEFINED -The ``undefined'' value. Its most important property is that is not -equal to any valid Scheme value. This is put to various internal uses -by C code interacting with Guile. - -For example, when you write a C function that is callable from Scheme -and which takes optional arguments, the interpreter passes -@code{SCM_UNDEFINED} for any arguments you did not receive. - -We also use this to mark unbound variables. -@end deftypefn - -@deftypefn Macro int SCM_UNBNDP (SCM @var{x}) -Return true if @var{x} is @code{SCM_UNDEFINED}. Apply this to a -symbol's value to see if it has a binding as a global variable. -@end deftypefn - - -@node Non-immediate Datatypes, Signalling Type Errors, Immediate Datatypes, How Guile does it -@subsection Non-immediate Datatypes - -A non-immediate datatype is one which lives in the heap, either because -it cannot fit entirely within a @code{SCM} word, or because it denotes a -specific storage location (in the nomenclature of the Revised^4 Report -on Scheme). - -The @code{SCM_IMP} and @code{SCM_NIMP} macros will distinguish these -from immediates; see @ref{Immediates vs. Non-immediates}. - -Given a cell, Guile distinguishes between pairs and other non-immediate -types by storing special @dfn{tag} values in a non-pair cell's car, that -cannot appear in normal pairs. A cell with a non-tag value in its car -is an ordinary pair. The type of a cell with a tag in its car depends -on the tag; the non-immediate type predicates test this value. If a tag -value appears elsewhere (in a vector, for example), the heap may become -corrupted. - - -@menu -* Non-immediate Type Predicates:: Special rules for using the type - predicates described here. -* Pairs:: -* Vectors:: -* Procedures:: -* Closures:: -* Subrs:: -* Ports:: -@end menu - -@node Non-immediate Type Predicates, Pairs, Non-immediate Datatypes, Non-immediate Datatypes -@subsubsection Non-immediate Type Predicates - -As mentioned in @ref{Garbage Collection}, all non-immediate objects -start with a @dfn{cell}, or a pair of words. Furthermore, all type -information that distinguishes one kind of non-immediate from another is -stored in the cell. The type information in the @code{SCM} value -indicates only that the object is a non-immediate; all finer -distinctions require one to examine the cell itself, usually with the -appropriate type predicate macro. - -The type predicates for non-immediate objects generally assume that -their argument is a non-immediate value. Thus, you must be sure that a -value is @code{SCM_NIMP} first before passing it to a non-immediate type -predicate. Thus, the idiom for testing whether a value is a cell or not -is: -@example -SCM_NIMP (@var{x}) && SCM_CONSP (@var{x}) -@end example - - -@node Pairs, Vectors, Non-immediate Type Predicates, Non-immediate Datatypes -@subsubsection Pairs - -Pairs are the essential building block of list structure in Scheme. A -pair object has two fields, called the @dfn{car} and the @dfn{cdr}. - -It is conventional for a pair's @sc{car} to contain an element of a -list, and the @sc{cdr} to point to the next pair in the list, or to -contain @code{SCM_EOL}, indicating the end of the list. Thus, a set of -pairs chained through their @sc{cdr}s constitutes a singly-linked list. -Scheme and libguile define many functions which operate on lists -constructed in this fashion, so although lists chained through the -@sc{car}s of pairs will work fine too, they may be less convenient to -manipulate, and receive less support from the community. - -Guile implements pairs by mapping the @sc{car} and @sc{cdr} of a pair -directly into the two words of the cell. - - -@deftypefn Macro int SCM_CONSP (SCM @var{x}) -Return non-zero iff @var{x} is a Scheme pair object. -The results are undefined if @var{x} is an immediate value. -@end deftypefn - -@deftypefn Macro int SCM_NCONSP (SCM @var{x}) -The complement of SCM_CONSP. -@end deftypefn - -@deftypefn Macro void SCM_NEWCELL (SCM @var{into}) -Allocate a new cell, and set @var{into} to point to it. This macro -expands to a statement, not an expression, and @var{into} must be an -lvalue of type SCM. - -This is the most primitive way to allocate a cell; it is quite fast. - -The @sc{car} of the cell initially tags it as a ``free cell''. If the -caller intends to use it as an ordinary cons, she must store ordinary -SCM values in its @sc{car} and @sc{cdr}. - -If the caller intends to use it as a header for some other type, she -must store an appropriate magic value in the cell's @sc{car}, to mark -it as a member of that type, and store whatever value in the @sc{cdr} -that type expects. You should generally not do this, unless you are -implementing a new datatype, and thoroughly understand the code in -@code{}. -@end deftypefn - -@deftypefun SCM scm_cons (SCM @var{car}, SCM @var{cdr}) -Allocate (``CONStruct'') a new pair, with @var{car} and @var{cdr} as its -contents. -@end deftypefun - - -The macros below perform no typechecking. The results are undefined if -@var{cell} is an immediate. However, since all non-immediate Guile -objects are constructed from cells, and these macros simply return the -first element of a cell, they actually can be useful on datatypes other -than pairs. (Of course, it is not very modular to use them outside of -the code which implements that datatype.) - -@deftypefn Macro SCM SCM_CAR (SCM @var{cell}) -Return the @sc{car}, or first field, of @var{cell}. -@end deftypefn - -@deftypefn Macro SCM SCM_CDR (SCM @var{cell}) -Return the @sc{cdr}, or second field, of @var{cell}. -@end deftypefn - -@deftypefn Macro void SCM_SETCAR (SCM @var{cell}, SCM @var{x}) -Set the @sc{car} of @var{cell} to @var{x}. -@end deftypefn - -@deftypefn Macro void SCM_SETCDR (SCM @var{cell}, SCM @var{x}) -Set the @sc{cdr} of @var{cell} to @var{x}. -@end deftypefn - -@deftypefn Macro SCM SCM_CAAR (SCM @var{cell}) -@deftypefnx Macro SCM SCM_CADR (SCM @var{cell}) -@deftypefnx Macro SCM SCM_CDAR (SCM @var{cell}) @dots{} -@deftypefnx Macro SCM SCM_CDDDDR (SCM @var{cell}) -Return the @sc{car} of the @sc{car} of @var{cell}, the @sc{car} of the -@sc{cdr} of @var{cell}, @i{et cetera}. -@end deftypefn - - -@node Vectors, Procedures, Pairs, Non-immediate Datatypes -@subsubsection Vectors, Strings, and Symbols - -Vectors, strings, and symbols have some properties in common. They all -have a length, and they all have an array of elements. In the case of a -vector, the elements are @code{SCM} values; in the case of a string or -symbol, the elements are characters. - -All these types store their length (along with some tagging bits) in the -@sc{car} of their header cell, and store a pointer to the elements in -their @sc{cdr}. Thus, the @code{SCM_CAR} and @code{SCM_CDR} macros -are (somewhat) meaningful when applied to these datatypes. - -@deftypefn Macro int SCM_VECTORP (SCM @var{x}) -Return non-zero iff @var{x} is a vector. -The results are undefined if @var{x} is an immediate value. -@end deftypefn - -@deftypefn Macro int SCM_STRINGP (SCM @var{x}) -Return non-zero iff @var{x} is a string. -The results are undefined if @var{x} is an immediate value. -@end deftypefn - -@deftypefn Macro int SCM_SYMBOLP (SCM @var{x}) -Return non-zero iff @var{x} is a symbol. -The results are undefined if @var{x} is an immediate value. -@end deftypefn - -@deftypefn Macro int SCM_LENGTH (SCM @var{x}) -Return the length of the object @var{x}. -The results are undefined if @var{x} is not a vector, string, or symbol. -@end deftypefn - -@deftypefn Macro {SCM *} SCM_VELTS (SCM @var{x}) -Return a pointer to the array of elements of the vector @var{x}. -The results are undefined if @var{x} is not a vector. -@end deftypefn - -@deftypefn Macro {char *} SCM_CHARS (SCM @var{x}) -Return a pointer to the characters of @var{x}. -The results are undefined if @var{x} is not a symbol or a string. -@end deftypefn - -There are also a few magic values stuffed into memory before a symbol's -characters, but you don't want to know about those. What cruft! - - -@node Procedures, Closures, Vectors, Non-immediate Datatypes -@subsubsection Procedures - -Guile provides two kinds of procedures: @dfn{closures}, which are the -result of evaluating a @code{lambda} expression, and @dfn{subrs}, which -are C functions packaged up as Scheme objects, to make them available to -Scheme programmers. - -(There are actually other sorts of procedures: compiled closures, and -continuations; see the source code for details about them.) - -@deftypefun SCM scm_procedure_p (SCM @var{x}) -Return @code{SCM_BOOL_T} iff @var{x} is a Scheme procedure object, of -any sort. Otherwise, return @code{SCM_BOOL_F}. -@end deftypefun - - -@node Closures, Subrs, Procedures, Non-immediate Datatypes -@subsubsection Closures - -[FIXME: this needs to be further subbed, but texinfo has no subsubsub] - -A closure is a procedure object, generated as the value of a -@code{lambda} expression in Scheme. The representation of a closure is -straightforward --- it contains a pointer to the code of the lambda -expression from which it was created, and a pointer to the environment -it closes over. - -In Guile, each closure also has a property list, allowing the system to -store information about the closure. I'm not sure what this is used for -at the moment --- the debugger, maybe? - -@deftypefn Macro int SCM_CLOSUREP (SCM @var{x}) -Return non-zero iff @var{x} is a closure. The results are -undefined if @var{x} is an immediate value. -@end deftypefn - -@deftypefn Macro SCM SCM_PROCPROPS (SCM @var{x}) -Return the property list of the closure @var{x}. The results are -undefined if @var{x} is not a closure. -@end deftypefn - -@deftypefn Macro void SCM_SETPROCPROPS (SCM @var{x}, SCM @var{p}) -Set the property list of the closure @var{x} to @var{p}. The results -are undefined if @var{x} is not a closure. -@end deftypefn - -@deftypefn Macro SCM SCM_CODE (SCM @var{x}) -Return the code of the closure @var{x}. The results are undefined if -@var{x} is not a closure. - -This function should probably only be used internally by the -interpreter, since the representation of the code is intimately -connected with the interpreter's implementation. -@end deftypefn - -@deftypefn Macro SCM SCM_ENV (SCM @var{x}) -Return the environment enclosed by @var{x}. -The results are undefined if @var{x} is not a closure. - -This function should probably only be used internally by the -interpreter, since the representation of the environment is intimately -connected with the interpreter's implementation. -@end deftypefn - - -@node Subrs, Ports, Closures, Non-immediate Datatypes -@subsubsection Subrs - -[FIXME: this needs to be further subbed, but texinfo has no subsubsub] - -A subr is a pointer to a C function, packaged up as a Scheme object to -make it callable by Scheme code. In addition to the function pointer, -the subr also contains a pointer to the name of the function, and -information about the number of arguments accepted by the C fuction, for -the sake of error checking. - -There is no single type predicate macro that recognizes subrs, as -distinct from other kinds of procedures. The closest thing is -@code{scm_procedure_p}; see @ref{Procedures}. - -@deftypefn Macro {char *} SCM_SNAME (@var{x}) -Return the name of the subr @var{x}. The results are undefined if -@var{x} is not a subr. -@end deftypefn - -@deftypefun SCM scm_make_gsubr (char *@var{name}, int @var{req}, int @var{opt}, int @var{rest}, SCM (*@var{function})()) -Create a new subr object named @var{name}, based on the C function -@var{function}, make it visible to Scheme the value of as a global -variable named @var{name}, and return the subr object. - -The subr object accepts @var{req} required arguments, @var{opt} optional -arguments, and a @var{rest} argument iff @var{rest} is non-zero. The C -function @var{function} should accept @code{@var{req} + @var{opt}} -arguments, or @code{@var{req} + @var{opt} + 1} arguments if @code{rest} -is non-zero. - -When a subr object is applied, it must be applied to at least @var{req} -arguments, or else Guile signals an error. @var{function} receives the -subr's first @var{req} arguments as its first @var{req} arguments. If -there are fewer than @var{opt} arguments remaining, then @var{function} -receives the value @code{SCM_UNDEFINED} for any missing optional -arguments. If @var{rst} is non-zero, then any arguments after the first -@code{@var{req} + @var{opt}} are packaged up as a list as passed as -@var{function}'s last argument. - -Note that subrs can actually only accept a predefined set of -combinations of required, optional, and rest arguments. For example, a -subr can take one required argument, or one required and one optional -argument, but a subr can't take one required and two optional arguments. -It's bizarre, but that's the way the interpreter was written. If the -arguments to @code{scm_make_gsubr} do not fit one of the predefined -patterns, then @code{scm_make_gsubr} will return a compiled closure -object instead of a subr object. -@end deftypefun - - -@node Ports, , Subrs, Non-immediate Datatypes -@subsubsection Ports - -Haven't written this yet, 'cos I don't understand ports yet. - - -@node Signalling Type Errors, , Non-immediate Datatypes, How Guile does it -@subsection Signalling Type Errors - -Every function visible at the Scheme level should aggressively check the -types of its arguments, to avoid misinterpreting a value, and perhaps -causing a segmentation fault. Guile provides some macros to make this -easier. - -@deftypefn Macro void SCM_ASSERT (int @var{test}, SCM @var{obj}, int @var{position}, char *@var{subr}) -If @var{test} is zero, signal an error, attributed to the subroutine -named @var{subr}, operating on the value @var{obj}. The @var{position} -value determines exactly what sort of error to signal. - -If @var{position} is a string, @code{SCM_ASSERT} raises a -``miscellaneous'' error whose message is that string. - -Otherwise, @var{position} should be one of the values defined below. -@end deftypefn - -@deftypefn Macro int SCM_ARG1 -@deftypefnx Macro int SCM_ARG2 -@deftypefnx Macro int SCM_ARG3 -@deftypefnx Macro int SCM_ARG4 -@deftypefnx Macro int SCM_ARG5 -Signal a ``wrong type argument'' error. When used as the @var{position} -argument of @code{SCM_ASSERT}, @code{SCM_ARG@var{n}} claims that -@var{obj} has the wrong type for the @var{n}'th argument of @var{subr}. - -The only way to complain about the type of an argument after the fifth -is to use @code{SCM_ARGn}, defined below, which doesn't specify which -argument is wrong. You could pass your own error message to -@code{SCM_ASSERT} as the @var{position}, but then the error signalled is -a ``miscellaneous'' error, not a ``wrong type argument'' error. This -seems kludgy to me. -@comment Any function with more than two arguments is wrong --- Perlis -@comment Despite Perlis, I agree. Why not have two Macros, one with -@comment a string error message, and the other with an integer position -@comment that only claims a type error in an argument? -@comment --- Keith Wright -@end deftypefn - -@deftypefn Macro int SCM_ARGn -As above, but does not specify which argument's type is incorrect. -@end deftypefn - -@deftypefn Macro int SCM_WNA -Signal an error complaining that the function received the wrong number -of arguments. - -Interestingly, the message is attributed to the function named by -@var{obj}, not @var{subr}, so @var{obj} must be a Scheme string object -naming the function. Usually, Guile catches these errors before ever -invoking the subr, so we don't run into these problems. -@end deftypefn - - -@node Defining New Types (Smobs), , How Guile does it, Top -@section Defining New Types (Smobs) - -@dfn{Smobs} are Guile's mechanism for adding new non-immediate types to -the system.@footnote{The term ``smob'' was coined by Aubrey Jaffer, who -says it comes from ``small object'', referring to the fact that only the -@sc{cdr} and part of the @sc{car} of a smob's cell are available for -use.} To define a new smob type, the programmer provides Guile with -some essential information about the type --- how to print it, how to -garbage collect it, and so on --- and Guile returns a fresh type tag for -use in the @sc{car} of new cells. The programmer can then use -@code{scm_make_gsubr} to make a set of C functions that create and -operate on these objects visible to Scheme code. - -(You can find a complete version of the example code used in this -section in the Guile distribution, in @file{doc/example-smob}. That -directory includes a makefile and a suitable @code{main} function, so -you can build a complete interactive Guile shell, extended with the -datatypes described here.) - -@menu -* Describing a New Type:: -* Creating Instances:: -* Typechecking:: -* Garbage Collecting Smobs:: -* A Common Mistake In Allocating Smobs:: -* Garbage Collecting Simple Smobs:: -* A Complete Example:: -@end menu - -@node Describing a New Type, Creating Instances, Defining New Types (Smobs), Defining New Types (Smobs) -@subsection Describing a New Type - -To define a new type, the programmer must write four functions to -manage instances of the type: - -@table @code -@item mark -Guile will apply this function to each instance of the new type it -encounters during garbage collection. This function is responsible for -telling the collector about any other non-immediate objects the object -refers to. The default smob mark function is to not mark any data. -@xref{Garbage Collecting Smobs}, for more details. - -@item free -Guile will apply this function to each instance of the new type it could -not find any live pointers to. The function should release all -resources held by the object and return the number of bytes released. -This is analagous to the Java finalization method-- it is invoked at -an unspecified time (when garbage collection occurs) after the object -is dead. -The default free function frees the smob data (if the size of the struct -passed to @code{scm_make_smob_type} or @code{scm_make_smob_type_mfpe} is -non-zero) using @code{scm_must_free} and returns the size of that -struct. @xref{Garbage Collecting Smobs}, for more details. - -@item print -@c GJB:FIXME:: @var{exp} and @var{port} need to refer to a prototype of -@c the print function.... where is that, or where should it go? -Guile will apply this function to each instance of the new type to print -the value, as for @code{display} or @code{write}. The function should -write a printed representation of @var{exp} on @var{port}, in accordance -with the parameters in @var{pstate}. (For more information on print -states, see @ref{Ports}.) The default print function prints @code{#} -where @code{NAME} is the first argument passed to @code{scm_make_smob_type} or -@code{scm_make_smob_type_mfpe}. - -@item equalp -If Scheme code asks the @code{equal?} function to compare two instances -of the same smob type, Guile calls this function. It should return -@code{SCM_BOOL_T} if @var{a} and @var{b} should be considered -@code{equal?}, or @code{SCM_BOOL_F} otherwise. If @code{equalp} is -@code{NULL}, @code{equal?} will assume that two instances of this type are -never @code{equal?} unless they are @code{eq?}. - -@end table - -To actually register the new smob type, call @code{scm_make_smob_type}: - -@deftypefun long scm_make_smob_type (const char *name, scm_sizet size) -This function implements the standard way of adding a new smob type, -named @var{name}, with instance size @var{size}, to the system. The -return value is a tag that is used in creating instances of the type. -If @var{size} is 0, then no memory will be allocated when instances of -the smob are created, and nothing will be freed by the default free -function. Default values are provided for mark, free, print, and, -equalp, as described above. If you want to customize any of these -functions, the call to @code{scm_make_smob_type} should be immediately -followed by calls to one or several of @code{scm_set_smob_mark}, -@code{scm_set_smob_free}, @code{scm_set_smob_print}, and/or -@code{scm_set_smob_equalp}. -@end deftypefun - -Each of the below @code{scm_set_smob_XXX} functions registers a smob -special function for a given type. Each function is intended to be used -only zero or one time per type, and the call should be placed -immediately following the call to @code{scm_make_smob_type}. - -@deftypefun void scm_set_smob_mark (long tc, SCM (*mark) (SCM)) -This function sets the smob marking procedure for the smob type specified by -the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. -@end deftypefun - -@deftypefun void scm_set_smob_free (long tc, scm_sizet (*free) (SCM)) -This function sets the smob freeing procedure for the smob type specified by -the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. -@end deftypefun - -@deftypefun void scm_set_smob_print (long tc, int (*print) (SCM,SCM,scm_print_state*)) -This function sets the smob printing procedure for the smob type specified by -the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. -@end deftypefun - -@deftypefun void scm_set_smob_equalp (long tc, SCM (*equalp) (SCM,SCM)) -This function sets the smob equality-testing predicate for the smob type specified by -the tag @var{tc}. @var{tc} is the tag returned by @code{scm_make_smob_type}. -@end deftypefun - -Instead of using @code{scm_make_smob_type} and calling each of the -individual @code{scm_set_smob_XXX} functions to register each special -function independently, you can use @code{scm_make_smob_type_mfpe} to -register all of the special functions at once as you create the smob -type@footnote{Warning: There is an ongoing discussion among the developers which -may result in deprecating @code{scm_make_smob_type_mfpe} in next release -of Guile.}: - -@deftypefun long scm_make_smob_type_mfpe(const char *name, scm_sizet size, SCM (*mark) (SCM), scm_sizet (*free) (SCM), int (*print) (SCM, SCM, scm_print_state*), SCM (*equalp) (SCM, SCM)) -This function invokes @code{scm_make_smob_type} on its first two arguments -to add a new smob type named @var{name}, with instance size @var{size} to the system. -It also registers the @var{mark}, @var{free}, @var{print}, @var{equalp} smob -special functions for that new type. Any of these parameters can be @code{NULL} -to have that special function use the default behaviour for guile. -The return value is a tag that is used in creating instances of the type. If @var{size} -is 0, then no memory will be allocated when instances of the smob are created, and -nothing will be freed by the default free function. -@end deftypefun - -For example, here is how one might declare and register a new type -representing eight-bit grayscale images: -@example -#include - -long image_tag; - -void -init_image_type () -@{ - image_tag = scm_make_smob_type_mfpe ("image",sizeof(struct image), - mark_image, free_image, print_image, NULL); -@} -@end example - - -@node Creating Instances, Typechecking, Describing a New Type, Defining New Types (Smobs) -@subsection Creating Instances - -Like other non-immediate types, smobs start with a cell whose @sc{car} -contains typing information, and whose @code{cdr} is free for any use. For smobs, -the @code{cdr} stores a pointer to the internal C structure holding the -smob-specific data. -To create an instance of a smob type following these standards, you should -use @code{SCM_NEWSMOB}: - -@deftypefn Macro void SCM_NEWSMOB(SCM value,long tag,void *data) -Make @var{value} contain a smob instance of the type with tag @var{tag} -and smob data @var{data}. @var{value} must be previously declared -as C type @code{SCM}. -@end deftypefn - -Since it is often the case (e.g., in smob constructors) that you will -create a smob instance and return it, there is also a slightly specialized -macro for this situation: - -@deftypefn Macro fn_returns SCM_RETURN_NEWSMOB(long tab, void *data) -This macro expands to a block of code that creates a smob instance of -the type with tag @var{tag} and smob data @var{data}, and returns -that @code{SCM} value. It should be the last piece of code in -a block. -@end deftypefn - -Guile provides the following functions for managing memory, which are -often helpful when implementing smobs: - -@deftypefun {char *} scm_must_malloc (long @var{len}, char *@var{what}) -Allocate @var{len} bytes of memory, using @code{malloc}, and return a -pointer to them. - -If there is not enough memory available, invoke the garbage collector, -and try once more. If there is still not enough, signal an error, -reporting that we could not allocate @var{what}. - -This function also helps maintain statistics about the size of the heap. -@end deftypefun - -@deftypefun {char *} scm_must_realloc (char *@var{addr}, long @var{olen}, long @var{len}, char *@var{what}) -Resize (and possibly relocate) the block of memory at @var{addr}, to -have a size of @var{len} bytes, by calling @code{realloc}. Return a -pointer to the new block. - -If there is not enough memory available, invoke the garbage collector, -and try once more. If there is still not enough, signal an error, -reporting that we could not allocate @var{what}. - -The value @var{olen} should be the old size of the block of memory at -@var{addr}; it is only used for keeping statistics on the size of the -heap. -@end deftypefun - -@deftypefun void scm_must_free (char *@var{addr}) -Free the block of memory at @var{addr}, using @code{free}. If -@var{addr} is zero, signal an error, complaining of an attempt to free -something that is already free. - -This does no record-keeping; instead, the smob's @code{free} function -must take care of that. - -This function isn't usually sufficiently different from the usual -@code{free} function to be worth using. -@end deftypefun - - -Continuing the above example, if the global variable @code{image_tag} -contains a tag returned by @code{scm_newsmob}, here is how we could -construct a smob whose @sc{cdr} contains a pointer to a freshly -allocated @code{struct image}: - -@example -struct image @{ - int width, height; - char *pixels; - - /* The name of this image */ - SCM name; - - /* A function to call when this image is - modified, e.g., to update the screen, - or SCM_BOOL_F if no action necessary */ - SCM update_func; -@}; - -SCM -make_image (SCM name, SCM s_width, SCM s_height) -@{ - struct image *image; - int width, height; - - SCM_ASSERT (SCM_NIMP (name) && SCM_STRINGP (name), name, - SCM_ARG1, "make-image"); - SCM_ASSERT (SCM_INUMP (s_width), s_width, SCM_ARG2, "make-image"); - SCM_ASSERT (SCM_INUMP (s_height), s_height, SCM_ARG3, "make-image"); - - width = SCM_INUM (s_width); - height = SCM_INUM (s_height); - - image = (struct image *) scm_must_malloc (sizeof (struct image), "image"); - image->width = width; - image->height = height; - image->pixels = scm_must_malloc (width * height, "image pixels"); - image->name = name; - image->update_func = SCM_BOOL_F; - - SCM_RETURN_NEWSMOB (image_tag, image); -@} -@end example - - -@node Typechecking, Garbage Collecting Smobs, Creating Instances, Defining New Types (Smobs) -@subsection Typechecking - -Functions that operate on smobs should aggressively check the types of -their arguments, to avoid misinterpreting some other datatype as a smob, -and perhaps causing a segmentation fault. Fortunately, this is pretty -simple to do. The function need only verify that its argument is a -non-immediate, whose @sc{car} is the type tag returned by -@code{scm_newsmob}. - -For example, here is a simple function that operates on an image smob, -and checks the type of its argument. We also present an expanded -version of the @code{init_image_type} function, to make -@code{clear_image} and the image constructor function @code{make_image} -visible to Scheme code. -@example -SCM -clear_image (SCM image_smob) -@{ - int area; - struct image *image; - - SCM_ASSERT (SCM_SMOB_PREDICATE (image_tag, image_smob), - image_smob, SCM_ARG1, "clear-image"); - - image = (struct image *) SCM_SMOB_DATA (image_smob); - area = image->width * image->height; - memset (image->pixels, 0, area); - - /* Invoke the image's update function. */ - if (image->update_func != SCM_BOOL_F) - scm_apply (image->update_func, SCM_EOL, SCM_EOL); - - return SCM_UNSPECIFIED; -@} - - -void -init_image_type () -@{ - image_tag = scm_newsmob (&image_funs); - - scm_make_gsubr ("make-image", 3, 0, 0, make_image); - scm_make_gsubr ("clear-image", 1, 0, 0, clear_image); -@} -@end example - -Note that checking types is a little more complicated during garbage -collection; see the description of @code{SCM_GCTYP16} in @ref{Garbage -Collecting Smobs}. - -@c GJB:FIXME:: should talk about guile-snarf somewhere! - -@node Garbage Collecting Smobs, A Common Mistake In Allocating Smobs, Typechecking, Defining New Types (Smobs) -@subsection Garbage Collecting Smobs - -Once a smob has been released to the tender mercies of the Scheme -system, it must be prepared to survive garbage collection. Guile calls -the @code{mark} and @code{free} functions of the @code{scm_smobfuns} -structure to manage this. - -As described before (@pxref{Garbage Collection}), every object in the -Scheme system has a @dfn{mark bit}, which the garbage collector uses to -tell live objects from dead ones. When collection starts, every -object's mark bit is clear. The collector traces pointers through the -heap, starting from objects known to be live, and sets the mark bit on -each object it encounters. When it can find no more unmarked objects, -the collector walks all objects, live and dead, frees those whose mark -bits are still clear, and clears the mark bit on the others. - -The two main portions of the collection are called the @dfn{mark phase}, -during which the collector marks live objects, and the @dfn{sweep -phase}, during which the collector frees all unmarked objects. - -The mark bit of a smob lives in its @sc{car}, along with the smob's type -tag. When the collector encounters a smob, it sets the smob's mark bit, -and uses the smob's type tag to find the appropriate @code{mark} -function for that smob: the one listed in that smob's -@code{scm_smobfuns} structure. It then calls the @code{mark} function, -passing it the smob as its only argument. - -The @code{mark} function is responsible for marking any other Scheme -objects the smob refers to. If it does not do so, the objects' mark -bits will still be clear when the collector begins to sweep, and the -collector will free them. If this occurs, it will probably break, or at -least confuse, any code operating on the smob; the smob's @code{SCM} -values will have become dangling references. - -To mark an arbitrary Scheme object, the @code{mark} function may call -this function: - -@deftypefun void scm_gc_mark (SCM @var{x}) -Mark the object @var{x}, and recurse on any objects @var{x} refers to. -If @var{x}'s mark bit is already set, return immediately. -@end deftypefun - -Thus, here is how we might write the @code{mark} function for the image -smob type discussed above: -@example -@group -SCM -mark_image (SCM image_smob) -@{ - /* Mark the image's name and update function. */ - struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); - - scm_gc_mark (image->name); - scm_gc_mark (image->update_func); - - return SCM_BOOL_F; -@} -@end group -@end example - -Note that, even though the image's @code{update_func} could be an -arbitrarily complex structure (representing a procedure and any values -enclosed in its environment), @code{scm_gc_mark} will recurse as -necessary to mark all its components. Because @code{scm_gc_mark} sets -an object's mark bit before it recurses, it is not confused by -circular structures. - -As an optimization, the collector will mark whatever value is returned -by the @code{mark} function; this helps limit depth of recursion during -the mark phase. Thus, the code above could also be written as: -@example -@group -SCM -mark_image (SCM image_smob) -@{ - /* Mark the image's name and update function. */ - struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); - - scm_gc_mark (image->name); - return image->update_func; -@} -@end group -@end example - - -Finally, when the collector encounters an unmarked smob during the sweep -phase, it uses the smob's tag to find the appropriate @code{free} -function for the smob. It then calls the function, passing it the smob -as its only argument. - -The @code{free} function must release any resources used by the smob. -However, it need not free objects managed by the collector; the -collector will take care of them. The return type of the @code{free} -function should be @code{scm_sizet}, an unsigned integral type; the -@code{free} function should return the number of bytes released, to help -the collector maintain statistics on the size of the heap. - -Here is how we might write the @code{free} function for the image smob -type: -@example -scm_sizet -free_image (SCM image_smob) -@{ - struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); - scm_sizet size = image->width * image->height + sizeof (*image); - - free (image->pixels); - free (image); - - return size; -@} -@end example - -During the sweep phase, the garbage collector will clear the mark bits -on all live objects. The code which implements a smob need not do this -itself. - -There is no way for smob code to be notified when collection is -complete. - -Note that, since a smob's mark bit lives in its @sc{car}, along with the -smob's type tag, the technique for checking the type of a smob described -in @ref{Typechecking} will not necessarily work during GC. If you need -to find out whether a given object is a particular smob type during GC, -use the following macro: - -@deftypefn Macro void SCM_GCTYP16 (SCM @var{x}) -Return the type bits of the smob @var{x}, with the mark bit clear. - -Use this macro instead of @code{SCM_CAR} to check the type of a smob -during GC. Usually, only code called by the smob's @code{mark} function -need worry about this. -@end deftypefn - -It is usually a good idea to minimize the amount of processing done -during garbage collection; keep @code{mark} and @code{free} functions -very simple. Since collections occur at unpredictable times, it is easy -for any unusual activity to interfere with normal code. - - -@node A Common Mistake In Allocating Smobs, Garbage Collecting Simple Smobs, Garbage Collecting Smobs, Defining New Types (Smobs) -@subsection A Common Mistake In Allocating Smobs - -When constructing new objects, you must be careful that the garbage -collector can always find any new objects you allocate. For example, -suppose we wrote the @code{make_image} function this way: - -@example -SCM -make_image (SCM name, SCM s_width, SCM s_height) -@{ - struct image *image; - SCM image_smob; - int width, height; - - SCM_ASSERT (SCM_NIMP (name) && SCM_STRINGP (name), name, - SCM_ARG1, "make-image"); - SCM_ASSERT (SCM_INUMP (s_width), s_width, SCM_ARG2, "make-image"); - SCM_ASSERT (SCM_INUMP (s_height), s_height, SCM_ARG3, "make-image"); - - width = SCM_INUM (s_width); - height = SCM_INUM (s_height); - - image = (struct image *) scm_must_malloc (sizeof (struct image), "image"); - image->width = width; - image->height = height; - image->pixels = scm_must_malloc (width * height, "image pixels"); - - /* THESE TWO LINES HAVE CHANGED: */ - image->name = scm_string_copy (name); - image->update_func = scm_make_gsubr (@dots{}); - - SCM_NEWCELL (image_smob); - SCM_SETCDR (image_smob, image); - SCM_SETCAR (image_smob, image_tag); - - return image_smob; -@} -@end example - -This code is incorrect. The calls to @code{scm_string_copy} and -@code{scm_make_gsubr} allocate fresh objects. Allocating any new object -may cause the garbage collector to run. If @code{scm_make_gsubr} -invokes a collection, the garbage collector has no way to discover that -@code{image->name} points to the new string object; the @code{image} -structure is not yet part of any Scheme object, so the garbage collector -will not traverse it. Since the garbage collector cannot find any -references to the new string object, it will free it, leaving -@code{image} pointing to a dead object. - -A correct implementation might say, instead: -@example - image->name = SCM_BOOL_F; - image->update_func = SCM_BOOL_F; - - SCM_NEWCELL (image_smob); - SCM_SETCDR (image_smob, image); - SCM_SETCAR (image_smob, image_tag); - - image->name = scm_string_copy (name); - image->update_func = scm_make_gsubr (@dots{}); - - return image_smob; -@end example - -Now, by the time we allocate the new string and function objects, -@code{image_smob} points to @code{image}. If the garbage collector -scans the stack, it will find a reference to @code{image_smob} and -traverse @code{image}, so any objects @code{image} points to will be -preserved. - - -@node Garbage Collecting Simple Smobs, A Complete Example, A Common Mistake In Allocating Smobs, Defining New Types (Smobs) -@subsection Garbage Collecting Simple Smobs - -It is often useful to define very simple smob types --- smobs which have -no data to mark, other than the cell itself, or smobs whose @sc{cdr} is -simply an ordinary Scheme object, to be marked recursively. Guile -provides some functions to handle these common cases; you can use these -functions as your smob type's @code{mark} function, if your smob's -structure is simple enough. - -If the smob refers to no other Scheme objects, then no action is -necessary; the garbage collector has already marked the smob cell -itself. In that case, you can use zero as your mark function. - -@deftypefun SCM scm_markcdr (SCM @var{x}) -Mark the references in the smob @var{x}, assuming that @var{x}'s -@sc{cdr} contains an ordinary Scheme object, and @var{x} refers to no -other objects. This function simply returns @var{x}'s @sc{cdr}. -@end deftypefun - -@deftypefun scm_sizet scm_free0 (SCM @var{x}) -Do nothing; return zero. This function is appropriate for smobs that -use either zero or @code{scm_markcdr} as their marking functions, and -refer to no heap storage, including memory managed by @code{malloc}, -other than the smob's header cell. -@end deftypefun - - -@node A Complete Example, , Garbage Collecting Simple Smobs, Defining New Types (Smobs) -@subsection A Complete Example - -Here is the complete text of the implementation of the image datatype, -as presented in the sections above. We also provide a definition for -the smob's @code{print} function, and make some objects and functions -static, to clarify exactly what the surrounding code is using. - -As mentioned above, you can find this code in the Guile distribution, in -@file{doc/example-smob}. That directory includes a makefile and a -suitable @code{main} function, so you can build a complete interactive -Guile shell, extended with the datatypes described here.) - -@example -/* file "image-type.c" */ - -#include -#include - -static long image_tag; - -struct image @{ - int width, height; - char *pixels; - - /* The name of this image */ - SCM name; - - /* A function to call when this image is - modified, e.g., to update the screen, - or SCM_BOOL_F if no action necessary */ - SCM update_func; -@}; - -static SCM -make_image (SCM name, SCM s_width, SCM s_height) -@{ - struct image *image; - SCM image_smob; - int width, height; - - SCM_ASSERT (SCM_NIMP (name) && SCM_STRINGP (name), name, - SCM_ARG1, "make-image"); - SCM_ASSERT (SCM_INUMP (s_width), s_width, SCM_ARG2, "make-image"); - SCM_ASSERT (SCM_INUMP (s_height), s_height, SCM_ARG3, "make-image"); - - width = SCM_INUM (s_width); - height = SCM_INUM (s_height); - - image = (struct image *) scm_must_malloc (sizeof (struct image), "image"); - image->width = width; - image->height = height; - image->pixels = scm_must_malloc (width * height, "image pixels"); - image->name = name; - image->update_func = SCM_BOOL_F; - - SCM_NEWSMOB (image_smob, image_tag, image); - - return image_smob; -@} - -static SCM -clear_image (SCM image_smob) -@{ - int area; - struct image *image; - - SCM_ASSERT (SCM_SMOB_PREDICATE (image_tag, image_smob), - image_smob, SCM_ARG1, "clear-image"); - - image = (struct image *) SCM_SMOB_DATA (image_smob); - area = image->width * image->height; - memset (image->pixels, 0, area); - - /* Invoke the image's update function. */ - if (image->update_func != SCM_BOOL_F) - scm_apply (image->update_func, SCM_EOL, SCM_EOL); - - return SCM_UNSPECIFIED; -@} - -static SCM -mark_image (SCM image_smob) -@{ - struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); - - scm_gc_mark (image->name); - return image->update_func; -@} - -static scm_sizet -free_image (SCM image_smob) -@{ - struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); - scm_sizet size = image->width * image->height + sizeof (struct image); - - free (image->pixels); - free (image); - - return size; -@} - -static int -print_image (SCM image_smob, SCM port, scm_print_state *pstate) -@{ - struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); - - scm_puts ("#name, port); - scm_puts (">", port); - - /* non-zero means success */ - return 1; -@} - -static scm_smobfuns image_funs = @{ - mark_image, free_image, print_image, 0 -@}; - -void -init_image_type () -@{ - image_tag = scm_newsmob (&image_funs); - - scm_make_gsubr ("clear-image", 1, 0, 0, clear_image); - scm_make_gsubr ("make-image", 3, 0, 0, make_image); -@} -@end example - -Here is a sample build and interaction with the code from the -@file{example-smob} directory, on the author's machine: - -@example -zwingli:example-smob$ make CC=gcc -gcc `guile-config compile` -c image-type.c -o image-type.o -gcc `guile-config compile` -c myguile.c -o myguile.o -gcc image-type.o myguile.o `guile-config link` -o myguile -zwingli:example-smob$ ./myguile -guile> make-image -# -guile> (define i (make-image "Whistler's Mother" 100 100)) -guile> i -# -guile> (clear-image i) -guile> (clear-image 4) -ERROR: In procedure clear-image in expression (clear-image 4): -ERROR: Wrong type argument in position 1: 4 -ABORT: (wrong-type-arg) - -Type "(backtrace)" to get more information. -guile> -@end example - -@bye