@c -*-texinfo-*- @c This is part of the GNU Guile Reference Manual. @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2005, 2010 @c Free Software Foundation, Inc. @c See the file guile.texi for copying conditions. @node Defining New Types (Smobs) @section Defining New Types (Smobs) @dfn{Smobs} are Guile's mechanism for adding new primitive types to the system. The term ``smob'' was coined by Aubrey Jaffer, who says it comes from ``small object'', referring to the fact that they are quite limited in size: they can hold just one pointer to a larger memory block plus 16 extra bits. To define a new smob type, the programmer provides Guile with some essential information about the type --- how to print it, how to garbage collect it, and so on --- and Guile allocates a fresh type tag for it. The programmer can then use @code{scm_c_define_gsubr} to make a set of C functions visible to Scheme code that create and operate on these objects. (You can find a complete version of the example code used in this section in the Guile distribution, in @file{doc/example-smob}. That directory includes a makefile and a suitable @code{main} function, so you can build a complete interactive Guile shell, extended with the datatypes described here.) @menu * Describing a New Type:: * Creating Smob Instances:: * Type checking:: * Garbage Collecting Smobs:: * Garbage Collecting Simple Smobs:: * Remembering During Operations:: * Double Smobs:: * The Complete Example:: @end menu @node Describing a New Type @subsection Describing a New Type To define a new type, the programmer must write four functions to manage instances of the type: @table @code @item mark Guile will apply this function to each instance of the new type it encounters during garbage collection. This function is responsible for telling the collector about any other @code{SCM} values that the object has stored. The default smob mark function does nothing. @xref{Garbage Collecting Smobs}, for more details. @item free Guile will apply this function to each instance of the new type that is to be deallocated. The function should release all resources held by the object. This is analogous to the Java finalization method-- it is invoked at an unspecified time (when garbage collection occurs) after the object is dead. The default free function frees the smob data (if the size of the struct passed to @code{scm_make_smob_type} is non-zero) using @code{scm_gc_free}. @xref{Garbage Collecting Smobs}, for more details. This function operates while the heap is in an inconsistent state and must therefore be careful. @xref{Smobs}, for details about what this function is allowed to do. @item print Guile will apply this function to each instance of the new type to print the value, as for @code{display} or @code{write}. The default print function prints @code{#} where @code{NAME} is the first argument passed to @code{scm_make_smob_type}. @item equalp If Scheme code asks the @code{equal?} function to compare two instances of the same smob type, Guile calls this function. It should return @code{SCM_BOOL_T} if @var{a} and @var{b} should be considered @code{equal?}, or @code{SCM_BOOL_F} otherwise. If @code{equalp} is @code{NULL}, @code{equal?} will assume that two instances of this type are never @code{equal?} unless they are @code{eq?}. @end table To actually register the new smob type, call @code{scm_make_smob_type}. It returns a value of type @code{scm_t_bits} which identifies the new smob type. The four special functions described above are registered by calling one of @code{scm_set_smob_mark}, @code{scm_set_smob_free}, @code{scm_set_smob_print}, or @code{scm_set_smob_equalp}, as appropriate. Each function is intended to be used at most once per type, and the call should be placed immediately following the call to @code{scm_make_smob_type}. There can only be at most 256 different smob types in the system. Instead of registering a huge number of smob types (for example, one for each relevant C struct in your application), it is sometimes better to register just one and implement a second layer of type dispatching on top of it. This second layer might use the 16 extra bits to extend its type, for example. Here is how one might declare and register a new type representing eight-bit gray-scale images: @example #include struct image @{ int width, height; char *pixels; /* The name of this image */ SCM name; /* A function to call when this image is modified, e.g., to update the screen, or SCM_BOOL_F if no action necessary */ SCM update_func; @}; static scm_t_bits image_tag; void init_image_type (void) @{ image_tag = scm_make_smob_type ("image", sizeof (struct image)); scm_set_smob_mark (image_tag, mark_image); scm_set_smob_free (image_tag, free_image); scm_set_smob_print (image_tag, print_image); @} @end example @node Creating Smob Instances @subsection Creating Smob Instances Normally, smobs can have one @emph{immediate} word of data. This word stores either a pointer to an additional memory block that holds the real data, or it might hold the data itself when it fits. The word is large enough for a @code{SCM} value, a pointer to @code{void}, or an integer that fits into a @code{size_t} or @code{ssize_t}. You can also create smobs that have two or three immediate words, and when these words suffice to store all data, it is more efficient to use these super-sized smobs instead of using a normal smob plus a memory block. @xref{Double Smobs}, for their discussion. Guile provides functions for managing memory which are often helpful when implementing smobs. @xref{Memory Blocks}. To retrieve the immediate word of a smob, you use the macro @code{SCM_SMOB_DATA}. It can be set with @code{SCM_SET_SMOB_DATA}. The 16 extra bits can be accessed with @code{SCM_SMOB_FLAGS} and @code{SCM_SET_SMOB_FLAGS}. The two macros @code{SCM_SMOB_DATA} and @code{SCM_SET_SMOB_DATA} treat the immediate word as if it were of type @code{scm_t_bits}, which is an unsigned integer type large enough to hold a pointer to @code{void}. Thus you can use these macros to store arbitrary pointers in the smob word. When you want to store a @code{SCM} value directly in the immediate word of a smob, you should use the macros @code{SCM_SMOB_OBJECT} and @code{SCM_SET_SMOB_OBJECT} to access it. Creating a smob instance can be tricky when it consists of multiple steps that allocate resources and might fail. It is recommended that you go about creating a smob in the following way: @itemize @item Allocate the memory block for holding the data with @code{scm_gc_malloc}. @item Initialize it to a valid state without calling any functions that might cause a non-local exits. For example, initialize pointers to NULL. Also, do not store @code{SCM} values in it that must be protected. Initialize these fields with @code{SCM_BOOL_F}. A valid state is one that can be safely acted upon by the @emph{mark} and @emph{free} functions of your smob type. @item Create the smob using @code{SCM_NEWSMOB}, passing it the initialized memory block. (This step will always succeed.) @item Complete the initialization of the memory block by, for example, allocating additional resources and making it point to them. @end itemize This procedure ensures that the smob is in a valid state as soon as it exists, that all resources that are allocated for the smob are properly associated with it so that they can be properly freed, and that no @code{SCM} values that need to be protected are stored in it while the smob does not yet completely exist and thus can not protect them. Continuing the example from above, if the global variable @code{image_tag} contains a tag returned by @code{scm_make_smob_type}, here is how we could construct a smob whose immediate word contains a pointer to a freshly allocated @code{struct image}: @example SCM make_image (SCM name, SCM s_width, SCM s_height) @{ SCM smob; struct image *image; int width = scm_to_int (s_width); int height = scm_to_int (s_height); /* Step 1: Allocate the memory block. */ image = (struct image *) scm_gc_malloc (sizeof (struct image), "image"); /* Step 2: Initialize it with straight code. */ image->width = width; image->height = height; image->pixels = NULL; image->name = SCM_BOOL_F; image->update_func = SCM_BOOL_F; /* Step 3: Create the smob. */ SCM_NEWSMOB (smob, image_tag, image); /* Step 4: Finish the initialization. */ image->name = name; image->pixels = scm_gc_malloc (width * height, "image pixels"); return smob; @} @end example Let us look at what might happen when @code{make_image} is called. The conversions of @var{s_width} and @var{s_height} to @code{int}s might fail and signal an error, thus causing a non-local exit. This is not a problem since no resources have been allocated yet that would have to be freed. The allocation of @var{image} in step 1 might fail, but this is likewise no problem. Step 2 can not exit non-locally. At the end of it, the @var{image} struct is in a valid state for the @code{mark_image} and @code{free_image} functions (see below). Step 3 can not exit non-locally either. This is guaranteed by Guile. After it, @var{smob} contains a valid smob that is properly initialized and protected, and in turn can properly protect the Scheme values in its @var{image} struct. But before the smob is completely created, @code{SCM_NEWSMOB} might cause the garbage collector to run. During this garbage collection, the @code{SCM} values in the @var{image} struct would be invisible to Guile. It only gets to know about them via the @code{mark_image} function, but that function can not yet do its job since the smob has not been created yet. Thus, it is important to not store @code{SCM} values in the @var{image} struct until after the smob has been created. Step 4, finally, might fail and cause a non-local exit. In that case, the complete creation of the smob has not been successful, but it does nevertheless exist in a valid state. It will eventually be freed by the garbage collector, and all the resources that have been allocated for it will be correctly freed by @code{free_image}. @node Type checking @subsection Type checking Functions that operate on smobs should check that the passed @code{SCM} value indeed is a suitable smob before accessing its data. They can do this with @code{scm_assert_smob_type}. For example, here is a simple function that operates on an image smob, and checks the type of its argument. @example SCM clear_image (SCM image_smob) @{ int area; struct image *image; scm_assert_smob_type (image_tag, image_smob); image = (struct image *) SCM_SMOB_DATA (image_smob); area = image->width * image->height; memset (image->pixels, 0, area); /* Invoke the image's update function. */ if (scm_is_true (image->update_func)) scm_call_0 (image->update_func); scm_remember_upto_here_1 (image_smob); return SCM_UNSPECIFIED; @} @end example See @ref{Remembering During Operations} for an explanation of the call to @code{scm_remember_upto_here_1}. @node Garbage Collecting Smobs @subsection Garbage Collecting Smobs Once a smob has been released to the tender mercies of the Scheme system, it must be prepared to survive garbage collection. Guile calls the @emph{mark} and @emph{free} functions of the smob to manage this. As described in more detail elsewhere (@pxref{Conservative GC}), every object in the Scheme system has a @dfn{mark bit}, which the garbage collector uses to tell live objects from dead ones. When collection starts, every object's mark bit is clear. The collector traces pointers through the heap, starting from objects known to be live, and sets the mark bit on each object it encounters. When it can find no more unmarked objects, the collector walks all objects, live and dead, frees those whose mark bits are still clear, and clears the mark bit on the others. The two main portions of the collection are called the @dfn{mark phase}, during which the collector marks live objects, and the @dfn{sweep phase}, during which the collector frees all unmarked objects. The mark bit of a smob lives in a special memory region. When the collector encounters a smob, it sets the smob's mark bit, and uses the smob's type tag to find the appropriate @emph{mark} function for that smob. It then calls this @emph{mark} function, passing it the smob as its only argument. The @emph{mark} function is responsible for marking any other Scheme objects the smob refers to. If it does not do so, the objects' mark bits will still be clear when the collector begins to sweep, and the collector will free them. If this occurs, it will probably break, or at least confuse, any code operating on the smob; the smob's @code{SCM} values will have become dangling references. To mark an arbitrary Scheme object, the @emph{mark} function calls @code{scm_gc_mark}. Thus, here is how we might write @code{mark_image}: @example @group SCM mark_image (SCM image_smob) @{ /* Mark the image's name and update function. */ struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); scm_gc_mark (image->name); scm_gc_mark (image->update_func); return SCM_BOOL_F; @} @end group @end example Note that, even though the image's @code{update_func} could be an arbitrarily complex structure (representing a procedure and any values enclosed in its environment), @code{scm_gc_mark} will recurse as necessary to mark all its components. Because @code{scm_gc_mark} sets an object's mark bit before it recurses, it is not confused by circular structures. As an optimization, the collector will mark whatever value is returned by the @emph{mark} function; this helps limit depth of recursion during the mark phase. Thus, the code above should really be written as: @example @group SCM mark_image (SCM image_smob) @{ /* Mark the image's name and update function. */ struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); scm_gc_mark (image->name); return image->update_func; @} @end group @end example Finally, when the collector encounters an unmarked smob during the sweep phase, it uses the smob's tag to find the appropriate @emph{free} function for the smob. It then calls that function, passing it the smob as its only argument. The @emph{free} function must release any resources used by the smob. However, it must not free objects managed by the collector; the collector will take care of them. For historical reasons, the return type of the @emph{free} function should be @code{size_t}, an unsigned integral type; the @emph{free} function should always return zero. Here is how we might write the @code{free_image} function for the image smob type: @example size_t free_image (SCM image_smob) @{ struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); scm_gc_free (image->pixels, image->width * image->height, "image pixels"); scm_gc_free (image, sizeof (struct image), "image"); return 0; @} @end example During the sweep phase, the garbage collector will clear the mark bits on all live objects. The code which implements a smob need not do this itself. There is no way for smob code to be notified when collection is complete. It is usually a good idea to minimize the amount of processing done during garbage collection; keep the @emph{mark} and @emph{free} functions very simple. Since collections occur at unpredictable times, it is easy for any unusual activity to interfere with normal code. @node Garbage Collecting Simple Smobs @subsection Garbage Collecting Simple Smobs It is often useful to define very simple smob types --- smobs which have no data to mark, other than the cell itself, or smobs whose immediate data word is simply an ordinary Scheme object, to be marked recursively. Guile provides some functions to handle these common cases; you can use this function as your smob type's @emph{mark} function, if your smob's structure is simple enough. If the smob refers to no other Scheme objects, then no action is necessary; the garbage collector has already marked the smob cell itself. In that case, you can use zero as your mark function. If the smob refers to exactly one other Scheme object via its first immediate word, you can use @code{scm_markcdr} as its mark function. Its definition is simply: @smallexample SCM scm_markcdr (SCM obj) @{ return SCM_SMOB_OBJECT (obj); @} @end smallexample @node Remembering During Operations @subsection Remembering During Operations @cindex remembering It's important that a smob is visible to the garbage collector whenever its contents are being accessed. Otherwise it could be freed while code is still using it. For example, consider a procedure to convert image data to a list of pixel values. @example SCM image_to_list (SCM image_smob) @{ struct image *image; SCM lst; int i; scm_assert_smob_type (image_tag, image_smob); image = (struct image *) SCM_SMOB_DATA (image_smob); lst = SCM_EOL; for (i = image->width * image->height - 1; i >= 0; i--) lst = scm_cons (scm_from_char (image->pixels[i]), lst); scm_remember_upto_here_1 (image_smob); return lst; @} @end example In the loop, only the @code{image} pointer is used and the C compiler has no reason to keep the @code{image_smob} value anywhere. If @code{scm_cons} results in a garbage collection, @code{image_smob} might not be on the stack or anywhere else and could be freed, leaving the loop accessing freed data. The use of @code{scm_remember_upto_here_1} prevents this, by creating a reference to @code{image_smob} after all data accesses. There's no need to do the same for @code{lst}, since that's the return value and the compiler will certainly keep it in a register or somewhere throughout the routine. The @code{clear_image} example previously shown (@pxref{Type checking}) also used @code{scm_remember_upto_here_1} for this reason. It's only in quite rare circumstances that a missing @code{scm_remember_upto_here_1} will bite, but when it happens the consequences are serious. Fortunately the rule is simple: whenever calling a Guile library function or doing something that might, ensure that the @code{SCM} of a smob is referenced past all accesses to its insides. Do this by adding an @code{scm_remember_upto_here_1} if there are no other references. In a multi-threaded program, the rule is the same. As far as a given thread is concerned, a garbage collection still only occurs within a Guile library function, not at an arbitrary time. (Guile waits for all threads to reach one of its library functions, and holds them there while the collector runs.) @node Double Smobs @subsection Double Smobs Smobs are called smob because they are small: they normally have only room for one @code{void*} or @code{SCM} value plus 16 bits. The reason for this is that smobs are directly implemented by using the low-level, two-word cells of Guile that are also used to implement pairs, for example. (@pxref{Data Representation} for the details.) One word of the two-word cells is used for @code{SCM_SMOB_DATA} (or @code{SCM_SMOB_OBJECT}), the other contains the 16-bit type tag and the 16 extra bits. In addition to the fundamental two-word cells, Guile also has four-word cells, which are appropriately called @dfn{double cells}. You can use them for @dfn{double smobs} and get two more immediate words of type @code{scm_t_bits}. A double smob is created with @code{SCM_NEWSMOB2} or @code{SCM_NEWSMOB3} instead of @code{SCM_NEWSMOB}. Its immediate words can be retrieved as @code{scm_t_bits} with @code{SCM_SMOB_DATA_2} and @code{SCM_SMOB_DATA_3} in addition to @code{SCM_SMOB_DATA}. Unsurprisingly, the words can be set to @code{scm_t_bits} values with @code{SCM_SET_SMOB_DATA_2} and @code{SCM_SET_SMOB_DATA_3}. Of course there are also @code{SCM_SMOB_OBJECT_2}, @code{SCM_SMOB_OBJECT_3}, @code{SCM_SET_SMOB_OBJECT_2}, and @code{SCM_SET_SMOB_OBJECT_3}. @node The Complete Example @subsection The Complete Example Here is the complete text of the implementation of the image datatype, as presented in the sections above. We also provide a definition for the smob's @emph{print} function, and make some objects and functions static, to clarify exactly what the surrounding code is using. As mentioned above, you can find this code in the Guile distribution, in @file{doc/example-smob}. That directory includes a makefile and a suitable @code{main} function, so you can build a complete interactive Guile shell, extended with the datatypes described here.) @example /* file "image-type.c" */ #include #include static scm_t_bits image_tag; struct image @{ int width, height; char *pixels; /* The name of this image */ SCM name; /* A function to call when this image is modified, e.g., to update the screen, or SCM_BOOL_F if no action necessary */ SCM update_func; @}; static SCM make_image (SCM name, SCM s_width, SCM s_height) @{ SCM smob; struct image *image; int width = scm_to_int (s_width); int height = scm_to_int (s_height); /* Step 1: Allocate the memory block. */ image = (struct image *) scm_gc_malloc (sizeof (struct image), "image"); /* Step 2: Initialize it with straight code. */ image->width = width; image->height = height; image->pixels = NULL; image->name = SCM_BOOL_F; image->update_func = SCM_BOOL_F; /* Step 3: Create the smob. */ SCM_NEWSMOB (smob, image_tag, image); /* Step 4: Finish the initialization. */ image->name = name; image->pixels = scm_gc_malloc (width * height, "image pixels"); return smob; @} SCM clear_image (SCM image_smob) @{ int area; struct image *image; scm_assert_smob_type (image_tag, image_smob); image = (struct image *) SCM_SMOB_DATA (image_smob); area = image->width * image->height; memset (image->pixels, 0, area); /* Invoke the image's update function. */ if (scm_is_true (image->update_func)) scm_call_0 (image->update_func); scm_remember_upto_here_1 (image_smob); return SCM_UNSPECIFIED; @} static SCM mark_image (SCM image_smob) @{ /* Mark the image's name and update function. */ struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); scm_gc_mark (image->name); return image->update_func; @} static size_t free_image (SCM image_smob) @{ struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); scm_gc_free (image->pixels, image->width * image->height, "image pixels"); scm_gc_free (image, sizeof (struct image), "image"); return 0; @} static int print_image (SCM image_smob, SCM port, scm_print_state *pstate) @{ struct image *image = (struct image *) SCM_SMOB_DATA (image_smob); scm_puts ("#name, port); scm_puts (">", port); /* non-zero means success */ return 1; @} void init_image_type (void) @{ image_tag = scm_make_smob_type ("image", sizeof (struct image)); scm_set_smob_mark (image_tag, mark_image); scm_set_smob_free (image_tag, free_image); scm_set_smob_print (image_tag, print_image); scm_c_define_gsubr ("clear-image", 1, 0, 0, clear_image); scm_c_define_gsubr ("make-image", 3, 0, 0, make_image); @} @end example Here is a sample build and interaction with the code from the @file{example-smob} directory, on the author's machine: @example zwingli:example-smob$ make CC=gcc gcc `guile-config compile` -c image-type.c -o image-type.o gcc `guile-config compile` -c myguile.c -o myguile.o gcc image-type.o myguile.o `guile-config link` -o myguile zwingli:example-smob$ ./myguile guile> make-image # guile> (define i (make-image "Whistler's Mother" 100 100)) guile> i # guile> (clear-image i) guile> (clear-image 4) ERROR: In procedure clear-image in expression (clear-image 4): ERROR: Wrong type (expecting image): 4 ABORT: (wrong-type-arg) Type "(backtrace)" to get more information. guile> @end example