Add USER-GUIDE.md

2025-07-09 02:50:20 +02:00 · 2023-08-03 11:21:00 +02:00 · 2023-08-03 11:21:00 +02:00 · 5fbd21a7c3
commit 5fbd21a7c3
parent e270290242
1 changed files with 544 additions and 0 deletions
--- a/USER-GUIDE.md
+++ b/USER-GUIDE.md
@ -0,0 +1,544 @@
+# Whippet user's guide
+
+Whippet is an embed-only library: it should be copied into the source
+tree of the program that uses it.  The program's build system needs to
+be wired up to compile Whippet, then link it into the program that uses
+it.
+
+## Subtree merges
+
+One way is get Whippet is just to manually copy the files present in a
+Whippet checkout into your project.  However probably the best way is to
+perform a [subtree
+merge](https://docs.github.com/en/get-started/using-git/about-git-subtree-merges)
+of Whippet into your project's Git repository, so that you can easily
+update your copy of Whippet in the future.
+
+Performing the first subtree merge is annoying and full of arcane
+incantations.  Follow the [subtree merge
+page](https://docs.github.com/en/get-started/using-git/about-git-subtree-merges)
+for full details, but for a cheat sheet, you might do something like
+this to copy Whippet into the `whippet/` directory of your project root:
+
+```
+git remote add whippet https://github.com/wingo/whippet-gc
+git fetch whippet
+git merge -s ours --no-commit --allow-unrelated-histories whippet/main
+git read-tree --prefix=whippet/ -u whippet/main
+git commit -m 'Added initial Whippet merge'
+```
+
+Then to later update your copy of whippet, assuming you still have the
+`whippet` remote, just do:
+
+```
+git pull -s subtree whippet main
+```
+
+## `gc-embedder-api.h`
+
+To determine the live set of objects, a tracing garbage collector starts
+with a set of root objects, and then transitively visits all reachable
+object edges.  Exactly how it goes about doing this depends on the
+program that is using the garbage collector; different programs will
+have different object representations, different strategies for
+recording roots, and so on.
+
+To traverse the heap in a program-specific way but without imposing an
+abstraction overhead, Whippet requires that a number of data types and
+inline functions be implemented by the program, for use by Whippet
+itself.  This is the *embedder API*, and this document describes what
+Whippet requires from a program.
+
+A program should provide a header file implementing the API in
+[`gc-embedder-api.h`](./gc-embedder-api.h).  This header should only be
+included when compiling Whippet itself; it is not part of the API that
+Whippet exposes to the program.
+
+### Identifying roots
+
+The collector uses two opaque struct types, `struct gc_mutator_roots`
+and `struct gc_heap_roots`, that are used by the program to record
+object roots.  Probably you should put the definition of these data
+types in a separate header that is included both by Whippet, via the
+embedder API, and via users of Whippet, so that programs can populate
+the root set.  In any case the embedder-API use of these structs is via
+`gc_trace_mutator_roots` and `gc_trace_heap_roots`, two functions that
+are passed a trace visitor function `trace_edge`, and which should call
+that function on all edges from a given mutator or heap.  (Usually
+mutator roots are per-thread roots, such as from the stack, and heap
+roots are global roots.)
+
+### Tracing objects
+
+The `gc_trace_object` is responsible for calling the `trace_edge`
+visitor function on all outgoing edges in an object.  It also includes a
+`size` out-parameter, for when the collector wants to measure the size
+of an object.  `trace_edge` and `size` may be `NULL`, in which case no
+tracing or size computation should be performed.
+
+### Tracing ephemerons
+
+Most kinds of GC-managed object are defined by the program, but the GC
+itself has support for a specific object kind: ephemerons.  If the
+program allocates ephemerons, it should trace them in the
+`gc_trace_object` function by calling `gc_trace_ephemeron` from
+[`gc-ephemerons.h`](./gc-ephemerons.h).
+
+### Remembered-set bits
+
+When built to support generational garbage collection, Whippet requires
+that all "large" or potentially large objects have a flag bit reserved
+for use of the garbage collector.  A large object is one whose size
+exceeds the `gc_allocator_large_threshold()` (see
+[`gc-attrs.h`](./gc-attrs.h)), which is a collector-specific value.
+Currently the only generational collector is the in-place Whippet
+collector, whose large object threshold is 4096 bytes.  The
+`gc_object_set_remembered`, `gc_object_is_remembered_nonatomic`, and
+`gc_object_clear_remembered_nonatomic` embedder functions manage the
+remembered bit.  Setting the remembered bit should be idempotent;
+multiple threads can race to call `gc_object_set_remembered` and do not
+synchronize.  The query and clear functions are called without
+concurrent accessors and so don't have to be atomic.
+
+### Forwarding objects
+
+When built with a collector that moves objects, the embedder must also
+allow for forwarding pointers to be installed in an object.  There are
+two forwarding APIs: one that is atomic and one that isn't.
+
+The nonatomic API is relatively simple; there is a
+`gc_object_forwarded_nonatomic` function that returns an embedded
+forwarding address, or 0 if the object is not yet forwarded, and
+`gc_object_forward_nonatomic`, which installs a forwarding pointer.
+
+The atomic API is gnarly.  It is used by parallel collectors, in which
+multiple collector threads can race to evacuate an object.
+
+There is a state machine associated with the `gc_atomic_forward`
+structure from [`gc-forwarding.h`](./gc-forwarding.h); the embedder API
+implements the state changes.  The collector calls
+`gc_atomic_forward_begin` on an object to begin a forwarding attempt,
+and the resulting `gc_atomic_forward` can be in the `NOT_FORWARDED`,
+`FORWARDED`, or `BUSY` state.
+
+If the `gc_atomic_forward`'s state is `BUSY`, the collector will call
+`gc_atomic_forward_retry_busy`; a return value of 0 means the object is
+still busy, because another thread is attempting to forward it.
+Otherwise the forwarding state becomes either `FORWARDED`, if the other
+thread succeeded in forwarding it, or `ABORTED`, indicating that the
+other thread failed to forward it.
+
+If the forwarding state is `FORWARDED`, the collector will call
+`gc_atomic_forward_address` to get the new address.
+
+If the forwarding state is `NOT_FORWARDED`, the collector may begin a
+forwarding attempt by calling `gc_atomic_forward_acquire`.  The
+resulting state is `ACQUIRED` on success, or `BUSY` if another thread
+acquired the object in the meantime, or `FORWARDED` if another thread
+acquired and completed the forwarding attempt.
+
+An `ACQUIRED` object can then be forwarded via
+`gc_atomic_forward_commit`, or the forwarding attempt can be aborted via
+`gc_atomic_forward_abort`.
+
+All of these `gc_atomic_forward` functions are to be implemented by the
+embedder.  Some programs may allocate a dedicated forwarding word in all
+objects; some will manage to store the forwarding word in an initial
+"tag" word, via a specific pattern for the low 3 bits of the tag that no
+non-forwarded object will have.  The low-bits approach takes advantage
+of the collector's minimum object alignment, in which objects are
+aligned at least to an 8-byte boundary, so all objects have 0 for the
+low 3 bits of their address.
+
+### Conservative references
+
+Finally, when configured in a mode in which root edges or intra-object
+edges are *conservative*, the embedder can filter out which bit patterns
+might be an object reference by implementing
+`gc_is_valid_conservative_ref_displacement`.  Here, the collector masks
+off the low bits of a conservative reference, and asks the embedder if a
+value with those low bits might point to an object.  Usually the
+embedder should return 1 only if the displacement is 0, but if the
+program allows low-bit tagged pointers, then it should also return 1 for
+those pointer tags.
+
+## Configuration, compilation, and linking
+
+To the user, Whippet presents an abstract API that does not encode the
+specificities of any given collector.  Whippet currently includes three
+implementations of that API: `semi`, a simple semi-space collector;
+`bdw`, an implementation via the third-party
+[Boehm-Demers-Weiser](https://github.com/ivmai/bdwgc) conservative
+collector; and `whippet`, an Immix-like collector.
+
+There is a bit of name overloading between the Whippet abstract API, the
+collection of GC implementations, and the specific Whippet collector;
+our apologies.  It's just like that, and we hope to make the usage
+obvious from context.
+
+The program that embeds Whippet selects the collector implementation at
+build-time.  In the case of the specific Whippet collector, the program
+also configures a specific collector mode, again at build-time:
+generational or not, parallel or not, stack-conservative or not, and
+heap-conservative or not.  It may be nice in the future to be able to
+configure these at run-time, but for the time being they are
+compile-time options so that adding new features doesn't change the
+footprint of a more minimal collector.
+
+Different collectors have different allocation strategies: for example,
+the BDW collector allocates from thread-local freelists, whereas the
+semi-space collector has a bump-pointer allocator.  A collector may also
+expose a write barrier, for example to enable generational collection.
+For performance reasons, many of these details can't be hidden behind an
+opaque functional API: they must be inlined into call sites.  Whippet's
+approach is to expose fast paths as part of its inline API, but which
+are *parameterized* on attributes of the selected garbage collector.
+The goal is to keep the user's code generic and avoid any code
+dependency on the choice of garbage collector.  Because of inlining,
+however, the choice of garbage collector does need to be specified when
+compiling user code.
+
+### Compiling the collector
+
+Building the collector is not as easy as it should be.  As an embed-only
+library, we don't get to choose the One True Build System and then just
+build the software in that way; instead Whippet needs to be buildable
+with any build system.  At some point we will have snippets that
+embedders can include in their various build systems, but for now we
+document the low-level structure, so that people can craft the
+appropriate incantations for their program's build system.
+
+Whippet consists of some collector-implementation-agnostic independent
+modules, and then the collector implementation itself.  Though Whippet
+tries to put performance-sensitive interfaces in header files, users
+should also compile with link-time optimization (LTO) to remove any
+overhead imposed by the division of code into separate compilation
+units.
+
+Usually you want to build with maximum optimization and no debugging
+assertions.  Sometimes you want minimal optimization and all assertions.
+Here's what we do, as a `Makefile` snippet:
+
+```
+DEFAULT_BUILD=opt
+BUILD_CFLAGS_opt=-O2 -g -DNDEBUG
+BUILD_CFLAGS_optdebug=-Og -g -DGC_DEBUG=1
+BUILD_CFLAGS_debug=-O0 -g -DGC_DEBUG=1
+BUILD_CFLAGS=$(BUILD_CFLAGS_$(or $(BUILD),$(DEFAULT_BUILD)))
+```
+
+So if you do just plain `make`, it will do an `opt` build.  You can
+specify the build mode by setting `BUILD` on the command line, as in
+`make BUILD=debug`.
+
+Then for the actual compilation flags, we do:
+
+```
+CC=gcc
+CFLAGS=-Wall -flto -fno-strict-aliasing -fvisibility=hidden -Wno-unused $(BUILD_CFLAGS)
+INCLUDES=-I.
+LDFLAGS=-lpthread -flto
+COMPILE=$(CC) $(CFLAGS) $(INCLUDES)
+```
+
+The actual include directory (the dot in `-I.`) should be adjusted as
+appropriate.
+
+#### Collector-implementation-agnostic independent modules
+
+There are currently four generic modules that don't depend on the choice
+of collector.  The first is `gc-stack.o`, which has supporting code to
+associate mutators (threads) with slices of the native stack, in order
+to support conservative root-finding.
+
+```
+$(COMPILE) -o gc-stack.o -c gc-stack.c
+```
+
+The next is a generic options interface, to allow the user to
+parameterize the collector at run-time, for example to implement a
+specific heap sizing strategy.
+
+```
+$(COMPILE) -o gc-options.o -c gc-options.c
+```
+
+Next, where Whippet needs to get data from the operating system, for
+example the number of processors available, it does so behind an
+abstract interface that is selected at compile-time.  The only
+implementation currently is for GNU/Linux, but it's a pretty thin layer,
+so adding more systems should not be difficult.
+
+```
+PLATFORM=gnu-linux
+$(COMPILE) -o gc-platform.o -c gc-platform-$(PLATFORM).c
+```
+
+Finally, something a little more complicated: ephemerons.  Ephemerons
+are objects that make a weak association between a key and a value.  As
+first-class objects, they need to be classifiable by the user system,
+and notably via the `gc_trace_object` procedure, and therefore need to
+have a header whose shape is understandable by the embedding program.
+We do this by including the `gc-embedder-api.h` implementation, via
+`-include`, in this case providing `foo-embedder.h`:
+
+```
+$(COMPILE) -include foo-embedder.h -o gc-ephemeron.o -c gc-ephemeron.c
+```
+
+#### Compile-time options
+
+There are a number of pre-processor definitions that can parameterize
+the collector at build-time:
+
+ * `GC_DEBUG`: If nonzero, then enable debugging assertions.
+ * `NDEBUG`: This one is a bit weird; if not defined, then enable
+   debugging assertions and some debugging printouts.  Probably
+   Whippet's use of `NDEBUG` should be folded in to `GC_DEBUG`.
+ * `GC_PARALLEL`: If nonzero, then enable parallelism in the collector.
+   Defaults to 0.
+ * `GC_GENERATIONAL`: If nonzero, then enable generational collection.
+   Defaults to zero.
+ * `GC_PRECISE_ROOTS`: If nonzero, then collect precise roots via
+   `gc_heap_roots` and `gc_mutator_roots`.  Defaults to zero.
+ * `GC_CONSERVATIVE_ROOTS`: If nonzero, then scan the stack and static
+   data sections for conservative roots.  Defaults to zero.  Not
+   mutually exclusive with `GC_PRECISE_ROOTS`.
+ * `GC_CONSERVATIVE_TRACE`: If nonzero, heap edges are scanned
+   conservatively.  Defaults to zero.
+
+Some collectors require specific compile-time options.  For example, the
+semi-space collector has to be able to move all objects; this is not
+compatible with conservative roots or heap edges.
+
+#### Building `semi`
+
+Finally, let's build a collector.  The simplest collector is the
+semi-space collector.  The entirety of the implementation can be had by
+compiling `semi.c`, providing the program's embedder API implementation
+via `-include`:
+
+```
+$(COMPILE) -DGC_PRECISE_ROOTS=1 -include foo-embedder.h -o gc.o -c semi.c
+```
+
+#### Building `bdw`
+
+The next simplest collector uses
+[BDW-GC](https://github.com/ivmai/bdwgc).  This collector must scan the
+roots and heap conservatively.  The collector is parallel if BDW-GC
+itself was compiled with parallelism enabled.
+
+```
+$(COMPILE) -DGC_CONSERVATIVE_ROOTS=1 -DGC_CONSERVATIVE_TRACE=1 \
+  `pkg-config --cflags bdw-gc` \
+  -include foo-embedder.h -o gc.o -c bdw.c
+```
+
+#### Building `whippet`
+
+Finally, there is the whippet collector.  It can collect roots precisely
+or conservatively, trace precisely or conservatively, be parallel or
+not, and be generational or not.
+
+```
+$(COMPILE) -DGC_PARALLEL=1 -DGC_GENERATIONAL=1 -DGC_PRECISE_ROOTS=1 \
+  -include foo-embedder.h -o gc.o -c whippet.c
+```
+
+### Compiling your program
+
+Any compilation unit that uses the GC API should have the same set of
+compile-time options defined as when compiling the collector.
+Additionally those compilation units should include the "attributes"
+header for the collector in question, namely `semi-attrs.h`,
+`bdw-attrs.h`, or `whippet-attrs.h`.  For example, for parallel
+generational whippet, you might have:
+
+```
+$(COMPILE) -DGC_PARALLEL=1 -DGC_GENERATIONAL=1 -DGC_PRECISE_ROOTS=1 \
+  -include whippet-attrs.h -o my-program.o -c my-program.c
+```
+
+### Linking the collector into your program
+
+Finally to link, pass all objects to the linker.  You will want to
+ensure that the linker enables `-flto`, for link-time optimization.  We
+do it like this:
+
+```
+$(CC) $(LDFLAGS) -o my-program \
+  my-program.o gc-stack.o gc-platform.o gc-options.o gc-ephemeron.o
+```
+
+## Using the collector
+
+Whew!  So you finally built the thing!  Did you also link it into your
+program?  No, because your program isn't written yet?  Well this section
+is for you: we describe the user-facing API of Whippet, where "user" in
+this case denotes the embedding program.
+
+What is the API, you ask?  It is in [`gc-api.h`](./gc-api.h).
+
+### Heaps and mutators
+
+To start with, you create a *heap*.  Usually an application will create
+just one heap.  A heap has one or more associated *mutators*.  A mutator
+is a thread-specific handle on the heap.  Allocating objects requires a
+mutator.
+
+The initial heap and mutator are created via `gc_init`, which takes two
+input parameters: the *options*, and a stack base address.  The options
+specify the initial heap size and so on.  `gc_init` returns the new heap
+as an out parameter, and also returns a mutator for the current thread.
+
+To make a new mutator for a new thread, use `gc_init_for_thread`.  When
+a thread is finished with its mutator, call `gc_finish_for_thread`.
+Each thread that allocates or accesses GC-managed objects should have
+its own mutator.
+
+The stack base address allows the collector to scan the mutator's stack,
+if conservative root-finding is enabled.  It may be omitted in the call
+to `gc_init` and `gc_init_for_thread`; passing `NULL` tells Whippet to
+ask the platform for the stack bounds of the current thread.  Generally
+speaking, this works on all platforms for the main thread, but not
+necessarily on other threads.  The most reliable solution is to
+explicitly obtain a base address by trampolining through
+`gc_call_with_stack_addr`.
+
+### Options
+
+There are some run-time parameters that programs and users might want to
+set explicitly; these are encapsulated in the *options*.  Make an
+options object with `gc_allocate_options()`; this object will be
+consumed by its `gc_init`.  Then, the most convenient thing is to set
+those options from `gc_options_parse_and_set_many` from a string passed
+on the command line or an environment variable, but to get there we have
+to explain the low-level first.  There are a few options that are
+defined for all collectors:
+
+ * `GC_OPTION_HEAP_SIZE_POLICY`: How should we size the heap?  Either
+   it's `GC_HEAP_SIZE_FIXED` (which is 0), in which the heap size is
+   fixed at startup; or `GC_HEAP_SIZE_GROWABLE` (1), in which the heap
+   may grow but will never shrink; or `GC_HEAP_SIZE_ADAPTIVE` (2), in
+   which we take an
+   [adaptive](https://wingolog.org/archives/2023/01/27/three-approaches-to-heap-sizing)
+   approach, depending on the rate of allocation and the cost of
+   collection.  Really you want the adaptive strategy, but if you are
+   benchmarking you definitely want the fixed policy.
+ * `GC_OPTION_HEAP_SIZE`: The initial heap size.  For a
+   `GC_HEAP_SIZE_FIXED` policy, this is also the final heap size.  In
+   bytes.
+ * `GC_OPTION_MAXIMUM_HEAP_SIZE`: For growable and adaptive heaps, the
+   maximum heap size, in bytes.
+ * `GC_OPTION_HEAP_SIZE_MULTIPLIER`: For growable heaps, the target heap
+   multiplier.  A heap multiplier of 2.5 means that for 100 MB of live
+   data, the heap should be 250 MB.
+ * `GC_OPTION_HEAP_FRUGALITY`: Something that will be used in adaptive
+   heaps, apparently!  Not yet implemented.
+ * `GC_OPTION_PARALLELISM`: How many threads to devote to collection
+   tasks during GC pauses.  By default, the current number of
+   processors, with a maximum of 8.
+
+You can set these options via `gc_option_set_int` and so on; see
+[`gc-options.h`](./gc-options.h).  Or, you can parse options from
+strings: `heap-size-policy`, `heap-size`, `maximum-heap-size`, and so
+on.  Use `gc_option_from_string` to determine if a string is really an
+option.  Use `gc_option_parse_and_set` to parse a value for an option.
+Use `gc_options_parse_and_set_many` to parse a number of comma-delimited
+*key=value* settings from a string.
+
+### Allocation
+
+So you have a heap and a mutator; great!  Let's allocate!  Call
+`gc_allocate`, passing the mutator and the number of bytes to allocate.
+
+There is also `gc_allocate_fast`, which is an inlined fast-path.  If
+that returns NULL, you need to call `gc_allocate_slow`.  The advantage
+of this API is that you can punt some root-saving overhead to the slow
+path.
+
+Allocation always succeeds.  If it doesn't, it kills your program.  The
+bytes in the resulting allocation will be initialized to 0.
+
+The allocation fast path is parameterized by collector-specific
+attributes.  JIT compilers can also read those attributes to emit
+appropriate inline code that replicates the logic of `gc_allocate_fast`.
+
+### Write barriers
+
+For some collectors, mutators have to tell the collector whenever they
+mutate an object.  They tell the collector by calling a *write barrier*;
+in Whippet this is currently the case only for generational collectors.
+
+The write barrier is `gc_write_barrier`; see `gc-api.h` for its
+parameters.
+
+As with allocation, the fast path for the write barrier is parameterized
+by collector-specific attributes, to allow JIT compilers to inline write
+barriers.
+
+### Safepoints
+
+Sometimes Whippet will need to synchronize all threads, for example as
+part of the "stop" phase of a stop-and-copy semi-space collector.
+Whippet stops at *safepoints*.  At a safepoint, all mutators must be
+able to enumerate all of their edges to live objects.
+
+Whippet has cooperative safepoints: mutators have to periodically call
+into the collector to potentially synchronize with other mutators.
+`gc_allocate_slow` is a safepoint, so if you a bunch of threads that are
+all allocating, usually safepoints are reached in a more-or-less prompt
+fashion.  But if a mutator isn't allocating, it either needs to
+temporarily mark itself as inactive by trampolining through
+`gc_call_without_gc`, or it should arrange to periodically call
+`gc_safepoint`.  Marking a mutator as inactive is the right strategy
+for, for example, system calls that might block.  Periodic safepoints is
+better for code that is active but not allocating.
+
+Thing is, though, `gc_safepoint` is not yet implemented :)  It will be,
+though!
+
+Also, the BDW collector actually uses pre-emptive safepoints: it stops
+threads via POSIX signals.  `gc_safepoint` is (or will be) a no-op with
+BDW.
+
+### Statistics
+
+Sometimes a program would like some information from the GC: how many
+bytes and objects have been allocated?  How much time has been spent in
+the GC?  How many times has GC run, and how many of those were minor
+collections?  What's the maximum pause time?  Stuff like that.  Whippet
+doesn't collect very much info right now, and this should probably
+change.  For the moment, all you get is `gc_print_stats`.
+
+### Ephemerons
+
+Whippet supports ephemerons, first-class objects that weakly associate
+keys with values.  If the an ephemeron's key ever becomes unreachable,
+the ephemeron becomes dead and loses its value.
+
+The user-facing API is in [`gc-ephemeron.h`](./gc-ephemeron.h).  To
+allocate an ephemeron, call `gc_allocate_ephemeron`, then initialize its
+key and value via `gc_ephemeron_init`.  Get the key and value via
+`gc_ephemeron_key` and `gc_ephemeron_value`, respectively.
+
+In Whippet, ephemerons can be linked together in a chain.  During GC, if
+an ephemeron's chain points to a dead ephemeron, that link will be
+elided, allowing the dead ephemeron itself to be collected.  In that
+way, ephemerons can be used to build weak data structures such as weak
+maps.
+
+Weak data structures are often shared across multiple threads, so all
+routines to access and modify chain links are atomic.  Use
+`gc_ephemeron_chain_head` to access the head of a storage location that
+points to an ephemeron; push a new ephemeron on a location with
+`gc_ephemeron_chain_push`; and traverse a chain with
+`gc_ephemeron_chain_next`.
+
+An ephemeron association can be removed via `gc_ephemeron_mark_dead`.
+
+### Finalizers
+
+Not yet implemented!