guile/doc/manual.md

# Whippet user's guide

Whippet is an embed-only library: it should be copied into the source
tree of the program that uses it.  The program's build system needs to
be wired up to compile Whippet, then link it into the program that uses
it.

## Subtree merges

One way is get Whippet is just to manually copy the files present in a
Whippet checkout into your project.  However probably the best way is to
perform a [subtree
merge](https://docs.github.com/en/get-started/using-git/about-git-subtree-merges)
of Whippet into your project's Git repository, so that you can easily
update your copy of Whippet in the future.

Performing the first subtree merge is annoying and full of arcane
incantations.  Follow the [subtree merge
page](https://docs.github.com/en/get-started/using-git/about-git-subtree-merges)
for full details, but for a cheat sheet, you might do something like
this to copy Whippet into the `whippet/` directory of your project root:

```
git remote add whippet https://github.com/wingo/whippet
git fetch whippet
git merge -s ours --no-commit --allow-unrelated-histories whippet/main
git read-tree --prefix=whippet/ -u whippet/main
git commit -m 'Added initial Whippet merge'
```

Then to later update your copy of whippet, assuming you still have the
`whippet` remote, just do:

```
git pull -s subtree whippet main
```

## `gc-embedder-api.h`

To determine the live set of objects, a tracing garbage collector starts
with a set of root objects, and then transitively visits all reachable
object edges.  Exactly how it goes about doing this depends on the
program that is using the garbage collector; different programs will
have different object representations, different strategies for
recording roots, and so on.

To traverse the heap in a program-specific way but without imposing an
abstraction overhead, Whippet requires that a number of data types and
inline functions be implemented by the program, for use by Whippet
itself.  This is the *embedder API*, and this document describes what
Whippet requires from a program.

A program should provide a header file implementing the API in
[`gc-embedder-api.h`](../api/gc-embedder-api.h).  This header should only be
included when compiling Whippet itself; it is not part of the API that
Whippet exposes to the program.

### Identifying roots

The collector uses two opaque struct types, `struct gc_mutator_roots`
and `struct gc_heap_roots`, that are used by the program to record
object roots.  Probably you should put the definition of these data
types in a separate header that is included both by Whippet, via the
embedder API, and via users of Whippet, so that programs can populate
the root set.  In any case the embedder-API use of these structs is via
`gc_trace_mutator_roots` and `gc_trace_heap_roots`, two functions that
are passed a trace visitor function `trace_edge`, and which should call
that function on all edges from a given mutator or heap.  (Usually
mutator roots are per-thread roots, such as from the stack, and heap
roots are global roots.)

### Tracing objects

The `gc_trace_object` is responsible for calling the `trace_edge`
visitor function on all outgoing edges in an object.  It also includes a
`size` out-parameter, for when the collector wants to measure the size
of an object.  `trace_edge` and `size` may be `NULL`, in which case no
tracing or size computation should be performed.

### Tracing ephemerons and finalizers

Most kinds of GC-managed object are defined by the program, but the GC
itself has support for two specific object kind: ephemerons and
finalizers.  If the program allocates ephemerons, it should trace them
in the `gc_trace_object` function by calling `gc_trace_ephemeron` from
[`gc-ephemerons.h`](../api/gc-ephemerons.h).  Likewise if the program
allocates finalizers, it should trace them by calling
`gc_trace_finalizer` from [`gc-finalizer.h`](../api/gc-finalizer.h).

### Remembered-set bits

When built to support generational garbage collection, Whippet requires
that all "large" or potentially large objects have a flag bit reserved
for use of the garbage collector.  A large object is one whose size
exceeds the `gc_allocator_large_threshold()` (see
[`gc-attrs.h`](../api/gc-attrs.h)), which is a collector-specific value.
Currently the only generational collector is the in-place `mmc`
collector, whose large object threshold is 4096 bytes.  The
`gc_object_set_remembered`, `gc_object_is_remembered_nonatomic`, and
`gc_object_clear_remembered_nonatomic` embedder functions manage the
remembered bit.  Setting the remembered bit should be idempotent;
multiple threads can race to call `gc_object_set_remembered` and do not
synchronize.  The query and clear functions are called without
concurrent accessors and so don't have to be atomic.

### Forwarding objects

When built with a collector that moves objects, the embedder must also
allow for forwarding pointers to be installed in an object.  There are
two forwarding APIs: one that is atomic and one that isn't.

The nonatomic API is relatively simple; there is a
`gc_object_forwarded_nonatomic` function that returns an embedded
forwarding address, or 0 if the object is not yet forwarded, and
`gc_object_forward_nonatomic`, which installs a forwarding pointer.

The atomic API is gnarly.  It is used by parallel collectors, in which
multiple collector threads can race to evacuate an object.

There is a state machine associated with the `gc_atomic_forward`
structure from [`gc-forwarding.h`](../api/gc-forwarding.h); the embedder API
implements the state changes.  The collector calls
`gc_atomic_forward_begin` on an object to begin a forwarding attempt,
and the resulting `gc_atomic_forward` can be in the `NOT_FORWARDED`,
`FORWARDED`, or `BUSY` state.

If the `gc_atomic_forward`'s state is `BUSY`, the collector will call
`gc_atomic_forward_retry_busy`; a return value of 0 means the object is
still busy, because another thread is attempting to forward it.
Otherwise the forwarding state becomes either `FORWARDED`, if the other
thread succeeded in forwarding it, or `ABORTED`, indicating that the
other thread failed to forward it.

If the forwarding state is `FORWARDED`, the collector will call
`gc_atomic_forward_address` to get the new address.

If the forwarding state is `NOT_FORWARDED`, the collector may begin a
forwarding attempt by calling `gc_atomic_forward_acquire`.  The
resulting state is `ACQUIRED` on success, or `BUSY` if another thread
acquired the object in the meantime, or `FORWARDED` if another thread
acquired and completed the forwarding attempt.

An `ACQUIRED` object can then be forwarded via
`gc_atomic_forward_commit`, or the forwarding attempt can be aborted via
`gc_atomic_forward_abort`.  Also, when an object is acquired, the
collector may call `gc_atomic_forward_object_size` to compute how many
bytes to copy.  (The collector may choose instead to record object sizes
in a different way.)

All of these `gc_atomic_forward` functions are to be implemented by the
embedder.  Some programs may allocate a dedicated forwarding word in all
objects; some will manage to store the forwarding word in an initial
"tag" word, via a specific pattern for the low 3 bits of the tag that no
non-forwarded object will have.  The low-bits approach takes advantage
of the collector's minimum object alignment, in which objects are
aligned at least to an 8-byte boundary, so all objects have 0 for the
low 3 bits of their address.

### Conservative references

Finally, when configured in a mode in which root edges or intra-object
edges are *conservative*, the embedder can filter out which bit patterns
might be an object reference by implementing
`gc_is_valid_conservative_ref_displacement`.  Here, the collector masks
off the low bits of a conservative reference, and asks the embedder if a
value with those low bits might point to an object.  Usually the
embedder should return 1 only if the displacement is 0, but if the
program allows low-bit tagged pointers, then it should also return 1 for
those pointer tags.

### External objects

Sometimes a system will allocate objects outside the GC, for example on
the stack or in static data sections.  To support this use case, Whippet
allows the embedder to provide a `struct gc_extern_space`
implementation.  Whippet will call `gc_extern_space_start_gc` at the
start of each collection, and `gc_extern_space_finish_gc` at the end.
External objects will be visited by `gc_extern_space_mark`, which should
return nonzero if the object hasn't been seen before and needs to be
traced via `gc_trace_object` (coloring the object grey).  Note,
`gc_extern_space_mark` may be called concurrently from many threads; be
prepared!

## Configuration, compilation, and linking

To the user, Whippet presents an abstract API that does not encode the
specificities of any given collector.  Whippet currently includes four
implementations of that API: `semi`, a simple semi-space collector;
`pcc`, a parallel copying collector (like semi but multithreaded);
`bdw`, an implementation via the third-party
[Boehm-Demers-Weiser](https://github.com/ivmai/bdwgc) conservative
collector; and `mmc`, a mostly-marking collector inspired by Immix.

The program that embeds Whippet selects the collector implementation at
build-time.  In the case of the `mmc` collector, the program
also configures a specific collector mode, again at build-time:
generational or not, parallel or not, stack-conservative or not, and
heap-conservative or not.  It may be nice in the future to be able to
configure these at run-time, but for the time being they are
compile-time options so that adding new features doesn't change the
footprint of a more minimal collector.

Different collectors have different allocation strategies: for example,
the BDW collector allocates from thread-local freelists, whereas the
semi-space collector has a bump-pointer allocator.  A collector may also
expose a write barrier, for example to enable generational collection.
For performance reasons, many of these details can't be hidden behind an
opaque functional API: they must be inlined into call sites.  Whippet's
approach is to expose fast paths as part of its inline API, but which
are *parameterized* on attributes of the selected garbage collector.
The goal is to keep the user's code generic and avoid any code
dependency on the choice of garbage collector.  Because of inlining,
however, the choice of garbage collector does need to be specified when
compiling user code.

### Compiling the collector

Building the collector is not as easy as it should be.  As an embed-only
library, we don't get to choose the One True Build System and then just
build the software in that way; instead Whippet needs to be buildable
with any build system.  At some point we will have snippets that
embedders can include in their various build systems, but for now we
document the low-level structure, so that people can craft the
appropriate incantations for their program's build system.

Whippet consists of some collector-implementation-agnostic independent
modules, and then the collector implementation itself.  Though Whippet
tries to put performance-sensitive interfaces in header files, users
should also compile with link-time optimization (LTO) to remove any
overhead imposed by the division of code into separate compilation
units.

Usually you want to build with maximum optimization and no debugging
assertions.  Sometimes you want minimal optimization and all assertions.
Here's what we do, as a `Makefile` snippet:

```
DEFAULT_BUILD=opt
BUILD_CFLAGS_opt=-O2 -g -DNDEBUG
BUILD_CFLAGS_optdebug=-Og -g -DGC_DEBUG=1
BUILD_CFLAGS_debug=-O0 -g -DGC_DEBUG=1
BUILD_CFLAGS=$(BUILD_CFLAGS_$(or $(BUILD),$(DEFAULT_BUILD)))
```

So if you do just plain `make`, it will do an `opt` build.  You can
specify the build mode by setting `BUILD` on the command line, as in
`make BUILD=debug`.

Then for the actual compilation flags, we do:

```
CC=gcc
CFLAGS=-Wall -flto -fno-strict-aliasing -fvisibility=hidden -Wno-unused $(BUILD_CFLAGS)
INCLUDES=-I.
LDFLAGS=-lpthread -flto
COMPILE=$(CC) $(CFLAGS) $(INCLUDES)
```

The actual include directory (the dot in `-I.`) should be adjusted as
appropriate.

#### Collector-implementation-agnostic independent modules

There are currently four generic modules that don't depend on the choice
of collector.  The first is `gc-stack.o`, which has supporting code to
associate mutators (threads) with slices of the native stack, in order
to support conservative root-finding.

```
$(COMPILE) -o gc-stack.o -c gc-stack.c
```

The next is a generic options interface, to allow the user to
parameterize the collector at run-time, for example to implement a
specific heap sizing strategy.

```
$(COMPILE) -o gc-options.o -c gc-options.c
```

Next, where Whippet needs to get data from the operating system, for
example the number of processors available, it does so behind an
abstract interface that is selected at compile-time.  The only
implementation currently is for GNU/Linux, but it's a pretty thin layer,
so adding more systems should not be difficult.

```
PLATFORM=gnu-linux
$(COMPILE) -o gc-platform.o -c gc-platform-$(PLATFORM).c
```

Finally, something a little more complicated: ephemerons.  Ephemerons
are objects that make a weak association between a key and a value.  As
first-class objects, they need to be classifiable by the user system,
and notably via the `gc_trace_object` procedure, and therefore need to
have a header whose shape is understandable by the embedding program.
We do this by including the `gc-embedder-api.h` implementation, via
`-include`, in this case providing `foo-embedder.h`:

```
$(COMPILE) -include foo-embedder.h -o gc-ephemeron.o -c gc-ephemeron.c
```

As for ephemerons, finalizers also have their own compilation unit.

```
$(COMPILE) -include foo-embedder.h -o gc-finalizer.o -c gc-finalizer.c
```

#### Compile-time options

There are a number of pre-processor definitions that can parameterize
the collector at build-time:

 * `GC_DEBUG`: If nonzero, then enable debugging assertions.
 * `NDEBUG`: This one is a bit weird; if not defined, then enable
   debugging assertions and some debugging printouts.  Probably
   Whippet's use of `NDEBUG` should be folded in to `GC_DEBUG`.
 * `GC_PARALLEL`: If nonzero, then enable parallelism in the collector.
   Defaults to 0.
 * `GC_GENERATIONAL`: If nonzero, then enable generational collection.
   Defaults to zero.
 * `GC_PRECISE_ROOTS`: If nonzero, then collect precise roots via
   `gc_heap_roots` and `gc_mutator_roots`.  Defaults to zero.
 * `GC_CONSERVATIVE_ROOTS`: If nonzero, then scan the stack and static
   data sections for conservative roots.  Defaults to zero.  Not
   mutually exclusive with `GC_PRECISE_ROOTS`.
 * `GC_CONSERVATIVE_TRACE`: If nonzero, heap edges are scanned
   conservatively.  Defaults to zero.

Some collectors require specific compile-time options.  For example, the
semi-space collector has to be able to move all objects; this is not
compatible with conservative roots or heap edges.

#### Building `semi`

Finally, let's build a collector.  The simplest collector is the
semi-space collector.  The entirety of the implementation can be had by
compiling `semi.c`, providing the program's embedder API implementation
via `-include`:

```
$(COMPILE) -DGC_PRECISE_ROOTS=1 -include foo-embedder.h -o gc.o -c semi.c
```

#### Building `bdw`

The next simplest collector uses
[BDW-GC](https://github.com/ivmai/bdwgc).  This collector must scan the
roots and heap conservatively.  The collector is parallel if BDW-GC
itself was compiled with parallelism enabled.

```
$(COMPILE) -DGC_CONSERVATIVE_ROOTS=1 -DGC_CONSERVATIVE_TRACE=1 \
  `pkg-config --cflags bdw-gc` \
  -include foo-embedder.h -o gc.o -c bdw.c
```

#### Building `pcc`

The parallel copying collector is like `semi` but better in every way:
it supports multiple mutator threads, and evacuates in parallel if
multiple threads are available.

```
$(COMPILE) -DGC_PARALLEL=1 -DGC_PRECISE_ROOTS=1 \
  -include foo-embedder.h -o gc.o -c pcc.c
```

#### Building `mmc`

Finally, there is the mostly-marking collector.  It can collect roots
precisely or conservatively, trace precisely or conservatively, be
parallel or not, and be generational or not.

```
$(COMPILE) -DGC_PARALLEL=1 -DGC_GENERATIONAL=1 -DGC_PRECISE_ROOTS=1 \
  -include foo-embedder.h -o gc.o -c mvv.c
```

### Compiling your program

Any compilation unit that uses the GC API should have the same set of
compile-time options defined as when compiling the collector.
Additionally those compilation units should include the "attributes"
header for the collector in question, namely `semi-attrs.h`,
`bdw-attrs.h`, `pcc-attrs.h`, or `mmc-attrs.h`.  For example, for
parallel generational mmc, you might have:

```
$(COMPILE) -DGC_PARALLEL=1 -DGC_GENERATIONAL=1 -DGC_PRECISE_ROOTS=1 \
  -include mmc-attrs.h -o my-program.o -c my-program.c
```

### Linking the collector into your program

Finally to link, pass all objects to the linker.  You will want to
ensure that the linker enables `-flto`, for link-time optimization.  We
do it like this:

```
$(CC) $(LDFLAGS) -o my-program \
  my-program.o gc-stack.o gc-platform.o gc-options.o gc-ephemeron.o
```

## Using the collector

Whew!  So you finally built the thing!  Did you also link it into your
program?  No, because your program isn't written yet?  Well this section
is for you: we describe the user-facing API of Whippet, where "user" in
this case denotes the embedding program.

What is the API, you ask?  It is in [`gc-api.h`](../api/gc-api.h).

### Heaps and mutators

To start with, you create a *heap*.  Usually an application will create
just one heap.  A heap has one or more associated *mutators*.  A mutator
is a thread-specific handle on the heap.  Allocating objects requires a
mutator.

The initial heap and mutator are created via `gc_init`, which takes
three logical input parameters: the *options*, a stack base address, and
an *event listener*.  The options specify the initial heap size and so
on.  The event listener is mostly for gathering statistics; see below
for more.  `gc_init` returns the new heap as an out parameter, and also
returns a mutator for the current thread.

To make a new mutator for a new thread, use `gc_init_for_thread`.  When
a thread is finished with its mutator, call `gc_finish_for_thread`.
Each thread that allocates or accesses GC-managed objects should have
its own mutator.

The stack base address allows the collector to scan the mutator's stack,
if conservative root-finding is enabled.  It may be omitted in the call
to `gc_init` and `gc_init_for_thread`; passing `NULL` tells Whippet to
ask the platform for the stack bounds of the current thread.  Generally
speaking, this works on all platforms for the main thread, but not
necessarily on other threads.  The most reliable solution is to
explicitly obtain a base address by trampolining through
`gc_call_with_stack_addr`.

### Options

There are some run-time parameters that programs and users might want to
set explicitly; these are encapsulated in the *options*.  Make an
options object with `gc_allocate_options()`; this object will be
consumed by its `gc_init`.  Then, the most convenient thing is to set
those options from `gc_options_parse_and_set_many` from a string passed
on the command line or an environment variable, but to get there we have
to explain the low-level first.  There are a few options that are
defined for all collectors:

 * `GC_OPTION_HEAP_SIZE_POLICY`: How should we size the heap?  Either
   it's `GC_HEAP_SIZE_FIXED` (which is 0), in which the heap size is
   fixed at startup; or `GC_HEAP_SIZE_GROWABLE` (1), in which the heap
   may grow but will never shrink; or `GC_HEAP_SIZE_ADAPTIVE` (2), in
   which we take an
   [adaptive](https://wingolog.org/archives/2023/01/27/three-approaches-to-heap-sizing)
   approach, depending on the rate of allocation and the cost of
   collection.  Really you want the adaptive strategy, but if you are
   benchmarking you definitely want the fixed policy.
 * `GC_OPTION_HEAP_SIZE`: The initial heap size.  For a
   `GC_HEAP_SIZE_FIXED` policy, this is also the final heap size.  In
   bytes.
 * `GC_OPTION_MAXIMUM_HEAP_SIZE`: For growable and adaptive heaps, the
   maximum heap size, in bytes.
 * `GC_OPTION_HEAP_SIZE_MULTIPLIER`: For growable heaps, the target heap
   multiplier.  A heap multiplier of 2.5 means that for 100 MB of live
   data, the heap should be 250 MB.
 * `GC_OPTION_HEAP_EXPANSIVENESS`: For adaptive heap sizing, an
   indication of how much free space will be given to heaps, as a
   proportion of the square root of the live data size.
 * `GC_OPTION_PARALLELISM`: How many threads to devote to collection
   tasks during GC pauses.  By default, the current number of
   processors, with a maximum of 8.

You can set these options via `gc_option_set_int` and so on; see
[`gc-options.h`](../api/gc-options.h).  Or, you can parse options from
strings: `heap-size-policy`, `heap-size`, `maximum-heap-size`, and so
on.  Use `gc_option_from_string` to determine if a string is really an
option.  Use `gc_option_parse_and_set` to parse a value for an option.
Use `gc_options_parse_and_set_many` to parse a number of comma-delimited
*key=value* settings from a string.

### Allocation

So you have a heap and a mutator; great!  Let's allocate!  Call
`gc_allocate`, passing the mutator and the number of bytes to allocate.

There is also `gc_allocate_fast`, which is an inlined fast-path.  If
that returns NULL, you need to call `gc_allocate_slow`.  The advantage
of this API is that you can punt some root-saving overhead to the slow
path.

Allocation always succeeds.  If it doesn't, it kills your program.  The
bytes in the resulting allocation will be initialized to 0.

The allocation fast path is parameterized by collector-specific
attributes.  JIT compilers can also read those attributes to emit
appropriate inline code that replicates the logic of `gc_allocate_fast`.

### Write barriers

For some collectors, mutators have to tell the collector whenever they
mutate an object.  They tell the collector by calling a *write barrier*;
in Whippet this is currently the case only for generational collectors.

The write barrier is `gc_write_barrier`; see `gc-api.h` for its
parameters.

As with allocation, the fast path for the write barrier is parameterized
by collector-specific attributes, to allow JIT compilers to inline write
barriers.

### Safepoints

Sometimes Whippet will need to synchronize all threads, for example as
part of the "stop" phase of a stop-and-copy semi-space collector.
Whippet stops at *safepoints*.  At a safepoint, all mutators must be
able to enumerate all of their edges to live objects.

Whippet has cooperative safepoints: mutators have to periodically call
into the collector to potentially synchronize with other mutators.
`gc_allocate_slow` is a safepoint, so if you a bunch of threads that are
all allocating, usually safepoints are reached in a more-or-less prompt
fashion.  But if a mutator isn't allocating, it either needs to
temporarily mark itself as inactive by trampolining through
`gc_call_without_gc`, or it should arrange to periodically call
`gc_safepoint`.  Marking a mutator as inactive is the right strategy
for, for example, system calls that might block.  Periodic safepoints is
better for code that is active but not allocating.

Thing is, though, `gc_safepoint` is not yet implemented :)  It will be,
though!

Also, the BDW collector actually uses pre-emptive safepoints: it stops
threads via POSIX signals.  `gc_safepoint` is (or will be) a no-op with
BDW.

### Statistics

Sometimes a program would like some information from the GC: how many
bytes and objects have been allocated?  How much time has been spent in
the GC?  How many times has GC run, and how many of those were minor
collections?  What's the maximum pause time?  Stuff like that.

Instead of collecting a fixed set of information, Whippet emits
callbacks when the collector reaches specific states.  The embedder
provides a *listener* for these events when initializing the collector.

The listener interface is defined in
[`gc-event-listener.h`](../api/gc-event-listener.h).  Whippet ships with
two listener implementations,
[`GC_NULL_EVENT_LISTENER`](../api/gc-null-event-listener.h), and
[`GC_BASIC_STATS`](../api/gc-basic-stats.h).  Most embedders will want
their own listener, but starting with the basic stats listener is not a
bad option:

```
#include "gc-api.h"
#include "gc-basic-stats.h"
#include <stdio.h>

int main() {
  struct gc_options *options = NULL;
  struct gc_heap *heap;
  struct gc_mutator *mut;
  struct gc_basic_stats stats;
  gc_init(options, NULL, &heap, &mut, GC_BASIC_STATS, &stats);
  // ...
  gc_basic_stats_finish(&stats);
  gc_basic_stats_print(&stats, stdout);
}
```

As you can see, `GC_BASIC_STATS` expands to a `struct gc_event_listener`
definition.  We pass an associated pointer to a `struct gc_basic_stats`
instance which will be passed to the listener at every event.

The output of this program might be something like:

```
Completed 19 major collections (0 minor).
654.597 ms total time (385.235 stopped).
Heap size is 167.772 MB (max 167.772 MB); peak live data 55.925 MB.
```

There are currently three different sorts of events: heap events to
track heap growth, collector events to time different parts of
collection, and mutator events to indicate when specific mutators are
stopped.

There are three heap events:

 * `init(void* data, size_t heap_size)`: Called during `gc_init`, to
   allow the listener to initialize its associated state.
 * `heap_resized(void* data, size_t new_size)`: Called if the heap grows
   or shrinks.
 * `live_data_size(void* data, size_t size)`: Called periodically when
   the collector learns about live data size.

The collection events form a kind of state machine, and are called in
this order:

 * `requesting_stop(void* data)`: Called when the collector asks
   mutators to stop.
 * `waiting_for_stop(void* data)`: Called when the collector has done
   all the pre-stop work that it is able to and is just waiting on
   mutators to stop.
 * `mutators_stopped(void* data)`: Called when all mutators have
   stopped; the trace phase follows.
 * `prepare_gc(void* data, enum gc_collection_kind gc_kind)`: Called
   to indicate which kind of collection is happening.
 * `roots_traced(void* data)`: Called when roots have been visited.
 * `heap_traced(void* data)`: Called when the whole heap has been
   traced.
 * `ephemerons_traced(void* data)`: Called when the [ephemeron
   fixpoint](https://wingolog.org/archives/2023/01/24/parallel-ephemeron-tracing)
   has been reached.
 * `restarting_mutators(void* data)`: Called right before the collector
   restarts mutators.

The collectors in Whippet will call all of these event handlers, but it
may be that they are called conservatively: for example, the
single-mutator, single-collector semi-space collector will never have to
wait for mutators to stop.  It will still call the functions, though!

Finally, there are the mutator events:
 * `mutator_added(void* data) -> void*`: The only event handler that
   returns a value, called when a new mutator is added.  The parameter
   is the overall event listener data, and the result is
   mutator-specific data.  The rest of the mutator events pass this
   mutator-specific data instead.
 * `mutator_cause_gc(void* mutator_data)`: Called when a mutator causes
   GC, either via allocation or an explicit `gc_collect` call.
 * `mutator_stopping(void* mutator_data)`: Called when a mutator has
   received the signal to stop.  It may perform some marking work before
   it stops.
 * `mutator_stopped(void* mutator_data)`: Called when a mutator parks
   itself.
 * `mutator_restarted(void* mutator_data)`: Called when a mutator
   restarts.
 * `mutator_removed(void* mutator_data)`: Called when a mutator goes
   away.

Note that these events handlers shouldn't really do much.  In
particular, they shouldn't call into the Whippet API, and they shouldn't
even access GC-managed objects.  Event listeners are really about
statistics and profiling and aren't a place to mutate the object graph.

### Ephemerons

Whippet supports ephemerons, first-class objects that weakly associate
keys with values.  If the an ephemeron's key ever becomes unreachable,
the ephemeron becomes dead and loses its value.

The user-facing API is in [`gc-ephemeron.h`](../api/gc-ephemeron.h).  To
allocate an ephemeron, call `gc_allocate_ephemeron`, then initialize its
key and value via `gc_ephemeron_init`.  Get the key and value via
`gc_ephemeron_key` and `gc_ephemeron_value`, respectively.

In Whippet, ephemerons can be linked together in a chain.  During GC, if
an ephemeron's chain points to a dead ephemeron, that link will be
elided, allowing the dead ephemeron itself to be collected.  In that
way, ephemerons can be used to build weak data structures such as weak
maps.

Weak data structures are often shared across multiple threads, so all
routines to access and modify chain links are atomic.  Use
`gc_ephemeron_chain_head` to access the head of a storage location that
points to an ephemeron; push a new ephemeron on a location with
`gc_ephemeron_chain_push`; and traverse a chain with
`gc_ephemeron_chain_next`.

An ephemeron association can be removed via `gc_ephemeron_mark_dead`.

### Finalizers

A finalizer allows the embedder to be notified when an object becomes
unreachable.

A finalizer has a priority.  When the heap is created, the embedder
should declare how many priorities there are.  Lower-numbered priorities
take precedence; if an object has a priority-0 finalizer outstanding,
that will prevent any finalizer at level 1 (or 2, ...)  from firing
until no priority-0 finalizer remains.

Call `gc_attach_finalizer`, from `gc-finalizer.h`, to attach a finalizer
to an object.

A finalizer also references an associated GC-managed closure object.
A finalizer's reference to the closure object is strong:  if a
finalizer's closure closure references its finalizable object,
directly or indirectly, the finalizer will never fire.

When an object with a finalizer becomes unreachable, it is added to a
queue.  The embedder can call `gc_pop_finalizable` to get the next
finalizable object and its associated closure.  At that point the
embedder can do anything with the object, including keeping it alive.
Ephemeron associations will still be present while the finalizable
object is live.  Note however that any objects referenced by the
finalizable object may themselves be already finalized; finalizers are
enqueued for objects when they become unreachable, which can concern
whole subgraphs of objects at once.

The usual way for an embedder to know when the queue of finalizable
object is non-empty is to call `gc_set_finalizer_callback` to
provide a function that will be invoked when there are pending
finalizers.

Arranging to call `gc_pop_finalizable` and doing something with the
finalizable object and closure is the responsibility of the embedder.
The embedder's finalization action can end up invoking arbitrary code,
so unless the embedder imposes some kind of restriction on what
finalizers can do, generally speaking finalizers should be run in a
dedicated thread instead of recursively from within whatever mutator
thread caused GC.  Setting up such a thread is the responsibility of the
mutator.  `gc_pop_finalizable` is thread-safe, allowing multiple
finalization threads if that is appropriate.

`gc_allocate_finalizer` returns a finalizer, which is a fresh GC-managed
heap object.  The mutator should then directly attach it to an object
using `gc_finalizer_attach`.  When the finalizer is fired, it becomes
available to the mutator via `gc_pop_finalizable`.