Update README

2025-07-12 20:20:29 +02:00 · 2022-05-15 22:06:41 +02:00 · 2022-05-15 22:06:41 +02:00 · 061d92d125
commit 061d92d125
parent 69d7ff83dd
1 changed files with 88 additions and 85 deletions
--- a/README.md
+++ b/README.md
@ -6,26 +6,26 @@ Scheme](https://gnu.org/s/guile).
 ## Design
-Whippet is a mark-region collector, like
+Whippet is mainly a mark-region collector, like
 [Immix](http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf).
 See also the lovely detailed [Rust
 implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf).
-To a first approximation, Whippet is a whole-heap Immix collector.  See
+To a first approximation, Whippet is a whole-heap Immix collector with a
-the Immix paper for full details, but basically Immix divides the heap
+large object space on the side.  See the Immix paper for full details,
-into 32kB blocks, and then divides those blocks into 128B lines.  An
+but basically Immix divides the heap into 32kB blocks, and then divides
-Immix allocation never spans blocks; allocations larger than 8kB go into
+those blocks into 128B lines.  An Immix allocation never spans blocks;
-a separate large object space.  Mutators request blocks from the global
+allocations larger than 8kB go into a separate large object space.
-store and allocate into those blocks using bump-pointer allocation.
+Mutators request blocks from the global store and allocate into those
-When all blocks are consumed, Immix stops the world and traces the
+blocks using bump-pointer allocation.  When all blocks are consumed,
-object graph, marking objects but also the lines that objects are on.
+Immix stops the world and traces the object graph, marking objects but
-After marking, blocks contain some lines with live objects and others
+also the lines that objects are on.  After marking, blocks contain some
-that are completely free.  Spans of free lines are called holes.  When a
+lines with live objects and others that are completely free.  Spans of
-mutator gets a recycled block from the global block store, it allocates
+free lines are called holes.  When a mutator gets a recycled block from
-into those holes.  Also, sometimes Immix can choose to evacuate rather
+the global block store, it allocates into those holes.  Also, sometimes
-than mark.  Bump-pointer-into-holes allocation is quite compatible with
+Immix can choose to evacuate rather than mark.  Bump-pointer-into-holes
-conservative roots, so it's an interesting option for Guile, which has a
+allocation is quite compatible with conservative roots, so it's an
-lot of legacy C API users.
+interesting option for Guile, which has a lot of legacy C API users.
 The essential difference of Whippet from Immix stems from a simple
 observation: Immix needs a side table of line mark bytes and also a mark
@ -50,7 +50,7 @@ is now a metadata byte) to record the object end, so that finding holes
 in a block can just read the mark table and can avoid looking at object
 memory.
-Other ideas in whippet:
+Other ideas in Whippet:
 * Minimize stop-the-world phase via parallel marking and punting all
   sweeping to mutators
@ -77,85 +77,88 @@ Other ideas in whippet:
 ## What's there
-There are currently three collectors:
+This repository is a workspace for Whippet implementation.  As such, it
 has files implementing Whippet itself.  It also has some benchmarks to
 use in optimizing Whippet:
 - [`mt-gcbench.c`](./mt-gcbench.c): The multi-threaded [GCBench
   benchmark](https://hboehm.info/gc/gc_bench.html).  An old but
   standard benchmark that allocates different sizes of binary trees.
   As parameters it takes a heap multiplier and a number of mutator
   threads.  We analytically compute the peak amount of live data and
   then size the GC heap as a multiplier of that size.  It has a peak
   heap consumption of 10 MB or so per mutator thread: not very large.
   At a 2x heap multiplier, it causes about 30 collections for the
   whippet collector, and runs somewhere around 200-400 milliseconds in
   single-threaded mode, on the machines I have in 2022.  For low thread
   counts, the GCBench benchmark is small; but then again many Guile
   processes also are quite short-lived, so perhaps it is useful to
   ensure that small heaps remain lightweight.
 - [`quads.c`](./quads.c): A synthetic benchmark that allocates quad
   trees.  The mutator begins by allocating one long-lived tree of depth
   N, and then allocates 13% of the heap in depth-3 trees, 20 times,
   simulating a fixed working set and otherwise an allocation-heavy
   workload.  By observing the times to allocate 13% of the heap in
   garbage we can infer mutator overheads, and also note the variance
   for the cycles in which GC hits.
 The repository has two other collector implementations, to appropriately
 situate Whippet's performance in context:
 - `bdw.h`: The external BDW-GC conservative parallel stop-the-world
   mark-sweep segregated-fits collector with lazy sweeping.
 - `semi.h`: Semispace copying collector.
- - `mark-sweep.h`: The whippet collector.  Two different marking algorithms:
+ - `mark-sweep.h`: The whippet collector.  Two different marking
-   single-threaded and parallel.
+   implementations: single-threaded and parallel.
-The two latter collectors reserve one word per object on the header,
+## Guile
 which might make them collect more frequently than `bdw` because the
 `Node` data type takes 32 bytes instead of 24 bytes.
-These collectors are sketches and exercises for improving Guile's
+If the Whippet collector works out, it could replace Guile's garbage
-garbage collector.  Guile currently uses BDW-GC.  In Guile if we have an
+collector.  Guile currently uses BDW-GC.  Guile has a widely used C API
-object reference we generally have to be able to know what kind of
+and implements part of its run-time in C.  For this reason it may be
-object it is, because there are few global invariants enforced by
+infeasible to require precise enumeration of GC roots -- we may need to
-typing.  Therefore it is reasonable to consider allowing the GC and the
+allow GC roots to be conservatively identified from data sections and
-application to share the first word of an object, for example to maybe
+from stacks.  Such conservative roots would be pinned, but other objects
-store a mark bit (though an on-the-side mark byte seems to allow much
+can be moved by the collector if it chooses to do so.  We assume that
-more efficient sweeping, for mark-sweep), to allow the application to
+object references within a heap object can be precisely identified.
-know what kind an object is, to allow the GC to find references within
+(However, Guile currently uses BDW-GC in its default configuration,
-the object, to allow the GC to compute the object's size, and so on.
+which scans for references conservatively even on the heap.)
-There's just the (modified) GCBench, which is an old but standard
+The existing C API allows direct access to mutable object fields,
-benchmark that allocates different sizes of binary trees.  As parameters
+without the mediation of read or write barriers.  Therefore it may be
-it takes a heap multiplier and a number of mutator threads.  We
+impossible to switch to collector strategies that need barriers, such as
-analytically compute the peak amount of live data and then size the GC
+generational or concurrent collectors.  However, we shouldn't write off
-heap as a multiplier of that size.  It has a peak heap consumption of 10
+this possibility entirely; an ideal replacement for Guile's GC will
-MB or so per mutator thread: not very large.  At a 2x heap multiplier,
+offer the possibility of migration to other GC designs without imposing
-it causes about 30 collections for the whippet collector, and runs
+new requirements on C API users in the initial phase.
 somewhere around 200-400 milliseconds in single-threaded mode, on the
 machines I have in 2022.
-The GCBench benchmark is small but then again many Guile processes also
+In this regard, the Whippet experiment also has the goal of identifying
-are quite short-lived, so perhaps it is useful to ensure that small
+a smallish GC abstraction in Guile, so that we might consider evolving
-heaps remain lightweight.
+GC implementation in the future without too much pain.  If we switch
-
+away from BDW-GC, we should be able to evaluate that it's a win for a
-Guile has a widely used C API and implements part of its run-time in C.
+large majority of use cases.
 For this reason it may be infeasible to require precise enumeration of
 GC roots -- we may need to allow GC roots to be conservatively
 identified from data sections and from stacks.  Such conservative roots
 would be pinned, but other objects can be moved by the collector if it
 chooses to do so.  We assume that object references within a heap object
 can be precisely identified.  (The current BDW-GC scans for references
 conservatively even on the heap.)
 A generationa
 A likely good solution for Guile would be an [Immix
 collector](https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf)
 with conservative roots, and a parallel stop-the-world mark/evacuate
 phase.  We would probably follow the [Rust
 implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf),
 more or less, with support for per-line pinning.  In an ideal world we
 would work out some kind of generational solution as well, either via a
 semispace nursery or via sticky mark bits, but this requires Guile to
 use a write barrier -- something that's possible to do within Guile
 itself but it's unclear if we can extend this obligation to users of
 Guile's C API.
 In any case, these experiments also have the goal of identifying a
 smallish GC abstraction in Guile, so that we might consider evolving GC
 implementation in the future without too much pain.  If we switch away
 from BDW-GC, we should be able to evaluate that it's a win for a large
 majority of use cases.
 ## To do
- - [X] Implement a parallel marker for the mark-sweep collector.
+### Missing features before Guile can use Whippet
- - [X] Adapt all GC implementations to allow multiple mutator threads.
+
-   Update gcbench.c.
+ - [ ] Pinning
- - [ ] Implement precise non-moving Immix whole-heap collector.
+ - [ ] Conservative stacks
- - [ ] Add evacuation to Immix whole-heap collector.
+ - [ ] Conservative data segments
- - [ ] Add parallelism to Immix stop-the-world phase.
+ - [ ] Heap growth/shrinking
- - [ ] Implement conservative root-finding for the mark-sweep collector.
+ - [ ] Debugging/tracing
- - [ ] Implement conservative root-finding and pinning for Immix.
+ - [ ] Finalizers
- - [ ] Implement generational GC with semispace nursery and mark-sweep
+ - [ ] Weak references / weak maps
-   old generation.
+
- - [ ] Implement generational GC with semispace nursery and Immix
+### Features that would improve Whippet performance
-   old generation.
+
 - [ ] Immix-style opportunistic evacuation
 - [ ] Overflow allocation
 - [ ] Lazy identification of empty blocks
 - [ ] Generational GC via sticky mark bits
 - [ ] Generational GC with semi-space nursery
 - [ ] Concurrent marking with SATB barrier
 ## About the name