1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-05-13 17:20:21 +02:00

Some README updates

This commit is contained in:
Andy Wingo 2022-05-11 22:25:09 +02:00
parent 7ac0b5bb4b
commit c39e26159d

106
README.md
View file

@ -1,27 +1,90 @@
# GC workbench # Whippet Garbage Collector
This repository is a workbench for implementing different GCs. It's a This repository is for development of Whippet, a new garbage collector
scratch space. implementation, eventually for use in [Guile
Scheme](https://gnu.org/s/guile).
## Design
Whippet is a mark-region collector, like
[Immix](http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf).
See also the lovely detailed [Rust
implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf).
To a first approximation, Whippet is a whole-heap Immix collector. See
the Immix paper for full details, but basically Immix divides the heap
into 32kB blocks, and then divides those blocks into 128B lines. An
Immix allocation never spans blocks; allocations larger than 8kB go into
a separate large object space. Mutators request blocks from the global
store and allocate into those blocks using bump-pointer allocation.
When all blocks are consumed, Immix stops the world and traces the
object graph, marking objects but also the lines that objects are on.
After marking, blocks contain some lines with live objects and others
that are completely free. Spans of free lines are called holes. When a
mutator gets a recycled block from the global block store, it allocates
into those holes. Also, sometimes Immix can choose to evacuate rather
than mark. Bump-pointer-into-holes allocation is quite compatible with
conservative roots, so it's an interesting option for Guile, which has a
lot of legacy C API users.
The essential difference of Whippet from Immix stems from a simple
observation: Immix needs a side table of line mark bytes and also a mark
bit or bits in each object (or in a side table). But if instead you
choose to store mark bytes instead of bits (for concurrency reasons) in
a side table, with one mark byte per granule (unit of allocation,
perhaps 16 bytes), then you effectively have a line mark table where the
granule size is the line size. You can bump-pointer allocate into holes
in the mark byte table.
You might think this is a bad tradeoff, and perhaps it is: I don't know
yet. If your granule size is two pointers, then one mark byte per
granule is 6.25% overhead on 64-bit, or 12.5% on 32-bit. Especially on
32-bit, it's a lot! On the other hand, you don't need GC bits in the
object itself, and you get a number of other benefits from the mark byte
table -- you can also stuff other per-object data there, such as pinning
bits, nursery and remset bits, multiple mark colors for concurrent
marking, and you can also use the mark byte (which is now a metadata
byte) to record the object end, so that finding holes in a block can
just read the mark table and can avoid looking at object memory.
Other ideas in whippet:
* Minimize stop-the-world phase via parallel marking and punting all
sweeping to mutators
* Enable mutator parallelism via lock-free block acquisition and lazy
statistics collation
* Allocate block space using aligned 4 MB slabs, with embedded metadata
to allow metadata bytes, slab headers, and block metadata to be
located via address arithmetic
* Facilitate conservative collection via mark byte array, oracle for
"does this address start an object"
* Enable in-place generational collection via nursery bit in metadata
byte for new allocations, remset bit for objects that should be
traced for nursery roots, and a card table with one entry per 256B or
so; but write barrier and generational trace not yet implemented
* Enable concurrent marking by having three mark bit states (dead,
survivor, marked) that rotate at each collection, and sweeping a
block clears metadata for dead objects; but concurrent marking and
associated SATB barrier not yet implemented
## What's there ## What's there
There's just the (modified) GCBench, which is an old but standard There are currently three collectors:
benchmark that allocates different sizes of binary trees. It takes a
heap of 25 MB or so, not very large, and causes somewhere between 20 and
50 collections, running in 100 to 500 milliseconds on 2022 machines.
Then there are currently three collectors:
- `bdw.h`: The external BDW-GC conservative parallel stop-the-world - `bdw.h`: The external BDW-GC conservative parallel stop-the-world
mark-sweep segregated-fits collector with lazy sweeping. mark-sweep segregated-fits collector with lazy sweeping.
- `semi.h`: Semispace copying collector. - `semi.h`: Semispace copying collector.
- `mark-sweep.h`: Stop-the-world mark-sweep segregated-fits collector - `mark-sweep.h`: The whippet collector. Two different marking algorithms:
with lazy sweeping. Two different marking algorithms:
single-threaded and parallel. single-threaded and parallel.
The two latter collectors reserve one word per object on the header, The two latter collectors reserve one word per object on the header,
which makes them collect more frequently than `bdw` because the `Node` which might make them collect more frequently than `bdw` because the
data type takes 32 bytes instead of 24 bytes. `Node` data type takes 32 bytes instead of 24 bytes.
These collectors are sketches and exercises for improving Guile's These collectors are sketches and exercises for improving Guile's
garbage collector. Guile currently uses BDW-GC. In Guile if we have an garbage collector. Guile currently uses BDW-GC. In Guile if we have an
@ -34,6 +97,16 @@ more efficient sweeping, for mark-sweep), to allow the application to
know what kind an object is, to allow the GC to find references within know what kind an object is, to allow the GC to find references within
the object, to allow the GC to compute the object's size, and so on. the object, to allow the GC to compute the object's size, and so on.
There's just the (modified) GCBench, which is an old but standard
benchmark that allocates different sizes of binary trees. As parameters
it takes a heap multiplier and a number of mutator threads. We
analytically compute the peak amount of live data and then size the GC
heap as a multiplier of that size. It has a peak heap consumption of 10
MB or so per mutator thread: not very large. At a 2x heap multiplier,
it causes about 30 collections for the whippet collector, and runs
somewhere around 200-400 milliseconds in single-threaded mode, on the
machines I have in 2022.
The GCBench benchmark is small but then again many Guile processes also The GCBench benchmark is small but then again many Guile processes also
are quite short-lived, so perhaps it is useful to ensure that small are quite short-lived, so perhaps it is useful to ensure that small
heaps remain lightweight. heaps remain lightweight.
@ -47,6 +120,7 @@ chooses to do so. We assume that object references within a heap object
can be precisely identified. (The current BDW-GC scans for references can be precisely identified. (The current BDW-GC scans for references
conservatively even on the heap.) conservatively even on the heap.)
A generationa
A likely good solution for Guile would be an [Immix A likely good solution for Guile would be an [Immix
collector](https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf) collector](https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf)
with conservative roots, and a parallel stop-the-world mark/evacuate with conservative roots, and a parallel stop-the-world mark/evacuate
@ -80,6 +154,12 @@ majority of use cases.
- [ ] Implement generational GC with semispace nursery and Immix - [ ] Implement generational GC with semispace nursery and Immix
old generation. old generation.
## About the name
It sounds better than WIP (work-in-progress) garbage collector, doesn't
it? Also apparently a whippet is a kind of dog that is fast for its
size. It would be nice if whippet-gc turns out to have this property.
## License ## License
gcbench.c, MT_GCBench.c, and MT_GCBench2.c are from gcbench.c, MT_GCBench.c, and MT_GCBench2.c are from