mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-05-12 16:50:22 +02:00
Update README
This commit is contained in:
parent
69d7ff83dd
commit
061d92d125
1 changed files with 88 additions and 85 deletions
173
README.md
173
README.md
|
@ -6,26 +6,26 @@ Scheme](https://gnu.org/s/guile).
|
||||||
|
|
||||||
## Design
|
## Design
|
||||||
|
|
||||||
Whippet is a mark-region collector, like
|
Whippet is mainly a mark-region collector, like
|
||||||
[Immix](http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf).
|
[Immix](http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf).
|
||||||
See also the lovely detailed [Rust
|
See also the lovely detailed [Rust
|
||||||
implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf).
|
implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf).
|
||||||
|
|
||||||
To a first approximation, Whippet is a whole-heap Immix collector. See
|
To a first approximation, Whippet is a whole-heap Immix collector with a
|
||||||
the Immix paper for full details, but basically Immix divides the heap
|
large object space on the side. See the Immix paper for full details,
|
||||||
into 32kB blocks, and then divides those blocks into 128B lines. An
|
but basically Immix divides the heap into 32kB blocks, and then divides
|
||||||
Immix allocation never spans blocks; allocations larger than 8kB go into
|
those blocks into 128B lines. An Immix allocation never spans blocks;
|
||||||
a separate large object space. Mutators request blocks from the global
|
allocations larger than 8kB go into a separate large object space.
|
||||||
store and allocate into those blocks using bump-pointer allocation.
|
Mutators request blocks from the global store and allocate into those
|
||||||
When all blocks are consumed, Immix stops the world and traces the
|
blocks using bump-pointer allocation. When all blocks are consumed,
|
||||||
object graph, marking objects but also the lines that objects are on.
|
Immix stops the world and traces the object graph, marking objects but
|
||||||
After marking, blocks contain some lines with live objects and others
|
also the lines that objects are on. After marking, blocks contain some
|
||||||
that are completely free. Spans of free lines are called holes. When a
|
lines with live objects and others that are completely free. Spans of
|
||||||
mutator gets a recycled block from the global block store, it allocates
|
free lines are called holes. When a mutator gets a recycled block from
|
||||||
into those holes. Also, sometimes Immix can choose to evacuate rather
|
the global block store, it allocates into those holes. Also, sometimes
|
||||||
than mark. Bump-pointer-into-holes allocation is quite compatible with
|
Immix can choose to evacuate rather than mark. Bump-pointer-into-holes
|
||||||
conservative roots, so it's an interesting option for Guile, which has a
|
allocation is quite compatible with conservative roots, so it's an
|
||||||
lot of legacy C API users.
|
interesting option for Guile, which has a lot of legacy C API users.
|
||||||
|
|
||||||
The essential difference of Whippet from Immix stems from a simple
|
The essential difference of Whippet from Immix stems from a simple
|
||||||
observation: Immix needs a side table of line mark bytes and also a mark
|
observation: Immix needs a side table of line mark bytes and also a mark
|
||||||
|
@ -50,7 +50,7 @@ is now a metadata byte) to record the object end, so that finding holes
|
||||||
in a block can just read the mark table and can avoid looking at object
|
in a block can just read the mark table and can avoid looking at object
|
||||||
memory.
|
memory.
|
||||||
|
|
||||||
Other ideas in whippet:
|
Other ideas in Whippet:
|
||||||
|
|
||||||
* Minimize stop-the-world phase via parallel marking and punting all
|
* Minimize stop-the-world phase via parallel marking and punting all
|
||||||
sweeping to mutators
|
sweeping to mutators
|
||||||
|
@ -77,85 +77,88 @@ Other ideas in whippet:
|
||||||
|
|
||||||
## What's there
|
## What's there
|
||||||
|
|
||||||
There are currently three collectors:
|
This repository is a workspace for Whippet implementation. As such, it
|
||||||
|
has files implementing Whippet itself. It also has some benchmarks to
|
||||||
|
use in optimizing Whippet:
|
||||||
|
|
||||||
|
- [`mt-gcbench.c`](./mt-gcbench.c): The multi-threaded [GCBench
|
||||||
|
benchmark](https://hboehm.info/gc/gc_bench.html). An old but
|
||||||
|
standard benchmark that allocates different sizes of binary trees.
|
||||||
|
As parameters it takes a heap multiplier and a number of mutator
|
||||||
|
threads. We analytically compute the peak amount of live data and
|
||||||
|
then size the GC heap as a multiplier of that size. It has a peak
|
||||||
|
heap consumption of 10 MB or so per mutator thread: not very large.
|
||||||
|
At a 2x heap multiplier, it causes about 30 collections for the
|
||||||
|
whippet collector, and runs somewhere around 200-400 milliseconds in
|
||||||
|
single-threaded mode, on the machines I have in 2022. For low thread
|
||||||
|
counts, the GCBench benchmark is small; but then again many Guile
|
||||||
|
processes also are quite short-lived, so perhaps it is useful to
|
||||||
|
ensure that small heaps remain lightweight.
|
||||||
|
|
||||||
|
- [`quads.c`](./quads.c): A synthetic benchmark that allocates quad
|
||||||
|
trees. The mutator begins by allocating one long-lived tree of depth
|
||||||
|
N, and then allocates 13% of the heap in depth-3 trees, 20 times,
|
||||||
|
simulating a fixed working set and otherwise an allocation-heavy
|
||||||
|
workload. By observing the times to allocate 13% of the heap in
|
||||||
|
garbage we can infer mutator overheads, and also note the variance
|
||||||
|
for the cycles in which GC hits.
|
||||||
|
|
||||||
|
The repository has two other collector implementations, to appropriately
|
||||||
|
situate Whippet's performance in context:
|
||||||
|
|
||||||
- `bdw.h`: The external BDW-GC conservative parallel stop-the-world
|
- `bdw.h`: The external BDW-GC conservative parallel stop-the-world
|
||||||
mark-sweep segregated-fits collector with lazy sweeping.
|
mark-sweep segregated-fits collector with lazy sweeping.
|
||||||
- `semi.h`: Semispace copying collector.
|
- `semi.h`: Semispace copying collector.
|
||||||
- `mark-sweep.h`: The whippet collector. Two different marking algorithms:
|
- `mark-sweep.h`: The whippet collector. Two different marking
|
||||||
single-threaded and parallel.
|
implementations: single-threaded and parallel.
|
||||||
|
|
||||||
The two latter collectors reserve one word per object on the header,
|
## Guile
|
||||||
which might make them collect more frequently than `bdw` because the
|
|
||||||
`Node` data type takes 32 bytes instead of 24 bytes.
|
|
||||||
|
|
||||||
These collectors are sketches and exercises for improving Guile's
|
If the Whippet collector works out, it could replace Guile's garbage
|
||||||
garbage collector. Guile currently uses BDW-GC. In Guile if we have an
|
collector. Guile currently uses BDW-GC. Guile has a widely used C API
|
||||||
object reference we generally have to be able to know what kind of
|
and implements part of its run-time in C. For this reason it may be
|
||||||
object it is, because there are few global invariants enforced by
|
infeasible to require precise enumeration of GC roots -- we may need to
|
||||||
typing. Therefore it is reasonable to consider allowing the GC and the
|
allow GC roots to be conservatively identified from data sections and
|
||||||
application to share the first word of an object, for example to maybe
|
from stacks. Such conservative roots would be pinned, but other objects
|
||||||
store a mark bit (though an on-the-side mark byte seems to allow much
|
can be moved by the collector if it chooses to do so. We assume that
|
||||||
more efficient sweeping, for mark-sweep), to allow the application to
|
object references within a heap object can be precisely identified.
|
||||||
know what kind an object is, to allow the GC to find references within
|
(However, Guile currently uses BDW-GC in its default configuration,
|
||||||
the object, to allow the GC to compute the object's size, and so on.
|
which scans for references conservatively even on the heap.)
|
||||||
|
|
||||||
There's just the (modified) GCBench, which is an old but standard
|
The existing C API allows direct access to mutable object fields,
|
||||||
benchmark that allocates different sizes of binary trees. As parameters
|
without the mediation of read or write barriers. Therefore it may be
|
||||||
it takes a heap multiplier and a number of mutator threads. We
|
impossible to switch to collector strategies that need barriers, such as
|
||||||
analytically compute the peak amount of live data and then size the GC
|
generational or concurrent collectors. However, we shouldn't write off
|
||||||
heap as a multiplier of that size. It has a peak heap consumption of 10
|
this possibility entirely; an ideal replacement for Guile's GC will
|
||||||
MB or so per mutator thread: not very large. At a 2x heap multiplier,
|
offer the possibility of migration to other GC designs without imposing
|
||||||
it causes about 30 collections for the whippet collector, and runs
|
new requirements on C API users in the initial phase.
|
||||||
somewhere around 200-400 milliseconds in single-threaded mode, on the
|
|
||||||
machines I have in 2022.
|
|
||||||
|
|
||||||
The GCBench benchmark is small but then again many Guile processes also
|
In this regard, the Whippet experiment also has the goal of identifying
|
||||||
are quite short-lived, so perhaps it is useful to ensure that small
|
a smallish GC abstraction in Guile, so that we might consider evolving
|
||||||
heaps remain lightweight.
|
GC implementation in the future without too much pain. If we switch
|
||||||
|
away from BDW-GC, we should be able to evaluate that it's a win for a
|
||||||
Guile has a widely used C API and implements part of its run-time in C.
|
large majority of use cases.
|
||||||
For this reason it may be infeasible to require precise enumeration of
|
|
||||||
GC roots -- we may need to allow GC roots to be conservatively
|
|
||||||
identified from data sections and from stacks. Such conservative roots
|
|
||||||
would be pinned, but other objects can be moved by the collector if it
|
|
||||||
chooses to do so. We assume that object references within a heap object
|
|
||||||
can be precisely identified. (The current BDW-GC scans for references
|
|
||||||
conservatively even on the heap.)
|
|
||||||
|
|
||||||
A generationa
|
|
||||||
A likely good solution for Guile would be an [Immix
|
|
||||||
collector](https://www.cs.utexas.edu/users/speedway/DaCapo/papers/immix-pldi-2008.pdf)
|
|
||||||
with conservative roots, and a parallel stop-the-world mark/evacuate
|
|
||||||
phase. We would probably follow the [Rust
|
|
||||||
implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf),
|
|
||||||
more or less, with support for per-line pinning. In an ideal world we
|
|
||||||
would work out some kind of generational solution as well, either via a
|
|
||||||
semispace nursery or via sticky mark bits, but this requires Guile to
|
|
||||||
use a write barrier -- something that's possible to do within Guile
|
|
||||||
itself but it's unclear if we can extend this obligation to users of
|
|
||||||
Guile's C API.
|
|
||||||
|
|
||||||
In any case, these experiments also have the goal of identifying a
|
|
||||||
smallish GC abstraction in Guile, so that we might consider evolving GC
|
|
||||||
implementation in the future without too much pain. If we switch away
|
|
||||||
from BDW-GC, we should be able to evaluate that it's a win for a large
|
|
||||||
majority of use cases.
|
|
||||||
|
|
||||||
## To do
|
## To do
|
||||||
|
|
||||||
- [X] Implement a parallel marker for the mark-sweep collector.
|
### Missing features before Guile can use Whippet
|
||||||
- [X] Adapt all GC implementations to allow multiple mutator threads.
|
|
||||||
Update gcbench.c.
|
- [ ] Pinning
|
||||||
- [ ] Implement precise non-moving Immix whole-heap collector.
|
- [ ] Conservative stacks
|
||||||
- [ ] Add evacuation to Immix whole-heap collector.
|
- [ ] Conservative data segments
|
||||||
- [ ] Add parallelism to Immix stop-the-world phase.
|
- [ ] Heap growth/shrinking
|
||||||
- [ ] Implement conservative root-finding for the mark-sweep collector.
|
- [ ] Debugging/tracing
|
||||||
- [ ] Implement conservative root-finding and pinning for Immix.
|
- [ ] Finalizers
|
||||||
- [ ] Implement generational GC with semispace nursery and mark-sweep
|
- [ ] Weak references / weak maps
|
||||||
old generation.
|
|
||||||
- [ ] Implement generational GC with semispace nursery and Immix
|
### Features that would improve Whippet performance
|
||||||
old generation.
|
|
||||||
|
- [ ] Immix-style opportunistic evacuation
|
||||||
|
- [ ] Overflow allocation
|
||||||
|
- [ ] Lazy identification of empty blocks
|
||||||
|
- [ ] Generational GC via sticky mark bits
|
||||||
|
- [ ] Generational GC with semi-space nursery
|
||||||
|
- [ ] Concurrent marking with SATB barrier
|
||||||
|
|
||||||
## About the name
|
## About the name
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue