mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-05-02 04:40:29 +02:00
Rework docs
This commit is contained in:
parent
3d4d4d047c
commit
ab5071f97a
8 changed files with 336 additions and 75 deletions
12
README.md
12
README.md
|
@ -8,19 +8,9 @@ Whippet is an embed-only C library, designed to be copied into a
|
||||||
program's source tree. It exposes an abstract C API for managed memory
|
program's source tree. It exposes an abstract C API for managed memory
|
||||||
allocation, and provides a number of implementations of that API.
|
allocation, and provides a number of implementations of that API.
|
||||||
|
|
||||||
One of the implementations is also called "whippet", and is the
|
|
||||||
motivation for creating this library. For a detailed introduction, see
|
|
||||||
[Whippet: Towards a new local
|
|
||||||
maximum](https://wingolog.org/archives/2023/02/07/whippet-towards-a-new-local-maximum),
|
|
||||||
a talk given at FOSDEM 2023.
|
|
||||||
|
|
||||||
## Documentation
|
## Documentation
|
||||||
|
|
||||||
* [Design](./doc/design.md): What's the general idea?
|
See the [documentation](./doc/README.md).
|
||||||
* [Manual](./doc/manual.md): How do you get your program to use
|
|
||||||
Whippet? What is the API?
|
|
||||||
* [Guile](./doc/guile.md): Some notes on a potential rebase of Guile on
|
|
||||||
top of Whippet.
|
|
||||||
|
|
||||||
## Source repository structure
|
## Source repository structure
|
||||||
|
|
||||||
|
|
24
doc/README.md
Normal file
24
doc/README.md
Normal file
|
@ -0,0 +1,24 @@
|
||||||
|
# Whippet documentation
|
||||||
|
|
||||||
|
* [Manual](./manual.md): How do you get your program to use
|
||||||
|
Whippet? What is the API?
|
||||||
|
|
||||||
|
* [Collector implementations](./collectors.md): There are a number of
|
||||||
|
implementations of the Whippet API with differing performance
|
||||||
|
characteristics and which impose different requirements on the
|
||||||
|
embedder.
|
||||||
|
- [Semi-space collector (semi)](./collector-semi.md): For
|
||||||
|
single-threaded embedders who are not too tight on memory.
|
||||||
|
- [Parallel copying collector (pcc)](./collector-pcc.md): Like semi,
|
||||||
|
but with support for multiple mutator threads. Faster than semi if
|
||||||
|
multiple cores are available at collection-time.
|
||||||
|
- [Whippet collector (whippet)](./collector-whippet.md):
|
||||||
|
Immix-inspired collector. Optionally parallel, conservative (stack
|
||||||
|
and/or heap), and/or generational.
|
||||||
|
- [Boehm-Demers-Weiser collector (bdw)](./collector-bdw.md):
|
||||||
|
Conservative mark-sweep collector, implemented by
|
||||||
|
Boehm-Demers-Weiser library.
|
||||||
|
|
||||||
|
* [Guile](./doc/guile.md): Some notes on a potential rebase of Guile on
|
||||||
|
top of Whippet.
|
||||||
|
|
20
doc/collector-bdw.md
Normal file
20
doc/collector-bdw.md
Normal file
|
@ -0,0 +1,20 @@
|
||||||
|
# Boehm-Demers-Weiser collector
|
||||||
|
|
||||||
|
Whippet's `bdw` collector is backed by a third-party garbage collector,
|
||||||
|
the [Boehm-Demers-Weiser collector](https://github.com/ivmai/bdwgc).
|
||||||
|
|
||||||
|
BDW-GC is a mark-sweep collector with conservative root-finding,
|
||||||
|
conservative heap tracing, and parallel tracing.
|
||||||
|
|
||||||
|
Whereas the other Whippet collectors which rely on mutators to
|
||||||
|
[periodically check if they need to
|
||||||
|
stop](https://github.com/wingo/whippet/blob/main/doc/manual.md#safepoints),
|
||||||
|
`bdw` will stop mutators with a POSIX signal. Also, it doesn't really
|
||||||
|
support ephemerons (the Whippet `bdw` collector simulates them using
|
||||||
|
finalizers), and both ephemerons and finalizers only approximate the
|
||||||
|
Whippet behavior, because they are implemented in terms of what BDW-GC
|
||||||
|
provides.
|
||||||
|
|
||||||
|
It's a bit of an oddball from a Whippet perspective, but useful as a
|
||||||
|
migration path if you have an embedder that is already using BDW-GC.
|
||||||
|
And, it is a useful performance comparison.
|
69
doc/collector-pcc.md
Normal file
69
doc/collector-pcc.md
Normal file
|
@ -0,0 +1,69 @@
|
||||||
|
# Parallel copying collector
|
||||||
|
|
||||||
|
Whippet's `pcc` collector is a copying collector, like the more simple
|
||||||
|
[`semi`](./collector-semi.md), but supporting multiple mutator threads
|
||||||
|
and multiple tracing threads.
|
||||||
|
|
||||||
|
Like `semi`, `pcc` traces by evacuation: it moves all live objects on
|
||||||
|
every collection. (Exception: objects larger than 8192 bytes are
|
||||||
|
placed into a partitioned space traces by marking in place instead of
|
||||||
|
copying.) Evacuation requires precise roots, so if your embedder does
|
||||||
|
not support precise roots, `pcc` is not for you.
|
||||||
|
|
||||||
|
Again like `semi`, `pcc` generally requires a heap size at least twice
|
||||||
|
as large as the maximum live heap size, and performs best with ample
|
||||||
|
heap sizes; between 3× and 5× is best.
|
||||||
|
|
||||||
|
## Implementation notes
|
||||||
|
|
||||||
|
Unlike `semi` which has a single global bump-pointer allocation region,
|
||||||
|
`pcc` structures the heap into 64-kB blocks. In this way it supports
|
||||||
|
multiple mutator threads: mutators do local bump-pointer allocation into
|
||||||
|
their own block, and when their block is full, they fetch another from
|
||||||
|
the global store.
|
||||||
|
|
||||||
|
The block size is 64 kB, but really it's 128 kB, because each block has
|
||||||
|
two halves: the active region and the copy reserve. Dividing each block
|
||||||
|
in two allows the collector to easily grow and shrink the heap while
|
||||||
|
ensuring there is always enough reserve space.
|
||||||
|
|
||||||
|
Blocks are allocated in 64-MB aligned slabs, so there are 512 blocks in
|
||||||
|
a slab. The first block in a slab is used by the collector itself, to
|
||||||
|
keep metadata for the rest of the blocks, for example a chain pointer
|
||||||
|
allowing blocks to be collected in lists, a saved allocation pointer for
|
||||||
|
partially-filled blocks, whether the block is paged in or out, and so
|
||||||
|
on.
|
||||||
|
|
||||||
|
`pcc` supports tracing in parallel. This mechanism works somewhat like
|
||||||
|
allocation, in which multiple trace workers compete to evacuate objects
|
||||||
|
into their local allocation buffers; when an allocation buffer is full,
|
||||||
|
the trace worker grabs another, just like mutators do.
|
||||||
|
|
||||||
|
However, unlike the simple semi-space collector which uses a Cheney grey
|
||||||
|
worklist, `pcc` uses the [fine-grained work-stealing parallel
|
||||||
|
tracer](../src/parallel-tracer.h) originally developed for [Whippet's
|
||||||
|
Immix-like collector](./collector-whippet.md). Each trace worker
|
||||||
|
maintains a [local queue of objects that need
|
||||||
|
tracing](../src/local-worklist.h), which currently has 1024 entries. If
|
||||||
|
the local queue becomes full, the worker will publish 3/4 of those
|
||||||
|
entries to the worker's [shared worklist](../src/shared-worklist.h).
|
||||||
|
When a worker runs out of local work, it will first try to remove work
|
||||||
|
from its own shared worklist, then will try to steal from other workers.
|
||||||
|
|
||||||
|
Because threads compete to evacuate objects, `pcc` uses [atomic
|
||||||
|
compare-and-swap instead of simple forwarding pointer
|
||||||
|
updates](./manual.md#forwarding-objects), which imposes around a ~30%
|
||||||
|
performance penalty. `pcc` generally starts to outperform `semi` when
|
||||||
|
it can trace with 2 threads, and gets better with each additional
|
||||||
|
thread.
|
||||||
|
|
||||||
|
I sometimes wonder whether the worklist should contain grey edges or
|
||||||
|
grey objects. [MMTk](https://www.mmtk.io/) seems to do the former, and bundles edges into work
|
||||||
|
packets, which are the unit of work sharing. I don't know yet what is
|
||||||
|
best and look forward to experimenting once I have better benchmarks.
|
||||||
|
|
||||||
|
The memory used for the external worklist is dynamically allocated from
|
||||||
|
the OS and is not currently counted as contributing to the heap size.
|
||||||
|
If you are targetting a microcontroller or something, probably you need
|
||||||
|
to choose a different kind of collector that never dynamically
|
||||||
|
allocates, such as `semi`.
|
23
doc/collector-semi.md
Normal file
23
doc/collector-semi.md
Normal file
|
@ -0,0 +1,23 @@
|
||||||
|
# Semi-space collector
|
||||||
|
|
||||||
|
The `semi` collector is simple. It is mostly useful as a first
|
||||||
|
collector to try out, to make sure that a mutator correctly records all
|
||||||
|
roots: because `semi` moves every live object on every collection, it is
|
||||||
|
very effective at shaking out mutator bugs.
|
||||||
|
|
||||||
|
If your embedder chooses to not precisely record roots, for example
|
||||||
|
instead choosing to conservatively scan the stack, then the semi-space
|
||||||
|
collector is not for you: `semi` requires precise roots.
|
||||||
|
|
||||||
|
For more on semi-space collectors, see
|
||||||
|
https://wingolog.org/archives/2022/12/10/a-simple-semi-space-collector.
|
||||||
|
|
||||||
|
Whippet's `semi` collector incorporates a large-object space, which
|
||||||
|
marks objects in place instead of moving. Otherwise, `semi` generally
|
||||||
|
requires a heap size at least twice as large as the maximum live heap
|
||||||
|
size, and performs best with ample heap sizes; between 3× and 5× is
|
||||||
|
best.
|
||||||
|
|
||||||
|
The semi-space collector doesn't support multiple mutator threads. If
|
||||||
|
you want a whole-heap copying collector for a multi-threaded mutator,
|
||||||
|
look at [pcc](./collector-pcc.md).
|
158
doc/collector-whippet.md
Normal file
158
doc/collector-whippet.md
Normal file
|
@ -0,0 +1,158 @@
|
||||||
|
# Whippet collector
|
||||||
|
|
||||||
|
One collector implementation in the Whippet garbage collection library
|
||||||
|
is also called Whippet. Naming-wise this is a somewhat confusing
|
||||||
|
situation; perhaps it will change.
|
||||||
|
|
||||||
|
Anyway, the `whippet` collector is mainly a mark-region collector,
|
||||||
|
inspired by
|
||||||
|
[Immix](http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf).
|
||||||
|
To a first approximation, Whippet is a whole-heap Immix collector with a
|
||||||
|
large object space on the side.
|
||||||
|
|
||||||
|
When tracing, `whippet` mostly marks objects in place. If the heap is
|
||||||
|
too fragmented, it can compact the heap by choosing to evacuate
|
||||||
|
sparsely-populated heap blocks instead of marking in place. However
|
||||||
|
evacuation is strictly optional, which means that `whippet` is also
|
||||||
|
compatible with conservative root-finding, making it a good replacement
|
||||||
|
for embedders that currently use the [Boehm-Demers-Weiser
|
||||||
|
collector](./collector-bdw.md).
|
||||||
|
|
||||||
|
## Differences from Immix
|
||||||
|
|
||||||
|
The original Immix divides the heap into 32kB blocks, and then divides
|
||||||
|
those blocks into 128B lines. An Immix allocation can span lines but
|
||||||
|
not blocks; allocations larger than 8kB go into a separate large object
|
||||||
|
space. Mutators request blocks from the global store and allocate into
|
||||||
|
those blocks using bump-pointer allocation. When all blocks are
|
||||||
|
consumed, Immix stops the world and traces the object graph, marking
|
||||||
|
objects but also the lines that objects are on. After marking, blocks
|
||||||
|
contain some lines with live objects and others that are completely
|
||||||
|
free. Spans of free lines are called holes. When a mutator gets a
|
||||||
|
recycled block from the global block store, it allocates into those
|
||||||
|
holes. For an exposition of Immix, see the lovely detailed [Rust
|
||||||
|
implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf).
|
||||||
|
|
||||||
|
The essential difference of `whippet` from Immix stems from a simple
|
||||||
|
observation: Immix needs a side table of line mark bytes and also a mark
|
||||||
|
bit or bits in each object (or in a side table). But if instead you
|
||||||
|
choose to store mark bytes instead of bits (for concurrency reasons) in
|
||||||
|
a side table, with one mark byte per granule (unit of allocation,
|
||||||
|
perhaps 16 bytes), then you effectively have a line mark table where the
|
||||||
|
granule size is the line size. You can bump-pointer allocate into holes
|
||||||
|
in the mark byte table.
|
||||||
|
|
||||||
|
You might think this is a bad tradeoff, and perhaps it is: I don't know
|
||||||
|
yet. If your granule size is two pointers, then one mark byte per
|
||||||
|
granule is 6.25% overhead on 64-bit, or 12.5% on 32-bit. Especially on
|
||||||
|
32-bit, it's a lot! On the other hand, instead of the worst case of one
|
||||||
|
survivor object wasting a line (or two, in the case of conservative line
|
||||||
|
marking), granule-size-is-line-size instead wastes nothing. Also, you
|
||||||
|
don't need GC bits in the object itself, and you can use the mark byte
|
||||||
|
array to record the object end, so that finding holes in a block can
|
||||||
|
just read the mark table and can avoid looking at object memory.
|
||||||
|
|
||||||
|
## Optional features
|
||||||
|
|
||||||
|
The `whippet` collector has a few feature flags that can be turned on or
|
||||||
|
off. If you use the [standard embedder makefile include](../embed.mk),
|
||||||
|
then there is a name for each combination of features: `whippet` has no
|
||||||
|
additional features, `parallel-whippet` enables parallel marking,
|
||||||
|
`parallel-generational-whippet` enables parallelism,
|
||||||
|
`stack-conservative-parallel-generational-whippet` uses conservative
|
||||||
|
root-finding, and `heap-conservative-parallel-generational-whippet`
|
||||||
|
additionally traces the heap conservatively. You can leave off
|
||||||
|
components of the name to get a collector without those features.
|
||||||
|
Underneath this corresponds to some pre-processor definitions passed to
|
||||||
|
the compiler on the command line.
|
||||||
|
|
||||||
|
### Generations
|
||||||
|
|
||||||
|
Whippet supports generational tracing via the [sticky mark-bit
|
||||||
|
algorithm](https://wingolog.org/archives/2022/10/22/the-sticky-mark-bit-algorithm).
|
||||||
|
This requires that the embedder emit [write
|
||||||
|
barriers](https://github.com/wingo/whippet/blob/main/doc/manual.md#write-barriers);
|
||||||
|
if your embedder cannot ensure write barriers are always invoked, then
|
||||||
|
generational collection is not for you. (We could perhaps relax this a
|
||||||
|
bit, following what [Ruby developers
|
||||||
|
did](http://rvm.jp/~ko1/activities/rgengc_ismm.pdf).)
|
||||||
|
|
||||||
|
The write barrier is currently a card-marking barrier emitted on stores,
|
||||||
|
with one card byte per 256 object bytes, where the card location can be
|
||||||
|
computed from the object address because blocks are allocated in
|
||||||
|
two-megabyte aligned slabs.
|
||||||
|
|
||||||
|
### Parallel tracing
|
||||||
|
|
||||||
|
You almost certainly want this on! `parallel-whippet` uses a the
|
||||||
|
[fine-grained work-stealing parallel tracer](../src/parallel-tracer.h).
|
||||||
|
Each trace worker maintains a [local queue of objects that need
|
||||||
|
tracing](../src/local-worklist.h), which currently has a capacity of
|
||||||
|
1024 entries. If the local queue becomes full, the worker will publish
|
||||||
|
3/4 of those entries to the worker's [shared
|
||||||
|
worklist](../src/shared-worklist.h). When a worker runs out of local
|
||||||
|
work, it will first try to remove work from its own shared worklist,
|
||||||
|
then will try to steal from other workers.
|
||||||
|
|
||||||
|
The memory used for the external worklist is dynamically allocated from
|
||||||
|
the OS and is not currently counted as contributing to the heap size.
|
||||||
|
If you absolutely need to avoid dynamic allocation during GC, `whippet`
|
||||||
|
(even serial whippet) would need some work for your use case, to
|
||||||
|
allocate a fixed-size space for a marking queue and to gracefully handle
|
||||||
|
mark queue overflow.
|
||||||
|
|
||||||
|
### Conservative stack scanning
|
||||||
|
|
||||||
|
With `semi` and `pcc`, embedders must precisely enumerate the set of
|
||||||
|
*roots*: the edges into the heap from outside. Commonly, roots include
|
||||||
|
global variables, as well as working variables from each mutator's
|
||||||
|
stack. Whippet can optionally mark mutator stacks *conservatively*:
|
||||||
|
treating each word on the stack as if it may be an object reference, and
|
||||||
|
marking any object at that address.
|
||||||
|
|
||||||
|
After all these years, *whether* to mark stacks conservatively or not is
|
||||||
|
still an open research question. Conservative stack scanning retain too
|
||||||
|
much data if an integer is confused for an object reference and removes
|
||||||
|
a layer of correctness-by-construction from a system. Sometimes it is
|
||||||
|
required, for example if your embedder cannot enumerate roots precisely.
|
||||||
|
But there are reasons to consider it even if you can do precise roots:
|
||||||
|
it removes the need for the compiler to produce a stack map to store the
|
||||||
|
precise root enumeration at every safepoint; it removes the need to look
|
||||||
|
up a stack map when tracing; and it allows C or C++ support code to
|
||||||
|
avoid having to place roots in traceable locations published to the
|
||||||
|
garbage collector. And the [performance question is still
|
||||||
|
open](https://dl.acm.org/doi/10.1145/2660193.2660198).
|
||||||
|
|
||||||
|
Anyway. Whippet can scan roots conservatively. Those roots are pinned
|
||||||
|
for the collection; even if the collection will compact via evacuation,
|
||||||
|
referents of conservative roots won't be moved. Objects not directly
|
||||||
|
referenced by roots can be evacuated, however.
|
||||||
|
|
||||||
|
### Conservative heap scanning
|
||||||
|
|
||||||
|
In addition to stack and global references, the Boehm-Demers-Weiser
|
||||||
|
collector scans heap objects conservatively as well, treating each word
|
||||||
|
of each heap object as if it were a reference. Whippet can do that, if
|
||||||
|
the embedder is unable to provide a `gc_trace_object` implementation.
|
||||||
|
However this is generally a performance lose, and it prevents
|
||||||
|
evacuation.
|
||||||
|
|
||||||
|
## Other implementation tidbits
|
||||||
|
|
||||||
|
`whippet` does lazy sweeping: as a mutator grabs a fresh block, it
|
||||||
|
reclaims memory that was unmarked in the previous collection before
|
||||||
|
making the memory available for allocation. This makes sweeping
|
||||||
|
naturally cache-friendly and parallel.
|
||||||
|
|
||||||
|
The mark byte array facilitates conservative collection by being an
|
||||||
|
oracle for "does this address start an object".
|
||||||
|
|
||||||
|
There is some support for concurrent marking by having three mark bit
|
||||||
|
states (dead, survivor, marked) that rotate at each collection; some
|
||||||
|
collector configurations can have mutators mark before waiting for other
|
||||||
|
mutators to stop. True concurrent marking and associated barriers
|
||||||
|
are not yet implemented.
|
||||||
|
|
||||||
|
For a detailed introduction, see [Whippet: Towards a new local
|
||||||
|
maximum](https://wingolog.org/archives/2023/02/07/whippet-towards-a-new-local-maximum),
|
||||||
|
a talk given at FOSDEM 2023.
|
41
doc/collectors.md
Normal file
41
doc/collectors.md
Normal file
|
@ -0,0 +1,41 @@
|
||||||
|
# Whippet collectors
|
||||||
|
|
||||||
|
Whippet has four collectors currently:
|
||||||
|
- [Semi-space collector (semi)](./collector-semi.md): For
|
||||||
|
single-threaded embedders who are not too tight on memory.
|
||||||
|
- [Parallel copying collector (pcc)](./collector-pcc.md): Like semi,
|
||||||
|
but with support for multiple mutator threads. Faster than semi if
|
||||||
|
multiple cores are available at collection-time.
|
||||||
|
- [Whippet collector (whippet)](./collector-whippet.md):
|
||||||
|
Immix-inspired collector. Optionally parallel, conservative (stack
|
||||||
|
and/or heap), and/or generational.
|
||||||
|
- [Boehm-Demers-Weiser collector (bdw)](./collector-bdw.md):
|
||||||
|
Conservative mark-sweep collector, implemented by
|
||||||
|
Boehm-Demers-Weiser library.
|
||||||
|
|
||||||
|
## How to choose?
|
||||||
|
|
||||||
|
If you are migrating an embedder off BDW-GC, then it could be reasonable
|
||||||
|
to first go to `bdw`, then `stack-conservative-parallel-whippet`.
|
||||||
|
|
||||||
|
If you have an embedder with precise roots, use `semi` if
|
||||||
|
single-threaded, or `pcc` if multi-threaded. That will shake out
|
||||||
|
mutator/embedder bugs. Then if memory is tight, switch to
|
||||||
|
`parallel-whippet`, possibly `parallel-generational-whippet`.
|
||||||
|
|
||||||
|
If you are writing a new project, you have a choice as to whether to pay
|
||||||
|
the development cost of precise roots or not. If choose to not have
|
||||||
|
precise roots, then go for `stack-conservative-parallel-whippet`
|
||||||
|
directly.
|
||||||
|
|
||||||
|
## More collectors
|
||||||
|
|
||||||
|
It would be nice to have a classic generational GC, perhaps using
|
||||||
|
parallel-whippet for the old generation but a pcc-style copying nursery.
|
||||||
|
|
||||||
|
Support for concurrent marking in `whippet` would be good as well,
|
||||||
|
perhaps with a SATB barrier. (Or, if you are the sort of person to bet
|
||||||
|
on conservative stack scanning, perhaps a retreating-wavefront barrier
|
||||||
|
would be more appropriate.)
|
||||||
|
|
||||||
|
Contributions are welcome, provided they have no more dependencies!
|
|
@ -1,64 +0,0 @@
|
||||||
# Design
|
|
||||||
|
|
||||||
Whippet is mainly a mark-region collector, like
|
|
||||||
[Immix](http://users.cecs.anu.edu.au/~steveb/pubs/papers/immix-pldi-2008.pdf).
|
|
||||||
See also the lovely detailed [Rust
|
|
||||||
implementation](http://users.cecs.anu.edu.au/~steveb/pubs/papers/rust-ismm-2016.pdf).
|
|
||||||
|
|
||||||
To a first approximation, Whippet is a whole-heap Immix collector with a
|
|
||||||
large object space on the side. See the Immix paper for full details,
|
|
||||||
but basically Immix divides the heap into 32kB blocks, and then divides
|
|
||||||
those blocks into 128B lines. An Immix allocation never spans blocks;
|
|
||||||
allocations larger than 8kB go into a separate large object space.
|
|
||||||
Mutators request blocks from the global store and allocate into those
|
|
||||||
blocks using bump-pointer allocation. When all blocks are consumed,
|
|
||||||
Immix stops the world and traces the object graph, marking objects but
|
|
||||||
also the lines that objects are on. After marking, blocks contain some
|
|
||||||
lines with live objects and others that are completely free. Spans of
|
|
||||||
free lines are called holes. When a mutator gets a recycled block from
|
|
||||||
the global block store, it allocates into those holes. Also, sometimes
|
|
||||||
Immix can choose to evacuate rather than mark. Bump-pointer-into-holes
|
|
||||||
allocation is quite compatible with conservative roots, so it's an
|
|
||||||
interesting option for Guile, which has a lot of legacy C API users.
|
|
||||||
|
|
||||||
The essential difference of Whippet from Immix stems from a simple
|
|
||||||
observation: Immix needs a side table of line mark bytes and also a mark
|
|
||||||
bit or bits in each object (or in a side table). But if instead you
|
|
||||||
choose to store mark bytes instead of bits (for concurrency reasons) in
|
|
||||||
a side table, with one mark byte per granule (unit of allocation,
|
|
||||||
perhaps 16 bytes), then you effectively have a line mark table where the
|
|
||||||
granule size is the line size. You can bump-pointer allocate into holes
|
|
||||||
in the mark byte table.
|
|
||||||
|
|
||||||
You might think this is a bad tradeoff, and perhaps it is: I don't know
|
|
||||||
yet. If your granule size is two pointers, then one mark byte per
|
|
||||||
granule is 6.25% overhead on 64-bit, or 12.5% on 32-bit. Especially on
|
|
||||||
32-bit, it's a lot! On the other hand, instead of the worst case of one
|
|
||||||
survivor object wasting a line (or two, in the case of conservative line
|
|
||||||
marking), granule-size-is-line-size instead wastes nothing. Also, you
|
|
||||||
don't need GC bits in the object itself, and you can use the mark byte
|
|
||||||
array to record the object end, so that finding holes in a block can
|
|
||||||
just read the mark table and can avoid looking at object memory.
|
|
||||||
|
|
||||||
Other ideas in Whippet:
|
|
||||||
|
|
||||||
* Minimize stop-the-world phase via parallel marking and punting all
|
|
||||||
sweeping to mutators
|
|
||||||
|
|
||||||
* Enable mutator parallelism via lock-free block acquisition and lazy
|
|
||||||
statistics collation
|
|
||||||
|
|
||||||
* Allocate block space using aligned 4 MB slabs, with embedded metadata
|
|
||||||
to allow metadata bytes, slab headers, and block metadata to be
|
|
||||||
located via address arithmetic
|
|
||||||
|
|
||||||
* Facilitate conservative collection via mark byte array, oracle for
|
|
||||||
"does this address start an object"
|
|
||||||
|
|
||||||
* Enable in-place generational collection via card table with one entry
|
|
||||||
per 256B or so
|
|
||||||
|
|
||||||
* Enable concurrent marking by having three mark bit states (dead,
|
|
||||||
survivor, marked) that rotate at each collection, and sweeping a
|
|
||||||
block clears metadata for dead objects; but concurrent marking and
|
|
||||||
associated SATB barrier not yet implemented
|
|
Loading…
Add table
Add a link
Reference in a new issue