Firstly, we add a priority evacuation reserve to prioritize having a few
evacuation blocks on hand. Otherwise if we give them all to big
allocations first and we have a fragmented heap, we won't be able to
evacuate that fragmented heap to give more blocks to the large
allocations.
Secondly, we remove `enum gc_reason`. The issue is that with multiple
mutator threads, the precise thread triggering GC does not provide much
information. Instead we should make choices on how to collect based on
the state of the heap.
Finally, we move detection of out-of-memory inside the collector,
instead of the allocator.
Together, these changes let mt-gcbench (with fragmentation) operate in
smaller heaps.
Marking conservative roots in place effectively prohibits them from
being moved, and we need to trace the roots anyway to discover
conservative roots. No need therefore for a pin bit.
If the mutator finds completely empty blocks, it puts them on the side.
The large object space acquires empty blocks, sweeping if needed, and
causes them to be unmapped, possibly causing GC.
Use uint64 instead of uintptr when bulk-reading metadata bytes. Assume
that live objects come in plugs rather than each object being separated
by a hole. Always bulk-load metadata bytes when measuring holes, and be
less branchy. Lazily clear hole bytes as we allocate. Add a place to
record lost space due to fragmentation.
Read a word at a time from the mark byte array. If the mark word
doesn't correspond to live data there will be no contention and we can
clear it with one write.
Don't require that mark bytes be cleared; instead we have rotating
colors. Beginnings of support for concurrent marking, pinning,
conservative roots, and generational collection.
This lets mutators run in parallel. There is a bug currently however
with a race between stopping mutators marking their roots and other
mutators still sweeping. Will fix in a followup.
There are 4 MB aligned slabs, divided into 64 KB pages. (On 32-bit this
will be 2 MB ad 32 kB). Then you can get a mark byte per granule by
slab plus granule offset. The unused slack that would correspond to
mark bytes for the blocks used *by* the mark bytes is used for other
purposes: remembered sets (not yet used), block summaries (not used),
and a slab header (likewise).
Probably the collector should use 8 byte granules on 32-bit but for now
we're working on 64-bit sizes. Since we don't (and never did) pack
pages with same-sized small objects, no need to make sure that small
object sizes fit evenly into the medium object threshold; just keep
packed freelists. This is a simplification that lets us reclaim the
tail of a region in constant time rather than looping through the size
classes.