Store allocations in a splay tree so that we can efficiently map from an
edge originating in the lospace to its object. Defer returning memory
to the OS to a periodic background thread, using a similar strategy as
for nofl and copy-space pages. Use a size-segregated freelist instead
of requiring a full best-fit search for those pages that haven't yet
been returned to the OS.
Add a new kind of write barrier, one which has a bit per field; the
mutator that sets the bit will need to add the field's location (the
edge) to a remembered set. Here we just have the fast-path
implementation.
We use Treiber stacks to represent sets of blocks: blocks to sweep, full
blocks, and so on. This is fine as long as we are only adding to or
only removing from those sets, but as soon as we have concurrent add and
remove, we need to avoid the ABA problem.
Concurrent add and remove occurs for partly-full blocks, which are both
acquired and released by mutators; empty blocks, which can be added to
by heap growth at the same time as the mutator acquires them; and the
paged-out queue, which is also concurrent with heap growth/shrinkage.
In GC, request mutators to stop before doing anything else; changes the
order of the event listener interface. Also, refactor mmc to look more
like pcc.