ZigTea GC Allocator
What’s cool we can make in a language with manual memory management? Of course a GC, never let ’em know your next move! As a big fan of Go, I really like the new GC called “Green Tea”. Inspired by ideas from that GC and some C allocators I’ve seen before, I made a simple allocator with a GC for Zig. It’s no code blocks in the post, link for the complete project is at the end.
Before we begin the technical explanation, here’s a few things you have to know:
- DON’T USE THIS IN PRODUCTION!
- It’s an educational project.
- It contains many known issues.
The Green Tea background
Since the project was inspired by Go’s Green Tea, it uses very similar principles. Instead of single malloc-style chunks, it deals with 8 KiB spans sliced into fixed-size slots. The slot size is picked from a predetermined range: 16, 32, 64, 128, 256, 512, 1024 bytes. With this segregated-free-list design, same-size objects sit together and a slot’s size never needs to be stored per-object, so the freelist mechanism stays cheap.
Every span has 3 fixed-size bitmaps, sized for the minimum class regardless of which class the span actually uses: max_slots = span_size / 16 = 512 slots, which gives bit_words = 512 / 64 = 8 u64 words per bitmap. That’s fine for the 16-byte class, where all 512 slots are real, but it’s wasteful for the bigger classes. Take the 1024-byte class: it only has 8192 / 1024 = 8 slots, so a single u64 word would cover all of them, but the struct still carries the full 8 words. That’s 7 wasted words per bitmap, times 3 bitmaps, which is 168 bytes burned on keeping that’s never used. Not too much but if you have OCD works the same way as mine, you have to know it… :D
The three bitmaps aren’t redundant copies of each other, they each track something different:
alloc_bits: this slot currently holds a live allocation.seen_bits: this object was reached during mark.scanned_bits: this object’s outgoing pointers were already traced.
The set difference between seen and scanned gives you exactly the objects that are known-reachable but was not traced yet. Once scanned catches up to seen for every object, the mark is done.
The span-centric work queue
Here’s the part that’s actually closer to Green Tea’s real idea than just “segregated size classes”, and it’s kind of the heart of the design: the mark phase is work queue holds span indices, not individual object pointers.
When an address looks like a pointer into some allocated slot, the GC sets that slot’s seen bit. If that’s is a first live object found in this span during this one mark phase, the whole span gets pushed onto the queue exactly once, and that slot is remembered as the span’s “representative”. If another one, different slot in the same span later will be marked too, the span is flagged as no having a unique representative anymore.
Why this way? Because when the mark phase later drains the queue and checked this span, it checks: is there only one active object in this span at this moment, and is the representative still unique? If yes, it scans only that one slot instead of go thru the entire span’s bitmap looking for which slots are set. For a span with hundreds of slots where only one object is alive, that’s a real save, you skip a linear scan over the whole span just to rediscover something you already knew at mark time.
If a second live object arrives, the optimization just steps back and falls through to a normal full bitmap check over all the slots, so correctness never depends on the shortcut holding. It’s a cool speedup for the common “mostly likely empty span” case, not a separate path that can go out of sync.
The allocation path
So what actually happens when you call allocator.alloc? First, pickClassSize looks at the requested length and alignment and rounds up to the smallest class that fits both. If the request is bigger than 1024 bytes (the largest class), it just returns null, which Zig’s allocator interface treats the same as out-of-memory, what’s ok for our aducational allocator.
When the class is picked, findSpanForClass go thru every existing span looking for one that’s the right class and still has a free slot. If it finds one, nice, it hands back a slot from that span’s free list. If it doesn’t, it asks the backing allocator for a brand new 8 KiB span and put it into slots of that class. Long story short, once a slot is found, the byte count gets added to a running total (bytes_since_gc), and if that total has crossed the current threshold, a collection runs right there, directly before alloc even returns. So allocation and collection aren’t separate phases you trigger by hand, a normal append call on some container can transparently trigger a full mark and sweep if you’re unlucky with timing.
resize and remap (the calls behind things like growing an ArrayList in place) don’t do anything clever, they ony check whether the new requested length still fits inside the slot’s existing class size. If it does, you get to keep using the same memory, no copy needed. If it doesn’t, resize just says no and remap returns null, which makes the caller fall back to “alloc-new -> copy -> free-old”. There’s no actual growth across class boundaries, the slot size is fixed for the life of that allocation.
Mark, trace, sweep
Root scanning is conservative, same as the Zig limitations section explains below: the collector walks every machine word on the active segment, and any word whose bit pattern lands inside an allocated slot gets processed as a pointer and marked.
Tracing an object works the same way as scanning the stack, just over the object’s own bytes: every word inside a live object is checked for whether it looks like a pointer into another slot. Once an object’s bytes have been scanned this way, its scanned bit will be set so it’s never rescanned again, what is also what keeps cyclic object graphs from looping forever.
Sweep is the simple part: any slot whose seen bit didn’t get set this cycle is freed, and every span’s free list gets rebuilt from scratch afterward. That rebuild over every slot in every span on every collection, no matter how little actually changed, simple and correct, just not the cheapest possible.
After each collection it adds up how many bytes are still live, and sets the next collection trigger to twice that (with a 64 KiB floor, well it doesn’t collect constantly on a near-empty heap).
Finding which span an address belongs to
There’s a piece of plumbing that quietly underlies almost everything above: given a raw address, which span does it belong to, and which slot inside that span? spanIndexForAddress answers this by go thru every span the GC currently owns and checking if the address falls inside that span’s memory range. markAddress calls this on every word it thinks might be a pointer, free calls it on every freed pointer, resize and remap call it too. findSpanForClass, the thing the allocation path leans on to find a span with a free slot, does basically the same kind of full walk, just filtered by class size instead of by address range.
None of this is wrong, it’s just O(number of spans) per call, and it adds up: a mark phase that’s chasing thousands of pointer-looking words is doing thousands of linear walks over the span list, not constant-time lookups. For a toy with a handful of spans this is invisible. For anything with a real heap, it’s the first thing that’d need to become a sorted structure or a radix/hash lookup keyed by span base address.
Zig limitations
Since Zig does not expose stack maps to user code, we have to use conservative stack/heap scanning, the same way Boehm does it in C/C++. It means the GC will never be precise, and it’ll never be able to move or compact objects either, without changing the mechanics so it actually knows what is a pointer and what just looks like one.
Known issues
Short list, since I already warned you not to run this anywhere serious:
- Not thread-safe at all.
- Hard 1024-byte allocation ceiling.
- Linear span lookups everywhere.
- Freelist is rebuilt from scratch every sweep.
Where/how to try
The source sode with an example was published at https://github.com/sibexico/zigtea
If you wanna have some exorcise, feel free to fix ane known problem and open PR. :) Maybe this projext will leads some production-ready solution, who knows?