Download presentation
Presentation is loading. Please wait.
Published byAubrie Edwards Modified over 9 years ago
1
Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke
2
Outline Introduction Basics of Garbage Collection revisited How do you make a GC for non-GC languages? Oh, and making it parallel would be nice. Or at least mostly parallel. The Basic Idea Virtual dirty bits to find the reachable set Sweeping doesn’t matter. Try telling that to your mother. For performance, the sweep step is practically ignorable. Formalisms Let’s introduce some notation and concepts for how this should work.
3
Outline, Part Deux Implementation Choices Based on our formalisms, which is the best combination to actually use? Brief Results Really, the test hardware was a SPARC station configured with as little as 10MB of RAM! Mostly Parallel Copying Collectors This is a mark-sweep paper, mostly. Could you build a copy collector? Onwards!
4
GC Taxonomy and Our Choices Garbage Collectors may be Reference-Counting or Tracing based. The authors focus on tracing out from the root set. The basic style of many early collectors was “stop-the- world” collection. Generational and parallel collectors attempt to mitigate the potentially long delays while the world is stopped. Generational collectors just collect a small part of the heap. Parallel collectors might be generational, but they mainly try to collect the whole heap, but in parallel with the mutator(s).
5
So why “mostly” parallel? Think back to the VM migration papers. Migration started while the VM was running. Or in parallel. But at some point, the VM had to be stopped to complete the transfer. Hopefully, by that time, there was very little to transfer. Same idea here: as much collection is done as possible while the mutator is running. At some point, we need to stop the world to finish the collection. For mostly the same kind of reason: the mutator will do things after the collector has made decisions on certain pointers which render the pointer un/reachable. This is meaningful because we don’t want the collector running all the time.
6
Authors’ Two Stated Goals “Present a method for transforming a stop-the- world racing collector into a mostly parallel collector.” And to make the solution general to copying/non- copying or generational/non-generational collectors. Furthermore, no OS changes are needed. “Describe a particular implementation of a garbage collector that illustrates this idea.” What’s really cool is that it will provide GC to languages like C with relatively short pause times.
7
Basic Idea Every program has a root set. The root set forms the foundation for the immune set, or the set of objects that are reachable or live. Tracing the path of pointers from the root set finds live, reachable objects that are marked. Unmarked (and therefore unreachable) objects can be collected.
8
More Basic Idea Key idea: Whenever a virtual memory page is written to, set a virtual dirty bit for that page. At the beginning of a collection, clear all the dirty bits. Start tracing. The tracing finds all currently reachable objects while the mutator keeps doing its thing. Writes introduce dirty pages. When the original trace is done, stop the world and trace out marked objects on dirty pages. Now everything reachable is marked. But is it safe to say everything unreachable is not marked?
9
A Compromise No, the collector is neither purely parallel nor precise. The duration of the stop-the-world pause is directly dependent on the number of dirtied pages. In theory, things can be no worse than a whole-heap stop- the-world collection. The authors claim this doesn’t happen. Not all unreachable objects are collected, as they may have been marked before the mutator dismissed them. The collector is complete, in that eventually that memory will be reclaimed. (Just not right now!)
10
Sweeping Doesn’t Matter Phase 2 of a mark-sweep collector is to free the unused memory in whatever form that takes – called sweeping. Sweeping doesn’t need to occur during the world stoppage. Once we know what’s garbage, we can sweep interleaved with object allocation.
11
Sweeping Implemented Here The heap is split into blocks. Each block contains objects of a certain size. For small objects, the block size is the same as a physical page of memory. After marking, pages are queued for sweeping in one of multiple queues (one per object size). Each object size also has a free list. When it is empty, the allocator sweeps the front of the queue for that object size and restores that memory to the free list. Blocks for larger objects are swept in large increments immediately following a collection. This limits CPU time consumed by the collection. The net effect is that GC times are dominated by the marking.
12
Let’s Get Formal Definition: A partial collection only reclaims some subset of the unreachable objects. Let the set T contain all threatened objects (that is, objects that might be collected). Let the set I contain all immune objects (that is, objects that will not be collected). T and I are disjoint. All objects fall into either T or I. For a full collection, I contains only the roots. In a partial collection, there are additional objects. A collection is correct iff no reachable objects are collected.
13
Guaranteeing Correctness Reclaim only unmarked objects when the following condition is true: C : Every object in I is marked and every object pointed to by a marked object is also marked.
14
Stop-the-World Collection Formalizing stop-the-world collection: Step 1: Stop the world Step 2: Clear all mark bits Step 3: Perform the tracing operation TR. Step 4: Restart the world The operation TR : TR : Mark all objects in I and trace from them. At the end of this 4-step operation, condition C holds, and all unmarked objects can be collected.
15
Parallel Collection Formally, mostly parallel collection requires: Step 1: Clear all mark bits Step 2: Clear all virtual dirty bits Step 3: Perform the tracing op TR. Step 4: Stop the world Step 5: Perform a finishing operation, F Step 6: Restart the world. The Finishing Operation F : Trace from all marked objects on dirty pages.
16
Notes on that Collection TR is performed totally in parallel with the mutator, which is dirtying pages that will need to be traced. The closure condition C does not hold after step 4 (stop-the-world), which is what requires the finishing step F. We will define a weaker closure C’ : C’ : Every object in I is marked and every object pointed to by a marked object on a clean page is also marked. Applying F to any state satisfying C’ will produce C.
17
Considerations Thus we have a correct, mostly-parallel collection. But, if we have a busy mutator, we might have lots of dirty pages, which in turn implies long pauses during the world stoppage. To shorten this delay, we can clean the pages in parallel. Let P be a set of pages. Then the process M is: M : 1.) Atomically retrieve and clear the virtual dirty bits from P. 2.) Trace from the marked objects on the dirty pages of P.
18
Generational Partial Collection All of that formally describes a general partial collection. Now let’s consider a generational collector that uses the mark bits for object age. Consider a partial collector where I is chosen to be the set of currently marked objects. Therefore, C’ holds. We could be done by simply performing F, but to reduce the delay, we perform M to the entire heap just before the world stoppage.
19
Formal Parallel Generational Collection 1. Perform M on the heap. 2. Stop the world. 3. Perform F. 4. Restart the world. Because an object that has been marked will never be collected by the generational collector, we occasionally need to run a full collection.
20
An Alternate Version of M M’ could be: M’ : 1.) Atomically retrieve and clear the dirty bits from the pages P, and 2.) for all unmarked objects pointed to by marked objects on dirty pages of P, mark them and dirty the pages on which they reside.. Iteratively performing M’ can substitute for M, though M is generally preferable.
21
Implementation Choices When and how to use M and M’. No M’. For allocation-intensive mutators, run M more than once (twice seems to be the sweet spot). What is a “full collection” going to be, and when to run it? Initially triggered on heap exhaustion. However, the allocating thread would be stalled, even with the parallel collector. Settled on a daemon thread that kicks off the collector if the amount of used memory exceeds some threshold above what was being used at the end of the last collection. Then we run up to two iterations of M, then a concurrent execution of TR. If we run out of memory, we try to expand the heap.
22
Brief Results This collector was used at Xerox PARC for quite a while, heavily optimized. They didn’t modify the SunOS running on their machines, but just write-protected the heap. Mainly interested in measuring interactive response. Subjectively better. (But they are aware this is pretty fuzzy.) Ran 5 iterations of a “Boyer benchmark” and an allocator loop at various memory configurations, trying to even the playing field for full, generational and parallel generational collectors.
23
Results
24
Mostly Parallel Copying Collectors We can do all the same things and make a copying collector, if we want. It just requires space to maintain explicit forwarding links. A forward pointer is associated with each object, used only by the GC. Reachable objects are copied from from-space to to-space, writing the new address into the forward pointer in from-space. The mutator only sees the from-space pointers.
25
More on Copy Collectors Concurrent collection forces the following to be true: If an object residing on a clean page has been copied, then everything it points to has also been copied. If an object resides on a clean page, its current contents are up-to-date. With the world stopped, we can execute the finishing operation shown on the next slide such that all reachable objects are found with correct contents in to-space.
26
F c : For every object a whose from-space copy resides on a dirty page: 1. Copy everything it points to that hasn’t already been copied. 2. Update pointers to point to to-space. 3. Recopy a to reflect both pointer and non-pointer fields that occurred since the collection started. Could create a concurrent version of F c, but the authors found a copy collector to be impractical for their environment and didn’t both implementing one. Just like with the mark-sweep collector, the world-stoppage time is proportional to the number of dirtied pages. Copying Finishing Op
27
Questions? I really liked the tone of this paper. It had less of that stuffy, self-important academic tone.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.