Presentation is loading. Please wait.

Presentation is loading. Please wait.

Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.

Similar presentations


Presentation on theme: "Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose."— Presentation transcript:

1 Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose Joao CS395T - Mar 23, 2009

2 Outline Motivation – Backup tracing – Trial deletion Mark-Sweep Cycle Detection (MSCD) Results – What worked and what didn’t Discussion

3 Motivation Reference counting can directly (i.e. locally) identify garbage – Low pause times – Reasonable throughput (deferred, coalescing, ulterior) – But it cannot reclaim circular garbage Existing general solutions are expensive: – Trace the whole heap (backup tracing) – Temporarily delete an object and see if the cycle collapses (trial deletion)

4 Trial deletion Is partial mark-sweep (no roots required): find objects that are alive only because they are reachable from themselves Three phases: – Assume candidate object is dead and mark&decrement children recursively. – Trace again from candidate object, marking &incrementing if some RC is not zero, i.e. if the object is externally reachable – Sweep objects with a zero count Bacon and Rajan: process candidates en masse, avoid acyclic objects, concurrent algorithm Usually less efficient than concurrent tracing

5 Backup tracing Trace all live objects and sweep the entire heap Shortcomings: – Increases pause times – Concurrency for low pause times requires synchronization, e.g. write barrier – Visits all objects, although some cannot be part of a cycle

6 MSCD: base algorithm 1.Add roots to mark queue 2.Mark until empty mark queue 1.Pop from queue and process (mark, scan and add children to queue) 2.Enqueue objects subject to races (fixup set) 3.Sweep

7 MSCD: concurrency Builds on top of coalescing RC with a snapshot-at-the-beginning write barrier: Atomic state update to process each object only once 1)Record all pre-mutation pointers for deferred decrement RC 2)Record object as mutated

8 MSCD: concurrency Black: marked and scanned Grey: marked, not yet scanned White: not yet visited C is never visited and incorrectly collected Again, C is never visited and incorrectly collected Same here… Necessary conditions for a race: Create a pointer from a black to a white object C Destroy the last path from a grey object to that white object C Necessary conditions for a race: Create a pointer from a black to a white object C Destroy the last path from a grey object to that white object C RC(C): 1 → 2 → 1 RC(E): 2 → 1

9 MSCD: concurrency Key insight: how to reduce the size of fixup set? Use the set of objects with RC decremented to a non-zero value – These decrements are necessary condition for cyclic garbage – These decrements are uncommon – Easy to identify while processing the decrement buffer (after increments) – Robust to coalescing of reference counts – These are the purple objects or candidates for trial deletion (Bacon&Rajan) – It’s enough to compute this set at tracing time – Trade-offs?

10 MSCD: marking Statically determine acyclic classes: – No pointer fields, or – Can point only to acyclic classes Set green bit in header of acyclic objects at allocation time Ignore green objects for the fixup set (step 2.2 of base algorithm?) – why only step 2.2? How about step 2.1? – the sweep phase also has to consider green objects as marked How about green objects pointed to only by non-green objects in a cycle? Trade-offs?

11 MSCD: sweeping Sweep only potentially cyclic objects and their children Start with all purple objects Trade-offs? – Much cheaper than scanning the heap – Require keeping the set of all purple objects identified since last cycle detection, not only during tracing Space overhead Time overhead of filtering the purple set from RC-collected objects Overhead increases with time between cycle detections!

12 MSCD: implementation Interaction with the reference counter – Establish roots atomically – Add complete fixup set to mark queue – RC must not free objects pointed to by MSCD (mark queue and fixup queue): free buffer Invocation heuristics – When RC is unable to free enough memory (?) – Heap fullness threshold – Size of the purple set – Can do trial deletion or backup tracing instead of MSCD

13 MSCD: possible timing MutatorRC Roots MutatorRCMutatorRC MSCD: marking Fixup New (grey) marking Fixup Final marking Sweeping Mutator New (grey) Fixup

14 Methodology and Results Jikes RVM 2.3.4+CVS, MMTk Dacapo beta050224, SPECjvm98 and pseudojbb Stop-the-world (i.e. limit) throughput: – Trial deletion is about 70% worse than Backup MS, while MSCD is about 20% better than Backup MS. – MSCD visits only 12% fewer nodes: green objects on the fringe still have to be visited, green objects are short lived (many allocated, fewer on the heap at a given time) – MSCD has about 7% cheaper cost per visited node: green objects not scanned, sweep optimization

15 More Results Concurrent throughput: – Bug in base and MSCD running on SMT (why not CMP?) – Time-slicing (i.e. single-context uniprocessor): no benefit from concurrency optimization → fixup is too small Overall performance (stop-the-world CD triggered by insufficient reclamation by RC): – MSCD with mark opt. is better than MSCD with both mark and sweep opt. due to overhead of maintaining the purple set – Overhead of gray bit and green bit – Heuristics to trigger CD matters, especially on tight heaps – Generations (e.g. ulterior RC) could reduce cycle detection load

16 Discussion Main ideas: reduce the cost of backup MS by: – stopping mark at the green-object frontier, – start sweep from purple objects, – reusing the concurrency mechanism from coalescing RC Figure 6 shows about 50% of the total time is GC+CD (!) Baseline is non-generational deferred/coalescing RC. Why not testing concurrency on CMP in addition to/instead of SMT? Synchronization is still required in the write barrier, although they claim the guard can be removed (?)  ?

17 Open questions Invocation heuristics (trade-offs?) – When running out of heap – At some heap occupancy threshold – Some form of estimating that there is enough cyclic garbage to trigger CD? – Hints from programmer/compiler? Can we do better with CMPs?

18 Qustions for the authors Old version of Jikes RVM. Why? Does it matter? For xalan and compress, green% + cycle% > 100% Table 2 and Figure 5 don’t agree


Download ppt "Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose."

Similar presentations


Ads by Google