Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Garbage Collection What is garbage and how can we deal with it?
Copying GC and Reference Counting Jonathan Kalechstain Tel Aviv University 11/11/2014.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
CS 326 Programming Languages, Concepts and Implementation Instructor: Mircea Nicolescu Lecture 18.
CPSC 388 – Compiler Design and Construction
Mark and Sweep Algorithm Reference Counting Memory Related PitFalls
CS 536 Spring Automatic Memory Management Lecture 24.
An Efficient Machine-Independent Procedure for Garbage Collection in Various List Structures, Schorr and Waite CACM August 1967, pp Curtis Dunham.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July A Rapid Introduction to Garbage Collection Richard Jones Computing Laboratory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Memory Allocation and Garbage Collection. Why Dynamic Memory? We cannot know memory requirements in advance when the program is written. We cannot know.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
Compiler Optimizations for Nondeferred Reference-Counting Garbage Collection Pramod G. Joisha Microsoft Research, Redmond.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Taking Off The Gloves With Reference Counting Immix
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
Garbage Collection and Memory Management CS 480/680 – Comparative Languages.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Memory Management -Memory allocation -Garbage collection.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
An Efficient, Incremental, Automatic Garbage Collector P. Deutsch and D. Bobrow Ivan JibajaCS 395T.
Naming CSCI 6900/4900. Unreferenced Objects in Dist. Systems Objects no longer needed as nobody has a reference to them and hence will not use them Garbage.
Garbage Collection What is garbage and how can we deal with it?
Dynamic Compilation Vijay Janapa Reddi
Concepts of programming languages
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
Ulterior Reference Counting Fast GC Without The Wait
Strategies for automatic memory management
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Reference Counting.
Garbage Collection What is garbage and how can we deal with it?
Reference Counting vs. Tracing
Presentation transcript:

Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose Joao CS395T - Mar 23, 2009

Outline Motivation – Backup tracing – Trial deletion Mark-Sweep Cycle Detection (MSCD) Results – What worked and what didn’t Discussion

Motivation Reference counting can directly (i.e. locally) identify garbage – Low pause times – Reasonable throughput (deferred, coalescing, ulterior) – But it cannot reclaim circular garbage Existing general solutions are expensive: – Trace the whole heap (backup tracing) – Temporarily delete an object and see if the cycle collapses (trial deletion)

Trial deletion Is partial mark-sweep (no roots required): find objects that are alive only because they are reachable from themselves Three phases: – Assume candidate object is dead and mark&decrement children recursively. – Trace again from candidate object, marking &incrementing if some RC is not zero, i.e. if the object is externally reachable – Sweep objects with a zero count Bacon and Rajan: process candidates en masse, avoid acyclic objects, concurrent algorithm Usually less efficient than concurrent tracing

Backup tracing Trace all live objects and sweep the entire heap Shortcomings: – Increases pause times – Concurrency for low pause times requires synchronization, e.g. write barrier – Visits all objects, although some cannot be part of a cycle

MSCD: base algorithm 1.Add roots to mark queue 2.Mark until empty mark queue 1.Pop from queue and process (mark, scan and add children to queue) 2.Enqueue objects subject to races (fixup set) 3.Sweep

MSCD: concurrency Builds on top of coalescing RC with a snapshot-at-the-beginning write barrier: Atomic state update to process each object only once 1)Record all pre-mutation pointers for deferred decrement RC 2)Record object as mutated

MSCD: concurrency Black: marked and scanned Grey: marked, not yet scanned White: not yet visited C is never visited and incorrectly collected Again, C is never visited and incorrectly collected Same here… Necessary conditions for a race: Create a pointer from a black to a white object C Destroy the last path from a grey object to that white object C Necessary conditions for a race: Create a pointer from a black to a white object C Destroy the last path from a grey object to that white object C RC(C): 1 → 2 → 1 RC(E): 2 → 1

MSCD: concurrency Key insight: how to reduce the size of fixup set? Use the set of objects with RC decremented to a non-zero value – These decrements are necessary condition for cyclic garbage – These decrements are uncommon – Easy to identify while processing the decrement buffer (after increments) – Robust to coalescing of reference counts – These are the purple objects or candidates for trial deletion (Bacon&Rajan) – It’s enough to compute this set at tracing time – Trade-offs?

MSCD: marking Statically determine acyclic classes: – No pointer fields, or – Can point only to acyclic classes Set green bit in header of acyclic objects at allocation time Ignore green objects for the fixup set (step 2.2 of base algorithm?) – why only step 2.2? How about step 2.1? – the sweep phase also has to consider green objects as marked How about green objects pointed to only by non-green objects in a cycle? Trade-offs?

MSCD: sweeping Sweep only potentially cyclic objects and their children Start with all purple objects Trade-offs? – Much cheaper than scanning the heap – Require keeping the set of all purple objects identified since last cycle detection, not only during tracing Space overhead Time overhead of filtering the purple set from RC-collected objects Overhead increases with time between cycle detections!

MSCD: implementation Interaction with the reference counter – Establish roots atomically – Add complete fixup set to mark queue – RC must not free objects pointed to by MSCD (mark queue and fixup queue): free buffer Invocation heuristics – When RC is unable to free enough memory (?) – Heap fullness threshold – Size of the purple set – Can do trial deletion or backup tracing instead of MSCD

MSCD: possible timing MutatorRC Roots MutatorRCMutatorRC MSCD: marking Fixup New (grey) marking Fixup Final marking Sweeping Mutator New (grey) Fixup

Methodology and Results Jikes RVM CVS, MMTk Dacapo beta050224, SPECjvm98 and pseudojbb Stop-the-world (i.e. limit) throughput: – Trial deletion is about 70% worse than Backup MS, while MSCD is about 20% better than Backup MS. – MSCD visits only 12% fewer nodes: green objects on the fringe still have to be visited, green objects are short lived (many allocated, fewer on the heap at a given time) – MSCD has about 7% cheaper cost per visited node: green objects not scanned, sweep optimization

More Results Concurrent throughput: – Bug in base and MSCD running on SMT (why not CMP?) – Time-slicing (i.e. single-context uniprocessor): no benefit from concurrency optimization → fixup is too small Overall performance (stop-the-world CD triggered by insufficient reclamation by RC): – MSCD with mark opt. is better than MSCD with both mark and sweep opt. due to overhead of maintaining the purple set – Overhead of gray bit and green bit – Heuristics to trigger CD matters, especially on tight heaps – Generations (e.g. ulterior RC) could reduce cycle detection load

Discussion Main ideas: reduce the cost of backup MS by: – stopping mark at the green-object frontier, – start sweep from purple objects, – reusing the concurrency mechanism from coalescing RC Figure 6 shows about 50% of the total time is GC+CD (!) Baseline is non-generational deferred/coalescing RC. Why not testing concurrency on CMP in addition to/instead of SMT? Synchronization is still required in the write barrier, although they claim the guard can be removed (?)  ?

Open questions Invocation heuristics (trade-offs?) – When running out of heap – At some heap occupancy threshold – Some form of estimating that there is enough cyclic garbage to trigger CD? – Hints from programmer/compiler? Can we do better with CMPs?

Qustions for the authors Old version of Jikes RVM. Why? Does it matter? For xalan and compress, green% + cycle% > 100% Table 2 and Figure 5 don’t agree