Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University 30.12.14.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,
Incorporating Generations into a Modern Reference Counting Garbage Collector Hezi Azatchi Advisor: Erez Petrank.
Garbage Collection What is garbage and how can we deal with it?
Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
Copying GC and Reference Counting Jonathan Kalechstain Tel Aviv University 11/11/2014.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
Mark DURING Sweep rather than Mark then Sweep Presented by Ram Mantsour Authors: Chrisitan Queinnec, Barbara Beaudoing, Jean-Pierre Queille.
Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002.
CS 536 Spring Automatic Memory Management Lecture 24.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
0 Parallel and Concurrent Real-time Garbage Collection Part I: Overview and Memory Allocation Subsystem David F. Bacon T.J. Watson Research Center.
Incremental Garbage Collection
Compilation 2007 Garbage Collection Michael I. Schwartzbach BRICS, University of Aarhus.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
File I/O Applied Component-Based Software Engineering File I/O CSE 668 / ECE 668 Prof. Roger Crawfis.
Parallel GC (Chapter 14) Eleanor Ainy December 16 th
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
1 Lecture 22 Garbage Collection Mark and Sweep, Stop and Copy, Reference Counting Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction.
Garbage Collection and Memory Management CS 480/680 – Comparative Languages.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
Automated and Modular Refinement Reasoning for Concurrent Programs Shaz Qadeer.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
University of Washington Wouldn’t it be nice… If we never had to free memory? Do you free objects in Java? 1.
More Distributed Garbage Collection DC4 Reference Listing Distributed Mark and Sweep Tracing in Groups.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Naming CSCI 6900/4900. Unreferenced Objects in Dist. Systems Objects no longer needed as nobody has a reference to them and hence will not use them Garbage.
GARBAGE COLLECTION Student: Jack Chang. Introduction Manual memory management Memory bugs Automatic memory management We know... A program can only use.
Garbage Collection What is garbage and how can we deal with it?
Concepts of programming languages
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
David F. Bacon, Perry Cheng, and V.T. Rajan
Strategies for automatic memory management
Memory Management Kathryn McKinley.
New GC collectors in Java 11
Reference Counting.
Garbage Collection What is garbage and how can we deal with it?
Reference Counting vs. Tracing
Presentation transcript:

Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University

Outline of talk  Introduction  Tricolour Invariants (Reminder)  Mostly Concurrent Mark & Sweep  Concurrent Marking and Sweeping  On-the-fly marking  Abstract Concurrent Collection  Summary

Introduction  During the course we have seen different types of garbage collection algorithms, particularly of types: STW (stop the world), Incremental and Parallel.  Each of which has its own tradeoffs with respect to:  Throughput  Pause time  Space overhead  Scalability and portability  And more …  We’ll present the concurrent mark-sweep paradigm.

Incremental Collection ( Roman Kecher Lec.09 ) 4  First, on a Uniprocessor Time

Incremental Collection ( Roman Kecher Lec.09 ) 5  On a Multiprocessor:

Incremental Collection ( Roman Kecher Lec.09 ) 6  On a Multiprocessor:  Can also be parallelized:

Mostly Concurrent Collection ( Roman Kecher Lec.09 ) 7

8  Can also be incremental:

Concurrent Collection ( Roman Kecher Lec.09 ) 9

10  Can also be incremental:

Our Focus

Tricolour Abstraction – Reminder ( Pavel Lec.02 )  A convenient way to describe object states:  Initially, every node is white.  When a node is first encountered (during tracing), it is colored grey.  When it has been scanned, and its children identified, it is colored black.  At the end of each sweep, no references from black to white objects.  All the white objects are unreachable = garbage.

Grey Wavefront ( Roman Kecher Lec.09 ) 13 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 14 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 15 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 16 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 17 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 18 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 19 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 20 Roots

Grey Wavefront ( Roman Kecher Lec.09 ) 21 Roots

The Lost Object Problem  Direct and transitive.

Lost Object – Direct ( Roman Kecher Lec.09 ) 23-Dec-1423  D1 // Insertion  Write(X, b, Read(Y, a)) Roots XY a Z

Lost Object – Direct ( Roman Kecher Lec.09 ) 23-Dec-1424  D1 // Insertion  Write(X, b, Read(Y, a)) Roots XY a Z b

Lost Object – Direct ( Roman Kecher Lec.09 ) 23-Dec-1425  D2 // Deletion:  Write(Y, a, null) Roots XY a Z b

Lost Object – Direct ( Roman Kecher Lec.09 ) 23-Dec-1426  D2 // Deletion:  Write(Y, a, null) Roots XY Z b

Lost Object – Direct ( Roman Kecher Lec.09 ) 27  D3:  scan(Y) Roots XY Z b

Lost Object – Direct ( Roman Kecher Lec.09 ) 28 Roots XY Z b

Lost Object – Transitive ( Roman Kecher Lec.09 ) 29 Roots PQ c RS d

Lost Object – Transitive ( Roman Kecher Lec.09 ) 30  T1 // Insertion:  Write(P, e, Read(R, d)) Roots PQ c RS d e

Lost Object – Transitive ( Roman Kecher Lec.09 ) 31  T2 // Deletion:  Write(Q, c, null) Roots PQ RS d e

Lost Object – Transitive ( Roman Kecher Lec.09 ) 32  T3:  scan(Q) Roots PQ RS d e

Lost Object – Transitive ( Roman Kecher Lec.09 ) 33 Roots PQ RS d e

Tricolour Invariants – Reminder  The strong tricolour invariant: There are no pointers from black objects to white objects. root YX Z

Tricolour Invariants – Reminder  The strong tricolour invariant: There are no pointers from black objects to white objects.  The weak tricolour invariant: All white objects pointed to by a black object are reachable from some grey object through a chain of white objects. root YX Z

Barriers for Grey Mutator Weak or Strong?

Barriers for Black Mutator Weak or Strong?

Barriers – to be continued …

Introduction Tricolour Invariants (Reminder)  Mostly Concurrent Mark & Sweep  Concurrent Marking and Sweeping  On-the-fly marking  Abstract Concurrent Collection  Summary

Mostly Concurrent Mark & Sweep Initialisation:  Triggering is a critical decision.  Once a collection cycle has started, we would like to:  Minimize its impact on mutator throughput (efficiency).  Complete the cycle before mutator exhausts memory (pause time).

Mostly Concurrent Mark & Sweep Mutator Allocation: New(): CollectEnough() ref  allocate() if ref = null errror "out of memory" return ref atomic collectEnough(): while behind() if not markSome() return

Mostly Concurrent Mark & Sweep Let’s dive deep into the markSome method: markSome(): if isEmpty(worklist) scan(Roots) if isEmpty(worklist) sweep() return false ref  remove(worklist) scan(ref) return true

Mostly Concurrent Mark & Sweep Additional Methods: shade(ref): if not isMarked(ref) setMarked(ref) add(worklist, ref) scan(ref): for each fld in Pointers(ref) child  *fld if child ≠ null shade(child)

Mostly Concurrent Mark & Sweep Additional Methods: revert(ref): add(worklist, ref) isWhite(ref): return not isMarked(ref) isGrey(ref): return ref in worklist isBlack(ref): return isMarked(ref) && not isGrey(ref)

Mostly Concurrent Mark & Sweep Recall: Black and Grey mutators. markSome(): if isEmpty(worklist) scan(Roots) if isEmpty(worklist) sweep() return false ref  remove(worklist) scan(ref) return true

Mostly Concurrent Mark & Sweep Recall: Black and Grey mutators. markSome(): if isEmpty(worklist) scan(Roots) if isEmpty(worklist) sweep() return false ref  remove(worklist) scan(ref) return true

Mostly Concurrent Mark & Sweep Termination:  Depends on the mutator’s colour:  If black, no need to rescan.  If grey, need to rescan and sync with mutator’s thread.

Mostly Concurrent Mark & Sweep Allocation:  Depends on the mutator’s colour:  If black, could not be allocated as white under the strong invariant.  What about the weak invariant? Shall we allocate it grey?  If grey, then it’s non trivial:  White – might affect pause time and throughput.  Black – increase floating garbage quantity.  Yellow – hybrid method (Vechev et al [2006]).

Introduction Tricolour Invariants (Reminder) Mostly Concurrent Mark & Sweep  Concurrent Marking and Sweeping  On-the-fly marking  Abstract Concurrent Collection  Summary

Concurrent marking and sweeping  So far, marking-sweeping proceeding in series.  This of course is not always the case.  Lazy sweeping?

Recap: Lazy Sweeping ( Pavel Lec.02 ) atomic collect() markFromRoots() for each block in Blocks if not isMarked(block) add(blockAllocator, block) else add(reclaimList, block)

Recap: Lazy Sweeping ( Pavel Lec.02 ) lazySweep(sz): repeat block  nextBlock(relaimList, sz) if block != null sweep(start(block), end(block)) if spaceFound(block) return untill block = null allocSlow(sz) allocSlow(sz): block  allocateBlock() if block != null initalise(block, sz)

Concurrent marking and sweeping  We may also trigger concurrent sweeping from the previous marking phase.  Even if a gc cycle has been already running the next one …  How could we advise the marker and sweeper to distinguish between different white nodes?

Concurrent marking and sweeping  Example - Roots block

Concurrent marking and sweeping Roots block

Concurrent marking and sweeping Roots block

Concurrent marking and sweeping  Gc cycle is ended …  Now what? Roots block

Concurrent marking and sweeping How could we advise the marker and sweeper to distinguish between different white nodes?  We could use a new color: purple (Lamport [1976]).  But this means we need to: 1. Wait until all markers and sweeper have finished. 2. Changing colors – white to purple, black back to white. 3. Shade all roots. 4. Start the markers and sweepers again.

Concurrent marking and sweeping A Caveat: Changing the colors – white to purple, black back to white consumes time. Elegant solution (Lamport): don’t change colors, change perspective

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Lamport Solution roots X YZ W

Introduction Tricolour Invariants (Reminder) Mostly Concurrent Mark & Sweep Concurrent Marking and Sweeping  On-the-fly marking  Abstract Concurrent Collection  Summary

On-the-fly marking (Heuristic)  So far we have assumed that the mutator threads are all stopped at once.  On-the-fly collection never stops the mutator threads at once.  An alternative is to sample the roots of each mutator thread separately, concurrently with other mutator threads working.  But it’s not as simple as it seems (if it seems) …

On-the-fly marking (Heuristic)  Usually mostly-concurrent collectors are using a black mutator with a deletion barrier.  Avoiding stw phase, some mutator's roots are neither black, nor grey.  As a result, we have black and white thread stacks. Thread Stack 1 x Thread Stack 2 Y

Barriers for Black Mutator (again) Weak or Strong? Merely Read Barrier?

On-the-fly marking (Heuristic)  Read barriers does not apply on stack operations, particularly the deletion barrier is not applied.  One can use stack operations barriers ( Visit Lecture no.6 by Oleg Dobkin ). Thread Stack 1 x Thread Stack 2 Y

On-the-fly marking (Heuristic)  Write(X,b,Y) Thread Stack 1 x Thread Stack 2 Y

On-the-fly marking (Heuristic)  Write(X,b,Y) Thread Stack 1 x Thread Stack 2 Y b

Doligez-Leroy-Gonthier [1993]  Let’s separate data allocated solely on a single thread and not shared with other threads.  In addition we have a global heap that allows sharing of objects among threads.  Private objects may move to global heap, if needed.

Doligez-Leroy-Gonthier [1993]

On-the-fly marking (Heuristic) This solves the first problem: Thread Stack 1 x Thread Stack 2 Y

On-the-fly marking (Heuristic) Thread Stack 1 x Thread Stack 2 Y

On-the-fly marking (Heuristic) Thread Stack 1 x Thread Stack 2 Y b

Doligez-Leroy-Gonthier [1993]

Introduction Tricolour Invariants (Reminder) Mostly Concurrent Mark & Sweep Concurrent Marking and Sweeping On-the-fly marking  Abstract Concurrent Collection  Summary

Abstract Concurrent Collection  We present an abstract framework for concurrent garbage collection [Vechev et al, 2005, 2006, 2007].  The correctness of a concurrent collector rely on the cooperation between the collector and the mutator.  Thus it’s reasonable to introduce a log event structure to append and synchronize the events of both.

Abstract Concurrent Collection - Preliminaries src old fld

Abstract Concurrent Collection - Preliminaries src new fld

Abstract Concurrent Collection - Preliminaries

Abstract Concurrent Collection - CMS

Abstract Tracing Algorithm – Reminder ( Yarden Lec.04 ) atomic collectTracing(): rootsTracing(W) scanTracing(W) sweepTracing() rootsTracing(R): for each fld in Roots ref ← *fld if ref ≠ null R ← R + [ref] scanTracing(W): while not isEmpty(W) src ← remove(W) (src) ← (src)+1 if (src) = 1 for each fld in Pointers(src) ref ← *fld if ref ≠ null W ← W + [ref]

Abstract Tracing Algorithm – Reminder ( Yarden Lec.04 )

The Concurrent Version shared log  () collectTracingInc(): atomic rootsTracing(W) log  () repeat scanTracingInc(W) addOrigins() until ? atomic addOrigins() scanTracingInc(W) sweepTracing() initialization termination

The Concurrent Version scanTracingInc(W): while not isEmpty(W) src  remove(W) if ρ (src) = 0 for each fld in Poinrters(src) atomic ref  *fld log  log T if ref ≠ null W  W + [ref] ρ (src)  ρ (src) + 1

The Concurrent Version addOrigins(): atomic origins  expose(log) for each src in origins W  W + [src] New(): ref  allocate() atomic ρ (ref)  0 log  log N return ref atomic Write(src, i, new): if src ≠ roots old  src[i] log  log W src[i]  new

Introduction Tricolour Invariants (Reminder) Mostly Concurrent Mark & Sweep Concurrent Marking and Sweeping On-the-fly marking Abstract Concurrent Collection  Summary

Introduction Tricolour Invariants (Reminder) Mostly Concurrent Mark & Sweep Concurrent Marking and Sweeping On-the-fly marking X Abstract Concurrent Collection  Summary

Discussion  What are CMS advantages?  Minimize Pause Time/Low Latency  Shared Resources between mutator and collector  What are CMS downsides?  Fragmentation  Promotion Failures  CPU Intensive  When would you use CMS?  Short GC times  Large sets of living objects

CMS in Practice  Is being used in JVM (not by default)  -XX:+UseConcMarkSweepGC  It has several flags, I (we) can now understand:  -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode  In practice, the vast majority of CMS are an implementation of mostly-concurrent algorithms with small stw phases.  It does minimize pause time.

CMS in Practice

 Additional nice metrics and benchmarks: other-java-7-garbage-collectors/

Summary  We’ve reviewed CMS algorithm –  Mostly concurrent  Mark and Sweep concurrently  On the fly  No free-lunch  It minimizes pause time, but at the expense of throughput and fragmentations.

Thank You! Questions and Feedback?