Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

A Block-structured Heap Simplifies Parallel GC Simon Marlow (Microsoft Research) Roshan James (U. Indiana) Tim Harris (Microsoft Research) Simon Peyton.
Concurrency: Deadlock and Starvation Chapter 6. Deadlock Permanent blocking of a set of processes that either compete for system resources or communicate.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Chapter 6: Process Synchronization
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
PZ10B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ10B - Garbage collection Programming Language Design.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Garbage Collection What is garbage and how can we deal with it?
Garbage Collecting the World Bernard Lang Christian Queinnec Jose Piquer Presented by Yu-Jin Chia See also: pp text.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
Run-time organization  Data representation  Storage organization: –stack –heap –garbage collection Programming Languages 3 © 2012 David A Watt,
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002.
CS 536 Spring Automatic Memory Management Lecture 24.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
Incremental Garbage Collection
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Garbage collection (& Midterm Topics) David Walker COS 320.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
File I/O Applied Component-Based Software Engineering File I/O CSE 668 / ECE 668 Prof. Roger Crawfis.
© 2004, D. J. Foreman 1 Memory Management. © 2004, D. J. Foreman 2 Building a Module -1  Compiler ■ generates references for function addresses may be.
CS533 - Concepts of Operating Systems Virtual Memory Primitives for User Programs Presentation by David Florey.
C. Varela; Adapted w/permission from S. Haridi and P. Van Roy1 Declarative Computation Model Memory management (CTM 2.5) Carlos Varela RPI April 6, 2015.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
1 Lecture 22 Garbage Collection Mark and Sweep, Stop and Copy, Reference Counting Ras Bodik Shaon Barman Thibaud Hottelier Hack Your Language! CS164: Introduction.
11/26/2015IT 3271 Memory Management (Ch 14) n Dynamic memory allocation Language systems provide an important hidden player: Runtime memory manager – Activation.
Runtime System CS 153: Compilers. Runtime System Runtime system: all the stuff that the language implicitly assumes and that is not described in the program.
Garbage Collection and Memory Management CS 480/680 – Comparative Languages.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
More Distributed Garbage Collection DC4 Reference Listing Distributed Mark and Sweep Tracing in Groups.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
Naming CSCI 6900/4900. Unreferenced Objects in Dist. Systems Objects no longer needed as nobody has a reference to them and hence will not use them Garbage.
Memory Management CSCI 2720 Spring What is memory management? “the prudent utilization of this scarce resource (memory), whether by conservation,
Garbage Collecting the World Presentation: Mark Mastroieni Authors: Bernard Lang, Christian Queinne, Jose Piquer.
Garbage Collection What is garbage and how can we deal with it?
Memory Management © 2004, D. J. Foreman.
CS 153: Concepts of Compiler Design November 28 Class Meeting
Concepts of programming languages
Automatic Memory Management
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Strategies for automatic memory management
Memory Management Kathryn McKinley.
Created By: Asst. Prof. Ashish Shah, J.M.Patel College, Goregoan West
Chapter 6: Synchronization Tools
Type Systems For Distributed Data Sharing
Reference Counting.
CMPE 152: Compiler Design May 2 Class Meeting
Garbage Collection What is garbage and how can we deal with it?
Presentation transcript:

Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems

A concurrent, generational garbage collector for a multithreaded implementation of ML - Doligez - Leroy (POPL 1993) On-the-fly garbage collection: an exercise in cooperation - Dijkstra et al. (1978)

Overview Motivation Concurrent collection strategies Concurrent collection constraints The basic algorithm (Dijkstra) Doligez-Leroy model Doligez-Leroy concurrent collector

Concurrent GC Known as a tough problem Published algorithms contain simplifying assumptions that either:  impose unbearable overhead on mutators  require high degree of hardware/OS support Other algorithms are buggy

“Stop the world” all threads synchronically stop and perform GC introduces sync. between independent threads T1T1 T2T2 T3T3 T4T4 Sync. GC

“Stop the world” all threads synchronically stop and perform GC introduces sync. between independent threads T1T1 T2T2 T3T3 T4T4 Sync. GC

“Stop the world” - Mostly Parallel GC (Bohem et. al) Uses virtual memory page protections reduces duration of “stop the world” period does not prevent synchronization between threads at “stop the world” points T1T1 T2T2 T3T3 T4T4 Sync. GCmarking

“Stop the world” - Scalable mark-sweep GC Uses a parallelization of Bohem’s mostly parallel collector reduces duration of “stop the world” periods does not prevent synchronization between threads at “stop the world” points T1T1 T2T2 T3T3 T4T4 Sync. GCmarking

“Stop the world” - Real Time GC (Nettles & O’Toole) Incremental copying collector reduces duration of “stop the world” periods does not prevent synchronization between threads at the swap point T1T1 T2T2 T3T3 T4T4 Sync. GC

Concurrent collector run the collector concurrently with user threads use as little as possible sync between user threads and GC thread T1T1 T2T2 T3T3 T4T4 GC

Concurrent Collection strategies Reference counting copying (relocation) mark & sweep

Concurrent GC - Reference counting Locks on reference counters heap RC = 2 M1M1 M2M2 M3M3 +1

Concurrent GC - relocation relocating objects while mutators are running heap fromto M1M1 GC ? M2M2 ?

Concurrent GC - relocation relocating objects while mutators are running must ensure that mutators are aware of relocation  test on heap pointer deref  extra indirection word for each object  virtual memory page protections significant run-time penalty

Concurrent GC - mark/sweep Mark all threads roots No inherent locks Mutators may change trace graph during any collection phase Heap Threads 1 Global variables 23

Multiprocessors facts of life Registers are local  impossible to track down machine registers of a running process Synchronization is expensive  semaphores and synchronization are only available through expensive system calls

Unobtrusive? No overhead on frequent actions:  move data between registers and memory  deref a heap pointer  fill a field in a new heap object imposes sync. overhead only on reserve actions (for which it is unavoidable) mutator cooperation with collector is done only at mutator’s convenience

Portable ? No special use of OS synchronization primitives no hardware support

Where all else fail relocating GC algorithms break locality or impose large overhead proposed incremental algorithms requires global synchronization mark & sweep - collector working while mutators change trace graph - complicated but possible

The basic algorithm Dijkstra et al. - “On the fly garbage collection” published in 1978 breaks locality assumes fixed set of roots Heap Threads 1 Global variables 23GC

Dijkstra’s collector Mark: for each x in Globals do MarkGray(x) Scan: repeat dirty  false for each x in heap do if color[x] = Gray then dirty  true MarkGray(x. Sons) color[x]  black until not dirty Sweep: for each x in heap do if color[x] = white then append x to free list else if color[x] = black then color[x]  white black gray white mark sweep mark update sweep allocate

Doligez-Leroy model Damein doligez & Xavier Leroy at 1993 a concurrent, generational GC for multithreaded implementation of ML relies on ML properties:  compile time distinction between mutable and immutable objects  duplicating immutable objects is semantically transparent does not stop program threads

Doligez-Leroy model Do anything to avoid synchronization trade collection “quality” for level of synchronization - allow large amounts of floating garbage trade collection “simplicity” for level of synchronization - complicated algorithm (not to mention correctness proof)

3 Doligez-Leroy model Stacks Minor heaps Major heap Threads 12 Global variables

Collection generations Each thread treats the two heaps (private and shared) as two generations  private = young generation  shared = old generation immutable objects are allocated in private heaps  does not require synchronization mutable objects handled differently (later)

Minor collection When private heap is full - stop and perform minor collection copy live objects from private heap to shared heap (old generation) after minor collection, whole private heap is free can be performed in any time synchronization is only required for allocation of the copied object on shared heap

Major collection Dedicated GC thread uses a variation of Dijkstra’s algorithm (mark & sweep) does not move objects, no synchronization is required when accessing/modifying objects in shared heap will be described later

3 Major and minor collection Stacks Minor heaps Major heap Threads 12 Global variables GC

Copy on update We assumed no pointers from shared heap to private heap Major heap Not reachable from thread’s roots

Copy on update Copy the referenced object (and descendents) similar to minor collection with a single root simply does some of the minor collection right away Major heap

Copy on update Until next minor collection, copying thread can access original and copied objects immutable objects - semantically equivalent what about mutable objects ? Major heap

Allocation of mutable objects If copied - can update both objects separately no equivalence of original and copied object solution: always allocate mutable objects in the shared heap requires synchronization (free list) ML programs usually use few mutable objects mutable objects have longer life span than average

The Concurrent collector Adapted version of Dijkstra’s algorithm naming conventions  mutator = thread + minor collection thread  collector = major collector major collector only requires marking of mutator roots. does not demand minor collections

Four color marking White - not yet marked (or unreachable) Gray - marked but sons not marked Black - marked and sons marked Blue - free list blocks Heap

Collection phases Root enumeration end of marking sweeping

Root enumeration Raise a flag to signal beginning of marking shade globals ask mutators to shade roots wait until all mutators answered meanwhile - start scanning and marking

Root enumeration Mark: for each x in Globals do MarkGray(x) call mutator to mark roots wait until all mutators answered... Cooperate: if call to roots is pending then call MarkGray on all roots answer the call Collector Mutators

End of marking Repeatedly mark gray objects until no more gray objects remain Scan: repeat dirty  false for each x in heap do if color[x] = Gray then dirty  true MarkGray (x. Sons) color[x]  black until not dirty

Sweeping Scan heap All white objects are free - set to blue and add to the free list all black objects are reset to white some object might have been set to gray since the end of marking phase - set to white

Invariants (1/2) All objects reachable from mutator roots at the time mutator shaded its roots, or that become reachable after that time are black at the end of the marking phase Objects can become reachable by allocation and modification which are performed concurrently with the collection

Invariants (2/2) gray objects that are unreachable at the beginning of the mark phase become black during mark, then white during sweep and reclaimed by the next cycle (floating garbage) all white objects unreachable at the start of the marking phase remain white No unreachable object ever becomes reachable again there are no blue objects outside the free list

Concurrent allocation and modification Mutators must consider collector status when performing modification or allocation of heap objects first, lets consider modification of heap objects

Concurrent modification Updating a black object could result in a reachable object that remains white at the end of marking even worse - the set of roots is not fixed during collection must shade both the new value and the old value

What happens if we don’t shade new value T1T2 Major heap Mark T1 root A B T2 updates A Root enumeration T2 pops

What happens if we don’t shade new value Major heap T1T2 A B Mark T1 rootT2 updates A Root enumeration T2 pops

What happens if we don’t shade new value Major heap T1T2 A B Mark T1 rootT2 updates A Root enumeration T2 popsMark T2 root End markSweep

What happens if we don’t shade old value T Major heap Mark T rootT pushes B Root enumeration A B End mark

What happens if we don’t shade old value Major heap T A B Mark T rootT pushes B Root enumerationEnd mark

What happens if we don’t shade old value Major heap T A B Mark T rootT pushes B Root enumeration T updates A SweepEnd mark

Concurrent Allocation Assign right color to new objects during marking - allocated objects are black  allocated are reachable  sons of allocated are reachable and will eventually be set to black sweeping - white if already swept, gray otherwise  set to gray to avoid immediate deallocation

Synchronization It is always safe to set an object to gray setting many objects to gray is inefficient  will be only reclaimed on next cycle allows us to avoid synchronization when race condition can end up making an object gray used to test collector status without locking

Synchronization Coloring of newly allocated block 1. If phase = marking then 2. Set object to black 3. If phase = sweeping then 4. Set object to gray 5. Else 6. If address(object) < sweep-pointer then 7. Set object to white 8. Else 9. Set object to gray S M S S S S S S S S

Color transitions summary blackgray bluewhite mark allocate sweep mark allocate update sweep allocate

Experimental results

Corrections When shading old value - what old value do we shade ? Another thread might “replace” old value, after we shade it, by a non-shaded value this is corrected by adding another handshake - all updates must end before we start marking

Summary Doligez Leroy & Gonthier concurrent GC does not stop program threads four colors mark & sweep - white,gray, black and blue relies on ML language properties, but can be extended for other languages

The End