Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

4.4 Page replacement algorithms
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Garbage Collection CS Introduction to Operating Systems.
ICS220 – Data Structures and Algorithms Lecture 13 Dr. Ken Cosh.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by Phil Howard.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Parallel Garbage Collection Timmie Smith CPSC 689 Spring 2002.
CS 536 Spring Automatic Memory Management Lecture 24.
An Efficient Machine-Independent Procedure for Garbage Collection in Various List Structures, Schorr and Waite CACM August 1967, pp Curtis Dunham.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
Garbage Collection Mooly Sagiv html://
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
O RERATıNG S YSTEM LESSON 10 MEMORY MANAGEMENT II 1.
CS533 - Concepts of Operating Systems Virtual Memory Primitives for User Programs Presentation by David Florey.
Chapter 21 Virtual Memoey: Policies Chien-Chung Shen CIS, UD
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Chapter 16 Recovery Yonsei University 1 st Semester, 2015 Sanghyun Park.
Runtime System CS 153: Compilers. Runtime System Runtime system: all the stuff that the language implicitly assumes and that is not described in the program.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
More Distributed Garbage Collection DC4 Reference Listing Distributed Mark and Sweep Tracing in Groups.
GARBAGE COLLECTION IN AN UNCOOPERATIVE ENVIRONMENT Hans-Juergen Boehm Computer Science Dept. Rice University, Houston Mark Wieser Xerox Corporation, Palo.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Lecture 7 Page 1 CS 111 Summer 2013 Dynamic Domain Allocation A concept covered in a previous lecture We’ll just review it here Domains are regions of.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
COMP091 – Operating Systems 1 Memory Management. Memory Management Terms Physical address –Actual address as seen by memory unit Logical address –Address.
1 Chapter 10: Virtual Memory Background Demand Paging Process Creation Page Replacement Allocation of Frames Thrashing Operating System Examples (not covered.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Lecture 10 Page 1 CS 111 Online Memory Management CS 111 On-Line MS Program Operating Systems Peter Reiher.
Lecture 5 Page 1 CS 111 Summer 2013 Bounded Buffers A higher level abstraction than shared domains or simple messages But not quite as high level as RPC.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Dynamic Compilation Vijay Janapa Reddi
Chapter 9: Virtual Memory – Part I
Chapter 9: Virtual Memory
Concepts of programming languages
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
Chapter 9: Virtual-Memory Management
David F. Bacon, Perry Cheng, and V.T. Rajan
Strategies for automatic memory management
Morgan Kaufmann Publishers Memory Hierarchy: Cache Basics
Chapter 12 Memory Management
Created By: Asst. Prof. Ashish Shah, J.M.Patel College, Goregoan West
Data Structures and Algorithms
Data Structures and Algorithms
Mooly Sagiv html:// Garbage Collection Mooly Sagiv html://
Presentation transcript:

Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke

Outline  Introduction  Basics of Garbage Collection revisited  How do you make a GC for non-GC languages?  Oh, and making it parallel would be nice.  Or at least mostly parallel.  The Basic Idea  Virtual dirty bits to find the reachable set  Sweeping doesn’t matter.  Try telling that to your mother.  For performance, the sweep step is practically ignorable.  Formalisms  Let’s introduce some notation and concepts for how this should work.

Outline, Part Deux  Implementation Choices  Based on our formalisms, which is the best combination to actually use?  Brief Results  Really, the test hardware was a SPARC station configured with as little as 10MB of RAM!  Mostly Parallel Copying Collectors  This is a mark-sweep paper, mostly. Could you build a copy collector?  Onwards!

GC Taxonomy and Our Choices  Garbage Collectors may be Reference-Counting or Tracing based.  The authors focus on tracing out from the root set.  The basic style of many early collectors was “stop-the- world” collection.  Generational and parallel collectors attempt to mitigate the potentially long delays while the world is stopped.  Generational collectors just collect a small part of the heap.  Parallel collectors might be generational, but they mainly try to collect the whole heap, but in parallel with the mutator(s).

So why “mostly” parallel?  Think back to the VM migration papers. Migration started while the VM was running.  Or in parallel.  But at some point, the VM had to be stopped to complete the transfer.  Hopefully, by that time, there was very little to transfer.  Same idea here: as much collection is done as possible while the mutator is running. At some point, we need to stop the world to finish the collection.  For mostly the same kind of reason: the mutator will do things after the collector has made decisions on certain pointers which render the pointer un/reachable.  This is meaningful because we don’t want the collector running all the time.

Authors’ Two Stated Goals  “Present a method for transforming a stop-the- world racing collector into a mostly parallel collector.”  And to make the solution general to copying/non- copying or generational/non-generational collectors.  Furthermore, no OS changes are needed.  “Describe a particular implementation of a garbage collector that illustrates this idea.”  What’s really cool is that it will provide GC to languages like C with relatively short pause times.

Basic Idea  Every program has a root set. The root set forms the foundation for the immune set, or the set of objects that are reachable or live.  Tracing the path of pointers from the root set finds live, reachable objects that are marked.  Unmarked (and therefore unreachable) objects can be collected.

More Basic Idea  Key idea: Whenever a virtual memory page is written to, set a virtual dirty bit for that page.  At the beginning of a collection, clear all the dirty bits. Start tracing.  The tracing finds all currently reachable objects while the mutator keeps doing its thing.  Writes introduce dirty pages.  When the original trace is done, stop the world and trace out marked objects on dirty pages.  Now everything reachable is marked.  But is it safe to say everything unreachable is not marked?

A Compromise  No, the collector is neither purely parallel nor precise.  The duration of the stop-the-world pause is directly dependent on the number of dirtied pages.  In theory, things can be no worse than a whole-heap stop- the-world collection. The authors claim this doesn’t happen.  Not all unreachable objects are collected, as they may have been marked before the mutator dismissed them.  The collector is complete, in that eventually that memory will be reclaimed. (Just not right now!)

Sweeping Doesn’t Matter  Phase 2 of a mark-sweep collector is to free the unused memory in whatever form that takes – called sweeping.  Sweeping doesn’t need to occur during the world stoppage. Once we know what’s garbage, we can sweep interleaved with object allocation.

Sweeping Implemented Here  The heap is split into blocks. Each block contains objects of a certain size.  For small objects, the block size is the same as a physical page of memory.  After marking, pages are queued for sweeping in one of multiple queues (one per object size).  Each object size also has a free list. When it is empty, the allocator sweeps the front of the queue for that object size and restores that memory to the free list.  Blocks for larger objects are swept in large increments immediately following a collection.  This limits CPU time consumed by the collection.  The net effect is that GC times are dominated by the marking.

Let’s Get Formal  Definition: A partial collection only reclaims some subset of the unreachable objects.  Let the set T contain all threatened objects (that is, objects that might be collected).  Let the set I contain all immune objects (that is, objects that will not be collected).  T and I are disjoint. All objects fall into either T or I.  For a full collection, I contains only the roots. In a partial collection, there are additional objects.  A collection is correct iff no reachable objects are collected.

Guaranteeing Correctness  Reclaim only unmarked objects when the following condition is true:  C : Every object in I is marked and every object pointed to by a marked object is also marked.

Stop-the-World Collection  Formalizing stop-the-world collection:  Step 1: Stop the world  Step 2: Clear all mark bits  Step 3: Perform the tracing operation TR.  Step 4: Restart the world  The operation TR : TR : Mark all objects in I and trace from them.  At the end of this 4-step operation, condition C holds, and all unmarked objects can be collected.

Parallel Collection  Formally, mostly parallel collection requires:  Step 1: Clear all mark bits  Step 2: Clear all virtual dirty bits  Step 3: Perform the tracing op TR.  Step 4: Stop the world  Step 5: Perform a finishing operation, F  Step 6: Restart the world.  The Finishing Operation F : Trace from all marked objects on dirty pages.

Notes on that Collection  TR is performed totally in parallel with the mutator, which is dirtying pages that will need to be traced.  The closure condition C does not hold after step 4 (stop-the-world), which is what requires the finishing step F.  We will define a weaker closure C’ : C’ : Every object in I is marked and every object pointed to by a marked object on a clean page is also marked.  Applying F to any state satisfying C’ will produce C.

Considerations  Thus we have a correct, mostly-parallel collection.  But, if we have a busy mutator, we might have lots of dirty pages, which in turn implies long pauses during the world stoppage.  To shorten this delay, we can clean the pages in parallel.  Let P be a set of pages. Then the process M is: M : 1.) Atomically retrieve and clear the virtual dirty bits from P. 2.) Trace from the marked objects on the dirty pages of P.

Generational Partial Collection  All of that formally describes a general partial collection.  Now let’s consider a generational collector that uses the mark bits for object age.  Consider a partial collector where I is chosen to be the set of currently marked objects.  Therefore, C’ holds.  We could be done by simply performing F, but to reduce the delay, we perform M to the entire heap just before the world stoppage.

Formal Parallel Generational Collection  1. Perform M on the heap.  2. Stop the world.  3. Perform F.  4. Restart the world.  Because an object that has been marked will never be collected by the generational collector, we occasionally need to run a full collection.

An Alternate Version of M  M’ could be: M’ : 1.) Atomically retrieve and clear the dirty bits from the pages P, and 2.) for all unmarked objects pointed to by marked objects on dirty pages of P, mark them and dirty the pages on which they reside..  Iteratively performing M’ can substitute for M, though M is generally preferable.

Implementation Choices  When and how to use M and M’.  No M’.  For allocation-intensive mutators, run M more than once (twice seems to be the sweet spot).  What is a “full collection” going to be, and when to run it?  Initially triggered on heap exhaustion.  However, the allocating thread would be stalled, even with the parallel collector.  Settled on a daemon thread that kicks off the collector if the amount of used memory exceeds some threshold above what was being used at the end of the last collection.  Then we run up to two iterations of M, then a concurrent execution of TR.  If we run out of memory, we try to expand the heap.

Brief Results  This collector was used at Xerox PARC for quite a while, heavily optimized.  They didn’t modify the SunOS running on their machines, but just write-protected the heap.  Mainly interested in measuring interactive response.  Subjectively better. (But they are aware this is pretty fuzzy.)  Ran 5 iterations of a “Boyer benchmark” and an allocator loop at various memory configurations, trying to even the playing field for full, generational and parallel generational collectors.

Results

Mostly Parallel Copying Collectors  We can do all the same things and make a copying collector, if we want.  It just requires space to maintain explicit forwarding links.  A forward pointer is associated with each object, used only by the GC.  Reachable objects are copied from from-space to to-space, writing the new address into the forward pointer in from-space.  The mutator only sees the from-space pointers.

More on Copy Collectors  Concurrent collection forces the following to be true:  If an object residing on a clean page has been copied, then everything it points to has also been copied.  If an object resides on a clean page, its current contents are up-to-date.  With the world stopped, we can execute the finishing operation shown on the next slide such that all reachable objects are found with correct contents in to-space.

 F c : For every object a whose from-space copy resides on a dirty page:  1. Copy everything it points to that hasn’t already been copied.  2. Update pointers to point to to-space.  3. Recopy a to reflect both pointer and non-pointer fields that occurred since the collection started.  Could create a concurrent version of F c, but the authors found a copy collector to be impractical for their environment and didn’t both implementing one.  Just like with the mark-sweep collector, the world-stoppage time is proportional to the number of dirtied pages. Copying Finishing Op

Questions?  I really liked the tone of this paper. It had less of that stuffy, self-important academic tone.