1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,
PZ10B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ10B - Garbage collection Programming Language Design.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by Phil Howard.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
© Richard Jones, University of Kent SCIEnce Paris Workshop Richard Jones Computing Laboratory University of Kent,
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Copyright, 1996 © Dale Carnegie & Associates, Inc. Mark-Sweep A tracing garbage collection technique Hagen Böhm November 21st, 2001
CPSC 388 – Compiler Design and Construction
CS 536 Spring Automatic Memory Management Lecture 24.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
G Robert Grimm New York University Cool Pet Tricks with… …Virtual Memory.
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
Virtual Memory Primitives for User Programs Andrew W. Appel and Kai Li Presented by: Khanh Nguyen.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July A Rapid Introduction to Garbage Collection Richard Jones Computing Laboratory.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
Incremental Garbage Collection
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Taking Off The Gloves With Reference Counting Immix
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Comparison of Compacting Algorithms for Garbage Collection Mrinal Deo CS395T – Spring
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
Hans-J. Boehm Alan J. Demers Scott Shenker Presented by Kit Cischke.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Introduction to Garbage Collection. Garbage Collection It automatically reclaims memory occupied by objects that are no longer in use It frees the programmer.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
David F. Bacon, Perry Cheng, and V.T. Rajan
Mark-Sweep and Mark-Compact GC
Strategies for automatic memory management
Chapter 12 Memory Management
Reference Counting.
Presentation transcript:

1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology

2 The Compressor The first compactor with one heap pass. Fully compacts all the objects in the heap. Preserves the order of the objects. Low space overhead. A parallel version and a concurrent version.

3 Garbage collection Automatic memory management. User allocates objects Memory manager reclaims objects which “are not needed anymore”. In practice: unreachable from roots.

4 Modern Platforms and Requirements High performance and low pauses. SMP and Multicore platforms: Use parallel collectors for highest efficiency Use concurrent collectors for short pauses. Parallel (STW) High throughput Concurrent & Parallel short Pauses t t

5 Main Streams in GC Mark and Sweep Trace objects. Go over the heap and reclaim the unmarked objects. Reference Counting Keep the number of pointers to each object. When an object counter reaches zero, reclaim the object. Copying Divide the heap into two spaces. Copy all the objects from one space to the other.

6 Compaction - Motivation M&S and RC face the problem of fragmentation. Fragmentation – unused space between live objects due to repeated allocation and reclaiming. Allocation efficiency decreases. May fail to allocate large objects. Cache behavior may be harmed. Compaction – move all the live objects to one place in the heap. Best practice: keep order of objects for best locality.

7 Traditional Compaction Go over the heap and write the new location of every object in its header (install a forwarding pointer). Update all the pointers in the roots and the heap. Move the objects Stack Three Heap Passes

8 Agenda Introduction: garbage collection, servers, compaction. The Compressor: Basic technique Obtain compaction with a single heap pass. The parallel version. The concurrent version. Measurements Related Work. Conclusion

9 Compressor - Overview Compute new locations of objects Fix root pointers Move objects + fix their pointers Stack One Heap Pass plus one pass over the (small) mark-bits table.

10 Compute new locations Computing new locations and saving this info succinctly: Heap partitioned to blocks (typically, 512 bytes). Start by computing and saving for each block the total size of objects preceding that block (the offset vector) Offset vector The Heap Addresses

The Heap Addresses Offset vector Markbit vector Computing A New Address Assume a markbit vector which reflect the heap: First and the last bits of each object are set. A new location of an object is computed from the markbit and the offset vectors: for object 5, at the 4 th block the new location is: = 1175.

12 Computing Offset Vector Computed from the markbit vector. Does not require a heap pass The Heap Addresses Offset vector Markbit vector

13 Properties Single heap pass. Plus one pass over the markbit vector. Small space overhead. Does not need a forwarding pointer. Single threaded. Stop-the-world. Next: A parallel stop-the-world (STW) version. A concurrent version.

14 Parallelization – First Try Had we divided the heap to two spaces… The application uses only one space. The Compressor compacts the objects from one space (from-space) to the other (to-Space). Advantage: objects can be moved independently. Problem: space overhead.

15 Eliminating Space Overhead Initially, to-space is not mapped to physical pages. It is a virtual address space. For every (virtual) page in to-space: (a parallel loop) Map the virtual page to physical memory. Move the corresponding from-space objects and fix the pointers. Unmap the relevant pages in from-space roots

16 Properties All virtues of basic Compressor: Single heap pass, small space overhead. Easy parallelization: each to-space page can be handled independently. Stop-The-World.

17 What about Concurrency? Problem: two copies appear when moving objects during application run. Sync. problems between compaction and application. Solution (Baker style): Application can only access moved objects (in to-space).

18 Concurrent Version Stop application Fix roots to new locations in to-space. Read-protect to-space and let application resume. When application touches a to-space page a trap is sprung. Trap handler moves relevant objects into the page and unprotect the page roots

19 Implementation & Measurements Implementation on the Jikes RVM. Compressor added to a simple modification of the Jikes mark-sweep collector (main modification: allocation via local-caches). Compressor invoked once every 10 collections. Benchmarks: SPECjbb, Dacapo, SPECjvm98. In the talk we concentrate on SPECjbb Compared collectors: no compaction algorithms on the Jikes RVM. Some comparison to mark-sweep (MS) and an Appel Generational collector (GenMS).

20 SPECjbb Throughput CON = Concurrent Compressor, STW = Parallel Compressor

21 SPECjbb pause time (ms) generational Mark and Sweep (full collections) Mark and Sweep Parallel Compaction (Stop-The- World) jbb 2-WH jbb 4-WH jbb 6-WH jbb 8-WH

22 SPECjbb - Allocations per time

23 Dacapo - Allocations per time

24 Previous Work on Compaction Early works: Two-finger, Lisp2, and the threaded algorithm [Jonkers and Morris] are single threaded and therefore create a large pause time. [Flood et al. 2001] first parallel compaction algorithms. But has 3 heap passes and creates several dense areas. [Abuaiadh et al. 2004] Parallel with two heap passes, not concurrent. [Ossia et al. 2004] execute the pointer fix-up part concurrently.

25 Related Work Numerous concurrent and parallel garbage collectors. Copying collectors [Cheney 70] compact objects during the collection but require a large space overhead and do not retain objects order. Savings in space overhead for copying collectors [Sachindran and Moss 2004] [Bacon et al. 2003, Click et al. 2005] propose an incremental compaction. But it uses a read barrier, and does not keep the order of objects.

26 Complexity Comparisons RemarksMark-bits Passes Heap Passes Two traces of the heap are interleaved with the two passes. 02Jonkers- Morris 03SUN 2001 Mark-bits table is small (at most 1/32 of the heap size) 12IBM The Compressor

27 Conclusion The Compressor: The first compactor that passes over the heap only once. Plus one pass over the mark-bits vector. Fully compacts all the objects in the heap. Preserves the order of the objects. Low space overhead. Uses memory services to obtain parallelism. Uses traps to obtain concurrency.

28 Questions