David F. Bacon, Perry Cheng, and V.T. Rajan

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Increasing Memory Usage in Real-Time GC Tobias Ritzau and Peter Fritzson Department of Computer and Information Science Linköpings universitet
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
On the limits of partial compaction Anna Bendersky & Erez Petrank Technion.
0 Parallel and Concurrent Real-time Garbage Collection Part I: Overview and Memory Allocation Subsystem David F. Bacon T.J. Watson Research Center.
Memory Allocation. Three kinds of memory Fixed memory Stack memory Heap memory.
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Taking Off The Gloves With Reference Counting Immix
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
XOberon Operating System CLAUDIA MARIN CS 550 Fall 2005.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Real-time collection for multithreaded Java Microcontroller Garbage Collection. Garbage Collection. Application of Java in embedded real-time systems.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
Real-time Garbage Collection By Tim St. John Low Overhead and Consistent Utilization. Low Overhead and Consistent Utilization. Multithreaded Java Microcontroller.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Immix: A Mark-Region Garbage Collector Jennifer Sartor CS395T Presentation Mar 2, 2009 Thanks to Steve for his Immix presentation from
Jonathan Walpole Computer Science Portland State University
Dynamic Compilation Vijay Janapa Reddi
Memory Management 6/20/ :27 PM
Dynamic Memory Allocation
Upper Bound for Defragmenting Buddy Heaps
CS 153: Concepts of Compiler Design November 28 Class Meeting
Concepts of programming languages
Main Memory Management
Chapter 8: Main Memory.
Ulterior Reference Counting Fast GC Without The Wait
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Strategies for automatic memory management
Adaptive Code Unloading for Resource-Constrained JVMs
List Processing in Real Time on a Serial Computer
Chapter 12 Memory Management
Garbage Collection Advantage: Improving Program Locality
Reference Counting.
CMPE 152: Compiler Design May 2 Class Meeting
Reference Counting vs. Tracing
Page Main Memory.
Presentation transcript:

David F. Bacon, Perry Cheng, and V.T. Rajan A Real-Time Garbage Collector with Low Overhead and Consistent Utilization David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center Presented by Jason VanFickell thanks to Srilakshmi Swati Pendyala for 2009 slides

Need for a real-time garbage collector with low memory usage. Motivation Real-time systems growing in importance Desirability of higher level programming languages Constraints for Real-Time Systems Hard constraints for continuous performance (Low Pause Times) Memory Constraints (less memory in embedded systems) Maximum Pause Time < Required Response CPU Utilization sufficient to accomplish task Measured with Minimum Mutator Utilization Memory Requirement < Resource Limit Important Constraint in Embedded Systems Need for a real-time garbage collector with low memory usage.

Problems with Previous Works Fragmentation Early works (Baker’s Treadmill) handles a single object size Fragmentation not a major problem for a family of C and C++ benchmarks (Johnstone’ Paper) Unsustainable for long-running programs Use of single (large) block size Increase in memory requirements and internal fragmentation High Space Overhead Copying algorithms to avoid fragmentation increase space overhead Uneven Mutator Utilization Fraction of processor devoted to mutator execution Copying algorithms suffer from uneven mutator utilization Long low-utilization periods Inability to handle large data structures

Components and Concepts in Metronome Segregated free list allocator Geometric size progression limits internal fragmentation Mostly non-copying Objects are usually not moved. Defragmentation Moves objects to a new page when page is fragmented due to GC Read barrier: to-space invariant [Brooks] New techniques with only 4% overhead Incremental mark-sweep collector Mark phase fixes stale pointers Arraylets: bound fragmentation, large object ops Time-based scheduling New Old

Segregated Free List Allocator Heap divided into fixed-size pages Each page divided into fixed-size blocks Objects allocated in smallest block that fits 12 16 24

Limiting Internal Fragmentation Choose page size P and block sizes sk such that sk = sk-1(1+ρ) How do we choose small s0 & ρ ? s0 ~ minimum block size ρ ~ sufficiently small to avoid internal fragmentation Too small a ρ leads to too many pages and hence a wastage of space, but it should be okay for long running processes Too large a ρ leads to internal fragmentation Memory for a page should be allocated only when there is at least one object in that page.

Defragmentation When do we move objects? At the end of sweep phase, when there are no sufficient free pages for the mutator to execute, that is, when there is fragmentation Usually, program exhibits locality of size Dead objects are re-used quickly Defragment either when Dead objects are not re-used for a GC cycle Free pages fall below limit for performing a GC In practice: we move 2-3% of data traced Major improvement over copying collector

Read Barrier: To-space Invariant Problem: Collector moves objects (defragmentation) Mutator is finely interleaved Solution: read barrier ensures consistency Each object contains a forwarding pointer [Brooks] Read barrier unconditionally forwards all pointers Mutator never sees old versions of objects Will the mutator utilization have any effects because of the read barrier ? X X Y A Y A A′ Z Z From-space To-space BEFORE AFTER

Read Barrier Optimization Previous studies: 20-40% overhead [Zorn, Nielsen] Several optimizations applied to the read barrier and reduced the cost over-head to <10% using Eager Read Barriers “Eager” read barrier preferred over “Lazy” read barrier.

Incremental Mark-Sweep Mark/sweep finely interleaved with mutator Write barrier: snapshot-at-the-beginning [Yuasa] Ensures no lost objects Treats objects in write buffer as roots Read barrier ensures consistency Marker always traces correct object Simpler interleaving

Pointer Fix-up During Mark When can a moved object be freed? When there are no more pointers to it Mark phase updates pointers Redirects forwarded pointers as it marks them Object moved in collection n can be freed: At the end of mark phase of collection n+1 X Y A A′ Z From-space To-space

Arraylets Large arrays create problems Fragment memory space Can not be moved in a short, bounded time Solution: break large arrays into arraylets Access via indirection; move one arraylet at a time A1 A2 A3

Program Start Stack Heap (one size only)

Program is allocating Stack Heap free allocated

GC starts Stack Heap free unmarked

Program allocating and GC marking Stack Heap free unmarked marked or allocated

Sweeping away blocks Stack Heap free unmarked marked or allocated

GC moving objects and installing redirection Stack Heap free evacuated allocated

2nd GC starts tracing and redirection fixup Stack Heap free evacuated unmarked marked or allocated

2nd GC complete Stack Heap free allocated

Scheduling the Collector Scheduling Issues Poor CPU utilization and space usage Loose program and collector coupling Competing options: Time-Based Trigger the collector to run for CT seconds whenever the mutator runs for QT seconds Work-Based Trigger the collector to collect CW work whenever the mutator allocate QW bytes

Scheduling Memory allocation does not need to be monitored. Time – Based Work – Based Very predictable mutator utilization Memory allocation does not need to be monitored. Uneven mutator utilization due to bursty allocation Memory allocation rates need to be monitored to make sure real-time performance is obtained

Experimental Results 500 MHz PowerPC RS64 III 4 GB RAM IBM RS/6000 Enterprise Server F80 AIX 5.1 500 MHz PowerPC RS64 III 4 GB RAM 4 MB of L2 cache Jikes Research Virtual Machine (RVM) 2.1.1 Adaptive compilation disabled

Pause Time Distribution for javac (Time-Based vs. Work-Based)

Utilization vs. Time for javac (Time-Based vs. Work-Based) 0.45

Minimum Mutator Utilization for javac (Time-Based vs. Work-Based)

Space Usage for javac (Time-Based vs. Work-Based)

Conclusions The Metronome provides true real-time GC Critical features First collector to do so without major sacrifice Short pauses (12.4 ms) Copying limited to 4% overhead High MMU during collection (50%) Low memory consumption (2.5 x max live) Critical features Time-based scheduling Hybrid, mostly non-copying approach Integration with the compiler

Discussion What are the downsides of incremental real-time collection? What is preserved that Baker's algorithm does not? Was the architecture used for the experiments appropriate? Were the performance characteristics adequately explored?