A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Increasing Memory Usage in Real-Time GC Tobias Ritzau and Peter Fritzson Department of Computer and Information Science Linköpings universitet
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
On the limits of partial compaction Anna Bendersky & Erez Petrank Technion.
0 Parallel and Concurrent Real-time Garbage Collection Part I: Overview and Memory Allocation Subsystem David F. Bacon T.J. Watson Research Center.
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Taking Off The Gloves With Reference Counting Immix
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.
Storage Management - Chap 10 MANAGING A STORAGE HIERARCHY on-chip --> main memory --> 750ps - 8ns ns. 128kb - 16mb 2gb -1 tb. RATIO 1 10 hard disk.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Chapter 4 Memory Management Virtual Memory.
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
11/26/2015IT 3271 Memory Management (Ch 14) n Dynamic memory allocation Language systems provide an important hidden player: Runtime memory manager – Activation.
CSE 425: Control Abstraction I Functions vs. Procedures It is useful to differentiate functions vs. procedures –Procedures have side effects but usually.
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Immix: A Mark-Region Garbage Collector Curtis Dunham CS 395T Presentation Feb 2, 2011 Thanks to Steve Blackburn and Jennifer Sartor for their 2008 and.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Real-time collection for multithreaded Java Microcontroller Garbage Collection. Garbage Collection. Application of Java in embedded real-time systems.
CS 241 Discussion Section (12/1/2011). Tradeoffs When do you: – Expand Increase total memory usage – Split Make smaller chunks (avoid internal fragmentation)
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
Real-time Garbage Collection By Tim St. John Low Overhead and Consistent Utilization. Low Overhead and Consistent Utilization. Multithreaded Java Microcontroller.
Lecture 10 Page 1 CS 111 Online Memory Management CS 111 On-Line MS Program Operating Systems Peter Reiher.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
Immix: A Mark-Region Garbage Collector Jennifer Sartor CS395T Presentation Mar 2, 2009 Thanks to Steve for his Immix presentation from
Dynamic Compilation Vijay Janapa Reddi
Dynamic Memory Allocation
Concepts of programming languages
Main Memory Management
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Main Memory Background Swapping Contiguous Allocation Paging
Strategies for automatic memory management
Memory Management Kathryn McKinley.
Chapter 12 Memory Management
Page Main Memory.
Presentation transcript:

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research

n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

n Real-time Embedded Systems n Memory usage important n Uniprocessor Problem Domain

3 Styles of Uniprocessor Garbage Collection: Stop-the-World vs. Incremental vs. Real-Time STW Inc RT time

Pause Times (Average and Maximum) STW Inc RT 1.5s 1.7s 0.5s 0.7s0.3s 0.5s0.9s 0.3s s 1.6s 0.5s 0.18s

Coarse-Grained Utilization vs. Time STW Inc RT 2.0 s window

Fine-Grained Utilization vs. Time STW Inc RT 0.4 s window

Minimum Mutator Utilization (MMU) STW Inc RT

Space Usage over Time max live trigger 2 X max live

Problems with Existing RT Collectors max live 2 X max live 3 X max live 4 X max live Non-moving Collector max live 2 X max live 3 X max live 4 X max live Replicating Collector Not fully incremental, Tight coupling, Work-based scheduling

Our Collector n GoalsResults n Real-Time ~10 ms n Low Space Overhead ~2X n Good Utilization during GC ~ 40% n Solution n Incremental Mark-Sweep Collector n Write barrier – snapshot-at-the-beginning [Yuasa] n Segregated free list heap architecture n Read Barrier – to support defragmentation [Brooks] n Incremental defragmentation n Segmented arrays – to bound fragmentation

n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

Fragmentation and Compaction n Intuitively: available but unusable memory è avoidance and coalescing - no guarantees è compaction use d neede d fre e

Heap Architecture n Segregated Free Lists – heap divided into pages – each page has equally-sizes blocks (1 object per block) – Large arrays are segmented usedfree sz 24 sz 32 external internal page-internal

Controlling Internal and Page-Internal Fragmentation n Choose page size (page) and block sizes (s k ) n If s k = s k-1 (1 + q ), internal fragmentation [ q n page-internal fragmentation [ page / s max n E.g. If page = 16K, q = 1/8, s max = 2K, maximum non-external fragmentation to 12.5%.

Fragmentation - small heap ( q = 1/8 vs. q = 1/2 ) q =1/8 q =1/2

Incremental Compaction n Compact only a part of the heap è Requires knowing what to compact ahead of time n Key Problems è Popular objects è Determining references to moved objects use d

Incremental Compaction: Redirection n Access all objects via per-object redirection pointers n Redirection is initially self-referential n Move an object by updating ONE redirection pointer original replica

Consistency via Read Barrier [Brooks] n Correctness requires always using the replica n E.g. field selection must be modified x[offset] x x[redirect][offset ] x normal access read barrier access x

Some Important Details n Our read barrier is decoupled from collection n Complication: In Java, any reference might be null n actual read barrier for GetField(x,offset) must be augmented tmp = x[offset]; return (tmp == null) ? null : tmp[redirect] è CSE, code motion (LICM and sinking), null-check combining n Barrier Variants - when to redirect è lazy - easier for collector è eager - better for optimization

Barrier Overhead to Mutator n Conventional wisdom says read barriers are too expensive è Studies found overhead of 20-40% (Zorn, Nielsen) è Our barrier has 4-6% overhead with optimizations

Heap (one size only)Stack Program Start

HeapStack free allocated Program is allocating

HeapStack free unmarked GC starts

HeapStack free unmarked marked or allocated Program allocating and GC marking

HeapStack free unmarked marked or allocated Sweeping away blocks

HeapStack free allocated evacuated GC moving objects and installing redirection

HeapStack free unmarked evacuated marked or allocated 2 nd GC starts tracing and redirection fixup

HeapStack free allocated 2 nd GC complete

n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

Scheduling the Collector n Scheduling Issues n bad CPU utilization and space usage n loose program and collector coupling n Time-Based n Trigger the collector to run for C T seconds whenever the program runs for Q T seconds n Work-Based n Trigger the collector to collect C W work whenever the program allocate Q W bytes

Time-Based Scheduling n Trigger the collector to run for C T seconds whenever the program runs for Q T seconds Space (Mb) Time (s) MMU (CPU Utilization) Window Size (s)

Work-Based Scheduling MMU (CPU Utilization) n Trigger the collector to collect C W bytes whenever the program allocates Q W bytes Window Size (s) Space (Mb) Time (s)

n What is Real-time Garbage Collection? n Pause Time, CPU utilization (MMU), and Space Usage n Heap Architecture n Types of Fragmentation n Incremental Compaction n Read Barriers n Barrier Performance n Scheduling: Time-Based vs. Work-Based n Empirical Results n Pause Time Distribution n Minimum Mutator Utilization (MMU) n Pause Times n Summary and Conclusion Roadmap

Pause Time Distribution for javac (Time-Based vs. Work-Based) 12 ms

Utilization vs. Time for javac (Time-Based vs. Work-Based) Utilization (%) Time (s) Utilization (%) 0.45

Minimum Mutator Utilization for javac (Time-Based vs. Work-Based)

Space Usage for javac (Time-Based vs. Work-Based)

n 3 inter-related factors: Space Bound (tradeoff) Utilization (tradeoff) Allocation Rate (lower is better) n Other factors Collection rate (higher is better) Pointer density (lower is better) Intrinsic Tradeoff

Summary: Mostly Non-moving RT GC n Read Barriers n Permits incremental defragmentation n Overhead is 4-6% with compiler optimizations n Low Space Overhead n Space usage is only about 2 X max live data n Fragmentation still bounded n Consistent Utilization n Always at least 45% at 12 ms resolution

Conclusions n Real-time GC is real n There are tradeoffs just like in traditional GC n Scheduling should be primarily time-based n Fallback to work-based due to user’s incorrect parameter estimations n Incremental defragmentation is possible n Compiler support is important!

Future Work n Lowering the real-time resolution n Sub-millisecond worst-case pause n Main issue: breaking up stack scan n Segmented array optimizations n Reduce segmented array cost below ~2% n Opportunistic contiguous layout n Type-based specialization with invalidation n Strip-mining