David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Part IV: Memory Management
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Garbage Collection What is garbage and how can we deal with it?
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
5. Memory Management From: Chapter 5, Modern Compiler Design, by Dick Grunt et al.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
CPSC 388 – Compiler Design and Construction
Increasing Memory Usage in Real-Time GC Tobias Ritzau and Peter Fritzson Department of Computer and Information Science Linköpings universitet
Chapter 11: File System Implementation
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
On the limits of partial compaction Anna Bendersky & Erez Petrank Technion.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
0 Parallel and Concurrent Real-time Garbage Collection Part I: Overview and Memory Allocation Subsystem David F. Bacon T.J. Watson Research Center.
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Jangwoo Shin Garbage Collection for Real-Time Java.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
 2004 Deitel & Associates, Inc. All rights reserved. Chapter 9 – Real Memory Organization and Management Outline 9.1 Introduction 9.2Memory Organization.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Taking Off The Gloves With Reference Counting Immix
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
File System Implementation Chapter 12. File system Organization Application programs Application programs Logical file system Logical file system manages.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Chapter 8 – Main Memory (Pgs ). Overview  Everything to do with memory is complicated by the fact that more than 1 program can be in memory.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
1 Real-Time Replication Garbage Collection Scott Nettles and James O’Toole PLDI 93 Presented by: Roi Amir.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Chapter 4 Memory Management Virtual Memory.
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
1 Advanced Memory Management Techniques  static vs. dynamic kernel memory allocation  resource map allocation  power-of-two free list allocation  buddy.
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
11/26/2015IT 3271 Memory Management (Ch 14) n Dynamic memory allocation Language systems provide an important hidden player: Runtime memory manager – Activation.
Computer Systems Week 14: Memory Management Amanda Oddie.
CS 241 Discussion Section (11/17/2011). Outline Review of MP7 MP8 Overview Simple Code Examples (Bad before the Good) Theory behind MP8.
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Real-time collection for multithreaded Java Microcontroller Garbage Collection. Garbage Collection. Application of Java in embedded real-time systems.
CS 241 Discussion Section (12/1/2011). Tradeoffs When do you: – Expand Increase total memory usage – Split Make smaller chunks (avoid internal fragmentation)
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Real-time Garbage Collection By Tim St. John Low Overhead and Consistent Utilization. Low Overhead and Consistent Utilization. Multithreaded Java Microcontroller.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Memory Management.
Dynamic Compilation Vijay Janapa Reddi
Chapter 9 – Real Memory Organization and Management
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Strategies for automatic memory management
Chapter 12 Memory Management
Reference Counting.
Presentation transcript:

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector

The Problem Real-time systems growing in importance – Many CPUs in a BMW: “80% of innovation in SW” Programmers left behind – Still using assembler and C – Lack productivity advantages of Java Result – Complexity – Low reliability or very high validation cost

Problem Domain Hard real-time garbage collection Uniprocessor – Multiprocessors very rare in real-time systems – Complication: collector must be finely interleaved – Simplification: memory model easier to program No truly concurrent operations Sequentially consistent

Metronome Project Goals Make GC feasible for hard real-time systems Provide simple application interface Develop technology that is efficient: – Throughput, Space comparable to stop-the- world BREAK THE MILLISECOND BARRIER – While providing even CPU utilization

Outline What is “Real-Time” Garbage Collection? Problems with Previous Work Overview of the Metronome Scheduling Empirical Results Conclusions

What is “ Real-Time ” Garbage Collection?

3 Uniprocessor GC Types STW Inc RT time GC #1GC #2

Utilization vs Time: 2s Window STW Inc RT

Utilization vs Time:.4 s Window STW Inc RT

Minimum Mutator Utilization STW Inc RT

What is Real-time? It Is … Maximum pause time < required response CPU Utilization sufficient to accomplish task – Measured with MMU Memory requirement < resource limit

Problems with Previous Work

No Compaction Hypothesis: fragmentation not a problem – Can use avoidance and coalescing [ Johnstone] – Non-moving incremental collection is simpler Problem: long-running applications – Reboot your car every 500 miles? 1 X max live 2 X max live 3 X max live 4 X max live

Copying Collection Idea: copy into to-space concurrently Compaction is part of basic operation Problem: space usage – 2 semi-spaces plus space to mutate during GC – Requires 4-8 X max live data in practice 1 X max live 2 X max live 3 X max live 4 X max live

Work-Based Scheduling The Baker fallacy: – “A real-time list processing system is one in which the time required by the elementary list processing operations…is bounded by a small constant” [Baker’78] Implicitly assumes GC work done in mutator – What does “small constant” mean? – Typically, constant is not so small And there is variability (fault rate analogy)

Overview of the Metronome

Is it real-time? Yes Maximum pause time < 4 ms MMU > 50% ± 2% Memory requirement < 2 X max live

Components of the Metronome Segregated free list allocator – Geometric size progression limits internal fragmentation Write barrier: snapshot-at-the-beginning [Yuasa] Read barrier: to-space invariant [Brooks] – New techniques with only 4% overhead Incremental mark-sweep collector – Mark phase fixes stale pointers Selective incremental defragmentation – Moves < 2% of traced objects Arraylets: bound fragmentation, large object ops Time-based scheduling OldNew

Segregated Free List Allocator Heap divided into fixed-size pages Each page divided into fixed-size blocks Objects allocated in smallest block that fits

Fragmentation on a Page Internal: wasted space at end of object Page-internal: wasted space at end of page External: blocks needed for other size external internal page-internal

Limiting Internal Fragmentation Choose page size P and block sizes s k such that – s k = s k-1 (1+ ρ ) – s max = P ρ Then – Internal and page-internal fragmentation < ρ Example: – P =16KB, ρ =1/8, s max = 2KB – Internal and page-internal fragmentation < 1/8

Fragmentation: ρ=1/8 vs ρ=1/2

Write Barrier: Snapshot-at-start Problem: mutator changes object graph Solution: write barrier prevents lost objects Logically, collector takes atomic snapshot – Objects live at snapshot will not be collected – Write barrier saves overwritten pointers [Yuasa] – Write buffer must be drained periodically WB A C BB

Read Barrier: To-space Invariant Problem: Collector moves objects (defragmentation) – and mutator is finely interleaved Solution: read barrier ensures consistency – Each object contains a forwarding pointer [Brooks] – Read barrier unconditionally forwards all pointers – Mutator never sees old versions of objects From-spaceTo-space A X Y Z A X Y Z A′ BEFOREAFTER

Read Barrier Optimizations Barrier variants: when to redirect – Lazy: easier for collector (no fixup) – Eager: better for performance (loop over a[i]) Standard optimizations: CSE, code motion Problem: pointers can be null – Augment read barrier for GetField(x,offset): tmp = x[offset]; return tmp == null ? null : tmp[redirect] – Optimize by null-check combining, sinking

Read Barrier Results Conventional wisdom: read barriers too slow – Previous studies: 20-40% overhead [Zorn,Nielsen]

Incremental Mark-Sweep Mark/sweep finely interleaved with mutator Write barrier ensures no lost objects – Must treat objects in write buffer as roots Read barrier ensures consistency – Marker always traces correct object With barriers, interleaving is simple

Pointer Fixup During Mark When can a moved object be freed? – When there are no more pointers to it Mark phase updates pointers – Redirects forwarded pointers as it marks them Object moved in collection n can be freed: – At the end of mark phase of collection n+1 From-spaceTo-space A X Y Z A′

Selective Defragmentation When do we move objects? – When there is fragmentation Usually, program exhibits locality of size – Dead objects are re-used quickly Defragment either when – Dead objects are not re-used for a GC cycle – Free pages fall below limit for performing a GC In practice: we move 2-3% of data traced – Major improvement over copying collector

Arraylets Large arrays create problems – Fragment memory space – Can not be moved in a short, bounded time Solution: break large arrays into arraylets – Access via indirection; move one arraylet at a time Optimizations – Type-dependent code optimized for contiguous case – Opportunistic contiguous allocation A1A2A3 A

Scheduling

Work-based Scheduling Trigger the collector to collect C W bytes – Whenever the mutator allocates Q W bytes MMU (CPU Utilization) Window Size (s) - log Space (MB) Time (s)

Time-based Scheduling Trigger collector to run for C T seconds – Whenever mutator runs for Q T seconds Space (MB) Time (s) MMU (CPU Utilization) Window Size (s) - log

Parameterization Mutator a * (ΔGC) m Collector R Tuner Δt s u Allocation Rate Maximum Live Memory Collection Rate Real Time Interval Maximum Used Memory CPU Utilization

Empirical Results

Pause time distribution: javac 12 ms Time-based SchedulingWork-based Scheduling

Utilization vs Time: javac Time (s) Utilization (%) Time-based SchedulingWork-based Scheduling 0.45

MMU: javac

Space Usage: javac

Conclusions

The Metronome provides true real-time GC – First collector to do so without major sacrifice Short pauses (4 ms) High MMU during collection (50%) Low memory consumption (2x max live) Critical features – Time-based scheduling – Hybrid, mostly non-copying approach – Integration w/compiler

Future Work Two main goals: – Reduce pause time, memory requirement – Increase predictability Pause time: – Expect sub-millisecond using current techniques – For 10’s of microseconds, need interrupt- based Predictability – Studying parameterization of collector – Good research area