Ulterior Reference Counting Fast GC Without The Wait

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

More on File Management
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Asynchronous Assertions Eddie Aftandilian and Sam Guyer Tufts University Martin Vechev ETH Zurich and IBM Research Eran Yahav Technion.
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.
Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
Compiler Optimizations for Nondeferred Reference-Counting Garbage Collection Pramod G. Joisha Microsoft Research, Redmond.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Taking Off The Gloves With Reference Counting Immix
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Runtime Environments. Support of Execution  Activation Tree  Control Stack  Scope  Binding of Names –Data object (values in storage) –Environment.
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
An Efficient, Incremental, Automatic Garbage Collector P. Deutsch and D. Bobrow Ivan JibajaCS 395T.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
Jonathan Walpole Computer Science Portland State University
Processes and threads.
Dynamic Compilation Vijay Janapa Reddi
Smalltalk Implementation: Memory Management and Garbage Collection
Java 9: The Quest for Very Large Heaps
Cork: Dynamic Memory Leak Detection with Garbage Collection
Rifat Shahriyar Stephen M. Blackburn Australian National University
Concepts of programming languages
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
David F. Bacon, Perry Cheng, and V.T. Rajan
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Strategies for automatic memory management
Memory Management Kathryn McKinley.
Beltway: Getting Around Garbage Collection Gridlock
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Garbage Collection Advantage: Improving Program Locality
Reference Counting.
Reference Counting vs. Tracing
Presentation transcript:

Ulterior Reference Counting Fast GC Without The Wait Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn

Outline Throughput-Responsiveness problem Reference counting & optimizations Ulterior in detail BG-RC in action Experimental evaluation Conclusion

Throughput/Responsiveness Trade-off GC and mutator share CPU Throughput: net GC/mutator ratio Responsivness: length of GC pauses GC mutator CPU Utilization (time) poor responsiveness maximum pause

The Ulterior approach Match mechanisms to object demographics Copying nursery (young space) Highly mutated, high mortality young objects Ignores most mutations GC time proportional to survivors, space efficient RC mature space Low mutation, low mortality old objects GC time proportional to mutations, space efficient Generalize deferred RC to heap objects Defer fields of highly mutated objects & enumerate them quickly Reference count only infrequently mutated fields

Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p)  RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1 b 1 RC space

Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p)  RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1 b c 1 RC space

Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p)  RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1  b c 1 RC space

Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p)  RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1 c 1 RC space RCM(p) for every mutation is very expensive

RC Optimizations Buffering: apply RC(p)--, RC(p)++ later Coalescing: apply RCM(p) only for the initial and final values of p (coalesce intermediate values): {RCM(p), RCM(p1), ... RCM(pn)}  RC(pinitial)--, RC(pfinal)++ Deferral of RCM events

Deferred Reference Counting Goal: Ignore RCM(p) for stacks & registers Deferral of p A mutation of p does not generate an RCM(p) Correctness: For all deferred p: RCR(p) at each GC Retain Event: RCR(p) po temporarily retains o regardless of RC(o) Deutsch/Bobrow use a Zero Count Table Bacon et al. use a temporary increment

Classic Deferral In deferral phase: Ignore RCM(p) for stacks & registers Stacks & Regs a b 1 RC space

Classic Deferral Ignore RCM(p) for stacks & registers Stacks & Regs a b c 1 RC space Breaks RC==0 Invariant

Classic Deferral (Bacon et al.) Divide execution in epochs Store information in buffers Root buffer (RB): Store 1st level objects Increment buffer (IB): Store increments to 1st level objects Decrement buffer (DB): Store decrements to 1st level objects At GC time do: Look at RB and apply temporary increments to all objects there Process IB of this epoch Look at RB of previous epoch and apply decrements to all objects there Process DB of previous epoch During DB processing recycle o if RC(o)=0 Avoid race conditions by Processing IB before DB Processing DB of one epoch behind

Classic Deferral (Bacon et al.) At GC time, RCR(p) for root pointers applies temporary increments. Stacks & Regs a 1 b 1 c 1 RC space a b dec buf root buf

Classic Deferral (Bacon et al.) Stacks & Regs At next GC, apply decrements a 1 b 1 c 1 RC space a b dec buf root buf

Classic Deferral (Bacon et al.) Key: Efficient enumeration of deferred pointers Stacks & Regs At next GC, apply decrements a 1 b 1 c 1 RC space a b dec buf root buf

Classic Deferral (Bacon et al.) Better, but not good enough! Stacks & Regs a 1 b 1 c 1 RC space dec buf root buf

Ulterior Reference Counting Idea: Extend deferral to select heap pointers e.g. All pointers within nursery objects Deferral is not a fixed property of p e.g. A nursery object gets promoted Integrate Event I(p) Changes p from deferred to not deferred

BG-RC Bounded Nursery Generational - RC Heap organization Bounded copying nursery Ignore mutations to nursery pointer fields RC old space Object remembering, coalescing, buffering Collection Process roots Nursery phase promotes live p to old space and I(p) RC phase processes object buffer, dec buffer

View of heap in Ulterior RC Stacks Regs defer remember a 1 b 1 r s defer d 1 e 1 t RC space non-RC space How can we efficiently Enumerate all deferred pointer fields ? Remember old to young pointers ?

Bringing it Together Deferral: Defer nursery & roots Perform I(p) on nursery promotion Piggyback on copying nursery collection Coalescing: Remember mutated RC objects Upon first mutation, dec each referent At GC time, inc each referent Piggyback remset onto this mechanism

BG-RC Write Barrier // unsync check for uniqueness 1 private void writeBarrier(VM_Address srcObj, 2 VM_Address srcSlot, 3 VM_Address tgtObj) 4 throws VM_PragmaInline { 5 if (getLogState(srcObj) != LOGGED) 6 writeBarrierSlow(srcObj); 7 VM_Magic.setMemoryAddress(srcSlot, tgtObj); 8 } 9 } // unsync check for uniqueness 10 private void writeBarrierSlow(VM_Address srcObj) 11 throws VM_PragmaNoInline { 12 if (attemptToLog(srcObj)) { 13 modifiedBuffer.push(srcObj); 14 enumeratePointersToDecBuffer(srcObj); // trade-off for sparsely 15 setLogState(srcObj, LOGGED); // modified objects 16 } 17 }

BG-RC Mutation Phase a b d e obj buf dec buf root buf Stacks Regs 1 a b 1 1 d e RC space non-RC space obj buf dec buf root buf

BG-RC Mutation Phase  a b d e b d e obj buf dec buf root buf Stacks Regs 1 a b  1 1 d e RC space non-RC space b d e obj buf dec buf root buf

BG-RC Mutation Phase a b d e b d e obj buf dec buf root buf Stacks Regs 1 a b 1 1 d e RC space non-RC space b d e obj buf dec buf root buf

BG-RC Mutation Phase a b r d e b d e obj buf dec buf root buf Stacks Regs 1 a b r 1 1 d e RC space non-RC space b d e obj buf dec buf root buf

BG-RC Mutation Phase a b r s d e b d e obj buf dec buf root buf Stacks Regs 1 a b r s 1 1 d e RC space non-RC space b d e obj buf dec buf root buf

BG-RC Mutation Phase a b r s d e t b d e obj buf dec buf root buf Stacks Regs 1 a b r s 1 1 d e t RC space non-RC space b d e obj buf dec buf root buf

BG-RC Mutation Phase a b r s d e t b d e obj buf dec buf root buf Stacks Regs 1 a b r s 1 1 d e t RC space non-RC space b d e obj buf dec buf root buf

BG-RC Nursery Collection: Scan Roots Stacks Regs 1 1 a b r s 1 1 d e t RC space non-RC space b d b e obj buf dec buf root buf

BG-RC Nursery Collection: Scan Roots Stacks Regs 1 1 1 a b s r s 1 1 d e t RC space non-RC space b d b e s obj buf dec buf root buf

BG-RC Nursery Collection: Scan Roots Stacks Regs 1 1 1 a b s r s 1 2 1 d e t t RC space non-RC space b d b e s obj buf dec buf root buf

BG-RC Nursery Collection: Process Object Buffer Stacks Regs 2 1 1 1 a b r s r s 1 3 1 d e t t RC space non-RC space b d b  e s obj buf dec buf root buf

BG-RC Nursery Collection: Reclaim Nursery Stacks Regs 2 1 1 1 a b r s r s Reclaim 1 3 1 d e t t RC space non-RC space d b e s obj buf dec buf root buf

BG-RC RC Collection: Process Decrement Buffer Stacks Regs 2 1 1 1 a b r s 3 1 d e t RC space non-RC space d b  e s obj buf dec buf root buf

BG-RC RC Collection: Recursive Decrement Stacks Regs 1 1 1 1 a b r s free  3 1 d e t RC space non-RC space e b s obj buf dec buf root buf

BG-RC RC Collection: Process Decrement Buffer Stacks Regs 1 1 1 1 a b r s 2 1 e t RC space non-RC space e b  s obj buf dec buf root buf

BG-RC Collection Complete! Stacks Regs 1 1 1 1 a b r s 2 1 e t RC space non-RC space b b  s s  obj buf dec buf root buf

Controlling Pause Times Modest bounded nursery size Meta Data Decrement and modified object buffers Trigger a collection if too big RC time cap Limits time recursively decrementing RC obj & in cycle detection Cycles - pure RC is incomplete Use Bacon/Rajan trial deletion algorithm

Experimental evaluation Jikes RVM with MMTK Compare MS, BG-MS, BG-RC, RC Examine various heap sizes Collection triggers Each 4MB of allocation for BG-RC (1 MB for RC) Time cap of 60 ms Cycle detection at 512 KB

Throughput/Pause time Moderate Heap Size 175 1.53 53 0.98 210 1.00 214 1.23 mean 121 1.14 43 0.96 178 185 1.05 mpeg 1.11 59 1.01 244 238 db 297 1.33 281 264 pjbb 72 0.93 68 0.88 160 .98 cmpress 130 1.75 49 1.04 180 241 1.29 mtrt 133 1.71 1.03 184 203 1.31 raytrace 1.66 44 0.94 1.52 jack 580 1.78 285 268 javac 131 2.36 0.99 181 182 1.91 jess max pause norm time RC BG-RC BG-MS MS

Throughput & Responsiveness

Conclusion Ulterior design based on careful study of object demographics and making collector aware of them Extends deferred RC to heap objects Practically shows that high throughput & low pause times are compatible