1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.

Slides:



Advertisements
Similar presentations
A Block-structured Heap Simplifies Parallel GC Simon Marlow (Microsoft Research) Roshan James (U. Indiana) Tim Harris (Microsoft Research) Simon Peyton.
Advertisements

Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
On-the-Fly Garbage Collection Using Sliding Views Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni, Hezi Azatchi,
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
On-the-Fly Garbage Collection: An Exercise in Cooperation Edsget W. Dijkstra, Leslie Lamport, A.J. Martin and E.F.M. Steffens Communications of the ACM,
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Asynchronous Assertions Eddie Aftandilian and Sam Guyer Tufts University Martin Vechev ETH Zurich and IBM Research Eran Yahav Technion.
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Incremental Garbage Collection
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Taking Off The Gloves With Reference Counting Immix
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
Introduction Overview Static analysis Memory analysis Kernel integrity checking Implementation and evaluation Limitations and future work Conclusions.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
CSC 213 – Large Scale Programming. Today’s Goals  Consider what new does & how Java works  What are traditional means of managing memory?  Why did.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
CSC 213 – Large Scale Programming. Explicit Memory Management  Traditional form of memory management  Used a lot, but fallen out of favor  malloc /
November 27, 2007 Verification of a Concurrent Priority Queue Bart Verzijlenberg.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Dynamic Compilation Vijay Janapa Reddi
Seminar in automatic tools for analyzing programs with dynamic memory
Automatic Memory Management/GC
Cycle Tracing Chapter 4, pages , From: "Garbage Collection and the Case for High-level Low-level Programming," Daniel Frampton, Doctoral Dissertation,
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Beltway: Getting Around Garbage Collection Gridlock
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Garbage Collection Advantage: Improving Program Locality
Reference Counting.
Reference Counting vs. Tracing
Presentation transcript:

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center

2 Write Barriers for Concurrent GC Mutator and Collector interleaved Mutator changes object graph Potential race conditions Write barriers a form of synchronization –Between mutators and the collector.

3 Can Synchronization Be Reduced? Minimum information from a mutator to a collector? –without compromising correctness? Can static analysis help? Can we discover dynamic opportunities? Concurrent Collector ? Mutators Heap Connectivity Graph write read write barrier

4 Outline The Synchronization Problem Elimination conditions Elimination opportunities (limit study) Elimination correlation Exploiting Order Conclusions and Future Work

5 A C B P1 The Synchronization Problem Time Collector Working - Marks B as live

6 A C B P1 The Synchronization Problem A C B P1 P2 Time Collector Working Thread Working - Installs P2- Marks B as live

7 A C B P1 The Synchronization Problem A C B P1 P2 A C B Time Collector Working Thread Working - Installs P2- Deletes P1- Marks B as live

8 A C B P1 ? The Synchronization Problem A C B P1 P2 A C B A B Time Collector Working Thread Working Collector Working - Installs P2- Deletes P1- Marks B as live - C was not seen - Reclaims C: live!

9 Time Memory Location A B P1 P A and B contain pointers to Object C Point of Error The Synchronization Problem

10 Types of Write Barriers A C B P1 A C B P2 A B A B TIME Collector Working Thread Working Collector Working C C Steele Marks B (Source) for rescanning Dijkstra Mark C (new target) as live Yuasa Mark C (old target) as live

11 Costs of Concurrent Write Barriers Mutator Overhead –Direct Run-time Overhead (5-20%) –I-cache pollution by write barrier code Collector Overhead –Process Write Barrier Information Space costs –Sequential Store Buffer –Increased Code Size Termination Issues –Yuasa vs. Dijkstra/Steele

12 Contributions Four Elimination Conditions a static analysis can utilize. –Main idea based on pointer lifetimes. 1.Two Covering conditions: Apply to all barriers 2.Two Allocation conditions: Apply to incremental barriers. Limit study shows potential for barrier elimination –For Yuasa, on average 83% can be eliminated –For Dijkstra and Steele, on average 54% can be eliminated Correlation study between barrier elimination and –Object Size –Object Lifetime –Program Locality

13 Single Covering Condition Time Memory Location 123 A B P1 P2 Key Observation: P1’s lifetime covers P2’s Barriers on P2 are redundant A C B P1 A C B P2 A C B A B TIME Collector Working Thread Working Collector Working P1 C

14 Single Covering Condition (Incremental) Time Memory Location 123 A B L H Rescan Roots 4 Time Memory Location 123 A B H H 4

15 Single Covering Condition (Snapshot) Time Memory Location 123 A B L H 4 Time Memory Location 123 A B H H 4

16 Time Memory Location A B P1 P Multiple Covering Condition P2 567 C V Key Observation : P1 together with a barrier on P2 form a virtual pointer PV Apply Single Covering to eliminate barriers on P3

17 Multiple Covering Condition (Incremental) Time Memory Location 123 A B L H 4 L C Time Memory Location 123 A B L H 4 H C

18 Multiple Covering Condition (Incremental) Time Memory Location 123 A B H H 4 L C Time Memory Location 123 A B H H 4 H C

19 Multiple Covering Condition (Snapshot) Time Memory Location 123 A B L H 4 L C Time Memory Location 123 A B L H 4 H C

20 Multiple Covering Condition (Snapshot) Time Memory Location 123 A B H H 4 L C Time Memory Location 123 A B H H 4 H C

21 Single Allocation Condition Allocation conditions exploit Initial Root Set Marking Main idea: At the time of a barrier, the object is already marked Only if allocating non-white (also trade off), for Dijkstra/Steele only Time Memory Location 123 A B N P2 Key Observation : H starts Inside ‘New’ pointer N

22 Multiple Allocation Condition Key Observation : V = N join L Apply SAC between V and H Time Memory Location 123 A B C N L H V

23 Eliminating Null Pointer Stores If old pointer is NULL, eliminate barrier Yuasa vs Steele/Dijskstra Yuasa good because of initialization Null stores are easier to eliminate –But less payoff Yuasa(destination, old_pointer) if ( Collector_Marking && old_pointer != NULL) store (old_pointer); Dijkstra(destination, new_pointer) if (Collector_Marking && new_pointer != NULL) store (new_pointer);

24 Evaluation Methodology Limit study based on elimination condition Shows potential for elimination Traces – Jikes RVM – Trace Events : Allocation, Pointer Stores – Application Objects Only Which benchmarks: –SpecJVM98 (-s100) –Jolden –Ipsixql –Xalan –Deltablue

25 Barrier Elimination Potential – Dijkstra / Steele

26 Barrier Elimination Potential - Yuasa

27 Time (MB) vs. Eliminated Yuasa Barriers Elimination is Periodic => WB elimination occurs in bursts JAVAC

28 Time (MB) vs. Eliminated Yuasa Barriers Elimination is Periodic => WB elimination occurs in bursts db DB

29 Object Size (words) vs. Eliminated Yuasa Barriers Elimination is on Small objects JAVAC

30 Object Age vs. Eliminated Yuasa Barriers Elimination is mostly on Young objects (Log Y scale) JAVAC

31 Object Age vs. Eliminated Yuasa Barriers An Exception DB

32 Further Write Barrier Elimination So far assumed collector sees pointers in any order Often, Collector order exists –take advantage of it Main idea: writes on the collector wave are safe. –Cannot apply previous conditions –Can eliminate more barriers Order extends to Lists, Queues, Trees (Needs connectivity approximation)

33 Single Object Scenarios (Order matters) C P1 ? Time Collector Working Thread Working Collector Working -Installs P2- Deletes P1-B is partially scanned (gray) - Reclaims live object C incorrectly C P1 P2 C Broken Sequence B BBB

34 Single Object Scenarios (Order matters) C P1 C P2 C C Correct Sequence Collector Working Thread Working Collector Working - Installs P2- Deletes P1- B is partially scanned (gray) - C is NOT collected BBBB P2

35 Conclusions and Future Work Static Analysis Elimination Conditions An Upper Bound Hot Barrier Methods Yuasa Barriers seem superior to Dijkstra/Steele –(study higher elimination vs. reduced floating garbage) –Yuasa are harder to statically eliminate More conditions possible –Requires formal reasoning Other elimination combinations possible: –Coverage (ownership) and Order can be exploited further.