1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Complutense University, Madrid. Spain. * IMEC, Heverlee. Belgium. SEMINARIO 5-Marzo-2004 Garbage Collectors Refinement for New Dynamic Multimedia Applications.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
Chapter 8 Runtime Support. How program structures are implemented in a computer memory? The evolution of programming language design has led to the creation.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
Free-Me: A Static Analysis for Individual Object Reclamation Samuel Z. Guyer Tufts University Kathryn S. McKinley University of Texas at Austin Daniel.
Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
Bell: Bit-Encoding Online Memory Leak Detection Michael D. Bond Kathryn S. McKinley University of Texas at Austin.
Using Generational Garbage Collection To Implement Cache- conscious Data Placement Trishul M. Chilimbi & James R. Larus מציג : ראובן ביק.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Mark and Split Kostis Sagonas Uppsala Univ., Sweden NTUA, Greece Jesper Wilhelmsson Uppsala Univ., Sweden.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
Optimizing RAM-latency Dominated Applications
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.
Taking Off The Gloves With Reference Counting Immix
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Institute of Computing Technology On Improving Heap Memory Layout by Dynamic Pool Allocation Zhenjiang Wang Chenggang Wu Institute of Computing Technology,
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Cache-Conscious Structure Definition By Trishul M. Chilimbi, Bob Davidson, and James R. Larus Presented by Shelley Chen March 10, 2003.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Practical Path Profiling for Dynamic Optimizers Michael Bond, UT Austin Kathryn McKinley, UT Austin.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
Dynamic Selection of Application-Specific Garbage Collectors Sunil V. Soman Chandra Krintz University of California, Santa Barbara David F. Bacon IBM T.J.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Consider Starting with 160 k of memory do: Starting with 160 k of memory do: Allocate p1 (50 k) Allocate p1 (50 k) Allocate p2 (30 k) Allocate p2 (30 k)
Polar Opposites: Next Generation Languages & Architectures Kathryn S McKinley The University of Texas at Austin.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.
Dynamic Compilation Vijay Janapa Reddi
Cork: Dynamic Memory Leak Detection with Garbage Collection
Approaches to Reflective Method Invocation
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Garbage Collection Advantage: Improving Program Locality
Program-level Adaptive Memory Management
Run-time environments
Practical Assignment Sinking for Dynamic Compilers
CMPE 152: Compiler Design May 2 Class Meeting
Mooly Sagiv html:// Garbage Collection Mooly Sagiv html://
Presentation transcript:

1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng

2 Motivation Memory gap How are Java programs affected?

3 Marksweep vs. Copying pseudojbb

4 Motivation Javac with perfect L1 and L2 cache. 16K L1 256K L2 Appel, GCTk. Breadth first

5 Motivation Copying collector can reorder objects Goal: take advantage of copying collectors reorder objects to improve locality

6 Exploring The Space Different policies for traversing roots Class-oblivious traversal orders  Which traversing order is the best? Class-based traversal orders  How to find the “important” data structure?

7 Different Root Traversal Policies Two different types of roots:  Stack, global variables  Remember sets (for generational) Different traversal orders  Copy all roots before traversing any children  Copy each root and its children (root-by-root)  Split roots Stack first and the children Remset first and the children

8 Experiment Setup JikesRVM, JMTk Generational copying collector with bounded nursery size of 4MB PseudoAdaptive 2 nd iteration

9 Different Root Traversal Policies RxR has the best mutator locality

10 Different Root Traversal Policies Total execution time

11 Exploring The Space Different policies for traversing roots Class-oblivious traversal orders  Which traversing order is the best? Class-based traversal orders  How to find the “important” data structure?

12 Different Traversal Orders Breadth first 1,2,3,4,5,6,7 Pure depth first 1,2,6,3,4,7,5 Pure depth first, LIFO 1,5,4,7,3,2,

13 Different Traversal Orders Breadth first 1,2,3,4,5,6,7 Pure depth first 1,2,6,3,4,7,5 Pure depth first, LIFO 1,5,4,7,3,2,6 Partial depth first, 2 children 1,2,6,3,4,5,

14 Class Oblivious Type Different traversal policies Partial DF is the best

15 Exploring The Space Different policies for traversing roots Class-oblivious traversal orders  Which traversing order is the best? Class-based traversal orders  How to find the “important” data structure?

16 Class-based Traversal Class-oblivious traversal orders inflexible Class-based object traversal  Static profiling  Dynamic sampling

17 Static Profiling Profile object accesses Find hot pairs with strong correlation Example  (1,4), (4,7) and (2,6) have strong correlation  Order: 1,4,7,2,6,3,

18 Online Profiling Use the adaptive compiler sampling  Hot method  Hot basic block Use field accesses to indicate hot fields Example: (In a hot method) { Class A a; a.b=…; … } A B b …..

19 Online Profiling Micro benchmark results

20 Online Profiling Geometric mean

21 Reasons No advice for most of the objects copied  For jess, db and raytrace, we only pick <<1% of the objects as hot objects  5% for javac The hot fields are within the first 2 pointers  90% of the advised objects for javac

22 Online Profiling PseudoJBB mutator results  Generate advice for 23% of the copied objects  75% of the objects have adviced hot fields other than first 2

23 Questions Have we found all the hot objects?  Not all hot objects are connected? Is class-base good enough?  For pseudojbb, we need instance-based? Locality for the nursery objects?

24 Future Work Sampling technique  Catch more hot objects access Lower the threshold Hot objects that are not connected  Dynamically change the advice for phase changing Nursery locality Different traversal orders for cold objects Instance-based

25 Conclusion Reorder objects during copying collection can improve locality In class-oblivious traversal orders partial depth first order is the best Online profiling, class-based traversal is  more flexible, up to 50% better.  very low overhead, ~0% Still mysteries

26 Questions?

27 Answers? Lower the threshold of the sampling, not only the hot methods For objects with only 1 or 2 pointers, it maybe easier just depth first Maybe the nursery locality is more important Instance-based advice

28 Online Profiling Execution overhead

29 Online Profiling Micro benchmark results for mutator time

30 Different Root Traversal Policies _227_mtrt

31 Static Profiling Results

32 Answers? Most objects have only one pointer Percentage of objects copied by advice (whether it is really hot?)  For pseudojbb ~50%, for jess <<1%, for our micro benchmark ~16% Change! Half of the pairs do not form chains longer than 2 Maybe the nursery locality is more important

33 Class Oblivious Orderings Different traversal policies Partial DF is better pseudoJBB

34 Motivation MarkSweep vs. Copying Collector Mutator time of _213_javac

35 Motivation Mutator L2 misses _213_javac