Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

1 Wake Up and Smell the Coffee: Performance Analysis Methodologies for the 21st Century Kathryn S McKinley Department of Computer Sciences University of.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
Beltway: Getting Around Garbage Collection Gridlock Mrinal Deo CS395T Presentation March 2, Content borrowed from Jennifer Sartor & Kathryn McKinley.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
Free-Me: A Static Analysis for Individual Object Reclamation Samuel Z. Guyer Tufts University Kathryn S. McKinley University of Texas at Austin Daniel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Task-aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Laurent Daynès Chandra Krintz RACE Lab, UC Santa Barbara Sun Microsystems.
© 2005 IBM Corporation ISMM’06 Ottawa, Ontario, Canada June 10 th 2006 | ISMM’06 Ottawa, Ontario, Canada © 2006 IBM Corporation Improving Locality with.
Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
Mark and Split Kostis Sagonas Uppsala Univ., Sweden NTUA, Greece Jesper Wilhelmsson Uppsala Univ., Sweden.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Taking Off The Gloves With Reference Counting Immix
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
CS380 C lecture 20 Last time –Linear scan register allocation –Classic compilation techniques –On to a modern context Today –Jenn Sartor –Experimental.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
How’s the Parallel Computing Revolution Going? 1How’s the Parallel Revolution Going?McKinley Kathryn S. McKinley The University of Texas at Austin.
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
Immix: A Mark-Region Garbage Collector Curtis Dunham CS 395T Presentation Feb 2, 2011 Thanks to Steve Blackburn and Jennifer Sartor for their 2008 and.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Department of Computer Sciences ISMM No Bit Left Behind: The Limits of Heap Data Compression Jennifer B. Sartor* Martin Hirzel †, Kathryn S. McKinley*
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
Dynamic Selection of Application-Specific Garbage Collectors Sunil V. Soman Chandra Krintz University of California, Santa Barbara David F. Bacon IBM T.J.
Department of Computer Sciences Z-Rays: Divide Arrays and Conquer Speed and Flexibility Jennifer B. Sartor Stephen M. Blackburn,
Object-Relative Addressing: Compressed Pointers in 64-bit Java Virtual Machines Kris Venstermans, Lieven Eeckhout, Koen De Bosschere Department of Electronics.
Polar Opposites: Next Generation Languages & Architectures Kathryn S McKinley The University of Texas at Austin.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Introduction to Garbage Collection. GC Fundamentals Algorithmic Components AllocationReclamation 2 Identification Bump Allocation Free List ` Tracing.
Dynamic Compilation Vijay Janapa Reddi
No Bit Left Behind: The Limits of Heap Data Compression
Rifat Shahriyar Stephen M. Blackburn Australian National University
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
José A. Joao* Onur Mutlu‡ Yale N. Patt*
No Bit Left Behind: The Limits of Heap Data Compression
Garbage Collection Advantage: Improving Program Locality
Program-level Adaptive Memory Management
Reference Counting vs. Tracing
Presentation transcript:

Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department of Computer Sciences University of Texas at Austin IBM Research Myths & Realities The Performance Impact of Garbage Collection

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Background No prior apples-to-apples comparisons MMTk Canonical policies implemented (SS, MS, RC, genX, etc) – Shared mechanisms – Good performance (match/beat old Watson GCs) – Ideal platform for apples-to-apples comparisons

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Some Questions Architecture – How well do modern OO languages play to modern architectures? Collection – Is generational GC “a waste of time”? – Are write barriers expensive? Allocation – Free list or bump pointer? “Locality is everything” – Really??? – Is it different for young & old? Why? Locality and architecture – What is the impact, what is the trend?

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Methodology Jikes RVM & MMTk Platforms 1.6GHz G5 (PowerPC 970) 1.9GHz AMD Athlon GHz Intel P4 Linux with perfctr patch & libraries – Separate accounting of GC & Mutator perf counts SPECjvm98 & pseudojbb

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Architecture

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Relative Performance Athlon GHz P4 2.6GHz G5 1.6GHz compress jess raytrace db javac mtrt jack pseudojbb

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Architecture - Q & A How big is the mismatch between modern arch & modern languages???

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Allocation

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Allocation Choices Bump pointer – ~70 bytes IA32 instructions, 726MB/s Free list – ~140 bytes IA32 instructions, 654MB/s Bump pointer 11% faster in tight loop – < 1% in practical setting – No significant difference (?) Second order effects? – Locality?? – Collection mechanism??

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Implications for Locality Compare SS & MS mutator – Mutator time = total – GC time – Mutator memory performance: L1, L2 & TLB

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection jess

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection javac

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection pseudojbb

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection db

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Locality

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Is the locality differential age-dependant? Re-run experiment with GenCopy & GenMS –Generational variants of MarkSweep & SemiSpace –Young objects treated identically –Mature objects either SemiSpace or MarkSweep

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Why?  Mature space locality? Nursery absorbs most allocs – lower frag Relatively frequent copying in SS Contigious allocation in nursery?

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Why? Mature space locality? Nursery absorbs most allocs – lower frag Relatively frequent copying in SS Contigious allocation in nursery?

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Why?  Mature space locality Nursery absorbs most allocs – lower frag Relatively frequent copying in SS Contigious allocation in nursery

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Run SS & MS in “infinite” heap

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Run SS & MS in “infinite” heap

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Bump Pointer & Free List Run SS & MS in “infinite” heap Infinite heap does not degrade locality (!?) – Exceptions: jess (degrades), db (improves) why? – Is spatial locality unimportant in mature space???

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection BP & FL Locality Implications Is spatial locality unimportant in mature space?? –No [Huang et al OOPSLA 2004] –But perhaps temporal locality is more significant Seems clear contiguous allocation is good –Vast majority of objects < cache line –h/w prefetcher may be significant Hard to improve over alloc order, easy to mess up? –Unlikely to be true: MarkSweep < Compacting < SemiSpace

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Locality & Architecture

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 1.6GHz PPC

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 1.9GHz AMD

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 2.6GHz P4

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover: 3.2GHz P4

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection MS/SS Crossover 2.6GHz 1.9GHz 1.6GHz localityspace 3.2GHz

Monday, April 13, 2015Myths & Realities: The performance impact of garbage collection Conclusions Need for (re) evaluation of GC performance –Key GC insights > 20yrs old –Technology has changed –Absence of apples-to-apples comparisons –Highly architecturally sensitive MMTk + perf counters –High performance infrastructure –Multiple GCs, shared mechanisms Some myths exposed & new realities