380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

1 Wake Up and Smell the Coffee: Performance Analysis Methodologies for the 21st Century Kathryn S McKinley Department of Computer Sciences University of.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Michael Bond Kathryn McKinley The University of Texas at Austin Presented by Na Meng Most of the slides are from Mike’s original talk. Many thanks go to.
© Richard Jones, University of Kent SCIEnce Paris Workshop Richard Jones Computing Laboratory University of Kent,
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
Beltway: Getting Around Garbage Collection Gridlock Mrinal Deo CS395T Presentation March 2, Content borrowed from Jennifer Sartor & Kathryn McKinley.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
Using Prefetching to Improve Reference-Counting Garbage Collectors Harel Paz IBM Haifa Research Lab Erez Petrank Microsoft Research and Technion.
Free-Me: A Static Analysis for Individual Object Reclamation Samuel Z. Guyer Tufts University Kathryn S. McKinley University of Texas at Austin Daniel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Memory Allocation and Garbage Collection. Why Dynamic Memory? We cannot know memory requirements in advance when the program is written. We cannot know.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
Taking Off The Gloves With Reference Counting Immix
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer.
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
P ath & E dge P rofiling Michael Bond, UT Austin Kathryn McKinley, UT Austin Continuous Presented by: Yingyi Bu.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
CS380 C lecture 20 Last time –Linear scan register allocation –Classic compilation techniques –On to a modern context Today –Jenn Sartor –Experimental.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.
How’s the Parallel Computing Revolution Going? 1How’s the Parallel Revolution Going?McKinley Kathryn S. McKinley The University of Texas at Austin.
Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Immix: A Mark-Region Garbage Collector Curtis Dunham CS 395T Presentation Feb 2, 2011 Thanks to Steve Blackburn and Jennifer Sartor for their 2008 and.
1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),
Memory Management -Memory allocation -Garbage collection.
CSE 598c – Virtual Machines Survey Proposal: Improving Performance for the JVM Sandra Rueda.
Department of Computer Sciences Z-Rays: Divide Arrays and Conquer Speed and Flexibility Jennifer B. Sartor Stephen M. Blackburn,
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Introduction to Garbage Collection. Garbage Collection It automatically reclaims memory occupied by objects that are no longer in use It frees the programmer.
Polar Opposites: Next Generation Languages & Architectures Kathryn S McKinley The University of Texas at Austin.
1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.
Introduction to Garbage Collection. GC Fundamentals Algorithmic Components AllocationReclamation 2 Identification Bump Allocation Free List ` Tracing.
Immix: A Mark-Region Garbage Collector Jennifer Sartor CS395T Presentation Mar 2, 2009 Thanks to Steve for his Immix presentation from
Dynamic Compilation Vijay Janapa Reddi
No Bit Left Behind: The Limits of Heap Data Compression
Rifat Shahriyar Stephen M. Blackburn Australian National University
Concepts of programming languages
David F. Bacon, Perry Cheng, and V.T. Rajan
Jipeng Huang, Michael D. Bond Ohio State University
Memory Management and Garbage Collection Hal Perkins Autumn 2011
Memory Management Kathryn McKinley.
José A. Joao* Onur Mutlu‡ Yale N. Patt*
No Bit Left Behind: The Limits of Heap Data Compression
Garbage Collection Advantage: Improving Program Locality
Reference Counting vs. Tracing
Presentation transcript:

380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap a lot? – Why you need to care about workloads – Alias analysis – Dependence analysis – Loop transformations – EDGE architectures 1

2 380C lecture 18 Garbage Collection – Why use garbage collection? – What is garbage? Reachable vs live, stack maps, etc. – Allocators and their collection mechanisms Semispace Marksweep Performance comparisons Mark Region – Incremental age based collection Write barriers: Friend or foe? Generational Beltway

Mark Region and Other Advances in Garbage Collection Kathryn S. McKinley Stephen M. Blackburn University of Texas at Austin Australian National University PLDI’08: Immix: A Mark-Region Collector With Space Efficiency, Fast Collection, and Mutator Performance

Isn’t GC a bit retro? 4 “Languages without automated garbage collection are getting out of fashion. The chance of running into all kinds of memory problems is gradually outweighing the performance penalty you have to pay for garbage collection.” Paul Jansen, managing director of TIOBE Software, in Dr Dobbs, April 2008 “Languages without automated garbage collection are getting out of fashion. The chance of running into all kinds of memory problems is gradually outweighing the performance penalty you have to pay for garbage collection.” Paul Jansen, managing director of TIOBE Software, in Dr Dobbs, April 2008 Mark-Compact Styger, 1967 Mark-Sweep McCarthy, 1960 Semi-Space Cheney, 1970

GC Fundamentals The Time–Space Tradeoff 5

6 Our Goal

GC Fundamentals Algorithmic Components AllocationReclamation 7 Identification Bump Allocation Free List ` Tracing (implicit) Reference Counting (explicit) Sweep-to-Free Compact Evacuate 31

Mark-Compact [Styger 1967] Bump allocation + trace + compact GC Fundamentals Canonical Garbage Collectors 8 ` Sweep-to-Free Compact Evacuate Mark-Sweep [McCarthy 1960] Free-list + trace + sweep-to-free Semi-Space [Cheney 1970] Bump allocation + trace + evacuate

Mark-Sweep Free List Allocation + Trace + Sweep-to-Free 9 Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Space efficient ✓ ✓ Simple, very fast collection Poor locality

10 Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Space efficient Mark-Compact Bump Allocation + Trace + Compact Expensive multi-pass collection ✓ ✓ Good locality Good locality

Semi-Space Bump Allocation + Trace + Evacuation 11 Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Good locality Space inefficient

Mark-Region with Sweep-To-Region 12 ` Sweep-to-Free Compact Evacuate Reclamation Sweep-to-Region Mark-Sweep Free-list + trace + sweep-to-free Mark-Compact Bump allocation + trace + compact Semi-Space Bump allocation + trace + evacuate Mark-Region Bump + trace + sweep-to-region

Mark-Region Bump Allocation + Trace + Sweep-to-Region 13 ✓ ✓ Simple, very fast collection ✓ ✓ Space efficient ✓ ✓ Good locality Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Excellent performance Excellent performance

Naïve Mark-Region 14 Contiguous allocation into regions Excellent locality – For simplicity, objects cannot span regions Simple mark phase (like mark-sweep) – Mark objects and their containing region Unmarked regions can be freed 0 0

Immix Efficient Mark-Region Garbage Collection 15

Lines and Blocks 16 Small Regions Large Regions ✗ Fragmentation (can’t fill blocks) ✓ More contiguous allocation ✗ Fragmentation (false marking) Lines & Blocks N pagesapprox 1 cache line ✓ Less fragmentation  Objects span lines ✓ Fast common case  Lines marked with objects ✗ Increased metadata o/h ✗ Constrained object sizes 0 0  TLB locality, cache locality  Block > 4 X max object size Free Recyclable lines

Allocation Policy (Recycling) 17 Recycle partially marked blocks first Minimizes fragmentation Maximizes sharing of freed blocks Recycle in address order – We explored other options Allocate into free blocks last

Opportunistic Defragmentation Identify source and target blocks – (see paper for heuristics) Evacuate objects in source blocks – Allocate into target blocks Opportunistic – Leave in place if no space, or object pinned Opportunistically evacuate fragmented blocks – Lightweight, uses same allocation mechanism – No cost in common case (specialized GC)

Other Optimizations 19 Implicit Marking ✓ Most objects small  Small objects implicitly mark next line ✓ V. Fast common case  Large objects mark lines exactly Implicit line mark Line mark Overflow Allocation  Multi-line objects may skip many small holes  Overflow allocation (used on failure) ✓ Large objects uncommon ✓ V. effective solution ✓ ✓

Results Complete data available at: 20

Evaluation 20 Benchmarks Hardware 21 Collectors ` Methodology DaCapo SPECjvm98 SPEC jbb2000 MMTk Jikes RVM (Perf ≈ HotSpot 1.5) Replay compiler Discard outliers Report 95 th %ile Full Heap Immix MarkSweep MarkCompact SemiSpace Generational GenIX GenMS GenCopy Sticky StickyIX StickyMS Core 2 Duo 2.4GHz, 32KB L1, 4MB L2, 2GB RAM AMD Athlon GHz, 64KB L1, 512KB L2, 2GB RAM PowerPC GHz, 32KB L1, 512KB L2, 2GB RAM Please see the paper for details.

Mutator Time 22 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

Minimum Heap 23

GC Time 24 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

Total Performance 25 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

Generational Performance 26 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

Sticky Performance 27 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

PseudoJBB On 2.4GHz Core 2 Duo

PseudoJBB On 2.4GHz Core 2 Duo

Prior Work IBM product collector – Mark-Region not characterized – Collector not evaluated – Product and basis for other research [Domani et al 2000][Kermany & Petrank 2006] 30

Mark-Region Collection 31 ` Sweep-to-Free Compact Evacuate Mark-Sweep Free-list + trace + sweep-to-free Mark-Compact Bump allocation + trace + compact Semi-Space Bump allocation + trace + evacuate Mark-Region Bump allocation + trace + sweep-to-region Sweep-to-Region

Immix Efficient Mark-Region Collection 32 ✓ ✓ Simple, very fast collection ✓ ✓ Space efficient ✓ ✓ Good locality Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Excellent performance Excellent performance

Open Source Code available in JikesRVM onward. Complete data available at: 33

Research History PLDI 1998 – Clinger & Hanson postulated the radioactive decay model for object lifetimes Genesis of Older-First – [Stefanovic, McKinley, Moss OOPSLA’99] 34

Garbage Collection Hypotheses Generational hypothesis: younger objects die quickly, so collect them first Older-first hypothesis: the collector can collect less the longer it waits 35 Survival function s(v) for object lifetime distribution younger  older 0 1/2V V Age ordered heap s(v)

Older-first Algorithm 36

Next Steps Beltway – [BJMM PLDI’02] – Increments – Belts – Combines generational and older-first Ulterior Reference Counting – [BM OOPSLA’03] – Reference count on-per-object basis – Responsiveness and throughput MMTk : [BCM SIGMETRICS’04 ICSE’04] – Toolkit for building & understanding GC – Motivated today’s work

Garbage Collection is the Answer to All Your Problems Improves data and code locality – [Huang et al. OOPSLA’02 ISMM’04, VEE’04] Cooperative GC optimizations – Colocation [Guyer OOPSLA’05] – Free-me [Guyer et al. PLDI’06] Finds leaks – [Bond ASPLOS’06, Jump POPL’07] Tolerates leaks – [Bond OOSLA’08] Helps with dynamic software updating! – [Subramaniam, Hicks ??’08] DaCapo Benchmarks – [Blackburn et al. OOPSLA’06 CACM’08] 38

380C Where are we & where we are going – Why you need to care about workloads – Managed languages Dynamic compilation Inlining Garbage collection – Opportunity to improve data locality on-the-fly – Read: X. Huang, S. M. Blackburn, K. S. McKinley, J. E. B. Moss, Z. Wang, and P. Cheng, The Garbage Collection Advantage: Improving Program Locality, ACM Conference on Object Oriented Programming, Systems, Languages, and Applications (OOPSLA), pp , Vancouver, Canada, October – Alias analysis – Dependence analysis – Loop transformations – EDGE architectures