Dynamic Selection of Application-Specific Garbage Collectors Sunil V. Soman Chandra Krintz University of California, Santa Barbara David F. Bacon IBM T.J.

Slides:

Advertisements

Similar presentations

An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.

Advertisements

1 Wake Up and Smell the Coffee: Performance Analysis Methodologies for the 21st Century Kathryn S McKinley Department of Computer Sciences University of.

Paging: Design Issues. Readings r Silbershatz et al: ,

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.

Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.

1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.

1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.

Object Field Analysis for Heap Space Optimization ISMM 2004 G. Chen, M. Kandemir, N. Vijaykrishnanan and M. J. Irwin The Pennsylvania State University.

MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.

An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –

MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.

ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.

Free-Me: A Static Analysis for Individual Object Reclamation Samuel Z. Guyer Tufts University Kathryn S. McKinley University of Texas at Austin Daniel.

OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.

Online Performance Auditing Using Hot Optimizations Without Getting Burned Jeremy Lau (UCSD, IBM) Matthew Arnold (IBM) Michael Hind (IBM) Brad Calder (UCSD)

1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.

Task-aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Laurent Daynès Chandra Krintz RACE Lab, UC Santa Barbara Sun Microsystems.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science CRAMM: Virtual Memory Support for Garbage-Collected Applications Ting Yang, Emery.

Generational Stack Collection And Profile driven Pretenuring Perry Cheng Robert Harper Peter Lee Presented By Moti Alperovitch

Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.

U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.

Dynamic Tainting for Deployed Java Programs Du Li Advisor: Witawas Srisa-an University of Nebraska-Lincoln 1.

JVM-1 Introduction to Java Virtual Machine. JVM-2 Outline Java Language, Java Virtual Machine and Java Platform Organization of Java Virtual Machine Garbage.

An Adaptive, Region-based Allocator for Java Feng Qian & Laurie Hendren 2002.

U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.

1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.

Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.

Mark and Split Kostis Sagonas Uppsala Univ., Sweden NTUA, Greece Jesper Wilhelmsson Uppsala Univ., Sweden.

Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.

The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.

Exploiting Prolific Types for Memory Management and Optimizations By Yefim Shuf et al.

Adaptive Optimization in the Jalapeño JVM M. Arnold, S. Fink, D. Grove, M. Hind, P. Sweeney Presented by Andrew Cove Spring 2006.

ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.

Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer.

An Adaptive, Region-based Allocator for Java Feng Qian, Laurie Hendren {fqian, Sable Research Group School of Computer Science McGill.

Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.

GC Algorithm inside.NET Luo Bingqiao 5/22/2009. Agenda 1. 经典基本垃圾回收算法 2.CLR 中垃圾回收算法介绍 3.SSCLI 中 Garbage Collection 源码分析.

Profile Driven Component Placement for Cluster-based Online Services Christopher Stewart (University of Rochester) Kai Shen (University of Rochester) Sandhya.

Adaptive Optimization with On-Stack Replacement Stephen J. Fink IBM T.J. Watson Research Center Feng Qian (presenter) Sable Research Group, McGill University.

Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.

1 Fast and Efficient Partial Code Reordering Xianglong Huang (UT Austin, Adverplex) Stephen M. Blackburn (Intel) David Grove (IBM) Kathryn McKinley (UT.

A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.

Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.

1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.

CS380 C lecture 20 Last time –Linear scan register allocation –Classic compilation techniques –On to a modern context Today –Jenn Sartor –Experimental.

Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.

Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.

Free-Me: A Static Analysis for Automatic Individual Object Reclamation Samuel Z. Guyer, Kathryn McKinley, Daniel Frampton Presented by: Dimitris Prountzos.

Chameleon Automatic Selection of Collections Ohad Shacham Martin VechevEran Yahav Tel Aviv University IBM T.J. Watson Research Center Presented by: Yingyi.

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,

Finding Your Cronies: Static Analysis for Dynamic Object Colocation Samuel Z. Guyer Kathryn S. McKinley T H E U N I V E R S I T Y O F T E X A S A T A U.

September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct

Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.

02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.

380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.

1 Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT) Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss (UMass),

David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.

A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.

1 GC Advantage: Improving Program Locality Xianglong Huang, Zhenlin Wang, Stephen M Blackburn, Kathryn S McKinley, J Eliot B Moss, Perry Cheng.

1 The Garbage Collection Advantage: Improving Program Locality Xianglong Huang (UT), Stephen M Blackburn (ANU), Kathryn S McKinley (UT) J Eliot B Moss.

Memory Management What if pgm mem > main mem ?. Memory Management What if pgm mem > main mem ? Overlays – program controlled.

Dynamic Compilation Vijay Janapa Reddi

Chapter 9 – Real Memory Organization and Management

David F. Bacon, Perry Cheng, and V.T. Rajan

Page Replacement.

Adaptive Code Unloading for Resource-Constrained JVMs

Adaptive Optimization in the Jalapeño JVM

Garbage Collection Advantage: Improving Program Locality

Program-level Adaptive Memory Management

Presentation transcript:

Dynamic Selection of Application-Specific Garbage Collectors Sunil V. Soman Chandra Krintz University of California, Santa Barbara David F. Bacon IBM T.J. Watson Research Center

ISMM '042 Background VMs/managed runtimes Next generation internet computing Automatic memory management for memory safety High performance, multi-application servers GC performance impacts overall performance So GC must perform well

ISMM '043 GC Research Focus on general-purpose GC One GC for all apps No single GC provides best performance in all cases Across applications Even for the same application given different memory constraints

ISMM '044 Motivation SS MS GMS GSS CMS switch point Heap Size Relative to Min Execution Time (sec) _209_db SS MS GMS GSS CMS

ISMM '045 Motivation Other researchers have reported similar results [Attanasio ’01, Fitzgerald ’00] Spread between best & worst at least 15% Generational not always better Employ a VM instance built with the optimal GC Goal of our work: employ multiple GCs in the same system and switch between them

ISMM '046 Framework Implemented in the JikesRVM Uses the Java Memory management Toolkit (JMTk) Extended to enable multiple GC support Can switch between diverse collectors Easily extensible GC System = Allocator + Collector

ISMM '047 Framework Implementation StopTheWorld Plan SS_Plan MS_PlanGenerationalCopyMS_Plan GenMS_PlanGenCopy_Plan StopTheWorld VM_Processor Plan {SS_Plan,MS_Plan,...} (Selected at build time) VM_Processor (Plan currentPlan;) Most code is shared across GCs 44MB vs 42MB (average across GCs)

ISMM '048 Framework Implementation Example MS -> GSS MS -> GMS Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER Unmap Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER Alloc Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER

ISMM '049 Framework Implementation Example MS -> GSS MS -> GMS Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER No GC Do not unmap Alloc

ISMM '0410 Framework Implementation Nursery Mark-Sweep High Semispace Low Semispace Large Object Space GC Data Structures Immortal HIGHER LOWER Mem mapped on demand Unused memory unmapped Need MS & copying states Use object header Bit stealing Worst-case cost = 1 GC No need to perform GC in many cases

ISMM '0411 Making Use of GC Switching Dynamic switching guided by Cross-input offline behavior Annotation Methodology 2.4 GHz/1G single proc x86 (hyperthreading) JikesRVM v2.2.0, linux kernel SpecJVM98, SpecJBB, Jolden, Javagrande Pseudo-adaptive system

Annotation-guided GC (AnnotGC) Annotation Hints about optimization opportunities Widely used to guide compiler optimization Different benchmarks & inputs examined offline Cross-input results combined Select “optimal” GC and annotate program with it Best GC for a range of heap sizes JVM at program load time Switch to annotated GC Ignore annotation if GC-Switching not avail.

ISMM '0413 AnnotGC Implementation Observations Only 0 or 1 switch points per benchmark Switch point relative to min heap does not vary much across inputs Annotate min heap size & switch point 4 byte annotation in Java class file Encoded in a compact form 40 MB min assumed if not specified

ISMM '0414 AnnotGC Results SS MS GMS GSS CMS SS MS GMS GSS CMS

ISMM '0415 AnnotGC Results Degradation over best: 4% Improvement over worst: 26% Improvement over GenMS: 7% Heap Size Relative to Min Execution Time (sec) _209_db SS MS GMS GSS CMS GC Annot Heap Size Relative to Min ^6/Throughput SPECjbb2000 SS MS GMS GC CMS GC Annot

AnnotGC Results Average Difference Between Best & Worst GC Systems BenchmarksDegradation over BestImprovement over Worst compress6.28% (443 ms)3.53% (279 ms) jess2.82% (85 ms)56.17% (5767 ms) db2.88% (532 ms)12.47% (3028 ms) javac5.64% (392 ms)24.12% (2944 ms) mpegaudio3.54% (214 ms)3.21% (209ms) mtrt4.51% (270 ms)42.29% (5170 ms) jack3.22% (147 ms)32.70% (2787 ms) JavaGrande3.97% (2511 ms)17.71% (15500 ms) SPECjbb2.22% (3.17*10 6 /tput)27.95% (82.6*10 6 /tput) MST4.07% (30 ms)48.42% (1001 ms) Voronoi9.20% (144 ms)31.78% (1063 ms) Average4.38%26.22% Average (without MS)3.36%24.13%

ISMM '0417 Extending GC Switching Switch automatically No offline profiling Employ execution characteristics Performance hit: lost optimization opportunity Inlining of allocation sites Write barriers Solution: Aggressive specialization guarded by Method invalidation On-stack replacement

ISMM '0418 Investigating Auto Switching Exploit general performance characteristics Best performing GCs across heap sizes & programs Deciding when to switch Default GC: MS Heap size & heap residency threshold If residency > 60%, switch to GSS if heap size > 90MB else use GMS Different collectors for startup & steady- state

ISMM '0419 Preliminary Implementation Method invalidation - for future invocations On-stack replacement - for currently executing methods Deferred compilation & guarded inlining [Fink’04] Unconditional OsrPoints Our extension OsrInfoPoints for state info Without inserting checks in application code

ISMM '0420 AutoSwitch Results Improvement over worst: 17% (Annot: 26%) Degradation over best: 15%(Annot: 4%) Overhead OSR - negligible Recompilation - depends on where switch occurs Lost optimization opportunities all variables live at every GC point OsrInfoPoints are “pinned” down DCE, load/store elim, code motion, reg allocation

ISMM '0421 Related Work Application-specific GC studies Profile-direction GC selection [Fitzgerald’00] Comparison of GCs [Attanasio’01] Comparing gen mark-sweep & copying [Zorn’90] Switching/swapping Coupling compaction with copying [Sansom’92] Hot-swapping mark-sweep/mark-compact [Printezis’01] BEA Weblogic JRockit [BEA Workshop’03]

ISMM '0422 Conclusion Choice of best GC is app-dependent Novel framework Switch between diverse collectors Enables annotation driven & automatic switching Significantly reduce impact of choosing the "wrong" collector AnnotGC: 26%, degradation: 4%, GMS: 7% Enabled by aggressive specialization

ISMM '0423 Future Work Improving AutoSwitch Improved OSR Cuts AutoSwitch overhead by half Paper available upon request Better heuristics for automatic switching Decide when to switch & which GC to switch to High-performance application-specific VMs Guided by available resources Application characteristics Self-modifying

App-specific Garbage Collectors Front LoadersRear Loaders RecyclersSide Loaders All products are trademarks of Heil Environmental Industries, Ltd

ISMM '0425 Extra slides

ISMM '0426 Preliminary Implementation Aggressive specialization Inline allocation sites Insert write barriers only when necessary Method Invalidation Invalidated method replaced by stub On-stack Replacement (OSR) Potentially at every GC point

ISMM '0427 On-stack replacement Insert OsrInfoPoints at each GC point Keep track of state Call sites, allocation sites, prologs, backedges, explicit yieldpoints, exceptions All local/stack variables kept alive bar foo osr_helper {... } old return address trigger OSR b. OSR triggered lazily by external event Lazy OSR

ISMM '0428 GC Performance Evaluation (2) Comparison of parallel garbage collectors [5] Mark-sweep, copying, Gen MS, Gen copy, hybrid (copying + MS) Results Hybrid has lowest heap residency Gen copy handles fragmentation better Minor collections for Gen copy faster Mark-sweep has fewer GCs No one collector best across applications

ISMM '0429 AnnotGC Results Degradation over best: 4% Improvement over worst: 26% Improvement over GenMS: 7% Heap Size Relative to Min ^6/Throughput SPECjbb2000 SS MS GMS GSS CMS GC Annot Heap Size Relative to Min Execution Time (sec) _209_db SS MS GMS GSS CMS GC Annot

ISMM '0430 GC Performance Evaluation Comparing generational mark-sweep & copying [66] Mark-sweep Slower free-list allocator Requires less heap space: 20% less on average Copying collection Fastest allocator Copying overhead Copy reserve space required Generational copying not clearly superior

ISMM '0431 GC Performance Evaluation (2) Comparison of parallel garbage collectors [5] Mark-sweep, copying, Gen MS, Gen copy, hybrid (copying + MS) Results Hybrid has lowest heap residency Gen copy handles fragmentation better Minor collections for Gen copy faster Mark-sweep has fewer GCs No one collector best across applications

ISMM '0432 GC Performance Evaluation (3) Fitzgerald et al [32] compared GCs across 20 benchmarks Null collector, non-gen copying, Gen copy with various WB implementations Every collector best at least once Spread between best & worst at least 15% Generational not always better Non-generational may outperform by 15-20% Recommend “profile-directed” selection The “best” GC chosen from different pre-compiled binaries based on profiles It is difficult or impossible to replace GCs at runtime

ISMM '0433