Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

KAIST Computer Architecture Lab. The Effect of Multi-core on HPC Applications in Virtualized Systems Jaeung Han¹, Jeongseob Ahn¹, Changdae Kim¹, Youngjin.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Automatic Memory Management Noam Rinetzky Schreiber 123A /seminar/seminar1415a.html.
Performance, Energy and Thermal Considerations of SMT and CMP architectures Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadron Dept. of Computer Science,
Assessing the Scalability of Garbage Collectors on Many Cores (Funded by ANR projects: Prose and ConcoRDanT) Lokesh GidraGaël Thomas Julien SopenaMarc.
4/17/20151 Improving Memory Bank-Level Parallelism in the Presence of Prefetching Chang Joo Lee Veynu Narasiman Onur Mutlu* Yale N. Patt Electrical and.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Beltway: Getting Around Garbage Collection Gridlock Mrinal Deo CS395T Presentation March 2, Content borrowed from Jennifer Sartor & Kathryn McKinley.
XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
st International Conference on Parallel Processing (ICPP)
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
Microarchitectural Characterization of Production JVMs and Java Workload work in progress Jungwoo Ha (UT Austin) Magnus Gustafsson (Uppsala Univ.) Stephen.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Task-aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Laurent Daynès Chandra Krintz RACE Lab, UC Santa Barbara Sun Microsystems.
An On-the-Fly Reference Counting Garbage Collector for Java Erez Petrank Technion – Israel Institute of Technology Joint work with Yossi Levanoni – Microsoft.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science CRAMM: Virtual Memory Support for Garbage-Collected Applications Ting Yang, Emery.
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Comparison of JVM Phases on Data Cache Performance Shiwen Hu and Lizy K. John Laboratory for Computer Architecture The University of Texas at Austin.
By- Jaideep Moses, Ravi Iyer , Ramesh Illikkal and
February 11, 2003Ninth International Symposium on High Performance Computer Architecture Memory System Behavior of Java-Based Middleware Martin Karlsson,
The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
The Impact of Performance Asymmetry in Multicore Architectures Saisanthosh Ravi Michael Konrad Balakrishnan Rajwar Upton Lai UW-Madison and, Intel Corp.
Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer.
Profile Driven Component Placement for Cluster-based Online Services Christopher Stewart (University of Rochester) Kai Shen (University of Rochester) Sandhya.
Adaptive Cache Partitioning on a Composite Core Jiecao Yu, Andrew Lukefahr, Shruti Padmanabha, Reetuparna Das, Scott Mahlke Computer Engineering Lab University.
Politecnico di Torino Dipartimento di Automatica ed Informatica TORSEC Group Performance of Xen’s Secured Virtual Networks Emanuele Cesena Paolo Carlo.
(1) Scheduling for Multithreaded Chip Multiprocessors (Multithreaded CMPs)
StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.
Dynamic Object Sampling for Pretenuring Maria Jump Department of Computer Sciences The University of Texas at Austin Stephen M. Blackburn.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Control Theory for Adaptive Heap Resizing Jeremy Singer, David White, Jon Aitken, Richard Jones.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
Handling Session Classes for Predicting ASP.NET Performance Metrics Ágnes Bogárdi-Mészöly, Tihamér Levendovszky, Hassan Charaf Budapest University of Technology.
Understanding Performance, Power and Energy Behavior in Asymmetric Processors Nagesh B Lakshminarayana Hyesoon Kim School of Computer Science Georgia Institute.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.
Full and Para Virtualization
CMP/CMT Scaling of SPECjbb2005 on UltraSPARC T1 (Niagara) Dimitris Kaseridis and Lizy K. John The University of Texas at Austin Laboratory for Computer.
Threads. Readings r Silberschatz et al : Chapter 4.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
NUMA Optimization of Java VM
1© Copyright 2015 EMC Corporation. All rights reserved. NUMA(YEY) BY JACOB KUGLER.
Institute of Parallel and Distributed Systems (IPADS)
Adaptive Cache Partitioning on a Composite Core
Java 9: The Quest for Very Large Heaps
Kilohertz Decision Making on Petabytes
An Empirical Analysis of Java Performance Quality
Department of Computer Science University of California, Santa Barbara
Mark Claypool and Jonathan Tanner Computer Science Department
Adaptive Code Unloading for Resource-Constrained JVMs
Beltway: Getting Around Garbage Collection Gridlock
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Department of Computer Science University of California, Santa Barbara
Program-level Adaptive Memory Management
Presentation transcript:

Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science & Engineering University of Nebraska-Lincoln

Throughput performance of SPECjAppServer2004 JBoss application server HotSpot VM released as part of OpenJDK1.7 –Parallel generational collector with 8 minor and 8 major collection threads –256 MB heap for this example (very little paging) –Nursery and mature spaces are set to 1:2 and periodically resized to maximize throughput 2 Setting the Stage …

Setting the Stage Performance of the generational collector has something to do with such throughput behavior –Higher workload often results in higher GC overhead –More GC work means less useful work done by an application 3

4 Not all generational collectors are created equal!

Motivation One factor that differs among different generational collector implementations is nursery sizing policy –There are multiple ways to size the nursery and mature space –Performance ramifications of using each policy have not been widely investigated 5

Research Goals Can performance be affected by using different sizing policies? –Modify a VM to support multiple sizing policies If so, how can it be affected? –Perform analysis by observing various metrics including execution time, throughput, GC behavior, and minimum mutator utilization 6

Agenda Overview of investigated sizing policies Experimental methodology Results of our evaluation Introducing a hybrid policy Conclusions 7

Sizing Policy: Fixed Ratio Nursery (33%) Mature (66%) Nursery (33%) Mature (66%) Size enlarged by 20% 8

Sizing Policy: Heap Availability Nursery Used Mature Nursery Used Mature Nursery Used Mature Nursery Initial After Minor_GC1 After Minor_GC2 After Minor_GCn 9

Sizing Policy: GC Ergonomics Nursery (33%) Mature (66%) Nursery (48%) Mature (52%) Nursery (40%) Mature (60%) Size and ratio are adjusted to meet performance goal(s) 10

Terminology: Copy-Reserve* Nursery Initial After a Minor GC After a few Minor GC Copy- Reserve Copy-Reserve = 100% of nursery *Sizing policy = FR 11 … Mature Used Mature Used Mature

Experimental Setting Three nursery sizing policies –GC Ergonomics Policy (Default) –Fixed Ratio Policy (FR) –Heap Availability Policy (HA) Multithreaded benchmarks –SPECjvm2008 (17) –Multithreaded benchmarks from DaCapo (eclipse,hsqldb,lusearch,xalan) –SPECjbb2005 & SPECjAppServer

Experimental Setting JVM (Hotspot) settings: –Memory: Old:Young=2:1 2 times minimum heap for each application 256MB for jAppServer2004 and 1GB for jbb2005 –GC : 8 threads for both minor & full GC, policy modification based on parallel collector Platform: Intel Xeon 8 cores, 16GB, running Linux Methodology: 5 runs, report best, worst, and average 13

Result: jvm2008 & DaCapo 14 The remaining thirteen benchmarks show little sensitivity to different policies

Analysis: jvm2008 & DaCapo AppDefaultHAFR MinorFullMinorFullMinorFull Com.sunflow Derby Crypto.aes Using different sizing policies can affect the garbage collection performance

Result: jAppServer

Analysis: jAppServer ConfigDefaultHAFR MinorFullMinorFullMinorFull 25 Tx Tx

Analysis: jAppServer2004 Current heap usage Copy-Reserve = 100% of nursery Sizing policy = FR or Default 18 Used Mature Copy- Reserve Nursery Used Mature Copy- Reserve Nursery Mature heap usage after full collection Full Collection Used Mature Copy- Reserve Nursery Full Collection Mature heap usage after full collection

Analysis: jAppServer2004 InitialAfter a Minor GC Nursery Copy- Reserve Used Mature Nursery Copy- Reserve After a few Minor GC Copy-Reserve = 100% of nursery Sizing policy = HA 19 Used Mature Nursery Copy- Reserve Minor Collection Minor Collection

MMU: jAppServer2004 at 25tx 20

MMU: jAppServer2004 at 40tx 21

MMU: jAppServer2004 at 50tx 22

Result: jbb

MMU: jbb2005 at 8whs 24

MMU: jbb2005 at 23whs 25

Summary HA does not do as well as the other two policies when the workload is light HA allows the server to respond to requests significantly longer under heavy workload 26

A Hybrid Policy Use default policy for the peak performance during light workload Use HA as soon as the GC behavior has reached a critical point –number of consecutive FullGC >= consecFailure (e.g., 2) 27

Preliminary Evaluation: jbb

Preliminary Evaluation: jAppServer

Preliminary Evaluation: Switching Between 30

Conclusions Three nursery sizing policies are investigated using 23 benchmarks –Sizing policy does matter! It can impact performance and serviceability of large servers –Up to 36% performance differences have been observed in some benchmarks The hybrid policy can be useful in large servers to better handle heavy workload 31

Investigating the Effects of Using Different Nursery Sizing Policies on Performance For source of modified hotspot & modified jbb2005 see: Thanks!