Java 9: The Quest for Very Large Heaps

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

Paging: Design Issues. Readings r Silbershatz et al: ,
Garbage Collection for Large Scale Multiprocessors (Funded by ANR projects: Prose and ConcoRDanT) Lokesh GidraGaël Thomas Julien SopenaMarc Shapiro Regal-LIP6/INRIA.
… an introduction Peter Varsanyi Garbage collector Confidential.
A Study of Garbage Collector Scalability on Multicores LokeshGidra, Gaël Thomas, JulienSopena and Marc Shapiro INRIA/University of Paris 6.
Assessing the Scalability of Garbage Collectors on Many Cores (Funded by ANR projects: Prose and ConcoRDanT) Lokesh GidraGaël Thomas Julien SopenaMarc.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Garbage Collection What is garbage and how can we deal with it?
15 Copyright © 2004, Oracle. All rights reserved. Monitoring and Managing Memory.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble, Ankita Kejriwal, and John Ousterhout Stanford University.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
Increasing Memory Usage in Real-Time GC Tobias Ritzau and Peter Fritzson Department of Computer and Information Science Linköpings universitet
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Task-aware Garbage Collection in a Multi-Tasking Virtual Machine Sunil Soman Laurent Daynès Chandra Krintz RACE Lab, UC Santa Barbara Sun Microsystems.
Memory Management 2010.
Memory Allocation and Garbage Collection. Why Dynamic Memory? We cannot know memory requirements in advance when the program is written. We cannot know.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
G1 TUNING Shubham Modi( ) Ujjwal Kumar Singh(10772) Vaibhav(10780)
Reference Counters Associate a counter with each heap item Whenever a heap item is created, such as by a new or malloc instruction, initialize the counter.
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
A Real-Time Garbage Collector Based on the Lifetimes of Objects Henry Lieberman and Carl Hewitt (CACM, June 1983) Rudy Kaplan Depena CS395T: Memory Management.
Copyright © Oracle Corporation, All rights reserved. 1 Oracle Architectural Components.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
1 Tuning Garbage Collection in an Embedded Java Environment G. Chen, R. Shetty, M. Kandemir, N. Vijaykrishnan, M. J. Irwin Microsystems Design Lab The.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Log-structured Memory for DRAM-based Storage Stephen Rumble, John Ousterhout Center for Future Architectures Research Storage3.2: Architectures.
CS 149: Operating Systems March 3 Class Meeting Department of Computer Science San Jose State University Spring 2015 Instructor: Ron Mak
How’s the Parallel Computing Revolution Going? 1How’s the Parallel Revolution Going?McKinley Kathryn S. McKinley The University of Texas at Austin.
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
Making Watson Fast Daniel Brown HON111. Need for Watson to be fast to play Jeopardy successfully – All computations have to be done in a few seconds –
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Big Data Engineering: Recent Performance Enhancements in JVM- based Frameworks Mayuresh Kunjir.
Preface 1Performance Tuning Methodology: A Review Course Structure 1-2 Lesson Objective 1-3 Concepts 1-4 Determining the Worst Bottleneck 1-5 Understanding.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
Antoine Chambille Head of Research & Development, Quartet FS
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
Lecture Topics: 12/1 File System Implementation –Space allocation –Free Space –Directory implementation –Caching Disk Scheduling File System/Disk Interaction.
NUMA Optimization of Java VM
.NET Memory Primer Martin Kulov. "Out of CPU, memory and disk, memory is typically the most important for overall system performance." Mark Russinovich.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
Garbage Collection What is garbage and how can we deal with it?
Core Java Garbage Collection LEVEL – PRACTITIONER.
Institute of Parallel and Distributed Systems (IPADS)
Topic: Java Garbage Collection
Memshare: a Dynamic Multi-tenant Key-value Cache
CS 153: Concepts of Compiler Design November 28 Class Meeting
Software Architecture in Practice
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
CMS to G1 - Java GC Vaibhav Choudhary Java Platforms Team
Ulterior Reference Counting Fast GC Without The Wait
NumaGiC: A garbage collector for big-data on big NUMA machines
Predictive Performance
Smart Pointers.
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Main Memory Background Swapping Contiguous Allocation Paging
Strategies for automatic memory management
Adaptive Code Unloading for Resource-Constrained JVMs
(A Research Proposal for Optimizing DBMS on CMP)
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Automating Memory Management
CMPE 152: Compiler Design May 2 Class Meeting
Garbage Collection What is garbage and how can we deal with it?
Presentation transcript:

Java 9: The Quest for Very Large Heaps Bernard Traversat VP Software Development, Java SE, Oracle bernard.traversat@oracle.com Antoine Chambille Head of Research & Development, Quartet FS antoine.chambille@quartetfs.com

Oracle and ActivePivot Partnership History 2014 2015 2016 2017 partnership to operate a credit risk application on SPARC server with 384 cores and 16TB ActivePivot engineers and JVM engineers work together in Java 9 to improve garbage collection for large heaps release of Java 9 with very large heap support Oracle and Quartet FS meet through a common customer (investment bank), with the challenge of running ActivePivot calculations in a 10TB JVM presentation at JavaOne 2015 presentation of achievements at JavaOne 2016

The Rise of In-Memory Computing $

Use Cases for In-Memory Computing Finance Pricing Supply Chain

Java for In-Memory Computing Performance JIT State of the art parallel computing (java.util.concurrent) Popularity Runs everywhere Versatile, safe, forgiving 20M developers But garbage collection scalability is not limitless

ActivePivot design for large memory calculations Off-Heap Memory Management Parallel Programming Most database structures are Off-Heap Columns of numbers Vectors of simulations Hash tables Indexes Relieves Garbage Collection Implemented with mmap and a custom malloc implementation Fork/Join Pool Reentrant thread pool Work stealing Lock-Free Data Structures Dictionaries Indexes Queues

Unleashing Many-Cores Parallelism Before After DATA PARTITION DATA CPU

NUMA Architecture Allocated memory A Thread A Thread B Allocated memory B

Garbage Collection (GC) Java Garbage Collection behaviors have the most impact on overall application performance and throughput As the JVM heap size grows and object allocation rate increases so does the amount of time that an application must pause to allow the JVM to perform GC Long and unpredictable GC pauses significantly affect applications: Delaying or deadlocking high-frequency transactions Deteriorating application throughput Causing user-session timeouts Requiring costly over-capacity planning GC pauses have traditionally be proportional to the size of the heap. So, a larger heap means a longer pause. Optimized GCs for predictability or low-pause time are been proposed but they tend to have limited applicability (small heap, throughput degradation and full GC cliff)

Current GC strategy are becoming obsolete GC Background Current GC strategy are becoming obsolete Memory and heap size are getting too large for current GC algorithms to scale Core counts are increasing providing more cycle opportunity for concurrent GCs Increase risk of heap fragmentation due to large heaps (100+ GB) The ratio of read vs write is also drastically changing (immutable big data objects) Scalability and predictability matter as much as low-latency Reducing GC storm jitters on large Cloud deployment Avoid unexpected delay in microservices that can create chained delayed reactions in other dependent microservices

G1 (Garbage First) Architecture O O E E O One large contiguous heap space divided into many fixed size regions Size can be 1MB – 32MB Scale to multi-TB heap Each region can be assigned a unique eviction/compaction policy (Eden region, Survivor region, Humongous or Old region) Per region scalable collection process Segment the heap for future capabilities E O O S E O S O O E E O O S S O O O O O H E H H O E Eden regions S Survivor regions O Old generation regions H Humongous regions Available / Unused regions

G1 (Garbage First) Architecture Segmented regions provide more predictable GC pause-times Non-contiguous Eden, Survivor and Old generation spaces Live objects from Eden and Survivor regions are evacuated to a set of unused regions which become the new Survivor regions Number of regions in Eden and Survivor evolve dynamically per GC requirements After concurrent cycle, old regions are collected based on region object liveness Remembered Set tracks the number of object references into a given region Collection Set tracks the set of regions to be collected on a given GC

G1 Garbage Collector – Overview Young collection (n times) Young collection Initial mark Young collection Mixed collection (n times) G1 Remark Clean up STW STW STW STW STW STW STW APP App threads Marking threads CONCURRENT MARKING

G1 Garbage Collector – Running application App threads 0. Initial state 1. Allocate objects App threads STW Young collection Initial mark Young collection (n times) Remark Clean up Mixed collection (n times) G1 APP CONCURRENT MARKING Marking threads App threads STW Initial mark Young collection Young collection (n times) Remark Clean up Mixed collection (n times) G1 APP Marking threads CONCURRENT MARKING

G1 Garbage Collector – Young collection 1. Evacuate live objects 2. Reclaim memory Survivor region STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

G1 Garbage Collector – Young collection …until threshold is reached (IHOP) STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

G1 Garbage Collector – Initial mark Start of the concurrent marking… STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

G1 Garbage Collector – Concurrent marking Keep going with young collections while marking STW Young collection Initial mark Young collection (n times) Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads STW Marking threads STW STW CONCURRENT MARKING APP Marking threads CONCURRENT MARKING

G1 Garbage Collector – Remark Help marking completion STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

G1 Garbage Collector – Remark Collect statistics about old regions… based on % of live objects 75% 30% 15% STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

G1 Garbage Collector – Clean up Clean-up, select regions to collect STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

G1 Garbage Collector – Young Collection 1. 2. STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

G1 Garbage Collector – Mixed collection 1. 2. STW Young collection STW Initial mark STW Young collection (n times) STW Remark STW Clean up STW Young collection STW Mixed collection (n times) G1 App threads Marking threads CONCURRENT MARKING APP

Read-Only Benchmark 2015 GC Time Distribution Short Queries GC Time Distribution Calculation Queries GC Time Distribution Full Scan

Improving Young Collections Objects Thread 1 Thread 2 T1 T2 PUBLIC BUFFER PRIVATE BUFFER

Before After Work Stealing Work Stealing Refill Thread 1 Thread 2 PUBLIC BUFFER PUBLIC BUFFER Work Stealing Work Stealing Refill PRIVATE BUFFER PRIVATE BUFFER

Read-Only Benchmark 2015 vs 2016 GC Time Distribution Short Queries GC Time Distribution Calculation Queries GC Time Distribution Full Scan 2015 2016 2016 2015 2016 2015

Java Heap Live Data Bitmap C D E F 512 bits B C D E F

One at a time Before After LOCK FREE Marking Thread 1 Marking Thread 2 MARK STACK One at a time LOCK FREE SHARED STACK

Read-Write Benchmark 2016 GC Time Distribution Short & Full Scan Queries Impact of Updates on Query Time Short & Full Scan Queries Full scan Read & Write Read & Write Short Read only Updates recycle about one terabyte every 10mn Read only

Do you want to be part of it? Please make the connection Antoine Chambille antoine.chambille@quartetfs.com Bernard Traversat bernard.traversat@oracle.com www.quartetfs.com