ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.

Slides:



Advertisements
Similar presentations
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Advertisements

Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
A Study of Garbage Collector Scalability on Multicores LokeshGidra, Gaël Thomas, JulienSopena and Marc Shapiro INRIA/University of Paris 6.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
5. Memory Management From: Chapter 5, Modern Compiler Design, by Dick Grunt et al.
Garbage Collection CSCI 2720 Spring Static vs. Dynamic Allocation Early versions of Fortran –All memory was static C –Mix of static and dynamic.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Heap Shape Scalability Scalable Garbage Collection on Highly Parallel Platforms Kathy Barabash, Erez Petrank Computer Science Department Technion, Israel.
380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
UPortal Performance Optimization Faizan Ahmed Architect and Engineering Group Enterprise Systems & Services RUTGERS
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Mark and Split Kostis Sagonas Uppsala Univ., Sweden NTUA, Greece Jesper Wilhelmsson Uppsala Univ., Sweden.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
The College of William and Mary 1 Influence of Program Inputs on the Selection of Garbage Collectors Feng Mao, Eddy Zheng Zhang and Xipeng Shen.
Taking Off The Gloves With Reference Counting Immix
ISMM 2004 Mostly Concurrent Compaction for Mark-Sweep GC Yoav Ossia, Ori Ben-Yitzhak, Marc Segal IBM Haifa Research Lab. Israel.
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Exploring Multi-Threaded Java Application Performance on Multicore Hardware Ghent University, Belgium OOPSLA 2012 presentation – October 24 th 2012 Jennifer.
Behavior of Synchronization Methods in Commonly Used Languages and Systems Yiannis Nikolakopoulos Joint work with: D. Cederman, B.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
How’s the Parallel Computing Revolution Going? 1How’s the Parallel Revolution Going?McKinley Kathryn S. McKinley The University of Texas at Austin.
Comparison of Compacting Algorithms for Garbage Collection Mrinal Deo CS395T – Spring
Compilation (Semester A, 2013/14) Lecture 13b: Memory Management Noam Rinetzky Slides credit: Eran Yahav 1.
Investigating the Effects of Using Different Nursery Sizing Policies on Performance Tony Guan, Witty Srisa-an, and Neo Jia Department of Computer Science.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Immix: A Mark-Region Garbage Collector Curtis Dunham CS 395T Presentation Feb 2, 2011 Thanks to Steve Blackburn and Jennifer Sartor for their 2008 and.
GARBAGE COLLECTION IN AN UNCOOPERATIVE ENVIRONMENT Hans-Juergen Boehm Computer Science Dept. Rice University, Houston Mark Wieser Xerox Corporation, Palo.
Memory Management -Memory allocation -Garbage collection.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Introduction to Garbage Collection. Garbage Collection It automatically reclaims memory occupied by objects that are no longer in use It frees the programmer.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
CS412/413 Introduction to Compilers and Translators April 21, 1999 Lecture 30: Garbage collection.
Reference Counting. Reference Counting vs. Tracing Advantages ✔ Immediate ✔ Object-local ✔ Overhead distributed ✔ Very simple Trivial implementation for.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
GARBAGE COLLECTION Student: Jack Chang. Introduction Manual memory management Memory bugs Automatic memory management We know... A program can only use.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Introduction to Garbage Collection. GC Fundamentals Algorithmic Components AllocationReclamation 2 Identification Bump Allocation Free List ` Tracing.
Immix: A Mark-Region Garbage Collector Jennifer Sartor CS395T Presentation Mar 2, 2009 Thanks to Steve for his Immix presentation from
Java 9: The Quest for Very Large Heaps
Håkan Sundell Philippas Tsigas
Strategies for automatic memory management
Adaptive Code Unloading for Resource-Constrained JVMs
Memory Management Kathryn McKinley.
List Allocation and Garbage Collection
No Bit Left Behind: The Limits of Heap Data Compression
Reference Counting.
Presentation transcript:

ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing and Systems Chalmers University of Technology

Garbage Collection (GC) Reclaim memory no longer used by programs Algorithms: Mark-sweep (McCarthy 1959), copying (Fenichel & Yochelsom 1969, Cheney 1970), reference counting (Collins 1960), mark-compact (Cohen & Nicolau 1983), mark-split (Sagonas & Wilhelmsson 2006), mark-region (Blackburn & McKinley 2008), etc. Multicore era Garbage collectors need to be parallelized! 2

How do we parallelize GCs? The answer is not limited to one algorithm! Several algorithms have been parallelized: mark- sweep, mark-compact, copying, mark-region, etc Mark-Split [ Sagonas et al. ISMM2006 ] Can Mark-Split be parallelized? Is it a challenging problem to solve? 3

Contributions ParMarkSplit: A Parallel and Concurrent Mark-split Garbage Collector Lock-free skip-list with extended functionality Lazy splitting mechanism Implementation in OpenJDK HotSpot A collector for the old generation Evaluation using the DaCapo benchmarks (Blackburn et al. 2006) 4

Contributions Introduction Background Mark-split algorithm Contribution Skip-list with extended functionality Parallelization of mark-split Evaluation Conclusion 5

Mark-split algorithm (Sagonas & Wilhelmsson, ISMM 2006) Mark-Sweep: MARK reachable objects as LIVE Scan the heap to SWEEP over unmarked objects  Create free-list: list of free spaces. Mark-Split: mark without sweep MARK live objects Create the free-list during the mark phase, using an operation called SPLIT!  In some cases, this replacement of the sweep phase pays off! 6

Mark-Split (1) Heap: Root set: Free-list: 7

Mark-Split (2) 8 Heap: Root set: Free-list:

Mark-Split (2) Heap to be collected 9 Sequential Mark-Split outperforms Mark-Sweep in certain cases (ISMM 2006) Free-list: Root set:

Mark-Split Algorithm for Multi-core Architectures 10 Most frequent operation in Mark-Split “search and split” Free-list implementation in a sequential context: Fast search is enough! Balanced search tree, skip-list, hash table Free-list implementation in a multi-threaded context: atomic “search and split” is needed split = find_interval + delete [+ insert_intervals]  Suitable search data structure to adapt? Skip-list !

Contributions Introduction Background Mark-split algorithm Contribution Skip-list with extended functionality Parallelization of mark-split Evaluation Conclusion 11

Skip-list HeadTail 50% 25% … Layers of ordered lists with different densities, achieve tree-like behavior Search: O(log 2 N) – probabilistic! Lock-free skip-list [Sundell & Tsigas 2003] Extension needed to perform split 12

Extended functionality 13 Search data structure for parallelization of mark-split: To store and efficiently search for free memory chunks. Can perform a special, composite split operation. split = 1 delete + 2 inserts Atomic, CAS

Lazy Splitting Reduce the number of splitting interval operations Lowers the contention at the shared skip-list Observation: several adjacent objects are marked consecutively 10-60% total objects Degree of adjacency Mark-split  Mark-mark…-mark-split 14

Our parallelization of mark-split Our Parallel Mark-Split (PMS)- naturally achieved by: Using the extended skip-list to store free memory intervals for splitting. Integrating concurrent splitting into parallel marking. Implementation: New GC available for HotSpot – an industry-level Java Virtual Machine. Evaluation using DaCapo benchmarks on Intel and AMD systems. Compare with Lock-based PMS and Concurrent Mark- Sweep (CMS). 15

A comparison to Concurrent Mark-Sweep Concurrent Mark-Sweep Initial Mark Concurrent Mark Final Mark Sweep Parallel Mark-Split Initial Mark & Split Concurrent Mark & Split Final Mark & Split Translate the skip-list to free-list for allocation 16

Contributions Introduction Background Mark-split algorithm Contribution Skip-list with extended functionality Parallelization of mark-split Evaluation Conclusion 17

Evaluation setup DaCapo benchmarks Avrora, tomcat, sunflow, lusearch, xalan 2 Intel Nahalem x 6 HT cores 4 AMD Bulldozer x 12 cores Linux Ubuntu OpenJDK 7

Lazy Splitting Number of split_interval operations when using lazy splitting mechanism

Evaluation – Garbage collection time 20 GC Time (s) Stop-the-world scenario Good case Bad case

Evaluation – Benchmark time 21 Benchmark time (s) Concurrent GC Good case Bad case Characterizes applications that benefit from ParMarkSplit: High garbage to live ratio Live objects reside adjacent  being marked consecutively

Conclusion Parallel and Concurrent Mark-Split GC An extended lock-free skip-list to handle free-intervals Lazy splitting mechanism Implemented as a garbage collector in industrial-standard OpenJDK7 HotSpot Evaluation ParMarkSplit performs well with applications with certain characteristics