MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.

Slides:



Advertisements
Similar presentations
An Implementation of Mostly- Copying GC on Ruby VM Tomoharu Ugawa The University of Electro-Communications, Japan.
Advertisements

U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 MC 2 –Copying GC for Memory Constrained Environments Narendran Sachindran J. Eliot.
Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
1 Overview Assignment 5: hints  Garbage collection Assignment 4: solution.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
Garbage Collection  records not reachable  reclaim to allow reuse  performed by runtime system (support programs linked with the compiled code) (support.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
1 Overview Assignment 7: hints  Distributed objects Assignment 6: solution  Living with a garbage collector.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
Efficient Concurrent Mark-Sweep Cycle Collection Daniel Frampton, Stephen Blackburn, Luke Quinane and John Zigman (Pending submission) Presented by Jose.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
OOPSLA 2003 Mostly Concurrent Garbage Collection Revisited Katherine Barabash - IBM Haifa Research Lab. Israel Yoav Ossia - IBM Haifa Research Lab. Israel.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science CRAMM: Virtual Memory Support for Garbage-Collected Applications Ting Yang, Emery.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
U NIVERSITY OF M ASSACHUSETTS Department of Computer Science Automatic Heap Sizing Ting Yang, Matthew Hertz Emery Berger, Eliot Moss University of Massachusetts.
Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Garbage Collection Without Paging Matthew Hertz, Yi Feng, Emery Berger University.
1 Reducing Generational Copy Reserve Overhead with Fallback Compaction Phil McGachey and Antony L. Hosking June 2006.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
G1 TUNING Shubham Modi( ) Ujjwal Kumar Singh(10772) Vaibhav(10780)
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
File System. NET+OS 6 File System Architecture Design Goals File System Layer Design Storage Services Layer Design RAM Services Layer Design Flash Services.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Lecture 21 Last lecture Today’s lecture Cache Memory Virtual memory
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Message Analysis-Guided Allocation and Low-Pause Incremental Garbage Collection in a Concurrent Language Konstantinos Sagonas Jesper Wilhelmsson Uppsala.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science 1 Automatic Heap Sizing: Taking Real Memory into Account Ting Yang, Emery Berger,
Computer Science Department Daniel Frampton, David F. Bacon, Perry Cheng, and David Grove Australian National University Canberra ACT, Australia
September 11, 2003 Beltway: Getting Around GC Gridlock Steve Blackburn, Kathryn McKinley Richard Jones, Eliot Moss Modified by: Weiming Zhao Oct
380C lecture 19 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Opportunity to improve data locality.
Fast Garbage Collection without a Long Wait Steve Blackburn – Kathryn McKinley Presented by: Na Meng Ulterior Reference Counting:
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Lecture 10 Page 1 CS 111 Summer 2013 File Systems Control Structures A file is a named collection of information Primary roles of file system: – To store.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
Simple Generational GC Andrew W. Appel (Practice and Experience, February 1989) Rudy Kaplan Depena CS 395T: Memory Management February 9, 2009.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
An Efficient, Incremental, Automatic Garbage Collector P. Deutsch and D. Bobrow Ivan JibajaCS 395T.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 9: Virtual Memory.
Eliminating External Fragmentation in a Non-Moving Garbage Collector for Java Author: Fridtjof Siebert, CASES 2000 Michael Sallas Object-Oriented Languages.
Immix: A Mark-Region Garbage Collector Jennifer Sartor CS395T Presentation Mar 2, 2009 Thanks to Steve for his Immix presentation from
Dynamic Compilation Vijay Janapa Reddi
Rifat Shahriyar Stephen M. Blackburn Australian National University
Chapter 11: File System Implementation
Ulterior Reference Counting Fast GC Without The Wait
David F. Bacon, Perry Cheng, and V.T. Rajan
Computer Architecture
Strategies for automatic memory management
Memory Management Kathryn McKinley.
José A. Joao* Onur Mutlu‡ Yale N. Patt*
Chapter 14: File-System Implementation
Reference Counting vs. Tracing
Presentation transcript:

MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation by S. Narayanan

Motivation Handheld Devices Cellular phones, PDAs widely used Diverse applications Media players, Video Games, Digital cameras, GPS Ideally: High throughput Short response time Better memory utilization (Constrained memory)

Review: Generational Collection Divide heap into regions called generations Generations segregate objects by age Focus GC effort on younger objects Nursery Old Generation

Generational Copying Collection Old Generation A B C Root D ED E Nursery Remset

A B C Root D E A D B E Old GenerationNursery Generational Copying Collection

Mark Copy (MC) Extends Generational Copying Collector Old generation: A number of equal size subregions called windows Start MC when we have 1 full window left of space

MC – Mark Phase

MC-Copy Phase

Why not Mark-Copy? Maps and unmaps pages in windows during copying phase (virtual memory) Always copies all live data in the old gen. Long pauses are possible from marking and copying

MC 2 Free list of windows Does not use object addresses for relative location. Instead, indirect addressing Decouples address from logical window number No need to map and unmap evacuated windows No need to copy data out of every window. A new logical number can be assigned to it Two phases: Mark and Copy

MC 2 – Mark Phase Logically order old gen. windows Three mark phase tasks: Mark reachable objects Calculate live data volume in each window Build per-window remembered sets If occupancy exceeds threshold (80%): start Interleave marking with nursery allocation Reduces mark phase pauses

MC 2 Example – Mark Phase Old Generation Nursery L1L1 L2L2 L3L3 L4L4 Root L max Root Nursery Unreachable Reachable Marked Copied MC 2 keeps a work queue: objects marked but not yet scanned

Old Generation Nursery L1L1 L2L2 L3L3 L4L4 Root L max Root Nursery Unreachable Reachable Marked Copied MC 2 Example – Mark Phase

Old Generation Nursery L1L1 L2L2 L3L3 L4L4 Root L max Root Nursery Unreachable Reachable Marked Copied MC 2 Example – Mark Phase Root

Old Generation Nursery L1L1 L2L2 L3L3 L4L4 Root L max Root Nursery Unreachable Reachable Marked Copied MC 2 Example – Mark Phase Root

Incremental Marking Error

Handling marking error Track mutations using write barrier Record modified old generation objects Scan these modified objects at GC time Snapshot at the beginning Incremental Update

MC 2 – Copy Phase Copy and compact reachable data Performed in small increments One windowful of live data copied per increment One increment per nursery collection High-occupancy windows copied logically

MC 2 Example – Copy Phase Old Generation Nursery L max L2L2 L3L3 L4L4 Root L max Root Copy a window worth of data during nursery collection Nursery Unreachable Reachable Marked Copied

Old Generation Nursery L max L2L2 L3L3 L4L4 Root L max Root Nursery Unreachable Reachable Marked Copied Copy a window worth of data during nursery collection MC 2 Example – Copy Phase

Old Generation Nursery L2L2 L3L3 L4L4 Root L max Root Nursery Unreachable Reachable Marked Copied Copy a window worth of data during nursery collection MC 2 Example – Copy Phase

Old Generation Nursery L2L2 L3L3 L4L4 Root L max Root Nursery Unreachable Reachable Marked Copied Copy a window worth of data during nursery collection MC 2 Example – Copy Phase

Old Generation Nursery L max Root L max Root Nursery Unreachable Reachable Marked Copied Copy a window worth of data during nursery collection MC 2 Example – Copy Phase

MC 2 Details Large remembered sets Bounding space overhead Popular objects Preventing long pauses

Handling large remembered sets Normal remembered set for W1 stores 5 pointers (20 bytes on a 32 bit machine) W1W2

W1 W2 Card 1 Card 2 Card table requires only 2 bytes (one per card) 11 W0 W1 W2 Handling large remembered sets

Set a limit on the total remembered set size (e.g., 5% of total heap space). Replace large remembered sets with card table when total size approaches limit Trade offs Marking bit is more expensive than inserting into sequential buffer Scanning card table when marking also more expensive

Handling popular objects When converting SBB -> Card Table Keep count of pointers to each object If > 100, move them to especial window Normal Window Group Popular Window Normal Window Group

Handling popular objects Isolate popular object at high end of heap Do not need to maintain references to the objects Copying a popular object can cause a long pause, but does not recur

Experimental Results Implemented in Jikes RVM (2.2.3)/MMTk Pentium GHz, 512MB memory, RedHat Linux Benchmarks: SPECjvm98, pseudojbb Collectors evaluated Generational Mark-Sweep (MS) Generational Mark-(Sweep)-Compact (MSC) MC2

Results Summary MC2: suitable for handheld devices with software real time requirements Low space overhead (50-80%) Good throughput (3-4% slower than non- incremental compacting collector) Short pause times (17-41ms, factor of 6 lower than non-incremental compacting collector) Well distributed pauses

Discussion Compare to IMMIX: windows vs blocks and lines CMP: Incremental Marking Incremental Copying We lost objects ordering: Is there a way we could have the benefits of MC 2 without having to look at the roots and remsets for each window? What about the effects of paging? Can we hint the VMM to NOT evict certain windows or what/when to bring back certain windows

Execution Time relative to MC 2 (Heap Size = 1.8x max. live size)

Max. Pause Time relative to MC 2 (Heap Size = 1.8x max. live size) MC 2 max. pause range: 17-41ms