Immix: A Mark-Region Garbage Collector Curtis Dunham CS 395T Presentation Feb 2, 2011 Thanks to Steve Blackburn and Jennifer Sartor for their 2008 and.

Slides:



Advertisements
Similar presentations
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Advertisements

Steve Blackburn Department of Computer Science Australian National University Perry Cheng TJ Watson Research Center IBM Research Kathryn McKinley Department.
Dynamic Memory Management
PZ10B Programming Language design and Implementation -4th Edition Copyright©Prentice Hall, PZ10B - Garbage collection Programming Language Design.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
Copyright, 1996 © Dale Carnegie & Associates, Inc. Mark-Sweep A tracing garbage collection technique Hagen Böhm November 21st, 2001
380C Where are we & where we are going – Managed languages Dynamic compilation Inlining Garbage collection What else can you do when you examine the heap.
CS 536 Spring Automatic Memory Management Lecture 24.
An Efficient Machine-Independent Procedure for Garbage Collection in Various List Structures, Schorr and Waite CACM August 1967, pp Curtis Dunham.
ParMarkSplit: A Parallel Mark- Split Garbage Collector Based on a Lock-Free Skip-List Nhan Nguyen Philippas Tsigas Håkan Sundell Distributed Computing.
Memory Management. History Run-time management of dynamic memory is a necessary activity for modern programming languages Lisp of the 1960’s was one of.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
CS 1114: Data Structures – memory allocation Prof. Graeme Bailey (notes modified from Noah Snavely, Spring 2009)
This presentation: Sasha GoldshteinCTO, Sela Group Garbage Collection Performance Tips.
© Richard Jones, Eric Jul, mmnet GC & MM Summer School, July A Rapid Introduction to Garbage Collection Richard Jones Computing Laboratory.
21 September 2005Rotor Capstone Workshop Parallel, Real-Time Garbage Collection Daniel Spoonhower Guy Blelloch, Robert Harper, David Swasey Carnegie Mellon.
UPortal Performance Optimization Faizan Ahmed Architect and Engineering Group Enterprise Systems & Services RUTGERS
File System Implementation CSCI 444/544 Operating Systems Fall 2008.
Connectivity-Based Garbage Collection Presenter Feng Xian Author Martin Hirzel, et.al Published in OOPSLA’2003.
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
Memory Allocation and Garbage Collection. Why Dynamic Memory? We cannot know memory requirements in advance when the program is written. We cannot know.
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
Linked lists and memory allocation Prof. Noah Snavely CS1114
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
Garbage Collection Memory Management Garbage Collection –Language requirement –VM service –Performance issue in time and space.
Memory Management Last Update: July 31, 2014 Memory Management1.
Flexible Reference-Counting-Based Hardware Acceleration for Garbage Collection José A. Joao * Onur Mutlu ‡ Yale N. Patt * * HPS Research Group University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Taking Off The Gloves With Reference Counting Immix
380C Lecture 17 Where are we & where we are going –Managed languages Dynamic compilation Inlining Garbage collection –Why you need to care about workloads.
Dynamic Memory Allocation Questions answered in this lecture: When is a stack appropriate? When is a heap? What are best-fit, first-fit, worst-fit, and.
Ulterior Reference Counting: Fast Garbage Collection without a Long Wait Author: Stephen M Blackburn Kathryn S McKinley Presenter: Jun Tao.
GC Algorithm inside.NET Luo Bingqiao 5/22/2009. Agenda 1. 经典基本垃圾回收算法 2.CLR 中垃圾回收算法介绍 3.SSCLI 中 Garbage Collection 源码分析.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Copyright (c) 2004 Borys Bradel Myths and Realities: The Performance Impact of Garbage Collection Paper: Stephen M. Blackburn, Perry Cheng, and Kathryn.
How’s the Parallel Computing Revolution Going? 1How’s the Parallel Revolution Going?McKinley Kathryn S. McKinley The University of Texas at Austin.
Memory Management II: Dynamic Storage Allocation Mar 7, 2000 Topics Segregated free lists –Buddy system Garbage collection –Mark and Sweep –Copying –Reference.
Comparison of Compacting Algorithms for Garbage Collection Mrinal Deo CS395T – Spring
11/26/2015IT 3271 Memory Management (Ch 14) n Dynamic memory allocation Language systems provide an important hidden player: Runtime memory manager – Activation.
Heap storage & Garbage collection Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
Memory Management -Memory allocation -Garbage collection.
Department of Computer Sciences Z-Rays: Divide Arrays and Conquer Speed and Flexibility Jennifer B. Sartor Stephen M. Blackburn,
Consider Starting with 160 k of memory do: Starting with 160 k of memory do: Allocate p1 (50 k) Allocate p1 (50 k) Allocate p2 (30 k) Allocate p2 (30 k)
Runtime The optimized program is ready to run … What sorts of facilities are available at runtime.
Introduction to Garbage Collection. Garbage Collection It automatically reclaims memory occupied by objects that are no longer in use It frees the programmer.
2/4/20161 GC16/3011 Functional Programming Lecture 20 Garbage Collection Techniques.
CS 241 Discussion Section (12/1/2011). Tradeoffs When do you: – Expand Increase total memory usage – Split Make smaller chunks (avoid internal fragmentation)
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Memory Management CSCI 2720 Spring What is memory management? “the prudent utilization of this scarce resource (memory), whether by conservation,
Introduction to Garbage Collection. GC Fundamentals Algorithmic Components AllocationReclamation 2 Identification Bump Allocation Free List ` Tracing.
Immix: A Mark-Region Garbage Collector Jennifer Sartor CS395T Presentation Mar 2, 2009 Thanks to Steve for his Immix presentation from
Dynamic Compilation Vijay Janapa Reddi
Memory Management 6/20/ :27 PM
Rifat Shahriyar Stephen M. Blackburn Australian National University
David F. Bacon, Perry Cheng, and V.T. Rajan
Strategies for automatic memory management
Memory Management Kathryn McKinley.
List Processing in Real Time on a Serial Computer
Beltway: Getting Around Garbage Collection Gridlock
José A. Joao* Onur Mutlu‡ Yale N. Patt*
CS703 - Advanced Operating Systems
Garbage collection Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section
Reference Counting vs. Tracing
Presentation transcript:

Immix: A Mark-Region Garbage Collector Curtis Dunham CS 395T Presentation Feb 2, 2011 Thanks to Steve Blackburn and Jennifer Sartor for their 2008 and 2009 Immix presentations, respectively.1 I believe this presentation to be ~95% Steve’s PLDI talk and ~4% Jennifer Sartor’s 395T presentation and < 1% mine

Comparison to Prior Work; Contributions 2 Space efficiency Fast Collection Mutator Performance Copying Mark-Sweep Mark-Compact Status Quo Before This Work Status Quo Before This Work Post-Immix: The New World of GC Post-Immix: The New World of GC Immix: Mark-Region w/ Opportunistic Defragmentation Immix: Mark-Region w/ Opportunistic Defragmentation Bump Pointer Non-Semispace Does both (every object either marked or copied) In One Pass Does both (every object either marked or copied) In One Pass Locality

GC Fundamentals Algorithmic Components AllocationReclamation 3 Identification Bump Allocation Free List ` Tracing (implicit) Reference Counting (explicit) Sweep-to-Free Compact Evacuate 31

Mark-Compact [Styger 1967] Bump allocation + trace + compact GC Fundamentals Canonical Garbage Collectors Thanks to Steve for his Immix presentation from ` Sweep-to-Free Compact Evacuate Mark-Sweep [McCarthy 1960] Free-list + trace + sweep-to-free Semi-Space [Cheney 1970] Bump allocation + trace + evacuate

Sweep-To-Region and Mark-Region 5 ` Sweep-to-Free Compact Evacuate Reclamation Sweep-to-Region Mark-Sweep Free-list + trace + sweep-to-free Mark-Compact Bump allocation + trace + compact Semi-Space Bump allocation + trace + evacuate Mark-Region Bump alloc + trace + sweep-to-region

Naïve Mark-Region 6 Contiguous allocation into regions Excellent locality – For simplicity, objects cannot span regions Simple mark phase (like mark-sweep) – Mark objects and their containing region Unmarked regions can be freed

Heap Organization Blocks – analogous to Regions – Recyclable – Immix block = 32KB Lines – Objects can span lines – Immix line = 128B Opportunistic defragmentation – Candidate and target blocks – Single pass to mark and copy 7 Reusable for (more) allocation 256 per Block Move from mostly-empty to mostly-full Move from mostly-empty to mostly-full

Immix: Lines and Blocks 8 Small Regions Large Regions ✗ Fragmentation (can’t fill blocks) ✓ More contiguous allocation ✗ Fragmentation (false marking) Lines & Blocks N pagesapprox 1 cache line ✓ Less fragmentation  Objects span lines ✓ Fast common case  Lines marked with objects ✗ Increased metadata o/h ✗ Constrained object sizes  TLB locality, cache locality  Block > 4 X max object size Free Recyclable “In a mark-region collector, region size embodies the collector’s space-time tradeoff.” Recyclable

Allocation Policy (Recycling) 9 Recycle partially marked blocks first Minimize fragmentation Maximize sharing of freed blocks Recycle in address order – We explored other options Allocate into free blocks last Effect on locality and fragmentation?

Opportunistic Defragmentation 10 Identify source and target blocks – (see paper for heuristics) Evacuate objects in source blocks – Allocate into target blocks Opportunistic – Leave in place if no space, or object pinned Opportunistically evacuate fragmented blocks – Lightweight, uses same allocation mechanism – No cost in common case (specialized GC) Source = most holes Other heuristics?

Details Parallelizable – Coarse sweeping – Defragmentation Demand-driven overflow allocations – Medium objects Metadata space overheads – For parallel synch: mark bytes (not bits) – Line and block mark, not just object mark – Defragmentation headroom – Overflow allocation block – Conservative line marking 11

Other Optimizations 12 Implicit Marking ✓ Most objects small  Small objects implicitly mark next line ✓ V. Fast common case  Large objects mark lines exactly Implicit line mark Line mark Overflow Allocation  Multi-line objects may skip many small holes  Overflow allocation (used on failure) ✓ Large objects uncommon ✓ V. effective solution ✓ ✓

Mark-Region: Immix (Bump Allocation + Trace + Sweep-to-Region) 13 ✓ ✓ Simple, very fast collection ✓ ✓ Space efficient ✓ ✓ Good locality Actual data, taken from geomean of DaCapo, jvm98, and jbb2000 on 2.4GHz Core 2 Duo ✓ ✓ Excellent performance Excellent performance

Total Performance 14 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo

Discussion Necessity of two-level hierarchy? Caching/Paging? – Efficacy of tuned line/block sizes: e.g. actual TLB miss reduction? Implicit Marking – advantages overcome possible fragmentation? Methodology and Results 15

Minimum Heap 16

Sticky Performance 17 Geomean of DaCapo, jvm98 and jbb2000 on 2.4GHz Core 2 Duo Benefits of Sticky?