0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

Slides:



Advertisements
Similar presentations
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Advertisements

CAFÉ: Scalable Task Pool with Adjustable Fairness and Contention Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, Dmitri Perelman Technion, Israel Institute.
Tutorial 8 March 9, 2012 TA: Europa Shang
Garbage collection David Walker CS 320. Where are we? Last time: A survey of common garbage collection techniques –Manual memory management –Reference.
CS140 Review Session Project 4 – File Systems Samir Selman 02/27/09cs140 Review Session Due Thursday March,11.
A Block-structured Heap Simplifies Parallel GC Simon Marlow (Microsoft Research) Roshan James (U. Indiana) Tim Harris (Microsoft Research) Simon Peyton.
1 Write Barrier Elision for Concurrent Garbage Collectors Martin T. Vechev Cambridge University David F. Bacon IBM T.J.Watson Research Center.
Lecture 12 Reduce Miss Penalty and Hit Time
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Garbage Collection What is garbage and how can we deal with it?
Portable, mostly-concurrent, mostly-copying GC for multi-processors Tony Hosking Secure Software Systems Lab Purdue University.
Reducing Pause Time of Conservative Collectors Toshio Endo (National Institute of Informatics) Kenjiro Taura (Univ. of Tokyo)
MC 2 : High Performance GC for Memory-Constrained Environments - Narendran Sachindran, J. Eliot B. Moss, Emery D. Berger Sowmiya Chocka Narayanan.
An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
By Jacob SeligmannSteffen Grarup Presented By Leon Gendler Incremental Mature Garbage Collection Using the Train Algorithm.
MC 2 : High Performance GC for Memory-Constrained Environments N. Sachindran, E. Moss, E. Berger Ivan JibajaCS 395T *Some of the graphs are from presentation.
CPSC 388 – Compiler Design and Construction
0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center.
CS 536 Spring Automatic Memory Management Lecture 24.
1 The Compressor: Concurrent, Incremental and Parallel Compaction. Haim Kermany and Erez Petrank Technion – Israel Institute of Technology.
MOSTLY PARALLEL GARBAGE COLLECTION Authors : Hans J. Boehm Alan J. Demers Scott Shenker XEROX PARC Presented by:REVITAL SHABTAI.
0 Parallel and Concurrent Real-time Garbage Collection Part I: Overview and Memory Allocation Subsystem David F. Bacon T.J. Watson Research Center.
Correctness-Preserving Derivation of Concurrent Garbage Collection Algorithms Martin T. Vechev Eran Yahav David F. Bacon University of Cambridge IBM T.J.
Incremental Garbage Collection
1 An Efficient On-the-Fly Cycle Collection Harel Paz, Erez Petrank - Technion, Israel David F. Bacon, V. T. Rajan - IBM T.J. Watson Research Center Elliot.
1 COMP 206: Computer Architecture and Implementation Montek Singh Wed, Nov 9, 2005 Topic: Caches (contd.)
Damien Doligez Georges Gonthier POPL 1994 Presented by Eran Yahav Portable, Unobtrusive Garbage Collection for Multiprocessor Systems.
Uniprocessor Garbage Collection Techniques Paul R. Wilson.
Garbage Collection and High-Level Languages Programming Languages Fall 2003.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
1 Overview Assignment 6: hints  Living with a garbage collector Assignment 5: solution  Garbage collection.
SEG Advanced Software Design and Reengineering TOPIC L Garbage Collection Algorithms.
CLR: Garbage Collection Inside Out
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center The Metronome: A Hard Real-time Garbage Collector.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
Chapter 4. INTERNAL REPRESENTATION OF FILES
A Mostly Non-Copying Real-Time Collector with Low Overhead and Consistent Utilization David Bacon Perry Cheng (presenting) V.T. Rajan IBM T.J. Watson Research.
Storage Management - Chap 10 MANAGING A STORAGE HIERARCHY on-chip --> main memory --> 750ps - 8ns ns. 128kb - 16mb 2gb -1 tb. RATIO 1 10 hard disk.
Incremental Garbage Collection Uwe Kern 23. Januar 2002
Chapter 4 Memory Management Virtual Memory.
DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.
Java Thread and Memory Model
Garbage Collection and Memory Management CS 480/680 – Comparative Languages.
Concurrent Garbage Collection Presented by Roman Kecher GC Seminar, Tel-Aviv University 23-Dec-141.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
UniProcessor Garbage Collection Techniques Paul R. Wilson University of Texas Presented By Naomi Sapir Tel-Aviv University.
David F. Bacon Perry Cheng V.T. Rajan IBM T.J. Watson Research Center ControllingFragmentation and Space Consumption in the Metronome.
A REAL-TIME GARBAGE COLLECTOR WITH LOW OVERHEAD AND CONSISTENT UTILIZATION David F. Bacon, Perry Cheng, and V.T. Rajan IBM T.J. Watson Research Center.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Processes and Threads.
Concurrent Mark-Sweep Presented by Eyal Dushkin GC Seminar, Tel-Aviv University
® July 21, 2004GC Summer School1 Cycles to Recycle: Copy GC Without Stopping the World The Sapphire Collector Richard L. Hudson J. Eliot B. Moss Originally.
The Metronome Washington University in St. Louis Tobias Mann October 2003.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
Memory Hierarchy— Five Ways to Reduce Miss Penalty.
.NET Memory Primer Martin Kulov. "Out of CPU, memory and disk, memory is typically the most important for overall system performance." Mark Russinovich.
GARBAGE COLLECTION Student: Jack Chang. Introduction Manual memory management Memory bugs Automatic memory management We know... A program can only use.
Garbage Collection What is garbage and how can we deal with it?
Copyright ©: Nahrstedt, Angrave, Abdelzaher
Hazard Pointers C++ Memory Ordering Issues
David F. Bacon, Perry Cheng, and V.T. Rajan
Strategies for automatic memory management
Chapter 12 Memory Management
Internal Representation of Files
Continuously and Compacting By Lyndon Meadow
Garbage Collection What is garbage and how can we deal with it?
Presentation transcript:

0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center

1 Part 2: Trace (aka Mark) Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

2 Let’s Assume a Stack Snapshot Stack r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b

3 Yuasa Algorithm Review: 2(a): Copy Over-written Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b

4 Yuasa Algorithm Review: 2(b): Trace Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b * Color is per-object mark bit

5 Yuasa Algorithm Review: 2(c): Allocate “Black” Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b

6 Non-monotonicity in Tracing 1 GB

7 Which Design Pattern is This? pool of work Shared Monotonic Work Pool Per-Thread State Update 

8 Trace is Non-Monotonic… and requires thread-local data GC Worker Threads GC Master Thread Application Threads pool of non-monotonic work

9 Basic Solution Check if there are more work packets –If some found, trace is not done yet –If none found, “probably done” Pause all threads Re-scan for non-empty buffers Resume all threads If none, done Otherwise, try again later

10 Ragged Barriers: How to Stop without Stopping Epoch 1 A BC - Local Epoch - Global Min - Global Max Epoch 4 Epoch 3 Epoch 2

11 “Trace” Phase

12 The Thread’s Full Monty Thread 1 inVM  color Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  epoch 43 BarrierOn true BarrierOn true color PhaseInfo phase workers PhaseInfo phase workers trace 3 color sweep  Epoch agreed newest Epoch agreed newest 4335

13 Work Packet Data Structures wbuf fill wbuf fill wbuf trace wbuf trace totrace free wbuf.epoch < Epoch.agreed WBufEpoch 39 WBufCount

14 trace() { thread->epoch = Epoch.newest; bool canTerminate = true; if (WBufCount > 0) getWriteBuffers(); canTerminate = false; while (b = wbuf-trace.pop()) if (! moreTime()) return; int traceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); while (b = totrace.pop()) if (! moreTime()) return; int TraceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); if (canTerminate) traceTerminate(); }

15 Getting Write Buffer Roots getWriteBuffers() { thread->epoch = fetchAndAdd(Epoch.newest, 1); WBufEpoch = thread->epoch; // mutators will dump wbufs LOCK(wbuf-fill); LOCK(wbuf-trace); for each (wbuf in wbuf-fill) if (wbuf.epoch < Epoch.agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace; UNLOCK(wbuf-trace); UNLOCK(wbuf-fill); }

16 Write Barrier writeBarrier(Object object, Field field, Object new) { if (BarrierOn) Object old = object[field]; if (old != null && ! old.marked) outOfLineBarrier(old); if (thread->dblb) // double barrier outOfLineBarrier(new); }

17 Write Barrier Slow Path outOfLineBarrier(Object obj) { if (obj == null || obj.marked) return; obj.marked = true; bool epochOK = thread->wbuf->epoch == WBufEpoch; bool haveRoom = thread->wbuf->data wbuf->end; if (! (epochOK && enoughSpace)) thread->wbuf = flushWBufAndAllocNew(thread->wbuf); // Updates WBufEpoch, Epoch.newest *thread->wbuf->data++ = obj; }

18 “Trace Terminate” Phase

19 Trace Termination wbuf fill wbuf fill wbuf trace wbuf trace totrace free WBufEpoch 39 WBufCount 0

20 Asynchronous Agreement BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers trace 1 Epoch agreed newest Epoch agreed newest 4237 PhaseInfo phase workers PhaseInfo phase workers terminate 1 desiredEpoch = Epooch.newest; … WAIT FOR Epoch.agreed == desiredEpoch if (WBufCount == 0) DONE else RESUME TRACING

21 Ragged Barrier bool raggedBarrier(desiredEpoch, urgent) { if (Epoch.agreed >= desiredEpoch) return true; LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread.epoch); Epoch.agreed = latest; UNLOCK(threadlist); if (epoch.agreed >= desiredEpoch) return true; else doCallbacks(RAGGED_BARRIER, true, urgent); return false; } * Non-locking implementation?

22 Part 1: Scan Roots Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

23 Fuzzy Snapshot Finally, we assume no magic Initiate Collection

24 “Initiate Collection” Phase

25 Initiate: Color Black, Double Barrier Thread 1 inVM  Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list epoch 43 BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers initiate 3 color sweep  Epoch agreed newest Epoch agreed newest 4237 color dblb  dblb  dblb  dblb 

26 What is a Double Barrier? Store both Old and New Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b

27 Why Double Barrier? T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n “Snapshot” = { V, W, X, Z } V V a b T3 Stack k k m m j j T2: m.b = n (writes X.b = W) T3: j.b = k (writes X.b = V) T1: q = p.b (reads X.b: V, W, or Z??) q q

28 Yuasa (Single) Barrier with 2 Writers T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j T2: m.b = n (X.b = W) q q T2T3 T3: j.b = k (X.b = V)

29 Yuasa Barrier Lost Update T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j T2: m.b = n (X.b = W) q q T2T3 T3: j.b = k (X.b = V)

30 Hosed! T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j q q T2T3T1 T2: m.b = n (X.b = W) T3: j.b = k (X.b = V) T1: q = p.b (q <- W) T2: n = null T1: Scan Stack T2: Scan Stack

31 “Thread Stack Scan” Phase

32 Scan Stacks (double barrier off) Thread 1 inVM  Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list epoch 43 BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers initiate 3 color sweep  Epoch agreed newest Epoch agreed newest 4237 color dblb  dblb  dblb  dblb  Scan Stack 1 Scan Stack 2

33 All Done!

34 Boosting: Ensuring Progress GC Worker Threads GC Master Thread Application Threads (may do GC work) Boost to master priority Revert(?)

35 Part 4: Defragmentation Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention) Defragmentation

36 free ? 16 free 16 free 64 free 64 free 256 free 256 alloc’d sweep

37 Two-way Communication GC Worker Threads GC Master Thread Application Threads (may do GC work) objects have movedpointers have changed

38 Defragmentation T T a b U U a b Z Z a b X X a b T’T’ T’T’ a b C C a b B B a b D D a b E E a b

39 Staccato Algorithm FREE T T a b NORMAL T T a b COPYING T T a b MOVED T T a b  T’T’ T’T’ a b T’T’ T’T’ a b allocate nop defragment cas access (abort) cas commit cas access load access load reap store create store

40 Scheduling

41 Guaranteeing Real Time Guaranteeing usability without realtime: –Must know maximum live memory If fragmentation & metadata overhead bounded We also require: –Maximum allocation rate (MB/s) How does the user figure this out??? –Very simple programming style –Empirical measurement –(Research) Static analysis

42 Conclusions Systems are made of concurrent components Basic building blocks: –Locks –Try-locks –Compare-and-Swap –Non-locking stacks, lists, … –Monotonic phases –Logical clocks and asynchronous agreement Encapsulate so others won’t suffer!

43

44 Legend GC Phases APP TRACE SNAPSHOT FLIP SWEEP APP TERMINATE APP … … … … … … … …