0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center
1 Part 2: Trace (aka Mark) Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)
2 Let’s Assume a Stack Snapshot Stack r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b
3 Yuasa Algorithm Review: 2(a): Copy Over-written Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b
4 Yuasa Algorithm Review: 2(b): Trace Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b * Color is per-object mark bit
5 Yuasa Algorithm Review: 2(c): Allocate “Black” Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b
6 Non-monotonicity in Tracing 1 GB
7 Which Design Pattern is This? pool of work Shared Monotonic Work Pool Per-Thread State Update
8 Trace is Non-Monotonic… and requires thread-local data GC Worker Threads GC Master Thread Application Threads pool of non-monotonic work
9 Basic Solution Check if there are more work packets –If some found, trace is not done yet –If none found, “probably done” Pause all threads Re-scan for non-empty buffers Resume all threads If none, done Otherwise, try again later
10 Ragged Barriers: How to Stop without Stopping Epoch 1 A BC - Local Epoch - Global Min - Global Max Epoch 4 Epoch 3 Epoch 2
11 “Trace” Phase
12 The Thread’s Full Monty Thread 1 inVM color Thread 2 epoch 37 inVM writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb epoch 43 BarrierOn true BarrierOn true color PhaseInfo phase workers PhaseInfo phase workers trace 3 color sweep Epoch agreed newest Epoch agreed newest 4335
13 Work Packet Data Structures wbuf fill wbuf fill wbuf trace wbuf trace totrace free wbuf.epoch < Epoch.agreed WBufEpoch 39 WBufCount
14 trace() { thread->epoch = Epoch.newest; bool canTerminate = true; if (WBufCount > 0) getWriteBuffers(); canTerminate = false; while (b = wbuf-trace.pop()) if (! moreTime()) return; int traceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); while (b = totrace.pop()) if (! moreTime()) return; int TraceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); if (canTerminate) traceTerminate(); }
15 Getting Write Buffer Roots getWriteBuffers() { thread->epoch = fetchAndAdd(Epoch.newest, 1); WBufEpoch = thread->epoch; // mutators will dump wbufs LOCK(wbuf-fill); LOCK(wbuf-trace); for each (wbuf in wbuf-fill) if (wbuf.epoch < Epoch.agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace; UNLOCK(wbuf-trace); UNLOCK(wbuf-fill); }
16 Write Barrier writeBarrier(Object object, Field field, Object new) { if (BarrierOn) Object old = object[field]; if (old != null && ! old.marked) outOfLineBarrier(old); if (thread->dblb) // double barrier outOfLineBarrier(new); }
17 Write Barrier Slow Path outOfLineBarrier(Object obj) { if (obj == null || obj.marked) return; obj.marked = true; bool epochOK = thread->wbuf->epoch == WBufEpoch; bool haveRoom = thread->wbuf->data wbuf->end; if (! (epochOK && enoughSpace)) thread->wbuf = flushWBufAndAllocNew(thread->wbuf); // Updates WBufEpoch, Epoch.newest *thread->wbuf->data++ = obj; }
18 “Trace Terminate” Phase
19 Trace Termination wbuf fill wbuf fill wbuf trace wbuf trace totrace free WBufEpoch 39 WBufCount 0
20 Asynchronous Agreement BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers trace 1 Epoch agreed newest Epoch agreed newest 4237 PhaseInfo phase workers PhaseInfo phase workers terminate 1 desiredEpoch = Epooch.newest; … WAIT FOR Epoch.agreed == desiredEpoch if (WBufCount == 0) DONE else RESUME TRACING
21 Ragged Barrier bool raggedBarrier(desiredEpoch, urgent) { if (Epoch.agreed >= desiredEpoch) return true; LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread.epoch); Epoch.agreed = latest; UNLOCK(threadlist); if (epoch.agreed >= desiredEpoch) return true; else doCallbacks(RAGGED_BARRIER, true, urgent); return false; } * Non-locking implementation?
22 Part 1: Scan Roots Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)
23 Fuzzy Snapshot Finally, we assume no magic Initiate Collection
24 “Initiate Collection” Phase
25 Initiate: Color Black, Double Barrier Thread 1 inVM Thread 2 epoch 37 inVM writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list epoch 43 BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers initiate 3 color sweep Epoch agreed newest Epoch agreed newest 4237 color dblb dblb dblb dblb
26 What is a Double Barrier? Store both Old and New Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b
27 Why Double Barrier? T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n “Snapshot” = { V, W, X, Z } V V a b T3 Stack k k m m j j T2: m.b = n (writes X.b = W) T3: j.b = k (writes X.b = V) T1: q = p.b (reads X.b: V, W, or Z??) q q
28 Yuasa (Single) Barrier with 2 Writers T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j T2: m.b = n (X.b = W) q q T2T3 T3: j.b = k (X.b = V)
29 Yuasa Barrier Lost Update T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j T2: m.b = n (X.b = W) q q T2T3 T3: j.b = k (X.b = V)
30 Hosed! T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j q q T2T3T1 T2: m.b = n (X.b = W) T3: j.b = k (X.b = V) T1: q = p.b (q <- W) T2: n = null T1: Scan Stack T2: Scan Stack
31 “Thread Stack Scan” Phase
32 Scan Stacks (double barrier off) Thread 1 inVM Thread 2 epoch 37 inVM writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list epoch 43 BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers initiate 3 color sweep Epoch agreed newest Epoch agreed newest 4237 color dblb dblb dblb dblb Scan Stack 1 Scan Stack 2
33 All Done!
34 Boosting: Ensuring Progress GC Worker Threads GC Master Thread Application Threads (may do GC work) Boost to master priority Revert(?)
35 Part 4: Defragmentation Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention) Defragmentation
36 free ? 16 free 16 free 64 free 64 free 256 free 256 alloc’d sweep
37 Two-way Communication GC Worker Threads GC Master Thread Application Threads (may do GC work) objects have movedpointers have changed
38 Defragmentation T T a b U U a b Z Z a b X X a b T’T’ T’T’ a b C C a b B B a b D D a b E E a b
39 Staccato Algorithm FREE T T a b NORMAL T T a b COPYING T T a b MOVED T T a b T’T’ T’T’ a b T’T’ T’T’ a b allocate nop defragment cas access (abort) cas commit cas access load access load reap store create store
40 Scheduling
41 Guaranteeing Real Time Guaranteeing usability without realtime: –Must know maximum live memory If fragmentation & metadata overhead bounded We also require: –Maximum allocation rate (MB/s) How does the user figure this out??? –Very simple programming style –Empirical measurement –(Research) Static analysis
42 Conclusions Systems are made of concurrent components Basic building blocks: –Locks –Try-locks –Compare-and-Swap –Non-locking stacks, lists, … –Monotonic phases –Logical clocks and asynchronous agreement Encapsulate so others won’t suffer!
43
44 Legend GC Phases APP TRACE SNAPSHOT FLIP SWEEP APP TERMINATE APP … … … … … … … …