Presentation is loading. Please wait.

Presentation is loading. Please wait.

0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center.

Similar presentations


Presentation on theme: "0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center."— Presentation transcript:

1 0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center

2 1 Part 2: Trace (aka Mark) Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

3 2 Let’s Assume a Stack Snapshot Stack r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b

4 3 Yuasa Algorithm Review: 2(a): Copy Over-written Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b

5 4 Yuasa Algorithm Review: 2(b): Trace Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b * Color is per-object mark bit

6 5 Yuasa Algorithm Review: 2(c): Allocate “Black” Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b

7 6 Non-monotonicity in Tracing 1 GB

8 7 Which Design Pattern is This? pool of work Shared Monotonic Work Pool Per-Thread State Update 

9 8 Trace is Non-Monotonic… and requires thread-local data GC Worker Threads GC Master Thread Application Threads pool of non-monotonic work

10 9 Basic Solution Check if there are more work packets –If some found, trace is not done yet –If none found, “probably done” Pause all threads Re-scan for non-empty buffers Resume all threads If none, done Otherwise, try again later

11 10 Ragged Barriers: How to Stop without Stopping Epoch 1 A BC - Local Epoch - Global Min - Global Max Epoch 4 Epoch 3 Epoch 2

12 11 “Trace” Phase

13 12 The Thread’s Full Monty 16 64 256 Thread 1 inVM  color 16 64 256 Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  epoch 43 BarrierOn true BarrierOn true color PhaseInfo phase workers PhaseInfo phase workers trace 3 color sweep  Epoch agreed newest Epoch agreed newest 4335

14 13 Work Packet Data Structures wbuf fill wbuf fill wbuf trace wbuf trace totrace free wbuf.epoch < Epoch.agreed WBufEpoch 39 WBufCount 3 3937

15 14 trace() { thread->epoch = Epoch.newest; bool canTerminate = true; if (WBufCount > 0) getWriteBuffers(); canTerminate = false; while (b = wbuf-trace.pop()) if (! moreTime()) return; int traceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); while (b = totrace.pop()) if (! moreTime()) return; int TraceCount = traceBufferContents(b); canTerminate &= (traceCount == 0); if (canTerminate) traceTerminate(); }

16 15 Getting Write Buffer Roots getWriteBuffers() { thread->epoch = fetchAndAdd(Epoch.newest, 1); WBufEpoch = thread->epoch; // mutators will dump wbufs LOCK(wbuf-fill); LOCK(wbuf-trace); for each (wbuf in wbuf-fill) if (wbuf.epoch < Epoch.agreed) remove wbuf from wbuf-fill; add wbuf to wbuf-trace; UNLOCK(wbuf-trace); UNLOCK(wbuf-fill); }

17 16 Write Barrier writeBarrier(Object object, Field field, Object new) { if (BarrierOn) Object old = object[field]; if (old != null && ! old.marked) outOfLineBarrier(old); if (thread->dblb) // double barrier outOfLineBarrier(new); }

18 17 Write Barrier Slow Path outOfLineBarrier(Object obj) { if (obj == null || obj.marked) return; obj.marked = true; bool epochOK = thread->wbuf->epoch == WBufEpoch; bool haveRoom = thread->wbuf->data wbuf->end; if (! (epochOK && enoughSpace)) thread->wbuf = flushWBufAndAllocNew(thread->wbuf); // Updates WBufEpoch, Epoch.newest *thread->wbuf->data++ = obj; }

19 18 “Trace Terminate” Phase

20 19 Trace Termination wbuf fill wbuf fill wbuf trace wbuf trace totrace free WBufEpoch 39 WBufCount 0

21 20 Asynchronous Agreement BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers trace 1 Epoch agreed newest Epoch agreed newest 4237 PhaseInfo phase workers PhaseInfo phase workers terminate 1 desiredEpoch = Epooch.newest; … WAIT FOR Epoch.agreed == desiredEpoch if (WBufCount == 0) DONE else RESUME TRACING

22 21 Ragged Barrier bool raggedBarrier(desiredEpoch, urgent) { if (Epoch.agreed >= desiredEpoch) return true; LOCK(threadlist); int latest = MAXINT; for each (Thread thread in threadlist) latest = min(latest, thread.epoch); Epoch.agreed = latest; UNLOCK(threadlist); if (epoch.agreed >= desiredEpoch) return true; else doCallbacks(RAGGED_BARRIER, true, urgent); return false; } * Non-locking implementation?

23 22 Part 1: Scan Roots Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

24 23 Fuzzy Snapshot Finally, we assume no magic Initiate Collection

25 24 “Initiate Collection” Phase

26 25 Initiate: Color Black, Double Barrier 16 64 256 Thread 1 inVM  16 64 256 Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list epoch 43 BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers initiate 3 color sweep  Epoch agreed newest Epoch agreed newest 4237 color dblb  dblb  dblb  dblb 

27 26 What is a Double Barrier? Store both Old and New Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b

28 27 Why Double Barrier? T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n “Snapshot” = { V, W, X, Z } V V a b T3 Stack k k m m j j T2: m.b = n (writes X.b = W) T3: j.b = k (writes X.b = V) T1: q = p.b (reads X.b: V, W, or Z??) q q

29 28 Yuasa (Single) Barrier with 2 Writers T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j T2: m.b = n (X.b = W) q q T2T3 T3: j.b = k (X.b = V)

30 29 Yuasa Barrier Lost Update T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j T2: m.b = n (X.b = W) q q T2T3 T3: j.b = k (X.b = V)

31 30 Hosed! T1 Stack p p X X a b Z Z a b W W a b T2 Stack n n V V a b T3 Stack k k m m j j q q T2T3T1 T2: m.b = n (X.b = W) T3: j.b = k (X.b = V) T1: q = p.b (q <- W) T2: n = null T1: Scan Stack T2: Scan Stack

32 31 “Thread Stack Scan” Phase

33 32 Scan Stacks (double barrier off) 16 64 256 Thread 1 inVM  16 64 256 Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list epoch 43 BarrierOn true BarrierOn true PhaseInfo phase workers PhaseInfo phase workers initiate 3 color sweep  Epoch agreed newest Epoch agreed newest 4237 color dblb  dblb  dblb  dblb  Scan Stack 1 Scan Stack 2

34 33 All Done!

35 34 Boosting: Ensuring Progress GC Worker Threads GC Master Thread Application Threads (may do GC work) Boost to master priority Revert(?)

36 35 Part 4: Defragmentation Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention) Defragmentation

37 36 free ? 16 free 16 free 64 free 64 free 256 free 256 alloc’d sweep 256 16 64

38 37 Two-way Communication GC Worker Threads GC Master Thread Application Threads (may do GC work) objects have movedpointers have changed

39 38 Defragmentation T T a b U U a b Z Z a b X X a b T’T’ T’T’ a b C C a b B B a b D D a b E E a b

40 39 Staccato Algorithm FREE T T a b NORMAL T T a b COPYING T T a b MOVED T T a b  T’T’ T’T’ a b T’T’ T’T’ a b allocate nop defragment cas access (abort) cas commit cas access load access load reap store create store

41 40 Scheduling

42 41 Guaranteeing Real Time Guaranteeing usability without realtime: –Must know maximum live memory If fragmentation & metadata overhead bounded We also require: –Maximum allocation rate (MB/s) How does the user figure this out??? –Very simple programming style –Empirical measurement –(Research) Static analysis

43 42 Conclusions Systems are made of concurrent components Basic building blocks: –Locks –Try-locks –Compare-and-Swap –Non-locking stacks, lists, … –Monotonic phases –Logical clocks and asynchronous agreement Encapsulate so others won’t suffer!

44 43 http://www.research.ibm.com/metronome https://sourceforge.net/projects/tuningforkvp

45 44 Legend GC Phases APP TRACE SNAPSHOT FLIP SWEEP APP TERMINATE APP … … … … … … … …


Download ppt "0 Parallel and Concurrent Real-time Garbage Collection Part III: Tracing, Snapshot, and Defragmentation David F. Bacon T.J. Watson Research Center."

Similar presentations


Ads by Google