Presentation is loading. Please wait.

Presentation is loading. Please wait.

0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center.

Similar presentations


Presentation on theme: "0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center."— Presentation transcript:

1 0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center

2 1 Page Data Synchronization, Take 1 16 64 256 free

3 2 Page Data, Take 2 16 64 256 free 16 64 256 Thread 1 16 64 256 Thread 2

4 3 free ? 16 free 16 free 64 free 64 free 256 free 256 16 64 temp full temp full

5 4 Compare-and-Swap* boolean CAS(int* addr, int old, int new) { atomic { if (* addr == old) { *addr = new; return true; } else { return false; } IBM 370 since 1970, Intel 80486 since 1989 Load-Reserved: ARM 6 since 1991, IBM POWER2 since 1993

6 5 Non-locking* Stack push(Page page) { do { Page t = top; page->next = t; } while (! CAS(&top, t, page); } Page pop() { Page t = null; do { t = top; if (t == null) return; Page n = t->next; } while (! CAS(&top, t, n); return t; } top

7 6 A A B B C C A A B B C C C C A A B B Thread 1 x=pop() start Thread 2 a=pop() b=pop() top Thread 2 push(A) C C B B top Thread 1 pop() finish 1 2 34 n t n t n t A A b a b

8 7 Correct Non-locking Stack push(PageId page) { do { = top; page->next = t; } while (! CAS(&top,, ); } PageId pop() { PageId t = 0; do { = top; if (t==0) return 0; } while (! CAS(&top,, ); return t; }

9 8 Really Correct Non-locking Stack push(Page page) { do { Page t = LOAD_AND_RESERVE(&top); page->next = t; } while (! STORE_CONDITIONAL(&top, page)); } Page pop() { Page t = null; do { Page t = LOAD_AND_RESERVE(&top); if (t==null) return null; } while (! STORE_CONDITIONAL(&top, t->next)); return t; }

10 9 Coalescing Ideal: –getMultiplePages(n) returns best fit –Does not cause mutator activities to block –Does not cause collection activities to block –Does not suffer pathology under contention Reality: tradeoffs between –Quality of fit –Probability of blocking –Overhead (number of CAS’s) free ?

11 10 Cartesian Tree + Try-lock 44 ? x.left.size = x.right.size x.left.address < x.address < x.right.address 45 46 47 33 34 35 62 51 50 11 25 24 22 21 63 59

12 11 Non-locking Rebalancing Tree Homework… …but I won’t use it. Solution would be –Too complex to be reliable –Require too many CAS operations to be fast

13 12 The Fix: Transactional Memory AtomicCartesianTree extends CartesianTree { atomic Block getBlock() { return super.getBlock(); } atomic void releaseBlock(Block b) { super.releaseBlock(b); } atomic Block[] getMultipleBlocks(int n) { return super.getMultipleBlocks(n); }

14 13 Hierarchical Non-locking Buddy? 44 45 46 47 33 34 35 62 51 50 11 25 24 22 21 63 59

15 14 So much for allocation… Garbage Collection

16 15 Handling the Concurrency ? r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b Y Y a b X X a b GC Threads Application (“mutator”) Threads Trace Collect Format Load pointer Store pointer Allocate

17 16 Yuasa Snapshot Algorithm (1990) Logically –Mark-and-Sweep Style Algorithm –Take a “copy-on-write” heap snapshot –Collect the garbage in that snapshot Physically –Stop all threads –Copy their stacks (“roots”) –Force them to save over-written pointers –Trace roots and over-written pointers * Dijktstra’75, Steele’76

18 17 1: Take Logical Snapshot Stack r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b

19 18 2(a): Copy Over-written Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b

20 19 2(b): Trace Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b * Color is per-object mark bit

21 20 2(c): Allocate “Black” Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b

22 21 3(a): Sweep Garbage Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b s s V V a b free

23 22 3(b): Allocate “White” Stack p p T T a b X X a b U U a b Z Z a b Y Y a b W W a b Z Z a b Y Y a b X X a b s s V V a b free V2 a b t t

24 23 4: Clear Marks Stack p p T T a b X X a b U U a b Z Z a b Y Y a b Z Z a b Y Y a b X X a b s s V V a b free V2 a b t t V V a b W W a b W W a b

25 24 Yuasa Algorithm Phases Snapshot stack and global roots Trace Flip Sweep Flip Clear Marks * Synchronous

26 25 Metronome-2 Concurrency GC Worker Threads GC Master Thread Application Threads (may do GC work)

27 26 Metronome-2 Phases Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

28 27 Part 3: Sweep Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)

29 28 Assume Trace Has Finished Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b

30 29 JVM Application Thread Structure 16 64 256 Thread 1 inVM  color 16 64 256 Thread 2 epoch 37 inVM  color writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  sweep  epoch 43 sweep  BarrierOn true BarrierOn false

31 30 “Move Available” Phase

32 31 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 256 16 64 available

33 32 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 256 16 64 available contention?

34 33 Generic Structure of Move Phase: Shared Monotonic Work Pool GC Worker Threads GC Master Thread Application Threads (may do GC work) pool of work

35 34 Performing a GC Work Quantum every (500us) tryToDoSomeWorkForGC(); tryToDoSomeWorkForGC() { short phase = enterPhase(); if (phase > 0) doQuantum(phase); leavePhase(); }

36 35 Phase Entry short enterPhase() { thread->epoch = Epoch.newest; while (true) = PhaseInfo; if (workers == phaseTable[phase].maxWorkers) return -1; if (CAS(PhaseInfo,, )) return phase; } PhaseInfo phase workers PhaseInfo phase workers moveavail 3 ABA?

37 36 Phase Exit short leavePhase() { while (true) = PhaseInfo; if (workers > 1 || moreWorkForPhase(phase)) newPhase = ; else if (phase == TRACE_PHASE) newPhase = ; else newPhase = ; if (CAS(PhaseInfo,, newPhase)) return newPhase; } PhaseInfo phase workers PhaseInfo phase workers moveavail 3

38 37 “Per-Thread Flush” Phase

39 38 Flush Per-Thread Pages 16 64 256 Thread 1 inVM  color 16 64 256 Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  epoch 43 BarrierOn true BarrierOn false color PhaseInfo phase workers PhaseInfo phase workers flush 1 color sweep  sweep 

40 39 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 256 16 64 available Put Full White Pages on temp-full List

41 40 How is this Phase Different? GC Worker Threads GC Master Thread Application Threads (may do GC work)  pool of work Per-Thread State Update

42 41 Callback Mechanism bool doCallbacks(int callbackId) { bool alldone = true; LOCK(threadlist); for each (Thread thread in threadlist) if (hasCallbackWork(thread, callbackId); if (! thread.inVM && TRY_LOCK(thread)) doCallbackWorkForThread(thread, callbackId); UNLOCK(thread); else postCallback(thread, callbackId); alldone = false; UNLOCK(threadlist); return alldone; } * Non-locking list with cursor?

43 42 “Sweep” Phase

44 43 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 64 available Sweep Phase 256 16

45 44 “Switch Full List” Phase

46 45 Switch Back to Normal Full List 16 64 256 Thread 1 inVM  color 16 64 256 Thread 2 epoch 37 inVM  writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb  epoch 43 BarrierOn true BarrierOn false color PhaseInfo phase workers PhaseInfo phase workers switchfull 1 color sweep  sweep 

47 46 “Drain Temp Full” Phase

48 47 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 64 available Drain Temp Full 256 16

49 48 Done with Collection!! But… –That Was the Easy Part And there are still some problems –What if threads go out to lunch?

50 49 http://www.research.ibm.com/metronome https://sourceforge.net/projects/tuningforkvp


Download ppt "0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center."

Similar presentations


Ads by Google