Download presentation
Presentation is loading. Please wait.
Published byJonathan Russell Modified over 9 years ago
1
0 Parallel and Concurrent Real-time Garbage Collection Part II: Memory Allocation and Sweeping David F. Bacon T.J. Watson Research Center
2
1 Page Data Synchronization, Take 1 16 64 256 free
3
2 Page Data, Take 2 16 64 256 free 16 64 256 Thread 1 16 64 256 Thread 2
4
3 free ? 16 free 16 free 64 free 64 free 256 free 256 16 64 temp full temp full
5
4 Compare-and-Swap* boolean CAS(int* addr, int old, int new) { atomic { if (* addr == old) { *addr = new; return true; } else { return false; } IBM 370 since 1970, Intel 80486 since 1989 Load-Reserved: ARM 6 since 1991, IBM POWER2 since 1993
6
5 Non-locking* Stack push(Page page) { do { Page t = top; page->next = t; } while (! CAS(&top, t, page); } Page pop() { Page t = null; do { t = top; if (t == null) return; Page n = t->next; } while (! CAS(&top, t, n); return t; } top
7
6 A A B B C C A A B B C C C C A A B B Thread 1 x=pop() start Thread 2 a=pop() b=pop() top Thread 2 push(A) C C B B top Thread 1 pop() finish 1 2 34 n t n t n t A A b a b
8
7 Correct Non-locking Stack push(PageId page) { do { = top; page->next = t; } while (! CAS(&top,, ); } PageId pop() { PageId t = 0; do { = top; if (t==0) return 0; } while (! CAS(&top,, ); return t; }
9
8 Really Correct Non-locking Stack push(Page page) { do { Page t = LOAD_AND_RESERVE(&top); page->next = t; } while (! STORE_CONDITIONAL(&top, page)); } Page pop() { Page t = null; do { Page t = LOAD_AND_RESERVE(&top); if (t==null) return null; } while (! STORE_CONDITIONAL(&top, t->next)); return t; }
10
9 Coalescing Ideal: –getMultiplePages(n) returns best fit –Does not cause mutator activities to block –Does not cause collection activities to block –Does not suffer pathology under contention Reality: tradeoffs between –Quality of fit –Probability of blocking –Overhead (number of CAS’s) free ?
11
10 Cartesian Tree + Try-lock 44 ? x.left.size = x.right.size x.left.address < x.address < x.right.address 45 46 47 33 34 35 62 51 50 11 25 24 22 21 63 59
12
11 Non-locking Rebalancing Tree Homework… …but I won’t use it. Solution would be –Too complex to be reliable –Require too many CAS operations to be fast
13
12 The Fix: Transactional Memory AtomicCartesianTree extends CartesianTree { atomic Block getBlock() { return super.getBlock(); } atomic void releaseBlock(Block b) { super.releaseBlock(b); } atomic Block[] getMultipleBlocks(int n) { return super.getMultipleBlocks(n); }
14
13 Hierarchical Non-locking Buddy? 44 45 46 47 33 34 35 62 51 50 11 25 24 22 21 63 59
15
14 So much for allocation… Garbage Collection
16
15 Handling the Concurrency ? r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b Y Y a b X X a b GC Threads Application (“mutator”) Threads Trace Collect Format Load pointer Store pointer Allocate
17
16 Yuasa Snapshot Algorithm (1990) Logically –Mark-and-Sweep Style Algorithm –Take a “copy-on-write” heap snapshot –Collect the garbage in that snapshot Physically –Stop all threads –Copy their stacks (“roots”) –Force them to save over-written pointers –Trace roots and over-written pointers * Dijktstra’75, Steele’76
18
17 1: Take Logical Snapshot Stack r r p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b
19
18 2(a): Copy Over-written Pointers Stack p p r r T T a b X X a b U U a b Z Z a b W W a b Y Y a b
20
19 2(b): Trace Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b * Color is per-object mark bit
21
20 2(c): Allocate “Black” Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b
22
21 3(a): Sweep Garbage Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b W W a b Z Z a b Y Y a b X X a b s s V V a b free
23
22 3(b): Allocate “White” Stack p p T T a b X X a b U U a b Z Z a b Y Y a b W W a b Z Z a b Y Y a b X X a b s s V V a b free V2 a b t t
24
23 4: Clear Marks Stack p p T T a b X X a b U U a b Z Z a b Y Y a b Z Z a b Y Y a b X X a b s s V V a b free V2 a b t t V V a b W W a b W W a b
25
24 Yuasa Algorithm Phases Snapshot stack and global roots Trace Flip Sweep Flip Clear Marks * Synchronous
26
25 Metronome-2 Concurrency GC Worker Threads GC Master Thread Application Threads (may do GC work)
27
26 Metronome-2 Phases Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)
28
27 Part 3: Sweep Initiation –Setup turn double barrier on Root Scan –Active Finalizer scan –Class scan –Thread scan** switch to single barrier, color to black –Debugger, JNI, Class Loader scan Trace –Trace* –Trace Terminate*** Re-materialization 1 –Weak/Soft/Phantom Reference List Transfer –Weak Reference clearing** (snapshot) Re-Trace 1 –Trace Master –(Trace*) –(Trace Terminate***) Re-materialization 2 –Finalizable Processing Clearing –Monitor Table clearing –JNI Weak Global clearing –Debugger Reference clearing –JVMTI Table clearing –Phantom Reference clearing Re-Trace 2 –Trace Master –(Trace*) –(Trace Terminate***) –Class Unloading Completion –Finalizer Wakeup –Class Unloading Flush –Clearable Compaction** –Book-keeping * Parallel ** Callback *** Single actor symmetric Flip –Move Available Lists to Full List* (contention) turn write barrier off –Flush Per-thread Allocation Pages** switch allocation color to white switch to temp full list Sweeping –Sweep* –Switch to regular Full List** –Move Temp Full List to regular Full List* (contention)
29
28 Assume Trace Has Finished Stack p p T T a b X X a b U U a b Z Z a b W W a b Y Y a b s s V V a b W W a b Z Z a b Y Y a b X X a b
30
29 JVM Application Thread Structure 16 64 256 Thread 1 inVM color 16 64 256 Thread 2 epoch 37 inVM color writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb sweep epoch 43 sweep BarrierOn true BarrierOn false
31
30 “Move Available” Phase
32
31 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 256 16 64 available
33
32 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 256 16 64 available contention?
34
33 Generic Structure of Move Phase: Shared Monotonic Work Pool GC Worker Threads GC Master Thread Application Threads (may do GC work) pool of work
35
34 Performing a GC Work Quantum every (500us) tryToDoSomeWorkForGC(); tryToDoSomeWorkForGC() { short phase = enterPhase(); if (phase > 0) doQuantum(phase); leavePhase(); }
36
35 Phase Entry short enterPhase() { thread->epoch = Epoch.newest; while (true) = PhaseInfo; if (workers == phaseTable[phase].maxWorkers) return -1; if (CAS(PhaseInfo,, )) return phase; } PhaseInfo phase workers PhaseInfo phase workers moveavail 3 ABA?
37
36 Phase Exit short leavePhase() { while (true) = PhaseInfo; if (workers > 1 || moreWorkForPhase(phase)) newPhase = ; else if (phase == TRACE_PHASE) newPhase = ; else newPhase = ; if (CAS(PhaseInfo,, newPhase)) return newPhase; } PhaseInfo phase workers PhaseInfo phase workers moveavail 3
38
37 “Per-Thread Flush” Phase
39
38 Flush Per-Thread Pages 16 64 256 Thread 1 inVM color 16 64 256 Thread 2 epoch 37 inVM writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb epoch 43 BarrierOn true BarrierOn false color PhaseInfo phase workers PhaseInfo phase workers flush 1 color sweep sweep
40
39 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 256 16 64 available Put Full White Pages on temp-full List
41
40 How is this Phase Different? GC Worker Threads GC Master Thread Application Threads (may do GC work) pool of work Per-Thread State Update
42
41 Callback Mechanism bool doCallbacks(int callbackId) { bool alldone = true; LOCK(threadlist); for each (Thread thread in threadlist) if (hasCallbackWork(thread, callbackId); if (! thread.inVM && TRY_LOCK(thread)) doCallbackWorkForThread(thread, callbackId); UNLOCK(thread); else postCallback(thread, callbackId); alldone = false; UNLOCK(threadlist); return alldone; } * Non-locking list with cursor?
43
42 “Sweep” Phase
44
43 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 64 available Sweep Phase 256 16
45
44 “Switch Full List” Phase
46
45 Switch Back to Normal Full List 16 64 256 Thread 1 inVM color 16 64 256 Thread 2 epoch 37 inVM writebuf 43 pthread pthread_t writebuf 41 pthread pthread_t next thread list dblb epoch 43 BarrierOn true BarrierOn false color PhaseInfo phase workers PhaseInfo phase workers switchfull 1 color sweep sweep
47
46 “Drain Temp Full” Phase
48
47 free ? 16 free 16 free 64 free 64 free 256 free 256 temp full temp full 64 available Drain Temp Full 256 16
49
48 Done with Collection!! But… –That Was the Easy Part And there are still some problems –What if threads go out to lunch?
50
49 http://www.research.ibm.com/metronome https://sourceforge.net/projects/tuningforkvp
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.