Ulterior Reference Counting Fast GC Without The Wait Steve Blackburn – Kathryn McKinley Presented by: Dimitris Prountzos Slides adapted from presentation by Steve Blackburn
Outline Throughput-Responsiveness problem Reference counting & optimizations Ulterior in detail BG-RC in action Experimental evaluation Conclusion
Throughput/Responsiveness Trade-off GC and mutator share CPU Throughput: net GC/mutator ratio Responsivness: length of GC pauses GC mutator CPU Utilization (time) poor responsiveness maximum pause
The Ulterior approach Match mechanisms to object demographics Copying nursery (young space) Highly mutated, high mortality young objects Ignores most mutations GC time proportional to survivors, space efficient RC mature space Low mutation, low mortality old objects GC time proportional to mutations, space efficient Generalize deferred RC to heap objects Defer fields of highly mutated objects & enumerate them quickly Reference count only infrequently mutated fields
Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1 b 1 RC space
Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1 b c 1 RC space
Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1 b c 1 RC space
Pure Reference Counting Tracks mutations: RCM(p) RCM(p) generates a decrement and an increment for the before and after values of p: RCM(p) RC(pbefore)--, RC(pafter)++ If RC==0, Free a 1 c 1 RC space RCM(p) for every mutation is very expensive
RC Optimizations Buffering: apply RC(p)--, RC(p)++ later Coalescing: apply RCM(p) only for the initial and final values of p (coalesce intermediate values): {RCM(p), RCM(p1), ... RCM(pn)} RC(pinitial)--, RC(pfinal)++ Deferral of RCM events
Deferred Reference Counting Goal: Ignore RCM(p) for stacks & registers Deferral of p A mutation of p does not generate an RCM(p) Correctness: For all deferred p: RCR(p) at each GC Retain Event: RCR(p) po temporarily retains o regardless of RC(o) Deutsch/Bobrow use a Zero Count Table Bacon et al. use a temporary increment
Classic Deferral In deferral phase: Ignore RCM(p) for stacks & registers Stacks & Regs a b 1 RC space
Classic Deferral Ignore RCM(p) for stacks & registers Stacks & Regs a b c 1 RC space Breaks RC==0 Invariant
Classic Deferral (Bacon et al.) Divide execution in epochs Store information in buffers Root buffer (RB): Store 1st level objects Increment buffer (IB): Store increments to 1st level objects Decrement buffer (DB): Store decrements to 1st level objects At GC time do: Look at RB and apply temporary increments to all objects there Process IB of this epoch Look at RB of previous epoch and apply decrements to all objects there Process DB of previous epoch During DB processing recycle o if RC(o)=0 Avoid race conditions by Processing IB before DB Processing DB of one epoch behind
Classic Deferral (Bacon et al.) At GC time, RCR(p) for root pointers applies temporary increments. Stacks & Regs a 1 b 1 c 1 RC space a b dec buf root buf
Classic Deferral (Bacon et al.) Stacks & Regs At next GC, apply decrements a 1 b 1 c 1 RC space a b dec buf root buf
Classic Deferral (Bacon et al.) Key: Efficient enumeration of deferred pointers Stacks & Regs At next GC, apply decrements a 1 b 1 c 1 RC space a b dec buf root buf
Classic Deferral (Bacon et al.) Better, but not good enough! Stacks & Regs a 1 b 1 c 1 RC space dec buf root buf
Ulterior Reference Counting Idea: Extend deferral to select heap pointers e.g. All pointers within nursery objects Deferral is not a fixed property of p e.g. A nursery object gets promoted Integrate Event I(p) Changes p from deferred to not deferred
BG-RC Bounded Nursery Generational - RC Heap organization Bounded copying nursery Ignore mutations to nursery pointer fields RC old space Object remembering, coalescing, buffering Collection Process roots Nursery phase promotes live p to old space and I(p) RC phase processes object buffer, dec buffer
View of heap in Ulterior RC Stacks Regs defer remember a 1 b 1 r s defer d 1 e 1 t RC space non-RC space How can we efficiently Enumerate all deferred pointer fields ? Remember old to young pointers ?
Bringing it Together Deferral: Defer nursery & roots Perform I(p) on nursery promotion Piggyback on copying nursery collection Coalescing: Remember mutated RC objects Upon first mutation, dec each referent At GC time, inc each referent Piggyback remset onto this mechanism
BG-RC Write Barrier // unsync check for uniqueness 1 private void writeBarrier(VM_Address srcObj, 2 VM_Address srcSlot, 3 VM_Address tgtObj) 4 throws VM_PragmaInline { 5 if (getLogState(srcObj) != LOGGED) 6 writeBarrierSlow(srcObj); 7 VM_Magic.setMemoryAddress(srcSlot, tgtObj); 8 } 9 } // unsync check for uniqueness 10 private void writeBarrierSlow(VM_Address srcObj) 11 throws VM_PragmaNoInline { 12 if (attemptToLog(srcObj)) { 13 modifiedBuffer.push(srcObj); 14 enumeratePointersToDecBuffer(srcObj); // trade-off for sparsely 15 setLogState(srcObj, LOGGED); // modified objects 16 } 17 }
BG-RC Mutation Phase a b d e obj buf dec buf root buf Stacks Regs 1 a b 1 1 d e RC space non-RC space obj buf dec buf root buf
BG-RC Mutation Phase a b d e b d e obj buf dec buf root buf Stacks Regs 1 a b 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
BG-RC Mutation Phase a b d e b d e obj buf dec buf root buf Stacks Regs 1 a b 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
BG-RC Mutation Phase a b r d e b d e obj buf dec buf root buf Stacks Regs 1 a b r 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
BG-RC Mutation Phase a b r s d e b d e obj buf dec buf root buf Stacks Regs 1 a b r s 1 1 d e RC space non-RC space b d e obj buf dec buf root buf
BG-RC Mutation Phase a b r s d e t b d e obj buf dec buf root buf Stacks Regs 1 a b r s 1 1 d e t RC space non-RC space b d e obj buf dec buf root buf
BG-RC Mutation Phase a b r s d e t b d e obj buf dec buf root buf Stacks Regs 1 a b r s 1 1 d e t RC space non-RC space b d e obj buf dec buf root buf
BG-RC Nursery Collection: Scan Roots Stacks Regs 1 1 a b r s 1 1 d e t RC space non-RC space b d b e obj buf dec buf root buf
BG-RC Nursery Collection: Scan Roots Stacks Regs 1 1 1 a b s r s 1 1 d e t RC space non-RC space b d b e s obj buf dec buf root buf
BG-RC Nursery Collection: Scan Roots Stacks Regs 1 1 1 a b s r s 1 2 1 d e t t RC space non-RC space b d b e s obj buf dec buf root buf
BG-RC Nursery Collection: Process Object Buffer Stacks Regs 2 1 1 1 a b r s r s 1 3 1 d e t t RC space non-RC space b d b e s obj buf dec buf root buf
BG-RC Nursery Collection: Reclaim Nursery Stacks Regs 2 1 1 1 a b r s r s Reclaim 1 3 1 d e t t RC space non-RC space d b e s obj buf dec buf root buf
BG-RC RC Collection: Process Decrement Buffer Stacks Regs 2 1 1 1 a b r s 3 1 d e t RC space non-RC space d b e s obj buf dec buf root buf
BG-RC RC Collection: Recursive Decrement Stacks Regs 1 1 1 1 a b r s free 3 1 d e t RC space non-RC space e b s obj buf dec buf root buf
BG-RC RC Collection: Process Decrement Buffer Stacks Regs 1 1 1 1 a b r s 2 1 e t RC space non-RC space e b s obj buf dec buf root buf
BG-RC Collection Complete! Stacks Regs 1 1 1 1 a b r s 2 1 e t RC space non-RC space b b s s obj buf dec buf root buf
Controlling Pause Times Modest bounded nursery size Meta Data Decrement and modified object buffers Trigger a collection if too big RC time cap Limits time recursively decrementing RC obj & in cycle detection Cycles - pure RC is incomplete Use Bacon/Rajan trial deletion algorithm
Experimental evaluation Jikes RVM with MMTK Compare MS, BG-MS, BG-RC, RC Examine various heap sizes Collection triggers Each 4MB of allocation for BG-RC (1 MB for RC) Time cap of 60 ms Cycle detection at 512 KB
Throughput/Pause time Moderate Heap Size 175 1.53 53 0.98 210 1.00 214 1.23 mean 121 1.14 43 0.96 178 185 1.05 mpeg 1.11 59 1.01 244 238 db 297 1.33 281 264 pjbb 72 0.93 68 0.88 160 .98 cmpress 130 1.75 49 1.04 180 241 1.29 mtrt 133 1.71 1.03 184 203 1.31 raytrace 1.66 44 0.94 1.52 jack 580 1.78 285 268 javac 131 2.36 0.99 181 182 1.91 jess max pause norm time RC BG-RC BG-MS MS
Throughput & Responsiveness
Conclusion Ulterior design based on careful study of object demographics and making collector aware of them Extends deferred RC to heap objects Practically shows that high throughput & low pause times are compatible