Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College.

Similar presentations


Presentation on theme: "© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College."— Presentation transcript:

1 © Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College London Simon Marlow & Simon Peyton Jones Microsoft Research, Cambridge, UK Lyndon While The University of Western Australia, Perth

2 © Imperial College London Page 2 Introduction We focus on Haskell with the intent of building an: Efficient Barrierless Hybrid Incremental Generational … garbage collector for GHC Investigate pause time bounds and mutator utilisation. Explore application to other dynamic dispatch systems.

3 © Imperial College London Page 3 Highlights Improving Non-Stop Haskell –Incremental GC read-barrier optimisation without the per-object space overhead Bridging the Generation Gap –Generational GC write-barrier optimisation Consistent Mutator Utilisation –Time-based versus Work based scheduling

4 © Imperial College London Page 4 Barriers: Friend or Foe - Summary Blackburn & Hosking - ISMM 2004 Conditional read-barrier –AMD: 21.24%, P4: 15.91%, PPC: 6.49% –Incremental GC: Standard Baker read-barrier Unconditional read-barrier –AMD: 8.05%, P4: 5.04%, PPC: 0.85% –Brooks indirection read-barrier –Metronome ‘Eager’ barrier ~ 4% –BUT: space overhead -> increased GC count Must consider GC cost!!!

5 © Imperial College London Page 5 Non-Stop Haskell Implementing Baker’s incremental collector typically introduces high overheads –The software read-barrier We have shown that this can be done efficiently in systems with dynamic dispatching Caveat Dynamic dispatching already “costs” something; we show that incremental garbage collection comes at virtually no extra cost.

6 © Imperial College London Page 6 Dynamic Dispatch and the STG Machine The STG machine is a model for the compilation of lazy functional languages All objects are represented on the heap as closures: To compute function ‘f’ applied to arguments ‘a b c d’ jump to Entry code 0:3: imm2:1:4: imm 2, 2 Other fields Entry code … heap pointers abcdf static info table

7 © Imperial College London Page 7 The Read-Barrier Invariant 2r 2: unscavenged 3r 3: unscavenged Stack top from-space to-space 1r 1: scavenged Problem 1 Problem 2

8 © Imperial College London Page 8 When the garbage collector is on make info pointers point to code that scavenges evacuated closures before entering them At all other times the system operates with no read barrier! Invariant Problem 1: Scavenging Closures 0:3: imm2:1:4: imm Self-scav code … heap pointers 2, 2 Other fields

9 © Imperial College London Page 9 QHow do we restore the original info pointer? AWe remember it when the closure is evacuated Non-Stop Haskell: Use an extra word in to-space Note: the space overhead applies only to objects copied from from-space but effectively reduces to-space by 30% Freshly allocated objects carry no space overhead 0:3: imm2:1:4: imm 2, 2 Other fields Entry code … heap pointers -1: Self-scav code … 2, 2 Other fields

10 © Imperial College London Page 10 QHow do we restore the original info pointer? AWe remember it when the closure is evacuated In production: Specialise every closure type at compile time Runtime space overhead is replaced by a static one of ~ 25% 0:3: imm2:1:4: imm 2, 2 Other fields Entry code … heap pointers Self-scav code JMP Entry code 2, 2 Other fields

11 © Imperial College London Page 11 Invariant Problem 2: Stack Scavenging STG machine stack frames look just like closures Before returning to the caller frame we ‘hijack’ the caller’s return address, replacing it with a pointer to self- scavenging code for that frame 1: scavenged 2r 2: unscavenged 3r 3: unscavenged 3r 3: unscavenged 2: scavenged scav; mod 3r; update; return scav; mod 4r; update; return update; return

12 © Imperial College London Page 12 Background Scavenging GHC’s heap is block allocated. So, scavenge at: –Every Allocation (EA) –Every Block allocation (EB) Reduce forced-completions via block chaining Incremental scavenger pauses are allocation- dependent Exploit GHC’s lightweight scheduler to implement a time-scheduled scavenger (Jikes RVM Metronome) –Consistent mutator utilisation –Increase in forced-completions due to allocation bursts

13 © Imperial College London Page 13 Results – Binary Sizes

14 © Imperial College London Page 14 Results – Runtimes

15 © Imperial College London Page 15

16 © Imperial College London Page 16

17 © Imperial College London Page 17

18 © Imperial College London Page 18

19 © Imperial College London Page 19

20 © Imperial College London Page 20

21 © Imperial College London Page 21 The Generational Write-barrier root set for generation N – 1 inter-generational pointer generation N generation N - 1 root set Depending on the number of updates, the write-barrier can impose an overhead of 8 – 24% (NJ/ML and Clean).

22 © Imperial College London Page 22 Bridging the Generation Gap We implement in GHC a mechanism that again exploits dynamic dispatch to eliminate unnecessary write-barriers: root set for generation 0 generation 0 THUNK_SELECT THUNK_1 THUNK_2 root set Promote to generation 1

23 © Imperial College London Page 23 Bridging the Generation Gap root set for generation 0 generation 1 generation 0 THUNK_SELECT THUNK_1 THUNK_2 IND_PRE_UPD root set force THUNK selectee evaluation

24 © Imperial College London Page 24 Bridging the Generation Gap root set for generation 0 generation 1 generation 0 THUNK_SELECT THUNK_1 THUNK_2 IND_UPD IND_PRE_UPD root set

25 © Imperial College London Page 25 Bridging the Generation Gap root set for generation 0 generation 1 generation 0 THUNK_SELECT THUNK_1 IND_OLDGEN IND_UPD IND_PRE_UPD root set CONSTR_2 inter-generational pointer Preliminary benchmarks suggested a reduction of 5 - 9%, in production it is actually around 2 - 3%.

26 © Imperial College London Page 26 Ongoing Work Unfortunately Java programs are not “pure” in their use of dynamic dispatch –Field access via get() / set() methods –Inlining must be disallowed Application of read-barrier optimisation to Java Investigating within Jikes RVM: Inter- and intra-class inlining Code bloat arising from get() / set() methods, restricted inlining and additional per-class VMT Cost of VMT TIB pointer flip

27 © Imperial College London Page 27 Removal of collector-specific barriers and tests: Yields cheaper ‘vanilla’ collectors Allows the efficient hybridisation of multiple collector algorithms Conclusion Time-based scheduling is massively attractive, but: Complete decoupling from the allocator is problematic* A hybrid approach looks promising: –Parameterised by mutator utilisation –Sensitive to allocation rate Elimination of per-object overhead: Mandatory for our production collector


Download ppt "© Imperial College London Exploring the Barrier to Entry Incremental Generational Garbage Collection for Haskell Andy Cheadle & Tony Field Imperial College."

Similar presentations


Ads by Google