Download presentation
Presentation is loading. Please wait.
Published byDestin Saker Modified over 9 years ago
1
Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan
2
Big Picture 2 Lightweight user-level threads Scheduler 1 t1 t2 tn Lots of concurrency! Core 1 Core n Core 2 Heap
3
Big Picture 3 Expendable resource? Big Picture 3 Scheduler 1 t1 t2 tn Lots of concurrency! Heap
4
Big Picture 4 Expendable resource? Big Picture 4 Scheduler 1 t1 t2 tn Lots of concurrency! Heap Exploit program concurrency to eliminate read barriers from thread-local collectors GC Operation Alleviate MM cost?
5
MultiMLton Goals – Safety, Scalability, ready for future manycore processors Parallel extension of MLton SML compiler and runtime Parallel extension of Concurrent ML – Lots of Concurrency! – Interact by sending messages over first-class channels 5 C send (c, v) v recv (c)
6
MultiMLton GC: Considerations Standard ML – functional PL with side-effects – Most objects are small and ephemeral Independent generational GC – # Mutations << # Reads Keep cost of reads to be low Minimize NUMA effects Run on non-cache coherent HW 6
7
MultiMLton GC: Design 7 Core Local Heap Core Local Heap Core Local Heap Core Local Heap Shared Heap Thread-local GC NUMA Awareness Circumvent cache-coherence issues
8
Invariant Preservation Read and write barriers for preserving invariants 8 Shared Heap r r Local Heap x x Target Source Exporting writes r := x Shared Heap r r Local Heap x x FWD Transitive closure of x Mutator needs read barriers!
9
Challenge Object reads are pervasive – RB overhead ∝ cost (RB) * frequency (RB) Read barrier optimization – Stacks and Registers never point to forwarded objects 9 20.1 % 15.3 % 21.3 % Mean Overhead ---------------------- Read barrier overhead (%)
10
Mutator and Forwarded Objects 10 # RB invocations # Encountered forwarded objects <0.00001 Eliminate read barriers altogether
11
RB Elimination Visibility Invariant – Mutator does not encounter forwarded objects Observation – No forwarded objects created ⇒ visibility invariant ⇒ No read barriers Exploit concurrency Procrastination! 11
12
Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 r1 := x1 r2 := x2 T T is running T T is suspended T T is blocked 12
13
Procrastination Shared Heap r1 Local Heap x1 r1 := x1 T1T2 r2 := x2 r2 x2 Delayed write list Control switches to T2 T T is running T T is suspended T T is blocked 13
14
Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 Delayed write list r1 := x1 r2 := x2 T T is running T T is suspended T T is blocked 14
15
Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list r1 := x1 r2 := x2 x1 T T is running T T is suspended T T is blocked FWD 15 FWD
16
Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list x1 Force local GC T T is running T T is suspended T T is blocked 16 r1 := x1 r2 := x2
17
Correctness Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! 17 T1 T2 T T is running T T is suspended T T is blocked
18
Correctness 18 T1 Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks! T2T T is running T T is suspended T T is blocked Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock!
19
Correctness 19 T1 T2 Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! T T is running T T is suspended T T is blocked Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks!
20
Efficacy (Procrastination) ∝ # Available runnable threads Is Procrastination alone enough? 20 M W1W1 W1W1 W1W1 F J Serial (low thread availability) Concurrent (high thread availability) With Procrastination, half of local major GCs were forced Eager exporting writes while preserving visibility invariant
21
Cleanliness A clean object closure can be lifted to the shared heap without breaking the visibility invariant 21 r := x inSharedHeap (r) inLocalHeap (x) && isClean (x) inLocalHeap (x) && isClean (x) Eager write (no Procrastination)
22
Cleanliness: Intuition 22 Shared Heap Local Heap x x lift (x) to shared heap
23
Shared Heap Local Heap Cleanliness: Intuition 23 x x FWD find all references to FWD
24
Shared Heap Local Heap Cleanliness: Intuition 24 x x Need to scan the entire local heap
25
Local Heap h h Shared Heap Cleanliness: Simpler question 25 x x FWD Do all references originate from heap region h? sizeof (h) << sizeof (local heap)
26
Local Heap h h Shared Heap Cleanliness: Simpler question 26 x x Only scan the heap region h. Heap session! sizeof (h) << sizeof (local heap)
27
Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 27 Previous Session Current Session Free Local Heap SessionStart Frontier Young Objects Old Objects Old Objects Start
28
Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC – SessionStart is moved to Frontier Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 28 Average session size < 4KB Previous SessionFree Local Heap Frontier & SessionStart Start
29
Cleanliness: Eager exporting writes 29 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap
30
Cleanliness: Eager exporting writes 30 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap Walk and fix FWD
31
Avoid tracing current session? Many SML objects are tree-structured (List, Tree, etc,.) – Specialize for no pointers from outside the closure ∀ x’ ∊ transitive closure (x), ref_count (x) = 0 && ref_count (x’) = 1 31 Local Heap x(0) y(1) z(1) Eager exporting write – No current session tracing needed! No refs from outside – ref_count does not consider pointers from stack or registers
32
Cleanliness: Reference Count 32 Current Session X(0) Current Session X(1) Current Session X(LM) Current Session X(G) Prev Sess Zero One LocalMany Global Purpose – Track pointers from previous session to current session – Identify tree-structured object Does not track pointers from stack and registers – Reference count only triggered during object initialization and mutation
33
Bringing it all together ∀ x’ ∊ transitive closure (x), if max (ref_count (x’)) – One & ref_count (x) = 0 ⇒ Clean tree-structured ⇒ Session tracing not needed – LocalMany ⇒ Clean ⇒ Trace current session – Global ⇒ 1+ pointer from previous session ⇒ Procrastinate 33
34
Cleanliness: Tree-structured Closure 34 Previous Session Current Session x(0) y(1) z(1) T1 Local Heap Shared heap r := x r r current stack
35
Shared heap Cleanliness: Tree-structured Closure 35 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x FWD r r Walk current stack
36
Shared heap Cleanliness: Tree-structured Closure 36 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x r r No need to walk current session!
37
Shared heap Cleanliness: Tree-structured Closure 37 Previous Session Current Session x x y y z z T1 Local Heap r := x r r T2 Next stack FWD current stack
38
Shared heap Cleanliness: Tree-structured Closure 38 Previous Session Current Session x x y y z z T1 previous stack Local Heap r := x r r T2 current stack Context Switch Walk target stack
39
Cleanliness: Object graph 39 Previous Session Current Session x(0) y ( LM ) z(1) current stack Local Heap Shared heap r := x r r a a
40
Shared heap Cleanliness: Object graph 40 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a FWD Walk current stack Walk current session
41
Shared heap Cleanliness: Object graph 41 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a Walk current stack Walk current session
42
Cleanliness: Global Reference 42 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a
43
Cleanliness: Global Reference 43 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a Procrastinate
44
Immutable Objects Specialize exporting writes If immutable object in previous session – Copy to shared heap Immutable objects in SML do not have identity – Original object unmodified Avoid space leaks – Treat large immutable objects as mutable 44
45
Cleanliness: Summary Cleanliness allows eager exporting writes while preserving visibility invariant With Procrastination + Cleanliness, <1% of local GCs were forced Additional niceties – Completely dynamic Portable – Does not impose any restriction on the GC strategy 45
46
Evaluation 46 Variants – RB- : TLC with Procrastination and Cleanliness – RB+ : TLC with read barriers Sansom’s dual-mode GC – Cheney’s 2-space copying collection Jonker’s sliding mark-compacting – Generational, 2 generations, No aging Target Architectures: – 16-core AMD Opteron server (NUMA) – 48-core Intel SCC (non-cache coherent) – 864-core Azul Vega3
47
Results Speedup: At 3X min heap size, RB- faster than RB+ – AMD 32% (2X faster than STW collector) – SCC 20% – AZUL 30% Concurrency – During exporting write, 8 runnable user-level threads/core! 47
48
Cleanliness Impact RB- MU- : RB- GC ignoring mutability for Cleanliness RB- CL- :RB- GC ignoring Cleanliness (Only Procrastination) 48 Avg. slowdown -------------------- 11.4% 28.2% 31.7%
49
Conclusion Eliminate the need for read barriers by preserving the visibility invariant – Procrastination: Exploit concurrency for delaying exporting writes – Cleanliness: Exploit generational property for eagerly perform exporting writes 49
50
Questions? http://multimlton.cs.purdue.edu 50
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.