Presentation is loading. Please wait.

Presentation is loading. Please wait.

Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan.

Similar presentations


Presentation on theme: "Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan."— Presentation transcript:

1 Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan

2 Big Picture 2 Lightweight user-level threads Scheduler 1 t1 t2 tn Lots of concurrency! Core 1 Core n Core 2 Heap

3 Big Picture 3 Expendable resource? Big Picture 3 Scheduler 1 t1 t2 tn Lots of concurrency! Heap

4 Big Picture 4 Expendable resource? Big Picture 4 Scheduler 1 t1 t2 tn Lots of concurrency! Heap Exploit program concurrency to eliminate read barriers from thread-local collectors GC Operation Alleviate MM cost?

5 MultiMLton Goals – Safety, Scalability, ready for future manycore processors Parallel extension of MLton SML compiler and runtime Parallel extension of Concurrent ML – Lots of Concurrency! – Interact by sending messages over first-class channels 5 C send (c, v) v  recv (c)

6 MultiMLton GC: Considerations Standard ML – functional PL with side-effects – Most objects are small and ephemeral Independent generational GC – # Mutations << # Reads Keep cost of reads to be low Minimize NUMA effects Run on non-cache coherent HW 6

7 MultiMLton GC: Design 7 Core Local Heap Core Local Heap Core Local Heap Core Local Heap Shared Heap Thread-local GC NUMA Awareness Circumvent cache-coherence issues

8 Invariant Preservation Read and write barriers for preserving invariants 8 Shared Heap r r Local Heap x x Target Source Exporting writes r := x Shared Heap r r Local Heap x x FWD Transitive closure of x Mutator needs read barriers!

9 Challenge Object reads are pervasive – RB overhead ∝ cost (RB) * frequency (RB) Read barrier optimization – Stacks and Registers never point to forwarded objects 9 20.1 % 15.3 % 21.3 % Mean Overhead ---------------------- Read barrier overhead (%)

10 Mutator and Forwarded Objects 10 # RB invocations # Encountered forwarded objects <0.00001 Eliminate read barriers altogether

11 RB Elimination Visibility Invariant – Mutator does not encounter forwarded objects Observation – No forwarded objects created ⇒ visibility invariant ⇒ No read barriers Exploit concurrency  Procrastination! 11

12 Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2  r1 := x1 r2 := x2 T  T is running T  T is suspended T  T is blocked 12

13 Procrastination Shared Heap r1 Local Heap x1 r1 := x1 T1T2  r2 := x2 r2 x2 Delayed write list  Control switches to T2 T  T is running T  T is suspended T  T is blocked 13

14 Procrastination Shared Heap r1 Local Heap x1 T1T2 r2 x2 Delayed write list  r1 := x1 r2 := x2 T  T is running T  T is suspended T  T is blocked 14

15 Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list  r1 := x1 r2 := x2 x1 T  T is running T  T is suspended T  T is blocked FWD 15 FWD

16 Procrastination Shared Heap r1 Local Heap T1T2 r2 x2 Delayed write list  x1 Force local GC T  T is running T  T is suspended T  T is blocked 16  r1 := x1 r2 := x2

17 Correctness Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! 17 T1 T2 T  T is running T  T is suspended T  T is blocked

18 Correctness 18 T1 Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks! T2T  T is running T  T is suspended T  T is blocked Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock!

19 Correctness 19 T1 T2 Does Procrastination introduce deadlocks? – Threads can be procrastinated while holding a lock! T  T is running T  T is suspended T  T is blocked Is Procrastination safe? – Yes. Forcing a local GC unblocks the threads. – No deadlocks or livelocks!

20 Efficacy (Procrastination) ∝ # Available runnable threads Is Procrastination alone enough? 20 M W1W1 W1W1 W1W1 F J Serial (low thread availability) Concurrent (high thread availability) With Procrastination, half of local major GCs were forced Eager exporting writes while preserving visibility invariant

21 Cleanliness A clean object closure can be lifted to the shared heap without breaking the visibility invariant 21 r := x inSharedHeap (r) inLocalHeap (x) && isClean (x) inLocalHeap (x) && isClean (x) Eager write (no Procrastination)

22 Cleanliness: Intuition 22 Shared Heap Local Heap x x lift (x) to shared heap

23 Shared Heap Local Heap Cleanliness: Intuition 23 x x FWD find all references to FWD

24 Shared Heap Local Heap Cleanliness: Intuition 24 x x Need to scan the entire local heap

25 Local Heap h h Shared Heap Cleanliness: Simpler question 25 x x FWD Do all references originate from heap region h? sizeof (h) << sizeof (local heap)

26 Local Heap h h Shared Heap Cleanliness: Simpler question 26 x x Only scan the heap region h. Heap session! sizeof (h) << sizeof (local heap)

27 Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 27 Previous Session Current Session Free Local Heap SessionStart Frontier Young Objects Old Objects Old Objects Start

28 Current session closed & new session opened – After an exporting write, a user-level context switch, a local GC – SessionStart is moved to Frontier Heap Sessions Source of an exporting write is often – Young – rarely referenced from outside the closure 28 Average session size < 4KB Previous SessionFree Local Heap Frontier & SessionStart Start

29 Cleanliness: Eager exporting writes 29 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap

30 Cleanliness: Eager exporting writes 30 A clean object closure – is fully contained within the current session – has no references from previous session Previous Session Current Session Free Local Heap X X Y Y Z Z r := x r r Shared Heap Walk and fix FWD

31 Avoid tracing current session? Many SML objects are tree-structured (List, Tree, etc,.) – Specialize for no pointers from outside the closure ∀ x’ ∊ transitive closure (x), ref_count (x) = 0 && ref_count (x’) = 1 31 Local Heap x(0) y(1) z(1) Eager exporting write – No current session tracing needed! No refs from outside – ref_count does not consider pointers from stack or registers

32 Cleanliness: Reference Count 32 Current Session X(0) Current Session X(1) Current Session X(LM) Current Session X(G) Prev Sess Zero One LocalMany Global Purpose – Track pointers from previous session to current session – Identify tree-structured object Does not track pointers from stack and registers – Reference count only triggered during object initialization and mutation

33 Bringing it all together ∀ x’ ∊ transitive closure (x), if max (ref_count (x’)) – One & ref_count (x) = 0 ⇒ Clean tree-structured ⇒ Session tracing not needed – LocalMany ⇒ Clean ⇒ Trace current session – Global ⇒ 1+ pointer from previous session ⇒ Procrastinate 33

34 Cleanliness: Tree-structured Closure 34 Previous Session Current Session x(0) y(1) z(1) T1 Local Heap Shared heap r := x r r current stack

35 Shared heap Cleanliness: Tree-structured Closure 35 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x FWD r r Walk current stack

36 Shared heap Cleanliness: Tree-structured Closure 36 Previous Session Current Session x x y y z z T1 current stack Local Heap r := x r r No need to walk current session!

37 Shared heap Cleanliness: Tree-structured Closure 37 Previous Session Current Session x x y y z z T1 Local Heap r := x r r T2 Next stack FWD current stack

38 Shared heap Cleanliness: Tree-structured Closure 38 Previous Session Current Session x x y y z z T1 previous stack Local Heap r := x r r T2 current stack Context Switch Walk target stack

39 Cleanliness: Object graph 39 Previous Session Current Session x(0) y ( LM ) z(1) current stack Local Heap Shared heap r := x r r a a

40 Shared heap Cleanliness: Object graph 40 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a FWD Walk current stack Walk current session

41 Shared heap Cleanliness: Object graph 41 Previous Session Current Session x x y y z z current stack Local Heap r := x r r a a Walk current stack Walk current session

42 Cleanliness: Global Reference 42 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a

43 Cleanliness: Global Reference 43 Previous Session Current Session x(0) y(1) z(G) T1 current stack Local Heap Shared heap r := x r r a a Procrastinate

44 Immutable Objects Specialize exporting writes If immutable object in previous session – Copy to shared heap Immutable objects in SML do not have identity – Original object unmodified Avoid space leaks – Treat large immutable objects as mutable 44

45 Cleanliness: Summary Cleanliness allows eager exporting writes while preserving visibility invariant With Procrastination + Cleanliness, <1% of local GCs were forced Additional niceties – Completely dynamic  Portable – Does not impose any restriction on the GC strategy 45

46 Evaluation 46 Variants – RB- : TLC with Procrastination and Cleanliness – RB+ : TLC with read barriers Sansom’s dual-mode GC – Cheney’s 2-space copying collection  Jonker’s sliding mark-compacting – Generational, 2 generations, No aging Target Architectures: – 16-core AMD Opteron server (NUMA) – 48-core Intel SCC (non-cache coherent) – 864-core Azul Vega3

47 Results Speedup: At 3X min heap size, RB- faster than RB+ – AMD 32% (2X faster than STW collector) – SCC 20% – AZUL 30% Concurrency – During exporting write, 8 runnable user-level threads/core! 47

48 Cleanliness Impact RB- MU- : RB- GC ignoring mutability for Cleanliness RB- CL- :RB- GC ignoring Cleanliness (Only Procrastination) 48 Avg. slowdown -------------------- 11.4% 28.2% 31.7%

49 Conclusion Eliminate the need for read barriers by preserving the visibility invariant – Procrastination: Exploit concurrency for delaying exporting writes – Cleanliness: Exploit generational property for eagerly perform exporting writes 49

50 Questions? http://multimlton.cs.purdue.edu 50


Download ppt "Eliminating Read Barriers through Procrastination and Cleanliness KC Sivaramakrishnan Lukasz Ziarek Suresh Jagannathan."

Similar presentations


Ads by Google