Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh.

Similar presentations


Presentation on theme: "Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh."— Presentation transcript:

1 Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh

2 Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g., code motion, caching value in register) No harm as long as dependences are respected Reordering in Uniprocessors a1: St x a2: Ld y a1: St x

3 counter-intuitive program behavior Reordering in Multiprocessors Initially x=y=0 (R x =1, R y =1) (R x =1, R y =0) (R x =0, R y =0) b1 : R y = y; b2 : R x = x; a1 : x = 1; a2 : y = 1; b2 : R x = x; a1 : x = 1; a2 : y = 1; b1 : R y = y; b2 : R x = x; a1 : x = 1; a2 : y = 1; b1 : R y = y; b2 : R x = x; (R x =0, R y =1) Intuitively, y=1  x=1 R y =1  R x =1 a1 : x = 1; b1 : R y = y; b2 : R x = x; a2 : y = 1; P1 P2 a1 : x = 1; a2 : y = 1;

4 Reordering in Multiprocessors p = new A(…) if (flag) a = p->var; flag = true; P1 P2 flag is supposed to be set after p is allocated Initially p=NULL, flag = false counter-intuitive program behavior

5 Fence Instructions p = new A(…) flag = true; P1 Memory Consistency Models Specify what reordering is allowed e.g., SC, TSO (x86, SPARC), RMO (ARM, PowerPC) Fence Instructions (Fences/Memory barriers) Selectively override default relaxed memory order Order memory operations before and after the fence FENCE

6 Fence Instructions Memory Consistency Models Specify what reordering is allowed e.g., SC, TSO (x86, SPARC), RMO (ARM, PowerPC) Fence Instructions (Fences/Memory barriers) Selectively override default relaxed memory order Order memory operations before and after the fence Inevitable -- building concurrent implementations (e.g., mutual exclusion, queues) [Attiya et. al., POPL’11] Expensive -- Cilk-5’s THE protocol spends 50% of its time executing a memory fence [Frigo et. al., PLDI’98]

7 Motivation Not all memory orderings enforced by fences are necessary Fences are usually used to enforce some specific memory operations Programmers know better how a fence is used, which can be conveyed to the hardware Process Data Control Data Access Concurrent algorithm

8 Scoped Fence (S-Fence) A S-Fence only orders memory operations in the scope Scope definition (Class scope, Set scope) Bridge the gap between programmers’ intention and hardware execution Programmers specify the scope Scope information is conveyed to hardware, imposing fewer ordering constraints Lightweight hardware and compiler support

9 Scoped Fence (S-Fence) Programming support S-FENCE global scope S-FENCE[class] class scope S-FENCE[set, {var1, var2, …}] set scope

10 Work-Stealing Queue Algorithm 1void put (TASK task){ 2 tail = TAIL; 3 wsq[tail] = task; 4 FENCE // store-store 5 TAIL = tail+1; 6 } 7TASK take ( ){ 8 tail = TAIL – 1; 9 TAIL = tail; 10 FENCE // store-load 11 head = HEAD; 12 if (tail<head){ 13 TAIL = head; 14 return EMPTY; 15 } … … 24 return task 25} 26TASK steal ( ){ 27 head = HEAD; 28 tail = TAIL; … … 35 return task; 36} Chase-Lev lock-free concurrent work-stealing queue

11 Parallel Spanning Tree 1task = wsq.take(); 2 for (each neighbor task’ of task) 3 if (task’ is not processed){ 4 process(task’); 5 wsq.put(task’) ; 6 } (a) ① ② ③ 8tail = TAIL – 1; 9TAIL = tail; 10 FENCE 11head = HEAD; …… color[task’] = label; parent[task’] = task; 2tail = TAIL; 3wsq[tail] = task’; 4 FENCE 5TAIL = tail + 1; (b) FENCE

12 Class Scope S-FENCE[class] class scope Make use of class in OO languages to illustrate the concept Constrain a fence to the object class where it is used (Encapsulation) Intuition: function members operate on data members of the class

13 Class Scope S-FENCE[class] class scope class A { B b; int m1, m2; void funcA() { m1 = val1; b.funcB(); S-FENCE1[class] m2 = val2; } class B { int n1, n2; void funcB() { n1 = val3; S-FENCE2[class] n2 = val4; } S-FENCE1: m1, m2, n1, n2 S-FENCE2: n1, n2

14 Class Scope Semantics More details in paper

15 Parallel Spanning Tree 1task = wsq.take(); 2 for (each neighbor task’ of task) 3 if (task’ is not processed){ 4 process(task’); 5 wsq.put(task’) ; 6 } (a) ① ② ③ 8tail = TAIL – 1; 9TAIL = tail; 10 FENCE 11head = HEAD; …… color[task’] = label; parent[task’] = task; 2tail = TAIL; 3wsq[tail] = task’; 4 FENCE 5TAIL = tail + 1; (b) SFENCE[class]

16 Compiler Support ISA Extension class-fence fs_start – start of a fence scope fs_end – end of a fence scope Use fs_start and fs_end to embrace functions containing fences Informing hardware to mark memory operations properly

17 Hardware Support Fence Scope Bits (FSB) Each entry of ROB and store buffer is associated with FSB Flag whether a memory operation is in the scope of some fence... Store Buffer Reorder Buffer... Fence Scope Bits (FSB) Decoding - memory operations in the scope are marked via FSB Fence issue - check the entry for current scope

18 Hardware Support Fence Scope Bits (FSB) Each entry of ROB and store buffer is associated with FSB Flag whether a memory operation is in the scope of some fence... Store Buffer Reorder Buffer... Fence Scope Bits (FSB) Decoding - memory operations in the scope are marked via FSB Fence issue - check the entry for current scope

19 Hardware Support Setting Fence Bits FSS: stack to record scope 0 1 2 3 fs_start a fs_start b fs_end b fs_end a inner outer I0 I1 I2 I3 I4 I5 I6 I7 FSB

20 Hardware Support 0 1 2 3 fs_start a fs_start b fs_end b fs_end a inner outer I0 I1 I2 I3 I4 I5 I6 I7 FSB Setting Fence Bits FSS: stack to record scope

21 Hardware Support 0 1 2 3 fs_start a fs_start b fs_end b fs_end a inner outer I0 I1 I2 I3 I4 I5 I6 I7 FSB Issue Fence by checking FSB on the current scope Setting Fence Bits FSS: stack to record scope

22 Hardware Support 0 1 2 3 fs_start a fs_start b fs_end b fs_end a inner outer I0 I1 I2 I3 I4 I5 I6 I7 FSB Issue Fence by checking FSB on the current scope Setting Fence Bits FSS: stack to record scope

23 Why S-Fence performs Better? St A St X Ld Y FENCE St B 0123401234 SB ROB St A St X St A Traditional Fence Scoped Fence stall Store Buffer drained & Fence issued stall...... Ld Y St B St A St X SB ROB stall St A Ld Y St B stall Timeline St A : a cache miss

24 flag1 = 1; flag2 = 1; if (flag2 == 0) if (flag1 == 0) critical section critical section P1 P2 Initially flag1 = flag2 = 0 FENCE m1 = … m2 = … Set Scope Dekker algorithm

25 flag1 = 1; flag2 = 1; if (flag2 == 0) if (flag1 == 0) critical section critical section P1 P2 Initially flag1 = flag2 = 0 S-FENCE[set, {flag1, flag2}] S-FENCE … m1 = … m2 = … Set Scope Dekker algorithm

26 Set Scope S-FENCE[set, {var1, var2, …}] set scope only order memory accesses to {var1, var2, …} Compiler and Hardware Supports flag memory accesses to the specified variables set fence scope bits in hardware for flagged memory accesses For simplicity, we do not differentiate memory accesses to different sets

27 Experimental Evaluation Cycle-accurate simulation (SESC) Integrate scoped fence logic RMO memory model Benchmarks pst - parallel spanning tree (work-stealing queue, class scope) ptc – parallel transitive closure (work-stealing queue, class scope) barnes – from SPLASH2 (fences inserted for SC, set scope) radiosity – from SPLASH2 (fences inserted for SC, set scope)

28 Experimental Evaluation Traditional fence (T) vs. Scoped fence (S) Fence Stall Reduced~40-50% ~13% ~50% class scope set scope

29 Conclusion Introduce the concept of fence scope Propose class scope and set scope OpenCL 2.0 (sub-group, work-group, device, system) Lightweight compiler and hardware support No change in inter-processor communication Fence scope should be implemented in some form !

30 Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh


Download ppt "Fence Scoping Changhui Lin †, Vijay Nagarajan*, Rajiv Gupta † † University of California, Riverside * University of Edinburgh."

Similar presentations


Ads by Google