Download presentation
Presentation is loading. Please wait.
1
Barrier Synchronization
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
2
Art of Multiprocessor Programming
Simple Video Game Prepare frame for display By graphics coprocessor “soft real-time” application Need at least 35 frames/second OK to mess up rarely Art of Multiprocessor Programming
3
Art of Multiprocessor Programming
Simple Video Game while (true) { frame.prepare(); frame.display(); } Art of Multiprocessor Programming
4
Art of Multiprocessor Programming
Simple Video Game while (true) { frame.prepare(); frame.display(); } What about overlapping work? 1st thread displays frame 2nd prepares next frame Art of Multiprocessor Programming
5
Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; Art of Multiprocessor Programming
6
Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; Even phases Art of Multiprocessor Programming
7
Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; odd phases Art of Multiprocessor Programming
8
Synchronization Problems
How do threads stay in phase? Too early? “we render no frame before its time” Too late? Recycle memory before frame is displayed Art of Multiprocessor Programming
9
Ideal Parallel Computation
1 1 1 Art of Multiprocessor Programming
10
Ideal Parallel Computation
2 2 2 1 1 1 Art of Multiprocessor Programming
11
Real-Life Parallel Computation
zzz… 1 1 Art of Multiprocessor Programming
12
Real-Life Parallel Computation
2 zzz… 1 1 Uh, oh Art of Multiprocessor Programming
13
Barrier Synchronization
barrier Art of Multiprocessor Programming
14
Barrier Synchronization
1 1 1 Art of Multiprocessor Programming
15
Barrier Synchronization
Until every thread has left here No thread enters here Art of Multiprocessor Programming
16
Art of Multiprocessor Programming
Why Do We Care? Mostly of interest to Scientific & numeric computation Elsewhere Garbage collection Less common in systems programming Still important topic Art of Multiprocessor Programming
17
Art of Multiprocessor Programming
Duality Dual to mutual exclusion Include others, not exclude them Same implementation issues Interaction with caches … Invalidation? Local spinning? Art of Multiprocessor Programming
18
Example: Parallel Prefix
b c d before a a+b a+b+c a+b+c +d after Art of Multiprocessor Programming
19
Art of Multiprocessor Programming
Parallel Prefix One thread Per entry a b c d Art of Multiprocessor Programming
20
Parallel Prefix: Phase 1
b c d a a+b b+c c+d Art of Multiprocessor Programming
21
Parallel Prefix: Phase 2
b c d a a+b a+b+c a+b+c +d Art of Multiprocessor Programming
22
Art of Multiprocessor Programming
Parallel Prefix N threads can compute Parallel prefix Of N entries In log2 N rounds What if system is asynchronous? Why we need barriers Art of Multiprocessor Programming
23
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming
24
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Array of input values Art of Multiprocessor Programming
25
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Thread index Art of Multiprocessor Programming
26
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Shared barrier Art of Multiprocessor Programming
27
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Initialize fields Art of Multiprocessor Programming
28
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; a[i] += sum; d = d * 2; }}} Art of Multiprocessor Programming
29
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Art of Multiprocessor Programming
30
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes Art of Multiprocessor Programming
31
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes Art of Multiprocessor Programming
32
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes Make sure everyone writes before anyone reads Art of Multiprocessor Programming
33
Barrier Implementations
Cache coherence Spin on locally-cached locations? Spin on statically-defined locations? Latency How many steps? Symmetry Do all threads do the same thing? Art of Multiprocessor Programming
34
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming
35
Number threads not yet arrived
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Number threads not yet arrived Art of Multiprocessor Programming
36
Number threads participating
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Number threads participating Art of Multiprocessor Programming
37
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Initialization Art of Multiprocessor Programming
38
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Principal method Art of Multiprocessor Programming
39
If I’m last, reset fields for next time
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} If I’m last, reset fields for next time Art of Multiprocessor Programming
40
Otherwise, wait for everyone else
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Otherwise, wait for everyone else Art of Multiprocessor Programming
41
What’s wrong with this protocol?
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} What’s wrong with this protocol? Art of Multiprocessor Programming
42
Art of Multiprocessor Programming
Reuse Barrier b = new Barrier(n); while ( mumble() ) { work(); b.await() } Do work repeat synchronize Art of Multiprocessor Programming
43
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming
44
Waiting for Phase 1 to finish
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 1 to finish Art of Multiprocessor Programming
45
Waiting for Phase 1 to finish
Barriers Phase 1 is so over public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 1 to finish Art of Multiprocessor Programming
46
Art of Multiprocessor Programming
Barriers Prepare for phase 2 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} ZZZZZ…. Art of Multiprocessor Programming
47
Waiting for Phase 2 to finish Waiting for Phase 1 to finish
Uh-Oh public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 2 to finish Waiting for Phase 1 to finish Art of Multiprocessor Programming
48
Art of Multiprocessor Programming
Basic Problem One thread “wraps around” to start phase 2 While another thread is still waiting for phase 1 One solution: Always use two barriers Art of Multiprocessor Programming
49
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming
50
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Completed odd or even-numbered phase? Art of Multiprocessor Programming
51
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Store sense for next phase Art of Multiprocessor Programming
52
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Get new sense determined by last phase Art of Multiprocessor Programming
53
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} If I’m last, reverse sense for next time Art of Multiprocessor Programming
54
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Otherwise, wait for sense to flip Art of Multiprocessor Programming
55
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Prepare sense for next phase Art of Multiprocessor Programming
56
Combining Tree Barriers
Art of Multiprocessor Programming
57
Combining Tree Barriers
Art of Multiprocessor Programming
58
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
59
Combining Tree Barrier
Parent barrier in tree public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
60
Combining Tree Barrier
Am I last? public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
61
Combining Tree Barrier
Proceed to parent barrier public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await();} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
62
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Prepare for next phase Art of Multiprocessor Programming
63
Combining Tree Barrier
Notify others at this node public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
64
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} I’m not last, so wait for notification Art of Multiprocessor Programming
65
Combining Tree Barrier
No sequential bottleneck Parallel getAndDecrement() calls Low memory contention Same reason Cache behavior Local spinning on bus-based architecture Not so good for NUMA Art of Multiprocessor Programming
66
Art of Multiprocessor Programming
Remarks Everyone spins on sense field Local spinning on bus-based (good) Network hot-spot on distributed architecture (bad) Not really scalable Art of Multiprocessor Programming
67
Tournament Tree Barrier
If tree nodes have fan-in 2 Don’t need to call getAndDecrement() Winner chosen statically At level i If i-th bit of id is 0, move up Otherwise keep back Art of Multiprocessor Programming
68
Tournament Tree Barriers
root winner loser winner loser winner loser Art of Multiprocessor Programming
69
Tournament Tree Barriers
All flags blue Art of Multiprocessor Programming
70
Tournament Tree Barriers
Loser thread sets winner’s flag Art of Multiprocessor Programming
71
Tournament Tree Barriers
Loser spins on own flag Art of Multiprocessor Programming
72
Tournament Tree Barriers
Winner spins on own flag Art of Multiprocessor Programming
73
Tournament Tree Barriers
Winner sees own flag, moves up, spins Art of Multiprocessor Programming
74
Tournament Tree Barriers
Bingo! Art of Multiprocessor Programming
75
Tournament Tree Barriers
Sense-reversing: next time use blue flags Art of Multiprocessor Programming
76
Art of Multiprocessor Programming
Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Art of Multiprocessor Programming
77
Notifications delivered here
Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Notifications delivered here Art of Multiprocessor Programming
78
Other thead at same level
Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Other thead at same level Art of Multiprocessor Programming
79
Parent (winner) or null (loser)
Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Parent (winner) or null (loser) Art of Multiprocessor Programming
80
Art of Multiprocessor Programming
Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Am I the root? Art of Multiprocessor Programming
81
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Art of Multiprocessor Programming
82
Parameter to save space…
Tournament Barrier Sense is a Parameter to save space… void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Le root, c’est moi Art of Multiprocessor Programming
83
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} I am already a winner Art of Multiprocessor Programming
84
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Wait for partner Art of Multiprocessor Programming
85
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Synchronize upstairs Art of Multiprocessor Programming
86
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Inform partner Art of Multiprocessor Programming
87
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Natural-born loser Art of Multiprocessor Programming
88
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Tell partner I’m here Art of Multiprocessor Programming
89
Wait for notification from partner
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Wait for notification from partner Art of Multiprocessor Programming
90
Art of Multiprocessor Programming
Remarks No need for read-modify-write calls Each thread spins on fixed location Good for bus-based architectures Good for NUMA architectures Art of Multiprocessor Programming
91
Dissemination Barrier
At round i Thread A notifies thread A+2i (mod n) Requires log n rounds Art of Multiprocessor Programming
92
Dissemination Barrier
+1 +2 +4 Art of Multiprocessor Programming
93
Art of Multiprocessor Programming
Remarks Great for lovers of combinatorics Not so much fun for those who track memory footprints Art of Multiprocessor Programming
94
Art of Multiprocessor Programming
Ideas So Far Sense-reversing Reuse without reinitializing Combining tree Like counters, locks … Tournament tree Optimized combining tree Dissemination barrier Intellectually Pleasing (matter of taste) Art of Multiprocessor Programming
95
Which is best for Multicore?
On a cache coherent multicore chip: none of the above… Use a static tree barrier! Art of Multiprocessor Programming
96
Art of Multiprocessor Programming
Static Tree Barrier All spin on cached copy of Done flag 2 Counter in parent: Num of children that have not arrived 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming
97
Art of Multiprocessor Programming
Static Tree Barrier All spin on cached copy of Done flag All detect change in Done flag 1 2 1 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming
98
Art of Multiprocessor Programming
Remarks Very little cache traffic Minimal space overhead Can be laid out optimally on NUMA Instead of Done bit, notification must be performed by passing sense down the static tree Art of Multiprocessor Programming
99
So…How will we make use of multicores?
Back to Amdahl’s Law: Speedup = 1/(ParallelPart/N + SequentialPart) Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! Must parallelize applications on a very fine grain!
100
Need Fine-Grained Locking
The reason we get only 2.9 speedup Coarse Grained Fine Grained c 25% Shared 25% Shared c c c 75% Unshared 75% Unshared
101
Traditional Scaling Process
7x Speedup 3.6x 1.8x User code Recall the traditional scaling process for software: write it once, trust Intel to make the CPU faster to improve performance. Traditional Uniprocessor Time: Moore’s law Art of Multiprocessor Programming
102
Multicore Scaling Process
Speedup 1.8x 7x 3.6x User code Multicore With multicores, we will have to parallelize the code to make software faster, and we cannot do this automatically (except in a limited way on the level of individual instructions). As noted, not so simple… Art of Multiprocessor Programming
103
Real-World Scaling Process
Speedup 2.9x 2x 1.8x User code Multicore This is because splitting the application up to utilize the cores is not simple, and coordination among the various code parts requires care. Parallelization and Synchronization will require great care… Art of Multiprocessor Programming
104
Future: Smoother Scaling …
Speedup 1.8x 7x 3.6x User code Abstractions (like TM) Multicore
105
Multicore Programming
We are at the dawn of a new era… Still a lot of research and development to be done And a body of multicore programming experience to be accumelated by us all © 2006 Herlihy & Shavit
106
Thanks for Attending! Good luck with your programming…
If you have questions or run into interesting problems, just us: © 2006 Herlihy & Shavit
107
Art of Multiprocessor Programming
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.