Presentation is loading. Please wait.

Presentation is loading. Please wait.

Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Similar presentations


Presentation on theme: "Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."— Presentation transcript:

1 Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

2 Art of Multiprocessor Programming 2 Simple Video Game Prepare frame for display –By graphics coprocessor “soft real-time” application –Need at least 35 frames/second –OK to mess up rarely

3 Art of Multiprocessor Programming 3 Simple Video Game while (true) { frame.prepare(); frame.display(); }

4 Art of Multiprocessor Programming 4 Simple Video Game while (true) { frame.prepare(); frame.display(); } What about overlapping work? –1 st thread displays frame –2 nd prepares next frame

5 Art of Multiprocessor Programming 5 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; }

6 Art of Multiprocessor Programming 6 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; } Even phases

7 Art of Multiprocessor Programming 7 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; } odd phases

8 Art of Multiprocessor Programming 8 Synchronization Problems How do threads stay in phase? Too early? –“we render no frame before its time” Too late? –Recycle memory before frame is displayed

9 Art of Multiprocessor Programming 9 Ideal Parallel Computation 0 0 0 1 1 1

10 Art of Multiprocessor Programming 10 Ideal Parallel Computation 2 2 2 1 1 1

11 Art of Multiprocessor Programming 11 Real-Life Parallel Computation 0 0 0 1 1 zzz…

12 Art of Multiprocessor Programming 12 Real-Life Parallel Computation 2 1 1 zzz… Uh, oh 0

13 Art of Multiprocessor Programming 13 Barrier Synchronization 0 0 0 barrier

14 Art of Multiprocessor Programming 14 Barrier Synchronization barrier 1 1 1

15 Art of Multiprocessor Programming 15 Barrier Synchronization barrier No thread enters here Until every thread has left here

16 Art of Multiprocessor Programming 16 Why Do We Care? Mostly of interest to –Scientific & numeric computation Elsewhere –Garbage collection –Less common in systems programming –Still important topic

17 Art of Multiprocessor Programming 17 Duality Dual to mutual exclusion –Include others, not exclude them Same implementation issues –Interaction with caches … Invalidation? Local spinning?

18 Art of Multiprocessor Programming 18 Example: Parallel Prefix abcd aa+ba+b+c +d before after

19 Art of Multiprocessor Programming 19 Parallel Prefix abcd One thread Per entry

20 Art of Multiprocessor Programming 20 Parallel Prefix: Phase 1 abcd aa+bb+cc+d

21 Art of Multiprocessor Programming 21 Parallel Prefix: Phase 2 abcd aa+ba+b+c +d

22 Art of Multiprocessor Programming 22 Parallel Prefix N threads can compute –Parallel prefix –Of N entries –In log 2 N rounds What if system is asynchronous? –Why we need barriers

23 Art of Multiprocessor Programming 23 Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; }

24 Art of Multiprocessor Programming 24 Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Array of input values

25 Art of Multiprocessor Programming 25 Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Thread index

26 Art of Multiprocessor Programming 26 Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Shared barrier

27 Art of Multiprocessor Programming 27 Prefix class Prefix extends Thread { private int[] a; private int i; private Barrier b; public Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Initialize fields

28 Art of Multiprocessor Programming 28 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; if (i >= d) a[i] += sum; d = d * 2; }}}

29 Art of Multiprocessor Programming 29 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; d = d * 2; }}}

30 Art of Multiprocessor Programming 30 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes

31 Art of Multiprocessor Programming 31 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; b.await(); d = d * 2; }}} Make sure everyone reads before anyone writes

32 Art of Multiprocessor Programming 32 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; b.await(); d = d * 2; }}} Make sure everyone writes before anyone reads Make sure everyone reads before anyone writes

33 Art of Multiprocessor Programming 33 Barrier Implementations Cache coherence –Spin on locally-cached locations? –Spin on statically-defined locations? Latency –How many steps? Symmetry –Do all threads do the same thing?

34 Art of Multiprocessor Programming 34 Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}}

35 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 35 Barriers Number threads not yet arrived

36 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 36 Barriers Number threads participating

37 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 37 Barriers Initialization

38 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 38 Barriers Principal method

39 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 39 Barriers If I’m last, reset fields for next time

40 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 40 Barriers Otherwise, wait for everyone else

41 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 41 Barriers What’s wrong with this protocol?

42 Art of Multiprocessor Programming 42 Reuse Barrier b = new Barrier(n); while ( mumble() ) { work(); b.await() } Do work synchronize repeat

43 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 43 Barriers

44 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 44 Barriers Waiting for Phase 1 to finish

45 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 45 Barriers Waiting for Phase 1 to finish Phase 1 is so over

46 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 46 Barriers ZZZZZ…. Prepare for phase 2

47 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 47 Uh-Oh Waiting for Phase 1 to finish Waiting for Phase 2 to finish

48 Art of Multiprocessor Programming 48 Basic Problem One thread “wraps around” to start phase 2 While another thread is still waiting for phase 1 One solution: –Always use two barriers

49 Art of Multiprocessor Programming 49 Sense-Reversing Barriers public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}}

50 public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 50 Sense-Reversing Barriers Completed odd or even-numbered phase?

51 public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 51 Sense-Reversing Barriers Store sense for next phase

52 public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 52 Sense-Reversing Barriers Get new sense determined by last phase

53 public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 53 Sense-Reversing Barriers If I’m last, reverse sense for next time

54 public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 54 Sense-Reversing Barriers Otherwise, wait for sense to flip

55 public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = !mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 55 Sense-Reversing Barriers Prepare sense for next phase

56 Art of Multiprocessor Programming 56 2-barrier Combining Tree Barriers

57 Art of Multiprocessor Programming 57 2-barrier Combining Tree Barriers

58 Art of Multiprocessor Programming 58 Combining Tree Barrier public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}}

59 public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 59 Combining Tree Barrier Parent barrier in tree

60 public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 60 Combining Tree Barrier Am I last?

61 public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await();} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 61 Combining Tree Barrier Proceed to parent barrier

62 public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 62 Combining Tree Barrier Prepare for next phase

63 public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 63 Combining Tree Barrier Notify others at this node

64 public class Node{ AtomicInteger count; int size; Node parent; Volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await()} count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 64 Combining Tree Barrier I’m not last, so wait for notification

65 Art of Multiprocessor Programming 65 Combining Tree Barrier No sequential bottleneck –Parallel getAndDecrement() calls Low memory contention –Same reason Cache behavior –Local spinning on bus-based architecture –Not so good for NUMA

66 Art of Multiprocessor Programming 66 Remarks Everyone spins on sense field –Local spinning on bus-based (good) –Network hot-spot on distributed architecture (bad) Not really scalable

67 Art of Multiprocessor Programming 67 Tournament Tree Barrier If tree nodes have fan-in 2 –Don’t need to call getAndDecrement() –Winner chosen statically At level i –If i-th bit of id is 0, move up –Otherwise keep back

68 Art of Multiprocessor Programming 68 Tournament Tree Barriers winnerloser winnerloser winnerloser root

69 Art of Multiprocessor Programming 69 Tournament Tree Barriers All flags blue

70 Art of Multiprocessor Programming 70 Tournament Tree Barriers Loser thread sets winner’s flag

71 Art of Multiprocessor Programming 71 Tournament Tree Barriers Loser spins on own flag

72 Art of Multiprocessor Programming 72 Tournament Tree Barriers Winner spins on own flag

73 Art of Multiprocessor Programming 73 Tournament Tree Barriers Winner sees own flag, moves up, spins

74 Art of Multiprocessor Programming 74 Tournament Tree Barriers Bingo!

75 Art of Multiprocessor Programming 75 Tournament Tree Barriers Sense-reversing: next time use blue flags

76 Art of Multiprocessor Programming 76 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … }

77 Art of Multiprocessor Programming 77 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Notifications delivered here

78 Art of Multiprocessor Programming 78 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Other thead at same level

79 Art of Multiprocessor Programming 79 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Parent (winner) or null (loser)

80 Art of Multiprocessor Programming 80 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Am I the root?

81 Art of Multiprocessor Programming 81 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}}

82 Art of Multiprocessor Programming 82 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Le root, c’est moi Sense is a Parameter to save space…

83 Art of Multiprocessor Programming 83 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} I am already a winner

84 Art of Multiprocessor Programming 84 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Wait for partner

85 Art of Multiprocessor Programming 85 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Synchronize upstairs

86 Art of Multiprocessor Programming 86 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Inform partner

87 Art of Multiprocessor Programming 87 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Natural-born loser

88 Art of Multiprocessor Programming 88 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Tell partner I’m here

89 Art of Multiprocessor Programming 89 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Wait for notification from partner

90 Art of Multiprocessor Programming 90 Remarks No need for read-modify-write calls Each thread spins on fixed location –Good for bus-based architectures –Good for NUMA architectures

91 Art of Multiprocessor Programming 91 Dissemination Barrier At round i –Thread A notifies thread A+2 i (mod n) Requires log n rounds

92 Art of Multiprocessor Programming 92 Dissemination Barrier +1+2+4

93 Art of Multiprocessor Programming 93 Remarks Great for lovers of combinatorics Not so much fun for those who track memory footprints

94 Art of Multiprocessor Programming 94 Ideas So Far Sense-reversing –Reuse without reinitializing Combining tree –Like counters, locks … Tournament tree –Optimized combining tree Dissemination barrier –Intellectually Pleasing (matter of taste)

95 Art of Multiprocessor Programming 95 Which is best for Multicore? On a cache coherent multicore chip: none of the above… Use a static tree barrier!

96 Art of Multiprocessor Programming 96 Static Tree Barrier All spin on cached copy of Done flag 0 2 2 0 0 Counter in parent: Num of children that have not arrived

97 Art of Multiprocessor Programming 97 Static Tree Barrier All spin on cached copy of Done flag 0 2 2 0 0 1 0 10 All detect change in Done flag

98 Art of Multiprocessor Programming 98 Remarks Very little cache traffic Minimal space overhead Can be laid out optimally on NUMA –Instead of Done bit, notification must be performed by passing sense down the static tree

99 Back to Amdahl’s Law: Speedup = 1/(ParallelPart/N + SequentialPart) Pay for N = 8 cores SequentialPart = 25% Speedup = only 2.9 times! Must parallelize applications on a very fine grain! So…How will we make use of multicores?

100 Need Fine-Grained Locking 75% Unshared 25% Shared cc cc cc cc Coarse Grained c c c c c c c c cc cc cc cc Fine Grained c c c c c c c c The reason we get only 2.9 speedup 75% Unshared 25% Shared

101 Art of Multiprocessor Programming 101 Traditional Scaling Process User code Traditional Uniprocessor Speedup 1.8x 7x 3.6x Time: Moore’s law

102 Art of Multiprocessor Programming 102 Multicore Scaling Process User code Multicore Speedup 1.8x7x3.6x As noted, not so simple…

103 Art of Multiprocessor Programming 103 Real-World Scaling Process 1.8x 2x 2.9x User code Multicore Speedup Parallelization and Synchronization will require great care…

104 Future: Smoother Scaling … Abstractions (like TM) Speedup 1.8x7x3.6x Multicore User code

105 Multicore Programming We are at the dawn of a new era… Still a lot of research and development to be done And a body of multicore programming experience to be accumelated by us all © 2006 Herlihy & Shavit105

106 © 2006 Herlihy & Shavit106 Thanks for Attending! Good luck with your programming… If you have questions or run into interesting problems, just email us: –mph@cs.brown.edu –shanir@cs.tau.ac.il

107 Art of Multiprocessor Programming 107 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.


Download ppt "Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."

Similar presentations


Ads by Google