Presentation is loading. Please wait.

Presentation is loading. Please wait.

Barrier Synchronization

Similar presentations


Presentation on theme: "Barrier Synchronization"— Presentation transcript:

1 Barrier Synchronization
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

2 Art of Multiprocessor Programming
Simple Video Game Prepare frame for display By graphics coprocessor “soft real-time” application Need at least 35 frames/second OK to mess up rarely Art of Multiprocessor Programming

3 Art of Multiprocessor Programming
Simple Video Game while (true) { frame.prepare(); frame.display(); } Art of Multiprocessor Programming

4 Art of Multiprocessor Programming
Simple Video Game while (true) { frame.prepare(); frame.display(); } What about overlapping work? 1st thread displays frame 2nd prepares next frame Art of Multiprocessor Programming

5 Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; Art of Multiprocessor Programming

6 Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; even phases Art of Multiprocessor Programming

7 Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; odd phases Art of Multiprocessor Programming

8 Synchronization Problems
How do threads stay in phase? Too early? “we render no frame before its time” Too late? Recycle memory before frame is displayed Art of Multiprocessor Programming

9 Ideal Parallel Computation
1 1 1 Art of Multiprocessor Programming

10 Ideal Parallel Computation
2 2 2 1 1 1 Art of Multiprocessor Programming

11 Real-Life Parallel Computation
zzz… 1 1 Art of Multiprocessor Programming

12 Real-Life Parallel Computation
2 zzz… 1 1 Uh, oh Art of Multiprocessor Programming

13 Barrier Synchronization
Art of Multiprocessor Programming

14 Barrier Synchronization
1 1 1 Art of Multiprocessor Programming

15 Barrier Synchronization
Until every thread has left here No thread enters here Art of Multiprocessor Programming

16 Art of Multiprocessor Programming
Why Do We Care? Mostly of interest to Scientific & numeric computation Elsewhere Garbage collection Less common in systems programming Still important topic Art of Multiprocessor Programming

17 Art of Multiprocessor Programming
Duality Dual to mutual exclusion Include others, not exclude them Same implementation issues Interaction with caches … Invalidation? Local spinning? Art of Multiprocessor Programming

18 Example: Parallel Prefix
b c d before a a+b a+b+c a+b+c +d after Art of Multiprocessor Programming

19 Art of Multiprocessor Programming
Parallel Prefix One thread Per entry a b c d Art of Multiprocessor Programming

20 Parallel Prefix: Phase 1
b c d a a+b b+c c+d Art of Multiprocessor Programming

21 Parallel Prefix: Phase 2
b c d a a+b a+b+c a+b+c +d Art of Multiprocessor Programming

22 Art of Multiprocessor Programming
Parallel Prefix N threads can compute Parallel prefix Of N entries In log2 N rounds What if system is asynchronous? Why we need barriers Art of Multiprocessor Programming

23 Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming

24 Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Array of input values Art of Multiprocessor Programming

25 Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Thread index Art of Multiprocessor Programming

26 Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Shared barrier Art of Multiprocessor Programming

27 Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Initialize fields Art of Multiprocessor Programming

28 Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; a[i] += sum; d = d * 2; }}} Art of Multiprocessor Programming

29 Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes Art of Multiprocessor Programming

30 Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes Make sure everyone writes before anyone reads Art of Multiprocessor Programming

31 Barrier Implementations
Cache coherence Spin on locally-cached locations? Spin on statically-defined locations? Latency How many steps? Symmetry Do all threads do the same thing? Art of Multiprocessor Programming

32 Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming

33 Number of threads not yet arrived
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Number of threads not yet arrived Art of Multiprocessor Programming

34 Number of threads participating
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Number of threads participating Art of Multiprocessor Programming

35 Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Initialization Art of Multiprocessor Programming

36 Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Principal method Art of Multiprocessor Programming

37 If I’m last, reset fields for next time
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} If I’m last, reset fields for next time Art of Multiprocessor Programming

38 Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Otherwise, wait for everyone else Art of Multiprocessor Programming

39 What’s wrong with this protocol?
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} What’s wrong with this protocol? Art of Multiprocessor Programming

40 Art of Multiprocessor Programming
Reuse Barrier b = new Barrier(n); while ( mumble() ) { work(); b.await() } do work repeat synchronize Art of Multiprocessor Programming

41 Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming

42 Waiting for Phase 1 to finish
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 1 to finish Art of Multiprocessor Programming

43 Waiting for Phase 1 to finish
Barriers Phase 1 is so over public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 1 to finish Art of Multiprocessor Programming

44 Art of Multiprocessor Programming
Barriers Prepare for phase 2 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} ZZZZZ…. Art of Multiprocessor Programming

45 Waiting for Phase 2 to finish Waiting for Phase 1 to finish
Uh-Oh public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 2 to finish Waiting for Phase 1 to finish Art of Multiprocessor Programming

46 Art of Multiprocessor Programming
Basic Problem One thread “wraps around” to start phase 2 While another thread is still waiting for phase 1 One solution: Always use two barriers Art of Multiprocessor Programming

47 Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming

48 Sense-Reversing Barriers
Completed odd or even-numbered phase? public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Sense must be volatile because of Java memory model so that JIT does not remove loop spinning on sense Art of Multiprocessor Programming

49 Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Store sense for next phase Art of Multiprocessor Programming

50 Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Get new sense determined by last phase Art of Multiprocessor Programming

51 Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} If I’m last, reverse sense for next time Art of Multiprocessor Programming

52 Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Otherwise, wait for sense to flip Art of Multiprocessor Programming

53 Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Prepare sense for next phase Art of Multiprocessor Programming

54 Combining Tree Barriers
Art of Multiprocessor Programming

55 Combining Tree Barriers
Art of Multiprocessor Programming

56 Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming

57 Combining Tree Barrier
Parent barrier in tree public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming

58 Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Am I last? Art of Multiprocessor Programming

59 Combining Tree Barrier
Proceed to parent barrier public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await(); count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming

60 Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Prepare for next phase Art of Multiprocessor Programming

61 Combining Tree Barrier
Notify others at this node public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming

62 Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} I’m not last, so wait for notification Art of Multiprocessor Programming

63 Combining Tree Barrier
No sequential bottleneck Parallel getAndDecrement() calls Low memory contention Same reason Cache behavior Local spinning on bus-based architecture Not so good for NUMA Art of Multiprocessor Programming

64 Art of Multiprocessor Programming
Remarks Everyone spins on sense field Local spinning on bus-based (good) Network hot-spot on distributed architecture (bad) Not really scalable Art of Multiprocessor Programming

65 Tournament Tree Barrier
If tree nodes have fan-in 2 Don’t need to call getAndDecrement() Winner chosen statically At level i If i-th bit of id is 0, move up Otherwise keep back Art of Multiprocessor Programming

66 Tournament Tree Barriers
root winner loser winner loser winner loser Art of Multiprocessor Programming

67 Tournament Tree Barriers
All flags blue Art of Multiprocessor Programming

68 Tournament Tree Barriers
Loser thread sets winner’s flag Art of Multiprocessor Programming

69 Tournament Tree Barriers
Loser spins on own flag Art of Multiprocessor Programming

70 Tournament Tree Barriers
Winner spins on own flag Art of Multiprocessor Programming

71 Tournament Tree Barriers
Winner sees own flag, moves up, spins Art of Multiprocessor Programming

72 Tournament Tree Barriers
Bingo! Art of Multiprocessor Programming

73 Tournament Tree Barriers
Sense-reversing: next time use blue flags Art of Multiprocessor Programming

74 Art of Multiprocessor Programming
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; } Art of Multiprocessor Programming

75 Notifications delivered here
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; } Notifications delivered here Art of Multiprocessor Programming

76 Other thead at same level
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; } Other thead at same level Art of Multiprocessor Programming

77 Parent (winner) or null (loser)
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; } Parent (winner) or null (loser) Art of Multiprocessor Programming

78 Art of Multiprocessor Programming
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; } Am I the root? Art of Multiprocessor Programming

79 Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Art of Multiprocessor Programming

80 Art of Multiprocessor Programming
Tournament Barrier Current sense void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Le root, c’est moi Art of Multiprocessor Programming

81 Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} I am already a winner Art of Multiprocessor Programming

82 Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Wait for partner Art of Multiprocessor Programming

83 Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Synchronize upstairs Art of Multiprocessor Programming

84 Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Inform partner Art of Multiprocessor Programming

85 Order is important (why?)
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Inform partner Order is important (why?) Art of Multiprocessor Programming 85 85

86 Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Natural-born loser Art of Multiprocessor Programming

87 Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Tell partner I’m here Art of Multiprocessor Programming

88 Wait for notification from partner
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Wait for notification from partner Art of Multiprocessor Programming

89 Art of Multiprocessor Programming
Remarks No need for read-modify-write calls Each thread spins on fixed location Good for bus-based architectures Good for NUMA architectures Art of Multiprocessor Programming

90 Dissemination Barrier
At round i Thread A notifies thread A+2i (mod n) Requires log n rounds Art of Multiprocessor Programming

91 Dissemination Barrier
+1 +2 +4 Art of Multiprocessor Programming

92 Art of Multiprocessor Programming
Remarks Elegant Good source of homework problems Not cache-friendly Art of Multiprocessor Programming

93 Art of Multiprocessor Programming
Ideas So Far Sense-reversing Reuse without reinitializing Combining tree Like counters, locks … Tournament tree Optimized combining tree Dissemination barrier Intellectually Pleasing (matter of taste) Art of Multiprocessor Programming

94 Which is best for Multicore?
On a cache coherent multicore chip: perhaps none of the above… Here is another (arguably) better algorithm … Art of Multiprocessor Programming

95 One node per thread, statically assigned
Static Tree Barrier We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent One node per thread, statically assigned Art of Multiprocessor Programming

96 Art of Multiprocessor Programming
Static Tree Barrier Sense-reversing flag We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 96 96

97 Node has count of missing children
Static Tree Barrier 2 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Node has count of missing children Art of Multiprocessor Programming 97 97

98 Art of Multiprocessor Programming
Static Tree Barrier 2 Spin until zero … 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 98 98

99 My counter is zero, decrement parent
Static Tree Barrier 2 2 1 My counter is zero, decrement parent We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 99 99

100 Art of Multiprocessor Programming
Static Tree Barrier 2 2 1 Spin on done flag We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 100 100

101 Art of Multiprocessor Programming
Static Tree Barrier 1 2 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 101 101

102 Art of Multiprocessor Programming
Static Tree Barrier 1 2 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 102 102

103 Art of Multiprocessor Programming
Static Tree Barrier 1 2 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 103 103

104 Art of Multiprocessor Programming
Static Tree Barrier 1 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 104 104

105 Art of Multiprocessor Programming
Static Tree Barrier 1 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 105 105

106 Art of Multiprocessor Programming
Static Tree Barrier yowzah! 1 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 106 106

107 Art of Multiprocessor Programming
Static Tree Barrier yowzah! 1 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 107 107

108 Art of Multiprocessor Programming
Static Tree Barrier yowzah! 1 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 108 108

109 Art of Multiprocessor Programming
Static Tree Barrier yes! yes! 1 yes! 1 yes! yes! We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 109 109

110 Art of Multiprocessor Programming
Static Tree Barrier 2 1 1 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 110 110

111 Art of Multiprocessor Programming
Remarks Very little cache traffic Minimal space overhead On message-passing architecture Send notification & sense down tree Art of Multiprocessor Programming

112 The Nature of Progress*
Some of the material in this lecture appears in chapter 3 of the textbook. The rest appear in the article: On the nature of progress, Herlihy and Shavit, 2008 which can be found at http: // Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

113 Concurrent Programming
Many real-word data structures blocking (lock-based) implementations & non-blocking (no locks) implementations For example: linked lists, queues, stacks, hash maps,… Does this make sense?

114 Concurrent Programming
Many data structures combine blocking & non-blocking methods Java™ concurrency package skiplists, hash tables, exchangers on 10 million desktops. Can seemingly contradictory conditions co-exist in same alg?…

115 Progress Conditions Deadlock-free: Starvation-free: Lock-free:
Some thread eventually acquires lock. Starvation-free: Every thread eventually acquires lock. Lock-free: Some method call returns. Wait-free: Every method call returns. Obstruction-free: Every method call returns if it executes in isolation We will show an example shortly

116 List-Based Sets Unordered collection of elements No duplicates Methods
Add() a new element Remove() an element Contains() if element is present

117 Coarse Grained Locking
b d c Lock is starvation-free: every attempt to acquire the lock eventually succeeds.

118 Fine Grained (Lock Coupling)
b d Overlapping locks detect overlapping operations Deadlock-free: some thread eventually acquires lock.

119 Optimistic Fine Grained
b c e add(), remove(), contains() lock destination nodes in order Deadlock-free: some thread trying to acquire the locks eventually succeeds.

120 Obstruction-free contains()
d Snapshot: if all nodes traversed twice are the same Obstruction-free: the method returns if it executes in isolation for long enough.

121 The Simple Snapshot is Obstruction-Free
Put increasing labels on each entry Collect twice If both agree, We’re done Otherwise, Try again Collect1 Collect2 1 22 7 13 18 12 1 22 7 13 18 12 Re call: if none of the labels (timestamps) changed, then there was a point, after the end of the first collect, and before the start of the next collect, in which none of the registers were written to. The values collected correspond to the values that were all together in memory at that point in time. =

122 Obstruction-freedom In the simple snapshot alg:
The update method is wait-free But scan is obstruction-free Completes if it executes in isolation (no concurrent updates).

123 Wait-free contains() a a b 1 d c e Use mark bit + list ordering
b 1 d c e Use mark bit + list ordering Not marked  in the set Marked or missing  not in the set

124 Lazy List-based Set Alg
b 1 d c e Combine blocking and non-blocking: deadlock-free add() and remove() and wait-free contains()

125 Lock-free List-Based Set
Logical Removal = Set Mark Bit a a b 1 c c e d mark and reference CASed together CAS will fail We say that "an alg is lock-free/starvation-free/.../wait-free" if all its methods together provide the given property". For individual methods, the rule should apply to calls of the given type of method only (as is currently defined). Thus, our version of Michael's lock-free linked list has a wait-free contains(), obstruction-free add() and remove(), and the algorithm as a whole is lock-free. What it means is that if you want to guarantee successful add() calls you need to add backoff on the remove() calls... Another example is the Obs-free snapshot consisting of two collects, snapshot only fails because updates succeed and yet we say its obs-free. So the snapshot algorithm as a whole is lock-free, updates are wait-free, and the snapshot method as a whole is obstruction-free. Lock-free add() and remove() and wait-free contains()

126 So how can this make sense?
Why have methods with different progress conditions? Let us try to understand this… Art of Multiprocessor Programming© Copyright Herlihy-Shavit 2007

127 Progress Conditions Deadlock-free: Starvation-free: Lock-free:
Some thread eventually acquires lock. Starvation-free: Every thread eventually acquires lock. Lock-free: Some method call returns. Wait-free: Every method call returns. Obstruction-free: Every method call returns if it executes in isolation

128 A “Periodic Table” of Progress Conditions
Non-Blocking Blocking All make progress Wait- free Obstruction- free Starvation- free Some make progress Lock- free Deadlock- free

129 More Formally Standard notion of abstract object
Progress conditions relate to method calls of an object A thread is active if it takes an infinite number of concrete (machine level) steps And is suspended if not.

130 Flags courtesy of www.theodora.com/flags used with permission
Maximal vs. Minimal Minimal progress some call eventually completes System matters, not individuals Maximal progress every call eventually completes. Individuals matter In some sense, the weakest interesting notion of progress requires that the system as a whole continues to advance. Consider a fixed history $H$. A collection of methods of a given object provides \emph{minimal progress} in $H$ if, in every suffix of $H$, some pending active invocation of one of the methods in the collection has a matching response. In other words, there is no point in the history where all threads that called abstract methods in the collection take an infinite number of concrete steps without returning. This condition might, for example, be useful for a thread pool, where we care about advancing the overall computation, but do not care whether individual threads are underutilized. The strongest notion of progress, and arguably the one most programmers actually want, requires that each individual thread continues to advance. A collection of methods of a given object provides \emph{maximal progress} in a history $H$ if in every suffix of $H$, every pending active invocation of a method in the collection has a matching response. In other words, there is no point in the history where a thread that calls the abstract method in the collection takes an infinite number of concrete steps without returning. This condition might be useful for a web server, where each thread represents a customer request, and we care about advancing each individual computation. The condition is the difference between the requirements of a thread pool versus those of a web server. This condition might, for example, be useful for a thread In the latter case the condition might be useful for a web server, where Flags courtesy of used with permission

131 The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free Minimal progress Although these progress conditions may have seemed quite different, each provides either minimal or maximal progress with respect to some set of histories. The result is a simple and regular structure illustrated in the ``periodic table'' shown in Figure~\ref{figure:progress} (and its more complete counterpart in Figure~\ref{figure:clash}). These observations may appear so simple as to be obvious in retrospect, but we have never seen them described in this way. There are three dividing lines, two vertical and one horizontal, that split the five conditions. The leftmost vertical line separates dependent conditions from the rest. The lock-free and wait-free properties apply to any histories, while obstruction-freedom, starvation-freedom, and deadlock-freedom require some kind of external scheduler support to guarantee progress. The rightmost vertical line separates the blocking and non-blocking conditions. The lock-free, wait-free, and obstruction-free conditions are non-blocking: if a suspended thread stops at an arbitrary point in a method call, at least some active threads can make progress. The deadlock-free and starvation-free conditions do not have this property. Finally, the horizontal line separates the minimal and maximal progress conditions. The minimal conditions guarantee the system as a whole makes progress while the maximal conditions guarantee that each thread makes progress. For brevity, \emph{minimal} progress properties encompass the lock-free and deadlock-free properties, while \emph{maximal} properties encompass the wait-free, starvation-free, and obstruction-free properties. Later we will see several ways to cross this line. One way is ``helping'' (for lack of space not included in this extended abstract), an algorithmic technique that has threads help others so each and every thread makes progress. However, in many cases, algorithms that employ helping are costly. An alternative and less costly approach is to make additional assumptions on scheduling. Lock- free Deadlock- free

132 The Scheduler’s Role Multiprocessor progress properties:
Are not about the guarantees a method's implementation provides. Are about scheduling needed to provide minimal or maximal progress. Thus, the various progress conditions are not about the progress guarantees their implementations must provide. All the properties in the table imply the same thing, maximal progress, yet they differ in the combination of scheduling assumptions necessary for an implementation to provide it. Put differently, programmers design lock-free, obstruction-free, or deadlock-free algorithms, but what they are implicitly assuming is that because of how schedulers on modern multiprocessors work, all method calls eventually complete as if they were wait-free.

133 Fair Scheduling A history is fair if each thread takes an infinite number of steps A method implementation is deadlock-free if it guarantees minimal progress in every fair history. The restriction to fair histories captures the informal requirement that each thread eventually leaves its critical section. The definition does not mention locks or criti- cal sections because progress should be defined in terms of completed method calls, not low-level mechanisms. Moreover, as noted, not all deadlock-free object imple- mentations will have easily recognizable locks and critical sections. The requirement that the implementation provide maximal progress in some fair history is intended to rule out certain pathological cases. For example, the first thread to access an object might lock it and never release the lock. Such an imple- mentation guarantees minimal progress (for the thread holding the lock) in every fair execution, but does not provide maximal progress in any execution. Clearly, such an implementation would not be considered acceptable in practice and is of no interest to us.

134 Starvation Freedom A method implementation is starvation-free if it guarantees maximal progress in every fair history. Progress extends to an object by considering all its methods together .

135 Dependent Progress Dependent progress conditions Independent ones do.
Do not guarantee minimal progress in every history Independent ones do. Blocking progress conditions deadlock-freedom, Starvation-freedom are dependent. We say that a Progress is dependent if progress requires scheduler support

136 Non-blocking Independent Conditions
A lock-free method guarantees minimal progress in every history. A wait-free method guarantees maximal progress The restriction to fair histories captures the informal requirement that each thread eventually leaves its critical section. The de¯nition does not mention locks or criti- cal sections because progress should be de¯ned in terms of completed method calls, not low-level mechanisms. Moreover, as noted, not all deadlock-free object imple- mentations will have easily recognizable locks and critical sections. The requirement that the implementation provide maximal progress in some fair history is intended to rule out certain pathological cases. For example, the ¯rst thread to access an object might lock it and never release the lock. Such an imple- mentation guarantees minimal progress (for the thread holding the lock) in every fair execution, but does not provide maximal progress in any execution. Clearly, such an implementation would not be considered acceptable in practice and is of no interest to us.

137 The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free Minimal progress On multiprocessors progress properties are not about the guarantees a method's implementation provides. Rather, they are about the assumptions one needs to make on the scheduler so that a method's implementation provides minimal or maximal progress. Lock- free Deadlock- free Dependent Independent

138 Uniformly Isolating Schedules
A history is uniformly isolating if any thread eventually runs by itself for “long enough” Modern systems do this with backoff, yield, etc. Later is in the next chapter on spin locks

139 A Non-blocking Dependent Condition
A method implementation is obstruction-free if it guarantees maximal progress in every uniformly isolating history. The restriction to fair histories captures the informal requirement that each thread eventually leaves its critical section. The de¯nition does not mention locks or criti- cal sections because progress should be de¯ned in terms of completed method calls, not low-level mechanisms. Moreover, as noted, not all deadlock-free object imple- mentations will have easily recognizable locks and critical sections. The requirement that the implementation provide maximal progress in some fair history is intended to rule out certain pathological cases. For example, the ¯rst thread to access an object might lock it and never release the lock. Such an imple- mentation guarantees minimal progress (for the thread holding the lock) in every fair execution, but does not provide maximal progress in any execution. Clearly, such an implementation would not be considered acceptable in practice and is of no interest to us.

140 The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Uniform iso scheduler Maximal progress Wait- free Fair scheduler Obstruction- free Starvation- free In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Fair scheduler Minimal progress Lock- free Deadlock- free Independent Dependent

141 The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free Independent Dependent

142 Clash-Freedom: the “Einsteinium” of Progress
A method implementation is clash-free if it guarantees minimal progress in every uniformly isolating history. Thm: clash-freedom strictly weaker than obstruction-freedom Like Einsteinium, symbol Es, atomic number 99, it does not occur naturally in any measurable quantities and has no commercial value. In the full paper we will show that being clash-free is strictly weaker than being obstruction-free, a result omitted from this extended abstract for lack of space. Clash-freedom thus answers the open question raised by Herlihy, Luchangco, and Moir [6], whether obstruction-freedom is the weakest natural non-blocking progress condition. Unlike Einsteinium is not radioactive but like it has of no commercial importance…

143 Getting from Minimal to Maximal
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free ? But helping is expensive In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free Helping Independent Dependent

144 Universal Constructions
Lock-free universal construction provides minimal progress A scheduler is benevolent if it guarantees maximal progress anyway Real-world OS schedulers are benevolent mostly They do not persecute any individual thread

145 Getting from Minimal to Maximal
Universal Lock-free Construction Getting from Minimal to Maximal Universal Wait-free Construction Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free ? For a one time object like consensus where each thread executes a method once, wait-free and lock-free are the same… In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free Helping Use Wait-free/Lock-free Consensus Objects Independent Dependent

146 Getting from Minimal to Maximal
Universal Wait-free Construction Non-Blocking Blocking Universal Lock-free Construction Maximal progress Wait- free Obstruction- free Starvation- free In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free If we use Starvation-free/Deadlock-free Consensus Objects result is respectively Starvation-free/Deadlock-free Independent Dependent

147 Maximal Progress Postulate
Programmers want maximal progress. Methods’ progress conditions define What we expect from the scheduler For example Don’t halt in critical section Let me run in isolation long enough …

148 Art of Multiprocessor Programming
Why Lock-Free is OK We all want maximal progress Wait-free Yet we often write lock-free or deadlock-free lock-based algorithms OK if we expect the scheduler to be benevolent Often true (not always!) Art of Multiprocessor Programming

149 Shared-Memory Computability
10011 What is (and is not) concurrently computable Wait-free Atomic Registers Lock-free/Wait-free Hierarchy and Universal Constructions In the same way, we make little attempt to make our initial constructions efficient. We are interested in understanding whether such constructions exist, and how they work, but they are not intended to be a practical model for computation, so we prefer easy-to-understand but inefficient constructions over complicated but efficient ones.

150 Troubling Intellectual Question…
I think I think, therefore I think I am (Ambrose Bierce) Why use non-blocking lock-free and wait-free conditions when most code uses locks?

151 The Answer Not about being non-blocking… About being independent!
Do not rely on the good behavior of the scheduler.

152 Reads and Writes Infinite tape Finite State Controller
By Analogy to Church-Turing abanbnan 1 Reads and Writes Infinite tape Finite State Controller Using a dependent condition is like relying on an oracle to recognize languages… The dependency masks the true power of the concurrent object… The classical theory of sequential computing proceeds in stages. It starts with finite-state automata, moves on to push-down automata, and culminates in Turing Machines. A Turing machine is an idealized model of computation, consisting of a finite-state controller and an infinite tape. It is safe to say that anything you can’t do on a Turing Machine is something you can’t do period. If you can devise a Turing machine program to do something, it doesn’t mean you can do it in a practical sense, but it gives you a place to start working on the problem.

153 Shared-Memory Computability
10011 Independent progress: use Lock-free and Wait-free Memory Hierarchy and Universal Constructions In the same way, we make little attempt to make our initial constructions efficient. We are interested in understanding whether such constructions exist, and how they work, but they are not intended to be a practical model for computation, so we prefer easy-to-understand but inefficient constructions over complicated but efficient ones.

154 Programmers Expect the Best
Programmers expect maximal progress. Progress conditions define scheduler requirements necessary to achieve it.

155 This Concludes Our Course Material
Principles: Mathematical foundations of multicore programming Practice: How multicore software and the architectures it runs on embody these principles Art of Multiprocessor Programming

156 Art of Multiprocessor Programming
          This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming


Download ppt "Barrier Synchronization"

Similar presentations


Ads by Google