Download presentation
Presentation is loading. Please wait.
1
Barrier Synchronization
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
2
Art of Multiprocessor Programming
Simple Video Game Prepare frame for display By graphics coprocessor “soft real-time” application Need at least 35 frames/second OK to mess up rarely Art of Multiprocessor Programming
3
Art of Multiprocessor Programming
Simple Video Game while (true) { frame.prepare(); frame.display(); } Art of Multiprocessor Programming
4
Art of Multiprocessor Programming
Simple Video Game while (true) { frame.prepare(); frame.display(); } What about overlapping work? 1st thread displays frame 2nd prepares next frame Art of Multiprocessor Programming
5
Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; Art of Multiprocessor Programming
6
Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; even phases Art of Multiprocessor Programming
7
Art of Multiprocessor Programming
Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; odd phases Art of Multiprocessor Programming
8
Synchronization Problems
How do threads stay in phase? Too early? “we render no frame before its time” Too late? Recycle memory before frame is displayed Art of Multiprocessor Programming
9
Ideal Parallel Computation
1 1 1 Art of Multiprocessor Programming
10
Ideal Parallel Computation
2 2 2 1 1 1 Art of Multiprocessor Programming
11
Real-Life Parallel Computation
zzz… 1 1 Art of Multiprocessor Programming
12
Real-Life Parallel Computation
2 zzz… 1 1 Uh, oh Art of Multiprocessor Programming
13
Barrier Synchronization
Art of Multiprocessor Programming
14
Barrier Synchronization
1 1 1 Art of Multiprocessor Programming
15
Barrier Synchronization
Until every thread has left here No thread enters here Art of Multiprocessor Programming
16
Art of Multiprocessor Programming
Why Do We Care? Mostly of interest to Scientific & numeric computation Elsewhere Garbage collection Less common in systems programming Still important topic Art of Multiprocessor Programming
17
Art of Multiprocessor Programming
Duality Dual to mutual exclusion Include others, not exclude them Same implementation issues Interaction with caches … Invalidation? Local spinning? Art of Multiprocessor Programming
18
Example: Parallel Prefix
b c d before a a+b a+b+c a+b+c +d after Art of Multiprocessor Programming
19
Art of Multiprocessor Programming
Parallel Prefix One thread Per entry a b c d Art of Multiprocessor Programming
20
Parallel Prefix: Phase 1
b c d a a+b b+c c+d Art of Multiprocessor Programming
21
Parallel Prefix: Phase 2
b c d a a+b a+b+c a+b+c +d Art of Multiprocessor Programming
22
Art of Multiprocessor Programming
Parallel Prefix N threads can compute Parallel prefix Of N entries In log2 N rounds What if system is asynchronous? Why we need barriers Art of Multiprocessor Programming
23
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming
24
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Array of input values Art of Multiprocessor Programming
25
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Thread index Art of Multiprocessor Programming
26
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Shared barrier Art of Multiprocessor Programming
27
Art of Multiprocessor Programming
Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Initialize fields Art of Multiprocessor Programming
28
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; a[i] += sum; d = d * 2; }}} Art of Multiprocessor Programming
29
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes Art of Multiprocessor Programming
30
Where Do the Barriers Go?
public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes Make sure everyone writes before anyone reads Art of Multiprocessor Programming
31
Barrier Implementations
Cache coherence Spin on locally-cached locations? Spin on statically-defined locations? Latency How many steps? Symmetry Do all threads do the same thing? Art of Multiprocessor Programming
32
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming
33
Number of threads not yet arrived
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Number of threads not yet arrived Art of Multiprocessor Programming
34
Number of threads participating
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Number of threads participating Art of Multiprocessor Programming
35
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Initialization Art of Multiprocessor Programming
36
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Principal method Art of Multiprocessor Programming
37
If I’m last, reset fields for next time
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} If I’m last, reset fields for next time Art of Multiprocessor Programming
38
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Otherwise, wait for everyone else Art of Multiprocessor Programming
39
What’s wrong with this protocol?
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} What’s wrong with this protocol? Art of Multiprocessor Programming
40
Art of Multiprocessor Programming
Reuse Barrier b = new Barrier(n); while ( mumble() ) { work(); b.await() } do work repeat synchronize Art of Multiprocessor Programming
41
Art of Multiprocessor Programming
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming
42
Waiting for Phase 1 to finish
Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 1 to finish Art of Multiprocessor Programming
43
Waiting for Phase 1 to finish
Barriers Phase 1 is so over public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 1 to finish Art of Multiprocessor Programming
44
Art of Multiprocessor Programming
Barriers Prepare for phase 2 public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} ZZZZZ…. Art of Multiprocessor Programming
45
Waiting for Phase 2 to finish Waiting for Phase 1 to finish
Uh-Oh public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Waiting for Phase 2 to finish Waiting for Phase 1 to finish Art of Multiprocessor Programming
46
Art of Multiprocessor Programming
Basic Problem One thread “wraps around” to start phase 2 While another thread is still waiting for phase 1 One solution: Always use two barriers Art of Multiprocessor Programming
47
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming
48
Sense-Reversing Barriers
Completed odd or even-numbered phase? public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Sense must be volatile because of Java memory model so that JIT does not remove loop spinning on sense Art of Multiprocessor Programming
49
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Store sense for next phase Art of Multiprocessor Programming
50
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Get new sense determined by last phase Art of Multiprocessor Programming
51
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} If I’m last, reverse sense for next time Art of Multiprocessor Programming
52
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Otherwise, wait for sense to flip Art of Multiprocessor Programming
53
Sense-Reversing Barriers
public class Barrier { AtomicInteger count; int size; volatile boolean sense = false; threadSense = new ThreadLocal<boolean>… public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Prepare sense for next phase Art of Multiprocessor Programming
54
Combining Tree Barriers
Art of Multiprocessor Programming
55
Combining Tree Barriers
Art of Multiprocessor Programming
56
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
57
Combining Tree Barrier
Parent barrier in tree public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
58
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Am I last? Art of Multiprocessor Programming
59
Combining Tree Barrier
Proceed to parent barrier public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await(); count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
60
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Prepare for next phase Art of Multiprocessor Programming
61
Combining Tree Barrier
Notify others at this node public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming
62
Combining Tree Barrier
public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} I’m not last, so wait for notification Art of Multiprocessor Programming
63
Combining Tree Barrier
No sequential bottleneck Parallel getAndDecrement() calls Low memory contention Same reason Cache behavior Local spinning on bus-based architecture Not so good for NUMA Art of Multiprocessor Programming
64
Art of Multiprocessor Programming
Remarks Everyone spins on sense field Local spinning on bus-based (good) Network hot-spot on distributed architecture (bad) Not really scalable Art of Multiprocessor Programming
65
Tournament Tree Barrier
If tree nodes have fan-in 2 Don’t need to call getAndDecrement() Winner chosen statically At level i If i-th bit of id is 0, move up Otherwise keep back Art of Multiprocessor Programming
66
Tournament Tree Barriers
root winner loser winner loser winner loser Art of Multiprocessor Programming
67
Tournament Tree Barriers
All flags blue Art of Multiprocessor Programming
68
Tournament Tree Barriers
Loser thread sets winner’s flag Art of Multiprocessor Programming
69
Tournament Tree Barriers
Loser spins on own flag Art of Multiprocessor Programming
70
Tournament Tree Barriers
Winner spins on own flag Art of Multiprocessor Programming
71
Tournament Tree Barriers
Winner sees own flag, moves up, spins Art of Multiprocessor Programming
72
Tournament Tree Barriers
Bingo! Art of Multiprocessor Programming
73
Tournament Tree Barriers
Sense-reversing: next time use blue flags Art of Multiprocessor Programming
74
Art of Multiprocessor Programming
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Art of Multiprocessor Programming
75
Notifications delivered here
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Notifications delivered here Art of Multiprocessor Programming
76
Other thead at same level
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Other thead at same level Art of Multiprocessor Programming
77
Parent (winner) or null (loser)
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Parent (winner) or null (loser) Art of Multiprocessor Programming
78
Art of Multiprocessor Programming
Tournament Barrier class TBarrier { volatile boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Am I the root? Art of Multiprocessor Programming
79
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Art of Multiprocessor Programming
80
Art of Multiprocessor Programming
Tournament Barrier Current sense void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Le root, c’est moi Art of Multiprocessor Programming
81
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} I am already a winner Art of Multiprocessor Programming
82
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Wait for partner Art of Multiprocessor Programming
83
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Synchronize upstairs Art of Multiprocessor Programming
84
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Inform partner Art of Multiprocessor Programming
85
Order is important (why?)
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Inform partner Order is important (why?) Art of Multiprocessor Programming 85 85
86
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Natural-born loser Art of Multiprocessor Programming
87
Art of Multiprocessor Programming
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Tell partner I’m here Art of Multiprocessor Programming
88
Wait for notification from partner
Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { }}} Wait for notification from partner Art of Multiprocessor Programming
89
Art of Multiprocessor Programming
Remarks No need for read-modify-write calls Each thread spins on fixed location Good for bus-based architectures Good for NUMA architectures Art of Multiprocessor Programming
90
Dissemination Barrier
At round i Thread A notifies thread A+2i (mod n) Requires log n rounds Art of Multiprocessor Programming
91
Dissemination Barrier
+1 +2 +4 Art of Multiprocessor Programming
92
Art of Multiprocessor Programming
Remarks Elegant Good source of homework problems Not cache-friendly Art of Multiprocessor Programming
93
Art of Multiprocessor Programming
Ideas So Far Sense-reversing Reuse without reinitializing Combining tree Like counters, locks … Tournament tree Optimized combining tree Dissemination barrier Intellectually Pleasing (matter of taste) Art of Multiprocessor Programming
94
Which is best for Multicore?
On a cache coherent multicore chip: perhaps none of the above… Here is another (arguably) better algorithm … Art of Multiprocessor Programming
95
One node per thread, statically assigned
Static Tree Barrier We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent One node per thread, statically assigned Art of Multiprocessor Programming
96
Art of Multiprocessor Programming
Static Tree Barrier Sense-reversing flag We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 96 96
97
Node has count of missing children
Static Tree Barrier 2 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Node has count of missing children Art of Multiprocessor Programming 97 97
98
Art of Multiprocessor Programming
Static Tree Barrier 2 Spin until zero … 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 98 98
99
My counter is zero, decrement parent
Static Tree Barrier 2 2 1 My counter is zero, decrement parent We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 99 99
100
Art of Multiprocessor Programming
Static Tree Barrier 2 2 1 Spin on done flag We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 100 100
101
Art of Multiprocessor Programming
Static Tree Barrier 1 2 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 101 101
102
Art of Multiprocessor Programming
Static Tree Barrier 1 2 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 102 102
103
Art of Multiprocessor Programming
Static Tree Barrier 1 2 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 103 103
104
Art of Multiprocessor Programming
Static Tree Barrier 1 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 104 104
105
Art of Multiprocessor Programming
Static Tree Barrier 1 2 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 105 105
106
Art of Multiprocessor Programming
Static Tree Barrier yowzah! 1 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 106 106
107
Art of Multiprocessor Programming
Static Tree Barrier yowzah! 1 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 107 107
108
Art of Multiprocessor Programming
Static Tree Barrier yowzah! 1 1 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 108 108
109
Art of Multiprocessor Programming
Static Tree Barrier yes! yes! 1 yes! 1 yes! yes! We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 109 109
110
Art of Multiprocessor Programming
Static Tree Barrier 2 1 1 2 We describe the static tree using counters in the nodes but could use simple array of locations, each written by a child, all spun on (repeatedly read) by the parent Art of Multiprocessor Programming 110 110
111
Art of Multiprocessor Programming
Remarks Very little cache traffic Minimal space overhead On message-passing architecture Send notification & sense down tree Art of Multiprocessor Programming
112
The Nature of Progress*
Some of the material in this lecture appears in chapter 3 of the textbook. The rest appear in the article: On the nature of progress, Herlihy and Shavit, 2008 which can be found at http: // Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit
113
Concurrent Programming
Many real-word data structures blocking (lock-based) implementations & non-blocking (no locks) implementations For example: linked lists, queues, stacks, hash maps,… Does this make sense?
114
Concurrent Programming
Many data structures combine blocking & non-blocking methods Java™ concurrency package skiplists, hash tables, exchangers on 10 million desktops. Can seemingly contradictory conditions co-exist in same alg?…
115
Progress Conditions Deadlock-free: Starvation-free: Lock-free:
Some thread eventually acquires lock. Starvation-free: Every thread eventually acquires lock. Lock-free: Some method call returns. Wait-free: Every method call returns. Obstruction-free: Every method call returns if it executes in isolation We will show an example shortly
116
List-Based Sets Unordered collection of elements No duplicates Methods
Add() a new element Remove() an element Contains() if element is present
117
Coarse Grained Locking
b d c Lock is starvation-free: every attempt to acquire the lock eventually succeeds.
118
Fine Grained (Lock Coupling)
b d Overlapping locks detect overlapping operations Deadlock-free: some thread eventually acquires lock.
119
Optimistic Fine Grained
b c e add(), remove(), contains() lock destination nodes in order Deadlock-free: some thread trying to acquire the locks eventually succeeds.
120
Obstruction-free contains()
d Snapshot: if all nodes traversed twice are the same Obstruction-free: the method returns if it executes in isolation for long enough.
121
The Simple Snapshot is Obstruction-Free
Put increasing labels on each entry Collect twice If both agree, We’re done Otherwise, Try again Collect1 Collect2 1 22 7 13 18 12 1 22 7 13 18 12 Re call: if none of the labels (timestamps) changed, then there was a point, after the end of the first collect, and before the start of the next collect, in which none of the registers were written to. The values collected correspond to the values that were all together in memory at that point in time. =
122
Obstruction-freedom In the simple snapshot alg:
The update method is wait-free But scan is obstruction-free Completes if it executes in isolation (no concurrent updates).
123
Wait-free contains() a a b 1 d c e Use mark bit + list ordering
b 1 d c e Use mark bit + list ordering Not marked in the set Marked or missing not in the set
124
Lazy List-based Set Alg
b 1 d c e Combine blocking and non-blocking: deadlock-free add() and remove() and wait-free contains()
125
Lock-free List-Based Set
Logical Removal = Set Mark Bit a a b 1 c c e d mark and reference CASed together CAS will fail We say that "an alg is lock-free/starvation-free/.../wait-free" if all its methods together provide the given property". For individual methods, the rule should apply to calls of the given type of method only (as is currently defined). Thus, our version of Michael's lock-free linked list has a wait-free contains(), obstruction-free add() and remove(), and the algorithm as a whole is lock-free. What it means is that if you want to guarantee successful add() calls you need to add backoff on the remove() calls... Another example is the Obs-free snapshot consisting of two collects, snapshot only fails because updates succeed and yet we say its obs-free. So the snapshot algorithm as a whole is lock-free, updates are wait-free, and the snapshot method as a whole is obstruction-free. Lock-free add() and remove() and wait-free contains()
126
So how can this make sense?
Why have methods with different progress conditions? Let us try to understand this… Art of Multiprocessor Programming© Copyright Herlihy-Shavit 2007
127
Progress Conditions Deadlock-free: Starvation-free: Lock-free:
Some thread eventually acquires lock. Starvation-free: Every thread eventually acquires lock. Lock-free: Some method call returns. Wait-free: Every method call returns. Obstruction-free: Every method call returns if it executes in isolation
128
A “Periodic Table” of Progress Conditions
Non-Blocking Blocking All make progress Wait- free Obstruction- free Starvation- free Some make progress Lock- free Deadlock- free
129
More Formally Standard notion of abstract object
Progress conditions relate to method calls of an object A thread is active if it takes an infinite number of concrete (machine level) steps And is suspended if not.
130
Flags courtesy of www.theodora.com/flags used with permission
Maximal vs. Minimal Minimal progress some call eventually completes System matters, not individuals Maximal progress every call eventually completes. Individuals matter In some sense, the weakest interesting notion of progress requires that the system as a whole continues to advance. Consider a fixed history $H$. A collection of methods of a given object provides \emph{minimal progress} in $H$ if, in every suffix of $H$, some pending active invocation of one of the methods in the collection has a matching response. In other words, there is no point in the history where all threads that called abstract methods in the collection take an infinite number of concrete steps without returning. This condition might, for example, be useful for a thread pool, where we care about advancing the overall computation, but do not care whether individual threads are underutilized. The strongest notion of progress, and arguably the one most programmers actually want, requires that each individual thread continues to advance. A collection of methods of a given object provides \emph{maximal progress} in a history $H$ if in every suffix of $H$, every pending active invocation of a method in the collection has a matching response. In other words, there is no point in the history where a thread that calls the abstract method in the collection takes an infinite number of concrete steps without returning. This condition might be useful for a web server, where each thread represents a customer request, and we care about advancing each individual computation. The condition is the difference between the requirements of a thread pool versus those of a web server. This condition might, for example, be useful for a thread In the latter case the condition might be useful for a web server, where Flags courtesy of used with permission
131
The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free Minimal progress Although these progress conditions may have seemed quite different, each provides either minimal or maximal progress with respect to some set of histories. The result is a simple and regular structure illustrated in the ``periodic table'' shown in Figure~\ref{figure:progress} (and its more complete counterpart in Figure~\ref{figure:clash}). These observations may appear so simple as to be obvious in retrospect, but we have never seen them described in this way. There are three dividing lines, two vertical and one horizontal, that split the five conditions. The leftmost vertical line separates dependent conditions from the rest. The lock-free and wait-free properties apply to any histories, while obstruction-freedom, starvation-freedom, and deadlock-freedom require some kind of external scheduler support to guarantee progress. The rightmost vertical line separates the blocking and non-blocking conditions. The lock-free, wait-free, and obstruction-free conditions are non-blocking: if a suspended thread stops at an arbitrary point in a method call, at least some active threads can make progress. The deadlock-free and starvation-free conditions do not have this property. Finally, the horizontal line separates the minimal and maximal progress conditions. The minimal conditions guarantee the system as a whole makes progress while the maximal conditions guarantee that each thread makes progress. For brevity, \emph{minimal} progress properties encompass the lock-free and deadlock-free properties, while \emph{maximal} properties encompass the wait-free, starvation-free, and obstruction-free properties. Later we will see several ways to cross this line. One way is ``helping'' (for lack of space not included in this extended abstract), an algorithmic technique that has threads help others so each and every thread makes progress. However, in many cases, algorithms that employ helping are costly. An alternative and less costly approach is to make additional assumptions on scheduling. Lock- free Deadlock- free
132
The Scheduler’s Role Multiprocessor progress properties:
Are not about the guarantees a method's implementation provides. Are about scheduling needed to provide minimal or maximal progress. Thus, the various progress conditions are not about the progress guarantees their implementations must provide. All the properties in the table imply the same thing, maximal progress, yet they differ in the combination of scheduling assumptions necessary for an implementation to provide it. Put differently, programmers design lock-free, obstruction-free, or deadlock-free algorithms, but what they are implicitly assuming is that because of how schedulers on modern multiprocessors work, all method calls eventually complete as if they were wait-free.
133
Fair Scheduling A history is fair if each thread takes an infinite number of steps A method implementation is deadlock-free if it guarantees minimal progress in every fair history. The restriction to fair histories captures the informal requirement that each thread eventually leaves its critical section. The definition does not mention locks or criti- cal sections because progress should be defined in terms of completed method calls, not low-level mechanisms. Moreover, as noted, not all deadlock-free object imple- mentations will have easily recognizable locks and critical sections. The requirement that the implementation provide maximal progress in some fair history is intended to rule out certain pathological cases. For example, the first thread to access an object might lock it and never release the lock. Such an imple- mentation guarantees minimal progress (for the thread holding the lock) in every fair execution, but does not provide maximal progress in any execution. Clearly, such an implementation would not be considered acceptable in practice and is of no interest to us.
134
Starvation Freedom A method implementation is starvation-free if it guarantees maximal progress in every fair history. Progress extends to an object by considering all its methods together .
135
Dependent Progress Dependent progress conditions Independent ones do.
Do not guarantee minimal progress in every history Independent ones do. Blocking progress conditions deadlock-freedom, Starvation-freedom are dependent. We say that a Progress is dependent if progress requires scheduler support
136
Non-blocking Independent Conditions
A lock-free method guarantees minimal progress in every history. A wait-free method guarantees maximal progress The restriction to fair histories captures the informal requirement that each thread eventually leaves its critical section. The de¯nition does not mention locks or criti- cal sections because progress should be de¯ned in terms of completed method calls, not low-level mechanisms. Moreover, as noted, not all deadlock-free object imple- mentations will have easily recognizable locks and critical sections. The requirement that the implementation provide maximal progress in some fair history is intended to rule out certain pathological cases. For example, the ¯rst thread to access an object might lock it and never release the lock. Such an imple- mentation guarantees minimal progress (for the thread holding the lock) in every fair execution, but does not provide maximal progress in any execution. Clearly, such an implementation would not be considered acceptable in practice and is of no interest to us.
137
The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free Minimal progress On multiprocessors progress properties are not about the guarantees a method's implementation provides. Rather, they are about the assumptions one needs to make on the scheduler so that a method's implementation provides minimal or maximal progress. Lock- free Deadlock- free Dependent Independent
138
Uniformly Isolating Schedules
A history is uniformly isolating if any thread eventually runs by itself for “long enough” Modern systems do this with backoff, yield, etc. Later is in the next chapter on spin locks
139
A Non-blocking Dependent Condition
A method implementation is obstruction-free if it guarantees maximal progress in every uniformly isolating history. The restriction to fair histories captures the informal requirement that each thread eventually leaves its critical section. The de¯nition does not mention locks or criti- cal sections because progress should be de¯ned in terms of completed method calls, not low-level mechanisms. Moreover, as noted, not all deadlock-free object imple- mentations will have easily recognizable locks and critical sections. The requirement that the implementation provide maximal progress in some fair history is intended to rule out certain pathological cases. For example, the ¯rst thread to access an object might lock it and never release the lock. Such an imple- mentation guarantees minimal progress (for the thread holding the lock) in every fair execution, but does not provide maximal progress in any execution. Clearly, such an implementation would not be considered acceptable in practice and is of no interest to us.
140
The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Uniform iso scheduler Maximal progress Wait- free Fair scheduler Obstruction- free Starvation- free In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Fair scheduler Minimal progress Lock- free Deadlock- free Independent Dependent
141
The “Periodic Table” of Progress Conditions
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free Independent Dependent
142
Clash-Freedom: the “Einsteinium” of Progress
A method implementation is clash-free if it guarantees minimal progress in every uniformly isolating history. Thm: clash-freedom strictly weaker than obstruction-freedom Like Einsteinium, symbol Es, atomic number 99, it does not occur naturally in any measurable quantities and has no commercial value. In the full paper we will show that being clash-free is strictly weaker than being obstruction-free, a result omitted from this extended abstract for lack of space. Clash-freedom thus answers the open question raised by Herlihy, Luchangco, and Moir [6], whether obstruction-freedom is the weakest natural non-blocking progress condition. Unlike Einsteinium is not radioactive but like it has of no commercial importance…
143
Getting from Minimal to Maximal
Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free ? But helping is expensive In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free Helping Independent Dependent
144
Universal Constructions
Lock-free universal construction provides minimal progress A scheduler is benevolent if it guarantees maximal progress anyway Real-world OS schedulers are benevolent mostly They do not persecute any individual thread
145
Getting from Minimal to Maximal
Universal Lock-free Construction Getting from Minimal to Maximal Universal Wait-free Construction Non-Blocking Blocking Maximal progress Wait- free Obstruction- free Starvation- free ? For a one time object like consensus where each thread executes a method once, wait-free and lock-free are the same… In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free Helping Use Wait-free/Lock-free Consensus Objects Independent Dependent
146
Getting from Minimal to Maximal
Universal Wait-free Construction Non-Blocking Blocking Universal Lock-free Construction Maximal progress Wait- free Obstruction- free Starvation- free In other words, there is no difference if we use blocking or non-blocking, they guarantee the same thing under the right scheduling assumptions. If I write a starvation-free alg I am assuming that I will get maximal progress, but that it will have to run on a machine with fair scheduling. Minimal progress Lock- free Clash- free ? Deadlock- free If we use Starvation-free/Deadlock-free Consensus Objects result is respectively Starvation-free/Deadlock-free Independent Dependent
147
Maximal Progress Postulate
Programmers want maximal progress. Methods’ progress conditions define What we expect from the scheduler For example Don’t halt in critical section Let me run in isolation long enough …
148
Art of Multiprocessor Programming
Why Lock-Free is OK We all want maximal progress Wait-free Yet we often write lock-free or deadlock-free lock-based algorithms OK if we expect the scheduler to be benevolent Often true (not always!) Art of Multiprocessor Programming
149
Shared-Memory Computability
10011 What is (and is not) concurrently computable Wait-free Atomic Registers Lock-free/Wait-free Hierarchy and Universal Constructions In the same way, we make little attempt to make our initial constructions efficient. We are interested in understanding whether such constructions exist, and how they work, but they are not intended to be a practical model for computation, so we prefer easy-to-understand but inefficient constructions over complicated but efficient ones.
150
Troubling Intellectual Question…
I think I think, therefore I think I am (Ambrose Bierce) Why use non-blocking lock-free and wait-free conditions when most code uses locks?
151
The Answer Not about being non-blocking… About being independent!
Do not rely on the good behavior of the scheduler.
152
Reads and Writes Infinite tape Finite State Controller
By Analogy to Church-Turing abanbnan 1 Reads and Writes Infinite tape Finite State Controller Using a dependent condition is like relying on an oracle to recognize languages… The dependency masks the true power of the concurrent object… The classical theory of sequential computing proceeds in stages. It starts with finite-state automata, moves on to push-down automata, and culminates in Turing Machines. A Turing machine is an idealized model of computation, consisting of a finite-state controller and an infinite tape. It is safe to say that anything you can’t do on a Turing Machine is something you can’t do period. If you can devise a Turing machine program to do something, it doesn’t mean you can do it in a practical sense, but it gives you a place to start working on the problem.
153
Shared-Memory Computability
10011 Independent progress: use Lock-free and Wait-free Memory Hierarchy and Universal Constructions In the same way, we make little attempt to make our initial constructions efficient. We are interested in understanding whether such constructions exist, and how they work, but they are not intended to be a practical model for computation, so we prefer easy-to-understand but inefficient constructions over complicated but efficient ones.
154
Programmers Expect the Best
Programmers expect maximal progress. Progress conditions define scheduler requirements necessary to achieve it.
155
This Concludes Our Course Material
Principles: Mathematical foundations of multicore programming Practice: How multicore software and the architectures it runs on embody these principles Art of Multiprocessor Programming
156
Art of Multiprocessor Programming
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. You are free: to Share — to copy, distribute and transmit the work to Remix — to adapt the work Under the following conditions: Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights. Art of Multiprocessor Programming
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.