Download presentation
Presentation is loading. Please wait.
Published byNicholas Newton Modified over 6 years ago
1
Counting, Sorting, and Distributed Coordination
The Art of Multiprocessor Programming, chapter 12 by Maurice Herlihy & Nir Shavit Some slides were taken from the authors’ companion slides
2
Art of Multiprocessor Programming
Shared Counting In general, many multi-processor problems can be described as counting problems – need to assign successive addresses in memory Almost always these problems are solved with a single counter, creating a bottleneck and memory contention Art of Multiprocessor Programming
3
Art of Multiprocessor Programming
put put - Any Problems? Art of Multiprocessor Programming
4
Art of Multiprocessor Programming
Coarse Grain Lock put put - Any Problems? Art of Multiprocessor Programming
5
Art of Multiprocessor Programming
Fine Grain Lock put remove put - Any Problems? Art of Multiprocessor Programming
6
Art of Multiprocessor Programming
Now we have 2 bottlenecks instead of 1 Can we even call it parallelism? Can we do better? We face two challenges. Can we avoid memory contention, where too many threads try to access the same memory location, stressing the underlying communication network and cache coherence protocols? Can we achieve real parallelism? Is incrementing a counter an inherently sequential operation, or is it possible for $n$ threads to increment a counter in time less than it takes for one thread to increment a counter $n$ times? Art of Multiprocessor Programming
7
Art of Multiprocessor Programming
Shared Counters Is it possible for n threads to increment a counter faster than it takes one thread to increment a counter n times? We face two challenges. Can we avoid memory contention, where too many threads try to access the same memory location, stressing the underlying communication network and cache coherence protocols? Can we achieve real parallelism? Is incrementing a counter an inherently sequential operation, or is it possible for $n$ threads to increment a counter in time less than it takes for one thread to increment a counter $n$ times? Art of Multiprocessor Programming
8
Naïve Solution – Prefix Sum
Art of Multiprocessor Programming
9
Art of Multiprocessor Programming
Combining Tree Here, the counter is zero, and the blue and yellow processors are about to add to the counter. Art of Multiprocessor Programming
10
Art of Multiprocessor Programming
Combining Tree +3 Blue registers it’s request to add three at its leaf. Art of Multiprocessor Programming
11
Art of Multiprocessor Programming
Combining Tree +3 +2 At about the same time, yellow adds two to the clunters Art of Multiprocessor Programming
12
Two threads meet, combine sums
Combining Tree Two threads meet, combine sums +3 +2 These two requests meet at the shared node, and are combined .. Art of Multiprocessor Programming
13
Two threads meet, combine sums
Combining Tree +5 Two threads meet, combine sums +3 +2 the blue three and the yellow two are combined into a green five Art of Multiprocessor Programming
14
Combined sum added to root
Combining Tree 5 Combined sum added to root +5 +3 +2 These combined requests are added to the root … Art of Multiprocessor Programming
15
Result returned to children
Combining Tree Result returned to children 5 +3 +2 Art of Multiprocessor Programming
16
Results returned to threads
Combining Tree 5 Results returned to threads Each node remembers enough bookkeeping information to distribute correct results among the callers. 3 Art of Multiprocessor Programming
17
Art of Multiprocessor Programming
Combining Tree Binary tree of nodes Each thread is assigned to a leaf At most, 2 threads share a leaf Our counter resides in the root Art of Multiprocessor Programming
18
Art of Multiprocessor Programming
getAndIncrement A thread starts at its leaf and works its way up to the root If 2 threads reach a node at the same time, only one thread will propagate their combined increments The other thread waits for the active thread to signal for completion Art of Multiprocessor Programming
19
Art of Multiprocessor Programming
Node Manages each visit to a node state – {IDLE, FIRST, SECOND, RESULT, ROOT} firstValue secondValue result lock Art of Multiprocessor Programming
20
Node – methods (Java synchronized)
Precombine Combine Op Distribution Art of Multiprocessor Programming
21
Node – methods (Java synchronized)
Precombine – Am I the first thread to arrive? The second? The third? Maybe I’m at the root node? Should I be the active or passive thread? Art of Multiprocessor Programming
22
Node – methods (Java synchronized)
Combine – Revisit the node Was I the only thread to visit that node? If not, did the other thread added its value? Art of Multiprocessor Programming
23
Node – methods (Java synchronized)
Op – The actual operation, At that point the thread gives its value for further handling When at the root, add to the counter Art of Multiprocessor Programming
24
Node – methods (Java synchronized)
Distribution – Active thread – go down the tree, were there any passive threads on the way waiting for me? Passive thread – not much to distribute Art of Multiprocessor Programming
25
Art of Multiprocessor Programming
state ROOT locked result state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
26
Art of Multiprocessor Programming
state ROOT locked result I’m the first one to visit this node, I’m the active thread state IDLE locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
27
Art of Multiprocessor Programming
state ROOT locked result First one again! state FIRST locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
28
Art of Multiprocessor Programming
state ROOT locked result OK, I’m at the root state FIRST locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
29
Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
30
Art of Multiprocessor Programming
state ROOT locked result I’m the second visitor, will lock this node until I’ll add it my value, so the other thread won’t go up without it state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
31
Art of Multiprocessor Programming
state ROOT locked result No point in going up, The first thread will carry my value from this point state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
32
Art of Multiprocessor Programming
state ROOT locked result But I better check the lower nodes, I’ll might have to do some work state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
33
Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Nope, I’m covered Art of Multiprocessor Programming
34
Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Still need to add my value though Art of Multiprocessor Programming
35
Art of Multiprocessor Programming
state ROOT locked result Let’s see if other threads have joined me state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
36
Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
37
Art of Multiprocessor Programming
state ROOT locked result There’s another thread! This node is locked, better wait for the other thread state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
38
Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
39
Art of Multiprocessor Programming
state ROOT locked result OK, I added my value, better notify the other thread state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue 1 state IDLE locked firstValue secondValue Art of Multiprocessor Programming
40
Art of Multiprocessor Programming
state ROOT locked result Can now go up with the combined values state FIRST locked firstValue secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
41
Art of Multiprocessor Programming
state ROOT locked result I was the only one here, Going up state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
42
Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming
43
Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Hi! Art of Multiprocessor Programming
44
Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue I’m the first one to visit this node, Cool! state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming
45
Art of Multiprocessor Programming
state ROOT locked result 2 Node locked, I’ve missed my chance state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming
46
Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue Go down the tree and distribute values to waiting threads state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming
47
Art of Multiprocessor Programming
state ROOT locked result 2 No one’s waiting here, can set the node free state IDLE locked firstValue secondValue state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming
48
Art of Multiprocessor Programming
state ROOT locked result 2 I incremented from 0 to 1, make the second thread think it incremented from 1 to 2, and notify I finished state IDLE locked firstValue secondValue state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming
49
Art of Multiprocessor Programming
state ROOT locked result 2 state IDLE locked firstValue secondValue state RESULT locked firstValue result 1 state FIRST locked firstValue secondValue Got 0 Art of Multiprocessor Programming
50
Art of Multiprocessor Programming
state ROOT locked result 2 state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue state FIRST locked firstValue secondValue Got 0 Got 1 Art of Multiprocessor Programming
51
Art of Multiprocessor Programming
state ROOT locked result 2 Can go on now state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming
52
Art of Multiprocessor Programming
public int getAndIncrement() { Stack<Node> stack = new Stack<Node>(); Node myLeaf = leaves[ThreadID.get() / 2]; Node node = myLeaf; // precombining phase while (node.precombine()) { node = node.parent; } Node stop = node; Art of Multiprocessor Programming
53
Art of Multiprocessor Programming
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; } } Art of Multiprocessor Programming
54
Art of Multiprocessor Programming
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; } } Art of Multiprocessor Programming
55
Art of Multiprocessor Programming
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; } } Art of Multiprocessor Programming
56
Art of Multiprocessor Programming
Node stop = node; // combining phase - revisit the nodes node = myLeaf; int combined = 1; // go through all the nodes this thread was the first // one to arrive while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Art of Multiprocessor Programming
57
Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming
58
Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming
59
Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming
60
Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming
61
Art of Multiprocessor Programming
// operation phase int prior = stop.op(combined); // distribution phase while (!stack.empty()) { node.stack.pop(); node.distribute(prior); } return prior; Art of Multiprocessor Programming
62
Art of Multiprocessor Programming
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; // return the old value return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; } Art of Multiprocessor Programming
63
Art of Multiprocessor Programming
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; // return the old value return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; } Art of Multiprocessor Programming
64
Art of Multiprocessor Programming
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; // return the old value return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; } Art of Multiprocessor Programming
65
Art of Multiprocessor Programming
synchronized void distribute(int prior) { switch (cStatus) { case FIRST: // no other thread was waiting on that node cStatus = CStatus.IDLE; locked = false; break; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; } notifyAll(); Art of Multiprocessor Programming
66
Art of Multiprocessor Programming
synchronized void distribute(int prior) { switch (cStatus) { case FIRST: // no other thread was waiting on that node cStatus = CStatus.IDLE; locked = false; break; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; } notifyAll(); Second thread is adding its value to the node, release the lock and notify the first thread. Question to ask – why there’s no FIRST case? Does is possible that we get inside that method for a node who’s state is FIRST? stop node – thread stops on its way up either if it got to the root, or if it’s the second thread to visit a node Art of Multiprocessor Programming
67
Art of Multiprocessor Programming
Performance Latency? Throughput? Latency is O(logn( comparing to O(1) Throughput is O(n/log(n)) at most, can be much worse though Art of Multiprocessor Programming
68
Art of Multiprocessor Programming
Performance Thread can arrive late at locked node, missing the chance to combine In practice, the higher the contention, the grater the observed rate of combining Better rate achieved when an arriving request waits a reasonable time for another request to arrive When contention is higher, wait more, it might paid off Dynamic waiting time increases robustness And forced to wait for the earlier request to ascend and descend the tree Art of Multiprocessor Programming
69
Art of Multiprocessor Programming
Memory model? Linearizable Art of Multiprocessor Programming
70
Art of Multiprocessor Programming
Counting Networks Art of Multiprocessor Programming
71
Art of Multiprocessor Programming
Networks That Count Higher Throughput But with a cost of a weaker memory model – Quiescently Consistent As long as we can guarantee no duplications and no omissions We saw in combining trees that if requests do not arrive together, the algorithm does not work efficiently counting networks offer higher throughput in a price of a weaker memory model Art of Multiprocessor Programming
72
Art of Multiprocessor Programming
A Balancer Input wires Output wires Art of Multiprocessor Programming
73
Tokens Traverse Balancers
Token i enters on any wire leaves on wire i mod (fan-out) Art of Multiprocessor Programming
74
Tokens Traverse Balancers
Art of Multiprocessor Programming
75
Tokens Traverse Balancers
Art of Multiprocessor Programming
76
Tokens Traverse Balancers
Art of Multiprocessor Programming
77
Tokens Traverse Balancers
Art of Multiprocessor Programming
78
Tokens Traverse Balancers
Arbitrary input distribution Balanced output distribution Art of Multiprocessor Programming
79
Art of Multiprocessor Programming
Balancing Network Art of Multiprocessor Programming
80
Art of Multiprocessor Programming
Balancing Network Art of Multiprocessor Programming
81
Art of Multiprocessor Programming
The Step Property The output distribution is balanced across all output wires The top output wires are filled first Any Balancing Network that satisfies the step property is called a Counting Network Also called Bitonic[K] Art of Multiprocessor Programming
82
Art of Multiprocessor Programming
Counting Network! Art of Multiprocessor Programming
83
Bitonic[k] is not Linearizable
Art of Multiprocessor Programming
84
Bitonic[k] is not Linearizable
Art of Multiprocessor Programming
85
Bitonic[k] is not Linearizable
2 Art of Multiprocessor Programming
86
Bitonic[k] is not Linearizable
2 Art of Multiprocessor Programming
87
Bitonic[k] is not Linearizable
Problem is: Red finished before Yellow started Red took 2 Yellow took 0 2 Art of Multiprocessor Programming
88
Art of Multiprocessor Programming
class Balancer { boolean toggle; Balancer[] next; // output wires synchronized boolean flip() { boolean oldValue = this.toggle; this.toggle = !this.toggle; return oldValue; } Art of Multiprocessor Programming
89
Art of Multiprocessor Programming
Balancer traverse (Balancer b) { while(!b.isLeaf()) { boolean toggle = b.flip(); if (toggle) b = b.next[0]; else b = b.next[1]; } return b; Art of Multiprocessor Programming
90
Art of Multiprocessor Programming
Scaling It’s all fun and games, but what happens when we have more than 4 input wires? Can we scale without breaking the Step Property? Yes we can! Art of Multiprocessor Programming
91
Art of Multiprocessor Programming
Bitonic[2k] Schematic Bitonic[k] Merger[2k] Bitonic[k] Art of Multiprocessor Programming
92
Art of Multiprocessor Programming
Merger[2k] Schematic even Merger[k] Bitonic[k] odd Bitonic[k] odd Merger[k] even Art of Multiprocessor Programming
93
Art of Multiprocessor Programming
If a sequence has the step property … Art of Multiprocessor Programming
94
Art of Multiprocessor Programming
So does its even subsequence Art of Multiprocessor Programming
95
Art of Multiprocessor Programming
And its odd subsequence Art of Multiprocessor Programming
96
Art of Multiprocessor Programming
Bitonic[K] depth? d(Bitonic[2]) = 1 d(Merger[2]) = 1 d(Bitonic[K]) = O(log2(K)) Art of Multiprocessor Programming
97
Performance Combining Trees Counting Networks Latency Throughput
O(log(n)) O(Network depth)= O(log2(n)) Throughput O(n / log(n)) at most O(Network width)= O(n) Art of Multiprocessor Programming
98
Art of Multiprocessor Programming
A Comparator Input wires Output wires Only now, lower value goes up, higher down Art of Multiprocessor Programming
99
Art of Multiprocessor Programming
100
Art of Multiprocessor Programming
Sorting Networks Art of Multiprocessor Programming
101
Art of Multiprocessor Programming
Networks That Sort We can recycle counting network layouts Counting networks and sorting networks are isomorphic If a balancing network can count, then its isomorphic comparison network sorts Art of Multiprocessor Programming
102
Art of Multiprocessor Programming
Performance How long does it take n threads to sort n elements? O(log2(n)) Art of Multiprocessor Programming
103
Art of Multiprocessor Programming
In conclusion We can parallelize counting We can sort at O(log2(n)) time When we have the resources…. Requires a different mindset Art of Multiprocessor Programming
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.