Presentation is loading. Please wait.

Presentation is loading. Please wait.

Counting, Sorting, and Distributed Coordination

Similar presentations


Presentation on theme: "Counting, Sorting, and Distributed Coordination"— Presentation transcript:

1 Counting, Sorting, and Distributed Coordination
The Art of Multiprocessor Programming, chapter 12 by Maurice Herlihy & Nir Shavit Some slides were taken from the authors’ companion slides

2 Art of Multiprocessor Programming
Shared Counting In general, many multi-processor problems can be described as counting problems – need to assign successive addresses in memory Almost always these problems are solved with a single counter, creating a bottleneck and memory contention Art of Multiprocessor Programming

3 Art of Multiprocessor Programming
put put - Any Problems? Art of Multiprocessor Programming

4 Art of Multiprocessor Programming
Coarse Grain Lock put put - Any Problems? Art of Multiprocessor Programming

5 Art of Multiprocessor Programming
Fine Grain Lock put remove put - Any Problems? Art of Multiprocessor Programming

6 Art of Multiprocessor Programming
Now we have 2 bottlenecks instead of 1 Can we even call it parallelism? Can we do better? We face two challenges. Can we avoid memory contention, where too many threads try to access the same memory location, stressing the underlying communication network and cache coherence protocols? Can we achieve real parallelism? Is incrementing a counter an inherently sequential operation, or is it possible for $n$ threads to increment a counter in time less than it takes for one thread to increment a counter $n$ times? Art of Multiprocessor Programming

7 Art of Multiprocessor Programming
Shared Counters Is it possible for n threads to increment a counter faster than it takes one thread to increment a counter n times? We face two challenges. Can we avoid memory contention, where too many threads try to access the same memory location, stressing the underlying communication network and cache coherence protocols? Can we achieve real parallelism? Is incrementing a counter an inherently sequential operation, or is it possible for $n$ threads to increment a counter in time less than it takes for one thread to increment a counter $n$ times? Art of Multiprocessor Programming

8 Naïve Solution – Prefix Sum
Art of Multiprocessor Programming

9 Art of Multiprocessor Programming
Combining Tree Here, the counter is zero, and the blue and yellow processors are about to add to the counter. Art of Multiprocessor Programming

10 Art of Multiprocessor Programming
Combining Tree +3 Blue registers it’s request to add three at its leaf. Art of Multiprocessor Programming

11 Art of Multiprocessor Programming
Combining Tree +3 +2 At about the same time, yellow adds two to the clunters Art of Multiprocessor Programming

12 Two threads meet, combine sums
Combining Tree Two threads meet, combine sums +3 +2 These two requests meet at the shared node, and are combined .. Art of Multiprocessor Programming

13 Two threads meet, combine sums
Combining Tree +5 Two threads meet, combine sums +3 +2 the blue three and the yellow two are combined into a green five Art of Multiprocessor Programming

14 Combined sum added to root
Combining Tree 5 Combined sum added to root +5 +3 +2 These combined requests are added to the root … Art of Multiprocessor Programming

15 Result returned to children
Combining Tree Result returned to children 5 +3 +2 Art of Multiprocessor Programming

16 Results returned to threads
Combining Tree 5 Results returned to threads Each node remembers enough bookkeeping information to distribute correct results among the callers. 3 Art of Multiprocessor Programming

17 Art of Multiprocessor Programming
Combining Tree Binary tree of nodes Each thread is assigned to a leaf At most, 2 threads share a leaf Our counter resides in the root Art of Multiprocessor Programming

18 Art of Multiprocessor Programming
getAndIncrement A thread starts at its leaf and works its way up to the root If 2 threads reach a node at the same time, only one thread will propagate their combined increments The other thread waits for the active thread to signal for completion Art of Multiprocessor Programming

19 Art of Multiprocessor Programming
Node Manages each visit to a node state – {IDLE, FIRST, SECOND, RESULT, ROOT} firstValue secondValue result lock Art of Multiprocessor Programming

20 Node – methods (Java synchronized)
Precombine Combine Op Distribution Art of Multiprocessor Programming

21 Node – methods (Java synchronized)
Precombine – Am I the first thread to arrive? The second? The third? Maybe I’m at the root node? Should I be the active or passive thread? Art of Multiprocessor Programming

22 Node – methods (Java synchronized)
Combine – Revisit the node Was I the only thread to visit that node? If not, did the other thread added its value? Art of Multiprocessor Programming

23 Node – methods (Java synchronized)
Op – The actual operation, At that point the thread gives its value for further handling When at the root, add to the counter Art of Multiprocessor Programming

24 Node – methods (Java synchronized)
Distribution – Active thread – go down the tree, were there any passive threads on the way waiting for me? Passive thread – not much to distribute Art of Multiprocessor Programming

25 Art of Multiprocessor Programming
state ROOT locked result state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

26 Art of Multiprocessor Programming
state ROOT locked result I’m the first one to visit this node, I’m the active thread state IDLE locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

27 Art of Multiprocessor Programming
state ROOT locked result First one again! state FIRST locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

28 Art of Multiprocessor Programming
state ROOT locked result OK, I’m at the root state FIRST locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

29 Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state FIRST locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

30 Art of Multiprocessor Programming
state ROOT locked result I’m the second visitor, will lock this node until I’ll add it my value, so the other thread won’t go up without it state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

31 Art of Multiprocessor Programming
state ROOT locked result No point in going up, The first thread will carry my value from this point state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

32 Art of Multiprocessor Programming
state ROOT locked result But I better check the lower nodes, I’ll might have to do some work state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

33 Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Nope, I’m covered Art of Multiprocessor Programming

34 Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Still need to add my value though Art of Multiprocessor Programming

35 Art of Multiprocessor Programming
state ROOT locked result Let’s see if other threads have joined me state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

36 Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

37 Art of Multiprocessor Programming
state ROOT locked result There’s another thread! This node is locked, better wait for the other thread state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

38 Art of Multiprocessor Programming
state ROOT locked result state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

39 Art of Multiprocessor Programming
state ROOT locked result OK, I added my value, better notify the other thread state FIRST locked firstValue secondValue state SECOND locked firstValue secondValue 1 state IDLE locked firstValue secondValue Art of Multiprocessor Programming

40 Art of Multiprocessor Programming
state ROOT locked result Can now go up with the combined values state FIRST locked firstValue secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

41 Art of Multiprocessor Programming
state ROOT locked result I was the only one here, Going up state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

42 Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Art of Multiprocessor Programming

43 Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state IDLE locked firstValue secondValue Hi! Art of Multiprocessor Programming

44 Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue I’m the first one to visit this node, Cool! state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming

45 Art of Multiprocessor Programming
state ROOT locked result 2 Node locked, I’ve missed my chance state FIRST locked firstValue 2 secondValue state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming

46 Art of Multiprocessor Programming
state ROOT locked result 2 state FIRST locked firstValue 2 secondValue Go down the tree and distribute values to waiting threads state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming

47 Art of Multiprocessor Programming
state ROOT locked result 2 No one’s waiting here, can set the node free state IDLE locked firstValue secondValue state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming

48 Art of Multiprocessor Programming
state ROOT locked result 2 I incremented from 0 to 1, make the second thread think it incremented from 1 to 2, and notify I finished state IDLE locked firstValue secondValue state SECOND locked firstValue 1 secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming

49 Art of Multiprocessor Programming
state ROOT locked result 2 state IDLE locked firstValue secondValue state RESULT locked firstValue result 1 state FIRST locked firstValue secondValue Got 0 Art of Multiprocessor Programming

50 Art of Multiprocessor Programming
state ROOT locked result 2 state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue state FIRST locked firstValue secondValue Got 0 Got 1 Art of Multiprocessor Programming

51 Art of Multiprocessor Programming
state ROOT locked result 2 Can go on now state IDLE locked firstValue secondValue state IDLE locked firstValue secondValue state FIRST locked firstValue secondValue Art of Multiprocessor Programming

52 Art of Multiprocessor Programming
public int getAndIncrement() { Stack<Node> stack = new Stack<Node>(); Node myLeaf = leaves[ThreadID.get() / 2]; Node node = myLeaf; // precombining phase while (node.precombine()) { node = node.parent; } Node stop = node; Art of Multiprocessor Programming

53 Art of Multiprocessor Programming
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; } } Art of Multiprocessor Programming

54 Art of Multiprocessor Programming
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; } } Art of Multiprocessor Programming

55 Art of Multiprocessor Programming
synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; } } Art of Multiprocessor Programming

56 Art of Multiprocessor Programming
Node stop = node; // combining phase - revisit the nodes node = myLeaf; int combined = 1; // go through all the nodes this thread was the first // one to arrive while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Art of Multiprocessor Programming

57 Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming

58 Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming

59 Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming

60 Art of Multiprocessor Programming
synchronized int combine(int combined) { // first thread can’t leave without second thread while (locked) wait(); locked = true; // set long term lock firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; } Art of Multiprocessor Programming

61 Art of Multiprocessor Programming
// operation phase int prior = stop.op(combined); // distribution phase while (!stack.empty()) { node.stack.pop(); node.distribute(prior); } return prior; Art of Multiprocessor Programming

62 Art of Multiprocessor Programming
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; // return the old value return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; } Art of Multiprocessor Programming

63 Art of Multiprocessor Programming
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; // return the old value return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; } Art of Multiprocessor Programming

64 Art of Multiprocessor Programming
synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; // return the old value return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; } Art of Multiprocessor Programming

65 Art of Multiprocessor Programming
synchronized void distribute(int prior) { switch (cStatus) { case FIRST: // no other thread was waiting on that node cStatus = CStatus.IDLE; locked = false; break; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; } notifyAll(); Art of Multiprocessor Programming

66 Art of Multiprocessor Programming
synchronized void distribute(int prior) { switch (cStatus) { case FIRST: // no other thread was waiting on that node cStatus = CStatus.IDLE; locked = false; break; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; } notifyAll(); Second thread is adding its value to the node, release the lock and notify the first thread. Question to ask – why there’s no FIRST case? Does is possible that we get inside that method for a node who’s state is FIRST? stop node – thread stops on its way up either if it got to the root, or if it’s the second thread to visit a node Art of Multiprocessor Programming

67 Art of Multiprocessor Programming
Performance Latency? Throughput? Latency is O(logn( comparing to O(1) Throughput is O(n/log(n)) at most, can be much worse though Art of Multiprocessor Programming

68 Art of Multiprocessor Programming
Performance Thread can arrive late at locked node, missing the chance to combine In practice, the higher the contention, the grater the observed rate of combining Better rate achieved when an arriving request waits a reasonable time for another request to arrive When contention is higher, wait more, it might paid off Dynamic waiting time increases robustness And forced to wait for the earlier request to ascend and descend the tree Art of Multiprocessor Programming

69 Art of Multiprocessor Programming
Memory model? Linearizable Art of Multiprocessor Programming

70 Art of Multiprocessor Programming
Counting Networks Art of Multiprocessor Programming

71 Art of Multiprocessor Programming
Networks That Count Higher Throughput But with a cost of a weaker memory model – Quiescently Consistent As long as we can guarantee no duplications and no omissions We saw in combining trees that if requests do not arrive together, the algorithm does not work efficiently counting networks offer higher throughput in a price of a weaker memory model Art of Multiprocessor Programming

72 Art of Multiprocessor Programming
A Balancer Input wires Output wires Art of Multiprocessor Programming

73 Tokens Traverse Balancers
Token i enters on any wire leaves on wire i mod (fan-out) Art of Multiprocessor Programming

74 Tokens Traverse Balancers
Art of Multiprocessor Programming

75 Tokens Traverse Balancers
Art of Multiprocessor Programming

76 Tokens Traverse Balancers
Art of Multiprocessor Programming

77 Tokens Traverse Balancers
Art of Multiprocessor Programming

78 Tokens Traverse Balancers
Arbitrary input distribution Balanced output distribution Art of Multiprocessor Programming

79 Art of Multiprocessor Programming
Balancing Network Art of Multiprocessor Programming

80 Art of Multiprocessor Programming
Balancing Network Art of Multiprocessor Programming

81 Art of Multiprocessor Programming
The Step Property The output distribution is balanced across all output wires The top output wires are filled first Any Balancing Network that satisfies the step property is called a Counting Network Also called Bitonic[K] Art of Multiprocessor Programming

82 Art of Multiprocessor Programming
Counting Network! Art of Multiprocessor Programming

83 Bitonic[k] is not Linearizable
Art of Multiprocessor Programming

84 Bitonic[k] is not Linearizable
Art of Multiprocessor Programming

85 Bitonic[k] is not Linearizable
2 Art of Multiprocessor Programming

86 Bitonic[k] is not Linearizable
2 Art of Multiprocessor Programming

87 Bitonic[k] is not Linearizable
Problem is: Red finished before Yellow started Red took 2 Yellow took 0 2 Art of Multiprocessor Programming

88 Art of Multiprocessor Programming
class Balancer { boolean toggle; Balancer[] next; // output wires synchronized boolean flip() { boolean oldValue = this.toggle; this.toggle = !this.toggle; return oldValue; } Art of Multiprocessor Programming

89 Art of Multiprocessor Programming
Balancer traverse (Balancer b) { while(!b.isLeaf()) { boolean toggle = b.flip(); if (toggle) b = b.next[0]; else b = b.next[1]; } return b; Art of Multiprocessor Programming

90 Art of Multiprocessor Programming
Scaling It’s all fun and games, but what happens when we have more than 4 input wires? Can we scale without breaking the Step Property? Yes we can! Art of Multiprocessor Programming

91 Art of Multiprocessor Programming
Bitonic[2k] Schematic Bitonic[k] Merger[2k] Bitonic[k] Art of Multiprocessor Programming

92 Art of Multiprocessor Programming
Merger[2k] Schematic even Merger[k] Bitonic[k] odd Bitonic[k] odd Merger[k] even Art of Multiprocessor Programming

93 Art of Multiprocessor Programming
If a sequence has the step property … Art of Multiprocessor Programming

94 Art of Multiprocessor Programming
So does its even subsequence Art of Multiprocessor Programming

95 Art of Multiprocessor Programming
And its odd subsequence Art of Multiprocessor Programming

96 Art of Multiprocessor Programming
Bitonic[K] depth? d(Bitonic[2]) = 1 d(Merger[2]) = 1 d(Bitonic[K]) = O(log2(K)) Art of Multiprocessor Programming

97 Performance Combining Trees Counting Networks Latency Throughput
O(log(n)) O(Network depth)= O(log2(n)) Throughput O(n / log(n)) at most O(Network width)= O(n) Art of Multiprocessor Programming

98 Art of Multiprocessor Programming
A Comparator Input wires Output wires Only now, lower value goes up, higher down Art of Multiprocessor Programming

99 Art of Multiprocessor Programming

100 Art of Multiprocessor Programming
Sorting Networks Art of Multiprocessor Programming

101 Art of Multiprocessor Programming
Networks That Sort We can recycle counting network layouts Counting networks and sorting networks are isomorphic If a balancing network can count, then its isomorphic comparison network sorts Art of Multiprocessor Programming

102 Art of Multiprocessor Programming
Performance How long does it take n threads to sort n elements? O(log2(n)) Art of Multiprocessor Programming

103 Art of Multiprocessor Programming
In conclusion We can parallelize counting We can sort at O(log2(n)) time When we have the resources…. Requires a different mindset Art of Multiprocessor Programming


Download ppt "Counting, Sorting, and Distributed Coordination"

Similar presentations


Ads by Google