Presentation is loading. Please wait.

Presentation is loading. Please wait.

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Similar presentations


Presentation on theme: "Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."— Presentation transcript:

1 Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

2 Art of Multiprocessor Programming 2 Shared Counter 3 2 1 0 1 2 3

3 Art of Multiprocessor Programming 3 Shared Counter 3 2 1 0 1 2 3 No duplication

4 Art of Multiprocessor Programming 4 Shared Counter 3 2 1 0 1 2 3 No duplication No Omission

5 Art of Multiprocessor Programming 5 Shared Counter 3 2 1 0 1 2 3 Not necessarily linearizable No duplication No Omission

6 Naïve Counter Implementation 3 4 6 5 2 1 FAI Last processes to succeed incur θ(n) time complexity! FAI Can we do much better? FAI object

7 Art of Multiprocessor Programming 7 Shared Counters Can we build a shared counter with –Low memory contention, and –Real parallelism?

8 Art of Multiprocessor Programming 8 Software Combining Tree 4 Contention: All spinning local Parallelism: Potential n/log n speedup

9 Art of Multiprocessor Programming 9 Combining Trees 0

10 Art of Multiprocessor Programming 10 Combining Trees 0 +3

11 Art of Multiprocessor Programming 11 Combining Trees 0 +3 +2

12 Art of Multiprocessor Programming 12 Combining Trees 0 +3 +2 Two threads meet, combine sums

13 Art of Multiprocessor Programming 13 Combining Trees 0 +3 +2 Two threads meet, combine sums +5

14 Art of Multiprocessor Programming 14 Combining Trees 5 +3 +2 +5 Combined sum added to root

15 Art of Multiprocessor Programming 15 Combining Trees 5 +3 +2 0 Result returned to children

16 Art of Multiprocessor Programming 16 Combining Trees 5 0 0 3 0 Results returned to threads

17 Art of Multiprocessor Programming 17 What if? Threads don’t arrive together? –Should I stay or should I go? How long to wait? –Waiting times add up … Idea: –Use multi-phase algorithm –Where threads wait in parallel …

18 Art of Multiprocessor Programming 18 Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT };

19 enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 19 Combining Status Nothing going on

20 enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 20 Combining Status 1 st thread is a partner for combining, will return to check for 2 nd thread

21 enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 21 Combining Status 2 nd thread has arrived with value for combining

22 enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 22 Combining Status 1 st thread has deposited result for 2 nd thread

23 enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming 23 Combining Status Special case: root node

24 Art of Multiprocessor Programming 24 Node Synchronization Short-term –Synchronized methods –Consistency during method call Long-term –Boolean locked field –Consistency across calls

25 Art of Multiprocessor Programming 25 Phases Precombining –Set up combining rendez-vous

26 Art of Multiprocessor Programming 26 Phases Precombining –Set up combining rendez-vous Combining –Collect and combine operations

27 Art of Multiprocessor Programming 27 Phases Precombining –Set up combining rendez-vous Combining –Collect and combine operations Operation –Hand off to higher thread

28 Art of Multiprocessor Programming 28 Phases Precombining –Set up combining rendez-vous Combining –Collect and combine operations Operation –Hand off to higher thread Distribution –Distribute results to waiting threads

29 Art of Multiprocessor Programming 29 High-level operation structure

30 Art of Multiprocessor Programming 30 Precombining Phase 0 Examine status IDLE

31 Art of Multiprocessor Programming 31 Precombining Phase 0 0 If IDLE, promise to return to look for partner FIRST

32 Art of Multiprocessor Programming 32 Precombining Phase At ROOT,turn back FIRST 0

33 Art of Multiprocessor Programming 33 Precombining Phase 0 FIRST

34 Art of Multiprocessor Programming 34 Precombining Phase 0 0 SECOND If FIRST, I’m willing to combine, but lock for now

35 Art of Multiprocessor Programming 35 Code Tree class –In charge of navigation Node class –Combining state –Synchronization state –Bookkeeping

36 Art of Multiprocessor Programming 36 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node;

37 Art of Multiprocessor Programming 37 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Start at leaf

38 Art of Multiprocessor Programming 38 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Move up while instructed to do so

39 Art of Multiprocessor Programming 39 Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Remember where we stopped

40 Art of Multiprocessor Programming 40 Precombining Node synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() }

41 Art of Multiprocessor Programming 41 synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Precombining Node Short-term synchronization

42 Art of Multiprocessor Programming 42 synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Synchronization Wait while node is locked (in use by earlier combining phase)

43 Art of Multiprocessor Programming 43 synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Precombining Node Check combining status

44 Art of Multiprocessor Programming 44 Node was IDLE synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } I will return to look for 2 nd thread’s input value

45 Art of Multiprocessor Programming 45 Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Continue up the tree

46 Art of Multiprocessor Programming 46 I’m the 2 nd Thread synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } If 1 st thread has promised to return, lock node so it won’t leave without me

47 Art of Multiprocessor Programming 47 Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Prepare to deposit 2 nd thread’s input value

48 Art of Multiprocessor Programming 48 Precombining Node synchronized boolean phase1() { while (sStatus==SStatus.BUSY) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } End of precombining phase, don’t continue up tree

49 Art of Multiprocessor Programming 49 Node is the Root synchronized boolean phase1() { while (sStatus==SStatus.BUSY) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } If root, precombining phase ends, don’t continue up tree

50 Art of Multiprocessor Programming 50 Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Always check for unexpected values!

51 Art of Multiprocessor Programming 51 Combining Phase 0 0 SECOND 1 st thread locked out until 2 nd provides value +3

52 Art of Multiprocessor Programming 52 Combining Phase 0 0 SECOND 2 nd thread deposits value to be combined, unlocks node, & waits … 2 +3 zzz

53 Art of Multiprocessor Programming 53 Combining Phase +3 +2 +5 SECOND 2 0 1 st thread moves up the tree with combined value … zzz

54 Art of Multiprocessor Programming 54 Combining (reloaded) 0 0 2 nd thread has not yet deposited value … FIRST

55 Art of Multiprocessor Programming 55 Combining (reloaded) 0 +3 FIRST 1 st thread is alone, locks out late partner

56 Art of Multiprocessor Programming 56 Combining (reloaded) 0 +3 FIRST Stop at root

57 Art of Multiprocessor Programming 57 Combining (reloaded) 0 +3 FIRST 2 nd thread’s late precombining phase visit locked out

58 Art of Multiprocessor Programming 58 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; }

59 Art of Multiprocessor Programming 59 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Start at leaf

60 Art of Multiprocessor Programming 60 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Add 1

61 Art of Multiprocessor Programming 61 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Revisit nodes visited in precombining

62 Art of Multiprocessor Programming 62 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Accumulate combined values, if any

63 Art of Multiprocessor Programming 63 node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Combining Navigation We will retraverse path in reverse order …

64 Art of Multiprocessor Programming 64 Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Move up the tree

65 Art of Multiprocessor Programming 65 Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … }

66 Art of Multiprocessor Programming 66 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Combining Phase Node Wait until node is unlocked. It is locked by the 2 nd thread until it deposits its value

67 Art of Multiprocessor Programming 67 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Combining Phase Node Why is it that no thread acquires the lock between the two lines?

68 Art of Multiprocessor Programming 68 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Combining Phase Node Lock out late attempts to combine (by threads still in precombining)

69 Art of Multiprocessor Programming 69 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Combining Phase Node Remember my (1 st thread) contribution

70 Art of Multiprocessor Programming 70 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Combining Phase Node Check status

71 Art of Multiprocessor Programming 71 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Combining Phase Node I (1 st thread) am alone

72 Art of Multiprocessor Programming 72 synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Combining Node Not alone: combine with 2 nd thread

73 Art of Multiprocessor Programming 73 Operation Phase 5 +3 +2 +5 Add combined value to root, start back down zzz

74 Art of Multiprocessor Programming 74 Operation Phase (reloaded) 5 Leave value to be combined … SECOND 2

75 Art of Multiprocessor Programming 75 Operation Phase (reloaded) 5 +2 Unlock, and wait … SECOND 2 zzz

76 Art of Multiprocessor Programming 76 Operation Phase Navigation prior = stop.op(combined);

77 Art of Multiprocessor Programming 77 Operation Phase Navigation prior = stop.op(combined); The node where we stopped. Provide collected sum and wait for combining result

78 Art of Multiprocessor Programming 78 Operation on Stopped Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: …

79 Art of Multiprocessor Programming 79 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Op States of Stop Node Only ROOT and SECOND possible. Why?

80 Art of Multiprocessor Programming 80 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … At Root Add sum to root, return prior value

81 Art of Multiprocessor Programming 81 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Deposit value for later combining …

82 Art of Multiprocessor Programming 82 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Unlock node (which I locked in precombining). Then notify 1 st thread

83 Art of Multiprocessor Programming 83 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Wait for 1 st thread to deliver results

84 Art of Multiprocessor Programming 84 synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); locked = false; notifyAll(); cStatus = CStatus.IDLE; return result; default: … Intermediate Node Unlock node (locked by 1 st thread in combining phase) & return

85 Art of Multiprocessor Programming 85 Distribution Phase 5 0 zzz Move down with result SECOND

86 Art of Multiprocessor Programming 86 Distribution Phase 5 zzz Leave result for 2 nd thread & lock node SECOND 3

87 Art of Multiprocessor Programming 87 Distribution Phase 5 0 zzz Push result down tree SECOND 3

88 Art of Multiprocessor Programming 88 Distribution Phase 5 2 nd thread awakens, unlocks, takes value IDLE 3

89 Art of Multiprocessor Programming 89 Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior;

90 Art of Multiprocessor Programming 90 Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior; Traverse path in reverse order

91 Art of Multiprocessor Programming 91 Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior; Distribute results to waiting 2 nd threads

92 Art of Multiprocessor Programming 92 Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior; Return result to caller

93 Art of Multiprocessor Programming 93 Distribution Phase synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); return; default: …

94 Art of Multiprocessor Programming 94 Distribution Phase synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); return; default: … No 2 nd thread to combine with me, unlock node & reset

95 Art of Multiprocessor Programming 95 Distribution Phase synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); return; default: … Notify 2 nd thread that result is available (2 nd thread will release lock)

96 Art of Multiprocessor Programming 96 Bad News: High Latency +3 +2 +5 Log n

97 Art of Multiprocessor Programming 97 Good News: Real Parallelism +3 +2 +5 2 threads 1 thread

98 Art of Multiprocessor Programming 98 Throughput Puzzles Ideal circumstances –All n threads move together, combine –n increments in O(log n) time Worst circumstances –All n threads slightly skewed, locked out –n increments in O(n · log n) time

99 Art of Multiprocessor Programming 99 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}

100 Art of Multiprocessor Programming 100 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} How many iterations

101 Art of Multiprocessor Programming 101 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Expected time between incrementing counter

102 Art of Multiprocessor Programming 102 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Take a number

103 Art of Multiprocessor Programming 103 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Pretend to work (more work, less concurrency)

104 Performance Here are some fake graphs –Distilled from real ones

105 Performance Here are some fake graphs –Distilled from real ones

106 Performance Here are some fake graphs –Distilled from real ones Throughput –Average incs in 1 million cycles

107 Performance Here are some fake graphs –Distilled from real ones Throughput –Average incs in 1 million cycles Latency –Average cycles per inc

108 108 Latency Combining tree Spin lock good bad Number of processors

109 109 Throughput Combining tree Spin lock good bad Number of processors Combining tree Spin lock

110 110 Combining Rate vs Work


Download ppt "Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit."

Similar presentations


Ads by Google