Shared Counters and Parallelism

Slides:



Advertisements
Similar presentations
Scalable Flat-Combining Based Synchronous Queues Danny Hendler, Itai Incze, Nir Shavit and Moran Tzafrir Presentation by Uri Golani.
Advertisements

1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Synchronization without Contention
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Synchronization (Barriers) Parallel Processing (CS453)
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Counting and Distributed Coordination BASED ON CHAPTER 12 IN «THE ART OF MULTIPROCESSOR PROGRAMMING» LECTURE BY: SAMUEL AMAR.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Lecture 9 : Concurrent Data Structures Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Tutorial 2: Homework 1 and Project 1
Lecture 3: Uninformed Search
Multithreading / Concurrency
Background on the need for Synchronization
Atomic Operations in Hardware
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Atomic Operations in Hardware
Lecture 18: Coherence and Synchronization
Reactive Synchronization Algorithms for Multiprocessors
Shared Counters and Parallelism
Ivy Eva Wu.
Algorithm Analysis CSE 2011 Winter September 2018.
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Counting, Sorting, and Distributed Coordination
The University of Adelaide, School of Computer Science
Designing Parallel Algorithms (Synchronization)
Lecture 5: Snooping Protocol Design Issues
Barrier Synchronization
Lecture 21: Synchronization and Consistency
Yiannis Nikolakopoulos
Thread Synchronization
Lecture: Coherence and Synchronization
Barrier Synchronization
Implementing Mutual Exclusion
Shared Counters and Parallelism
Pools, Counters, and Parallelism
ECE 352 Digital System Fundamentals
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
CSC 143 Binary Search Trees.
CSE 153 Design of Operating Systems Winter 19
Lecture 24: Multiprocessors
Lecture: Coherence, Synchronization
Shared Counters and Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
Programming with Shared Memory Specifying parallelism
Lecture: Coherence and Synchronization
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Nir Shavit Multiprocessor Synchronization Spring 2003
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming Shared Counter 3 2 1 1 2 3 Art of Multiprocessor Programming

Art of Multiprocessor Programming Shared Counter No duplication 3 2 1 1 2 3 Well, first, we want to be sure that two threads never take the same value. Art of Multiprocessor Programming

Art of Multiprocessor Programming Shared Counter No duplication No Omission 3 2 1 1 2 3 Second, we want to make sure that the threads do not collectively skip a value. Art of Multiprocessor Programming

Naïve Counter Implementation FAI 3 1 FAI object FAI 4 2 FAI FAI 6 5 FAI FAI Last processes to succeed incur θ(n) time complexity! Can we do much better?

Art of Multiprocessor Programming Shared Counters Can we build a shared counter with Low memory contention, and Real parallelism? We face two challenges. Can we avoid memory contention, where too many threads try to access the same memory location, stressing the underlying communication network and cache coherence protocols? Can we achieve real parallelism? Is incrementing a counter an inherently sequential operation, or is it possible for $n$ threads to increment a counter in time less than it takes for one thread to increment a counter $n$ times? Art of Multiprocessor Programming

Software Combining Tree 4 Contention: All spinning local Parallelism: Potential n/log n speedup A combining tree is a tree of nodes, where each node contains bookkeeping information. The counter's value is stored at the root. Each thread is assigned a leaf, and at most two threads share a leaf. To increment the counter, a thread starts at its leaf, and works its way up the tree to the root. If two threads reach a node at approximately the same time, then they combine their increments by adding them together. One thread, the first thread, propagates their combined increments up the tree, while the other, second thread, waits for the first thread to complete their combined work.. Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Trees Here, the counter is zero, and the blue and yellow processors are about to add to the counter. Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Trees +3 Blue registers it’s request to add three at its leaf. Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Trees +3 +2 At about the same time, yellow adds two to the clunters Art of Multiprocessor Programming

Two threads meet, combine sums Combining Trees Two threads meet, combine sums +3 +2 These two requests meet at the shared node, and are combined .. Art of Multiprocessor Programming

Two threads meet, combine sums Combining Trees +5 Two threads meet, combine sums +3 +2 the blue three and the yellow two are combined into a green five Art of Multiprocessor Programming

Combined sum added to root Combining Trees 5 Combined sum added to root +5 +3 +2 These combined requests are added to the root … Art of Multiprocessor Programming

Result returned to children Combining Trees Result returned to children 5 +3 +2 Art of Multiprocessor Programming

Results returned to threads Combining Trees 5 Results returned to threads 3 Each node remembers enough bookkeeping information to distribute correct results among the callers. Art of Multiprocessor Programming

Art of Multiprocessor Programming What if? Threads don’t arrive together? Should I stay or should I go? How long to wait? Waiting times add up … Idea: Use multi-phase algorithm Where threads wait in parallel … Art of Multiprocessor Programming 16 16

Art of Multiprocessor Programming Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Nothing going on Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; 1st thread is a partner for combining, will return to check for 2nd thread Art of Multiprocessor Programming

2nd thread has arrived with value for combining Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; 2nd thread has arrived with value for combining Art of Multiprocessor Programming

1st thread has deposited result for 2nd thread Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; 1st thread has deposited result for 2nd thread Art of Multiprocessor Programming

Special case: root node Combining Status enum CStatus{ IDLE, FIRST, SECOND, RESULT, ROOT }; Special case: root node Art of Multiprocessor Programming

Art of Multiprocessor Programming Node Synchronization Short-term Synchronized methods Consistency during method call Long-term Boolean locked field Consistency across calls Art of Multiprocessor Programming

Art of Multiprocessor Programming Phases Precombining Set up combining rendez-vous Art of Multiprocessor Programming

Art of Multiprocessor Programming Phases Precombining Set up combining rendez-vous Combining Collect and combine operations Art of Multiprocessor Programming 25 25

Art of Multiprocessor Programming Phases Precombining Set up combining rendez-vous Combining Collect and combine operations Operation Hand off to higher thread Art of Multiprocessor Programming 26 26

Art of Multiprocessor Programming Phases Precombining Set up combining rendez-vous Combining Collect and combine operations Operation Hand off to higher thread Distribution Distribute results to waiting threads Art of Multiprocessor Programming 27 27

High-level operation structure Art of Multiprocessor Programming

Art of Multiprocessor Programming Precombining Phase IDLE Examine status Art of Multiprocessor Programming

If IDLE, promise to return to look for partner Precombining Phase FIRST If IDLE, promise to return to look for partner Art of Multiprocessor Programming

Art of Multiprocessor Programming Precombining Phase At ROOT,turn back FIRST Art of Multiprocessor Programming

Art of Multiprocessor Programming Precombining Phase FIRST Art of Multiprocessor Programming

If FIRST, I’m willing to combine, but lock for now Precombining Phase If FIRST, I’m willing to combine, but lock for now SECOND Art of Multiprocessor Programming

Art of Multiprocessor Programming Code Tree class In charge of navigation Node class Combining state Synchronization state Bookkeeping Art of Multiprocessor Programming

Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Art of Multiprocessor Programming

Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Start at leaf Art of Multiprocessor Programming

Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Move up while instructed to do so Art of Multiprocessor Programming

Precombining Navigation Node node = myLeaf; while (node.precombine()) { node = node.parent; } Node stop = node; Remember where we stopped Art of Multiprocessor Programming

Art of Multiprocessor Programming Precombining Node synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Art of Multiprocessor Programming

Short-term synchronization Precombining Node synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Short-term synchronization Art of Multiprocessor Programming

Wait while node is locked (in use by earlier combining phase) Synchronization synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Wait while node is locked (in use by earlier combining phase) Art of Multiprocessor Programming

Check combining status Precombining Node synchronized boolean precombine() { while (locked) wait(); switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Check combining status Art of Multiprocessor Programming

I will return to look for 2nd thread’s input value Node was IDLE synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } I will return to look for 2nd thread’s input value Art of Multiprocessor Programming

Art of Multiprocessor Programming Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Continue up the tree Art of Multiprocessor Programming

Art of Multiprocessor Programming I’m the 2nd Thread synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } If 1st thread has promised to return, lock node so it won’t leave without me Art of Multiprocessor Programming

Prepare to deposit 2nd thread’s input value Precombining Node synchronized boolean precombine() { while (locked) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Prepare to deposit 2nd thread’s input value Art of Multiprocessor Programming

End of precombining phase, don’t continue up tree Precombining Node End of precombining phase, don’t continue up tree synchronized boolean phase1() { while (sStatus==SStatus.BUSY) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Art of Multiprocessor Programming

If root, precombining phase ends, don’t continue up tree Node is the Root If root, precombining phase ends, don’t continue up tree synchronized boolean phase1() { while (sStatus==SStatus.BUSY) {wait();} switch (cStatus) { case IDLE: cStatus = CStatus.FIRST; return true; case FIRST: locked = true; cStatus = CStatus.SECOND; return false; case ROOT: return false; default: throw new PanicException() } Art of Multiprocessor Programming

1st thread locked out until 2nd provides value Combining Phase 1st thread locked out until 2nd provides value SECOND +3 Art of Multiprocessor Programming

2nd thread deposits value to be combined, unlocks node, & waits … Combining Phase 2nd thread deposits value to be combined, unlocks node, & waits … SECOND 2 +3 zzz Art of Multiprocessor Programming

1st thread moves up the tree with combined value … Combining Phase +5 1st thread moves up the tree with combined value … SECOND 2 +3 +2 zzz Art of Multiprocessor Programming

2nd thread has not yet deposited value … Combining (reloaded) FIRST 2nd thread has not yet deposited value … Art of Multiprocessor Programming

1st thread is alone, locks out late partner Combining (reloaded) 1st thread is alone, locks out late partner FIRST +3 Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining (reloaded) Stop at root +3 FIRST +3 Art of Multiprocessor Programming

2nd thread’s late precombining phase visit locked out Combining (reloaded) +3 FIRST 2nd thread’s late precombining phase visit locked out +3 Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Start at leaf Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Add 1 Art of Multiprocessor Programming

Revisit nodes visited in precombining Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Revisit nodes visited in precombining Art of Multiprocessor Programming

Accumulate combined values, if any Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Accumulate combined values, if any Art of Multiprocessor Programming

We will retraverse path in reverse order … Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } We will retraverse path in reverse order … Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Navigation node = myLeaf; int combined = 1; while (node != stop) { combined = node.combine(combined); stack.push(node); node = node.parent; } Move up the tree Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Art of Multiprocessor Programming

Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Wait until node is unlocked. It is locked by the 2nd thread until it deposits its value Art of Multiprocessor Programming

Why is it that no thread acquires the lock between the two lines? Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Why is it that no thread acquires the lock between the two lines? Why is it that no thread acquires the lock between the two lines? If the node is locked, the only reason may be that the second thread is yet to climb and post its combining result. Therefore no other thread can climb from the other side and also no other thread can climb from this thread’s side, as it made sure all lower nodes along its path are also locked. If the node is not locked, then I am the first thread in this combining phase and I manage to lock it because the node is synchronized. Art of Multiprocessor Programming

Lock out late attempts to combine (by threads still in precombining) Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Lock out late attempts to combine (by threads still in precombining) Art of Multiprocessor Programming

Remember my (1st thread) contribution Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Remember my (1st thread) contribution Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Check status Art of Multiprocessor Programming

Art of Multiprocessor Programming Combining Phase Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } I (1st thread) am alone Art of Multiprocessor Programming

Not alone: combine with 2nd thread Combining Node synchronized int combine(int combined) { while (locked) wait(); locked = true; firstValue = combined; switch (cStatus) { case FIRST: return firstValue; case SECOND: return firstValue + secondValue; default: … } Not alone: combine with 2nd thread Art of Multiprocessor Programming

Add combined value to root, start back down Operation Phase 5 +5 Add combined value to root, start back down +3 +2 zzz Art of Multiprocessor Programming

Operation Phase (reloaded) 5 Leave value to be combined … SECOND 2 Art of Multiprocessor Programming

Operation Phase (reloaded) 5 Unlock, and wait … SECOND 2 +2 zzz Art of Multiprocessor Programming

Operation Phase Navigation prior = stop.op(combined); Art of Multiprocessor Programming

Operation Phase Navigation prior = stop.op(combined); The node where we stopped. Provide collected sum and wait for combining result Art of Multiprocessor Programming

Operation on Stopped Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; default: … Notice that at this point if a node is not at the root then it is the second to arrive in the precombining phase, that is, there is another thread that will do the actual combining up the tree, and so its role is to place the combined value it accumulated till now into the secondValue field. Art of Multiprocessor Programming

Only ROOT and SECOND possible. Why? Op States of Stop Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; default: … Only ROOT and SECOND possible. Why? Only ROOT and SECOND possible. Why? Because FIRST continued up the tree and did not stop at this node. Art of Multiprocessor Programming

Add sum to root, return prior value At Root synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; default: … Add sum to root, return prior value Art of Multiprocessor Programming

Deposit value for later combining … Intermediate Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; default: … Notice that at this point the node is not at the root so it is the second to arrive in the precombining phase, that is, there is another thread that will do the actual combining up the tree, and so its role is to place the combined value it accumulated till now into the secondValue field. Deposit value for later combining … Art of Multiprocessor Programming

Unlock node (which I locked in precombining). Then notify 1st thread Intermediate Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; default: … Unlock node (which I locked in precombining). Then notify 1st thread Notice that you can set locked = false and not worry about letting a late precombining thread get by because you are the second at thread at this node so there is no late precombinging thread that needs to be locked out!!!! Art of Multiprocessor Programming

Wait for 1st thread to deliver results Intermediate Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; default: … Wait for 1st thread to deliver results Art of Multiprocessor Programming

Unlock node (locked by 1st thread in combining phase) & return Intermediate Node synchronized int op(int combined) { switch (cStatus) { case ROOT: int prior = result; result += combined; return prior; case SECOND: secondValue = combined; locked = false; notifyAll(); while (cStatus != CStatus.RESULT) wait(); cStatus = CStatus.IDLE; return result; default: … Unlock node (locked by 1st thread in combining phase) & return Art of Multiprocessor Programming

Art of Multiprocessor Programming Distribution Phase 5 Move down with result SECOND zzz Art of Multiprocessor Programming

Leave result for 2nd thread & lock node Distribution Phase 5 Leave result for 2nd thread & lock node SECOND 3 zzz Art of Multiprocessor Programming

Art of Multiprocessor Programming Distribution Phase 5 Push result down tree SECOND 3 zzz Art of Multiprocessor Programming

2nd thread awakens, unlocks, takes value Distribution Phase 5 2nd thread awakens, unlocks, takes value IDLE 3 Art of Multiprocessor Programming

Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior; Art of Multiprocessor Programming

Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior; Traverse path in reverse order Art of Multiprocessor Programming

Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior; Distribute results to waiting 2nd threads Art of Multiprocessor Programming

Distribution Phase Navigation while (!stack.empty()) { node = stack.pop(); node.distribute(prior); } return prior; Return result to caller Art of Multiprocessor Programming

Art of Multiprocessor Programming Distribution Phase synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); default: … Art of Multiprocessor Programming

No 2nd thread to combine with me, unlock node & reset Distribution Phase synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); default: … No 2nd thread to combine with me, unlock node & reset Art of Multiprocessor Programming

Art of Multiprocessor Programming Distribution Phase Notify 2nd thread that result is available (2nd thread will release lock) synchronized void distribute(int prior) { switch (cStatus) { case FIRST: cStatus = CStatus.IDLE; locked = false; notifyAll(); return; case SECOND: result = prior + firstValue; cStatus = CStatus.RESULT; notifyAll(); default: … Art of Multiprocessor Programming

Art of Multiprocessor Programming Bad News: High Latency +5 Log n +3 +2 Art of Multiprocessor Programming

Good News: Real Parallelism 1 thread +5 +3 +2 2 threads Art of Multiprocessor Programming

Art of Multiprocessor Programming Throughput Puzzles Ideal circumstances All n threads move together, combine n increments in O(log n) time Worst circumstances All n threads slightly skewed, locked out n increments in O(n · log n) time Art of Multiprocessor Programming

Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Art of Multiprocessor Programming

Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} How many iterations Art of Multiprocessor Programming

Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Expected time between incrementing counter Art of Multiprocessor Programming

Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Take a number Art of Multiprocessor Programming

Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Pretend to work (more work, less concurrency) Art of Multiprocessor Programming

Performance Here are some fake graphs Distilled from real ones

Performance Here are some fake graphs Distilled from real ones

Performance Here are some fake graphs Throughput Distilled from real ones Throughput Average incs in 1 million cycles

Performance Here are some fake graphs Throughput Latency Distilled from real ones Throughput Average incs in 1 million cycles Latency Average cycles per inc

Latency Spin lock bad Combining tree good Number of processors 106 106

Throughput Spin lock Combining tree bad Combining tree Spin lock good Number of processors 107 107

Combining Rate vs Work