Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Slides:



Advertisements
Similar presentations
Multiprocessor Architecture Basics
Advertisements

Multicore Programming Skip list Tutorial 10 CS Spring 2010.
1 Synchronization A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Types of Synchronization.
Synchronization without Contention
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
12. Common Errors, a few Puzzles. © O. Nierstrasz P2 — Common Errors, a few Puzzles 12.2 Common Errors, a few Puzzles Sources  Cay Horstmann, Computing.
Introduction Companion slides for
Synchronization without Contention John M. Mellor-Crummey and Michael L. Scott+ ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun.
Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich.
Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Two-Process Systems TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA Companion slides for Distributed Computing.
Synchronization (Barriers) Parallel Processing (CS453)
Multicore Programming
Manifold Protocols TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA Companion slides for Distributed Computing.
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Ahmed Khademzadeh Azad University.
1 Chapter 5 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
MULTIVIE W Slide 1 (of 23) The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors Paper: Thomas E. Anderson Presentation: Emerson.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization.
State of the Ward in 2007 Version 1.0 A Fifth Sunday Lesson Given in the Sterling Park Ward, Ashburn, VA Stake by D. Calvin Andrus, Bishop
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 22: May 20, 2005 Synchronization.
Programming Language Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
1 Based on: The art of multiprocessor programming Maurice Herlihy and Nir Shavit, 2008 Appendix A – Software Basics Appendix B – Hardware Basics Introduction.
Concurrent Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Counting and Distributed Coordination BASED ON CHAPTER 12 IN «THE ART OF MULTIPROCESSOR PROGRAMMING» LECTURE BY: SAMUEL AMAR.
Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
The Relative Power of Synchronization Operations Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.
Barrier Synchronization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Concurrent Objects Companion slides for
Shared Counters and Parallelism
Ivy Eva Wu.
Distributed Algorithms (22903)
Distributed Algorithms (22903)
Byzantine-Resilient Colorless Computaton
Counting, Sorting, and Distributed Coordination
FOTW Worksheet Slides Christopher Penn, Financial Aid Podcast Student Loan Network.
Simulations and Reductions
Barrier Synchronization
Combinatorial Topology and Distributed Computing
Barrier Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Shared Counters and Parallelism
Shared Counters and Parallelism
Lecture 17 Multiprocessors and Thread-Level Parallelism
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming 2 Simple Video Game Prepare frame for display –By graphics coprocessor “soft real-time” application –Need at least 35 frames/second –OK to mess up rarely

Art of Multiprocessor Programming 3 Simple Video Game while (true) { frame.prepare(); frame.display(); }

Art of Multiprocessor Programming 4 Simple Video Game while (true) { frame.prepare(); frame.display(); } What about overlapping work? –1 st thread displays frame –2 nd prepares next frame

Art of Multiprocessor Programming 5 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; }

Art of Multiprocessor Programming 6 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; } even phases

Art of Multiprocessor Programming 7 Two-Phase Rendering while (true) { if (phase) { frame[0].display(); } else { frame[1].display(); } phase = !phase; } while (true) { if (phase) { frame[1].prepare(); } else { frame[0].prepare(); } phase = !phase; } odd phases

Art of Multiprocessor Programming 8 Synchronization Problems How do threads stay in phase? Too early? –“we render no frame before its time” Too late? –Recycle memory before frame is displayed

Art of Multiprocessor Programming 9 Ideal Parallel Computation

Art of Multiprocessor Programming 10 Ideal Parallel Computation

Art of Multiprocessor Programming 11 Real-Life Parallel Computation zzz…

Art of Multiprocessor Programming 12 Real-Life Parallel Computation zzz… Uh, oh 0

Art of Multiprocessor Programming 13 Barrier Synchronization barrier

Art of Multiprocessor Programming 14 Barrier Synchronization barrier

Art of Multiprocessor Programming 15 Barrier Synchronization No thread enters here Until every thread has left here barrier

Art of Multiprocessor Programming 16 Why Do We Care? Mostly of interest to –Scientific & numeric computation Elsewhere –Garbage collection –Less common in systems programming –Still important topic

Art of Multiprocessor Programming 17 Duality Dual to mutual exclusion –Include others, not exclude them Same implementation issues –Interaction with caches … Invalidation? Local spinning?

Art of Multiprocessor Programming 18 Example: Parallel Prefix abcd aa+ba+b+c +d before after

Art of Multiprocessor Programming 19 Parallel Prefix abcd One thread Per entry

Art of Multiprocessor Programming 20 Parallel Prefix: Phase 1 abcd aa+bb+cc+d

Art of Multiprocessor Programming 21 Parallel Prefix: Phase 2 abcd aa+ba+b+c +d

Art of Multiprocessor Programming 22 Parallel Prefix N threads can compute –Parallel prefix –Of N entries –In log 2 N rounds What if system is asynchronous? –Why we need barriers

Art of Multiprocessor Programming 23 Prefix class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; }

class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 24 Prefix Array of input values

class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 25 Prefix Thread index

class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 26 Prefix Shared barrier

class Prefix extends Thread { int[] a; int i; Barrier b; void Prefix(int[] a, Barrier b, int i) { a = a; b = b; i = i; } Art of Multiprocessor Programming 27 Prefix Initialize fields

Art of Multiprocessor Programming 28 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; if (i >= d) a[i] += sum; d = d * 2; }}}

Art of Multiprocessor Programming 29 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; d = d * 2; }}} Make sure everyone reads before anyone writes

Art of Multiprocessor Programming 30 Where Do the Barriers Go? public void run() { int d = 1, sum = 0; while (d < N) { if (i >= d) sum = a[i-d]; b.await(); if (i >= d) a[i] += sum; b.await(); d = d * 2; }}} Make sure everyone writes before anyone reads Make sure everyone reads before anyone writes

Art of Multiprocessor Programming 31 Barrier Implementations Cache coherence –Spin on locally-cached locations? –Spin on statically-defined locations? Latency –How many steps? Symmetry –Do all threads do the same thing?

Art of Multiprocessor Programming 32 Barriers public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}}

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 33 Barriers Number of threads not yet arrived

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 34 Barriers Number of threads participating

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 35 Barriers Initialization

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 36 Barriers Principal method

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 37 Barriers If I’m last, reset fields for next time

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 38 Barriers Otherwise, wait for everyone else

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 39 Barriers What’s wrong with this protocol?

Art of Multiprocessor Programming 40 Reuse Barrier b = new Barrier(n); while ( mumble() ) { work(); b.await() } do work synchronize repeat

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 41 Barriers

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 42 Barriers Waiting for Phase 1 to finish

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 43 Barriers Waiting for Phase 1 to finish Phase 1 is so over

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming 44 Barriers ZZZZZ…. Prepare for phase 2

public class Barrier { AtomicInteger count; int size; public Barrier(int n){ count = AtomicInteger(n); size = n; } public void await() { if (count.getAndDecrement()==1) { count.set(size); } else { while (count.get() != 0); }}}} Art of Multiprocessor Programming Uh-Oh Waiting for Phase 1 to finish Waiting for Phase 2 to finish

Art of Multiprocessor Programming 46 Basic Problem One thread “wraps around” to start phase 2 While another thread is still waiting for phase 1 One solution: –Always use two barriers

Art of Multiprocessor Programming 47 Sense-Reversing Barriers public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}}

public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 48 Sense-Reversing Barriers Completed odd or even-numbered phase?

public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 49 Sense-Reversing Barriers Store sense for next phase

public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 50 Sense-Reversing Barriers Get new sense determined by last phase

public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 51 Sense-Reversing Barriers If I’m last, reverse sense for next time

public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 52 Sense-Reversing Barriers Otherwise, wait for sense to flip

public class Barrier { AtomicInteger count; int size; boolean sense = false; threadSense = new ThreadLocal … public void await { boolean mySense = threadSense.get(); if (count.getAndDecrement()==1) { count.set(size); sense = mySense } else { while (sense != mySense) {} } threadSense.set(!mySense)}}} Art of Multiprocessor Programming 53 Sense-Reversing Barriers Prepare sense for next phase

Art of Multiprocessor Programming 54 2-barrier Combining Tree Barriers

Art of Multiprocessor Programming 55 2-barrier Combining Tree Barriers

Art of Multiprocessor Programming 56 Combining Tree Barrier public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}}

public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 57 Combining Tree Barrier Parent barrier in tree

public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 58 Combining Tree Barrier Am I last?

public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await(); count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 59 Combining Tree Barrier Proceed to parent barrier

public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 60 Combining Tree Barrier Prepare for next phase

public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 61 Combining Tree Barrier Notify others at this node

public class Node{ AtomicInteger count; int size; Node parent; volatile boolean sense; public void await() {… if (count.getAndDecrement()==1) { if (parent != null) { parent.await() count.set(size); sense = mySense } else { while (sense != mySense) {} }…}}} Art of Multiprocessor Programming 62 Combining Tree Barrier I’m not last, so wait for notification

Art of Multiprocessor Programming 63 Combining Tree Barrier No sequential bottleneck –Parallel getAndDecrement() calls Low memory contention –Same reason Cache behavior –Local spinning on bus-based architecture –Not so good for NUMA

Art of Multiprocessor Programming 64 Remarks Everyone spins on sense field –Local spinning on bus-based (good) –Network hot-spot on distributed architecture (bad) Not really scalable

Art of Multiprocessor Programming 65 Tournament Tree Barrier If tree nodes have fan-in 2 –Don’t need to call getAndDecrement() –Winner chosen statically At level i –If i-th bit of id is 0, move up –Otherwise keep back

Art of Multiprocessor Programming 66 Tournament Tree Barriers winnerloser winnerloser winnerloser root

Art of Multiprocessor Programming 67 Tournament Tree Barriers All flags blue

Art of Multiprocessor Programming 68 Tournament Tree Barriers Loser thread sets winner’s flag

Art of Multiprocessor Programming 69 Tournament Tree Barriers Loser spins on own flag

Art of Multiprocessor Programming 70 Tournament Tree Barriers Winner spins on own flag

Art of Multiprocessor Programming 71 Tournament Tree Barriers Winner sees own flag, moves up, spins

Art of Multiprocessor Programming 72 Tournament Tree Barriers Bingo!

Art of Multiprocessor Programming 73 Tournament Tree Barriers Sense-reversing: next time use blue flags

Art of Multiprocessor Programming 74 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … }

Art of Multiprocessor Programming 75 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Notifications delivered here

Art of Multiprocessor Programming 76 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Other thead at same level

Art of Multiprocessor Programming 77 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Parent (winner) or null (loser)

Art of Multiprocessor Programming 78 Tournament Barrier class TBarrier { boolean flag; TBarrier partner; TBarrier parent; boolean top; … } Am I the root?

Art of Multiprocessor Programming 79 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}}

Art of Multiprocessor Programming 80 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Le root, c’est moi Current sense

Art of Multiprocessor Programming 81 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} I am already a winner

Art of Multiprocessor Programming 82 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Wait for partner

Art of Multiprocessor Programming 83 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Synchronize upstairs

Art of Multiprocessor Programming 84 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Inform partner

Art of Multiprocessor Programming 85 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Inform partner Order is important (why?)

Art of Multiprocessor Programming 86 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Natural-born loser

Art of Multiprocessor Programming 87 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Tell partner I’m here

Art of Multiprocessor Programming 88 Tournament Barrier void await(boolean mySense) { if (top) { return; } else if (parent != null) { while (flag != mySense) {}; parent.await(mySense); partner.flag = mySense; } else { partner.flag = mySense; while (flag != mySense) {}; }}} Wait for notification from partner

Art of Multiprocessor Programming 89 Remarks No need for read-modify-write calls Each thread spins on fixed location –Good for bus-based architectures –Good for NUMA architectures

Art of Multiprocessor Programming 90 Dissemination Barrier At round i –Thread A notifies thread A+2 i (mod n) Requires log n rounds

Art of Multiprocessor Programming 91 Dissemination Barrier

Art of Multiprocessor Programming 92 Remarks Elegant Good source of homework problems Not cache-friendly

Art of Multiprocessor Programming 93 Ideas So Far Sense-reversing –Reuse without reinitializing Combining tree –Like counters, locks … Tournament tree –Optimized combining tree Dissemination barrier –Intellectually Pleasing (matter of taste)

Art of Multiprocessor Programming 94 Which is best for Multicore? On a cache coherent multicore chip: perhaps none of the above… Here is another (arguably) better algorithm …

Art of Multiprocessor Programming 95 Static Tree Barrier One node per thread, statically assigned

Art of Multiprocessor Programming 96 Static Tree Barrier Sense-reversing flag

Art of Multiprocessor Programming Static Tree Barrier Node has count of missing children

Art of Multiprocessor Programming Static Tree Barrier Spin until zero …

21 Art of Multiprocessor Programming Static Tree Barrier 0 0 My counter is zero, decrement parent

21 Art of Multiprocessor Programming Static Tree Barrier 0 0 Spin on done flag

Art of Multiprocessor Programming Static Tree Barrier 0 0

Art of Multiprocessor Programming Static Tree Barrier 0 0

Art of Multiprocessor Programming Static Tree Barrier 0 0

Art of Multiprocessor Programming Static Tree Barrier 0 0

Art of Multiprocessor Programming Static Tree Barrier 0 0

Art of Multiprocessor Programming Static Tree Barrier 0 0 yowzah!

Art of Multiprocessor Programming Static Tree Barrier 0 0 yowzah!

Art of Multiprocessor Programming Static Tree Barrier 0 0 yowzah!

Art of Multiprocessor Programming Static Tree Barrier 0 0 yes!

Art of Multiprocessor Programming Static Tree Barrier 0 0

Art of Multiprocessor Programming 111 Remarks Very little cache traffic Minimal space overhead On message-passing architecture –Send notification & sense down tree

Art of Multiprocessor Programming 112 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.