Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Slides:

Advertisements

Similar presentations

Multicore Programming Skip list Tutorial 10 CS Spring 2010.

Advertisements

Scalable Flat-Combining Based Synchronous Queues Danny Hendler, Itai Incze, Nir Shavit and Moran Tzafrir Presentation by Uri Golani.

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

Chapter 6: Process Synchronization

Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.

12. Common Errors, a few Puzzles. © O. Nierstrasz P2 — Common Errors, a few Puzzles 12.2 Common Errors, a few Puzzles Sources  Cay Horstmann, Computing.

Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Introduction Companion slides for

Spin Locks and Contention Based on slides by by Maurice Herlihy & Nir Shavit Tomer Gurevich.

Introduction to Software Engineering 7. Modeling Behaviour.

Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.

12. Summary, Trends, Research. © O. Nierstrasz PS — Summary, Trends, Research Roadmap  Summary: —Trends in programming paradigms  Research:...

ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Rajeev Alur for.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

CP — Concurrent Programming X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.

A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted.

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Copyright © 2010, Oracle and/or its affiliates. All rights reserved. Who’s Afraid of a Big Bad Lock Nir Shavit Sun Labs at Oracle Joint work with Danny.

Two-Process Systems TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA Companion slides for Distributed Computing.

Win8 on Intel Programming Course Modern UI : Features Cédric Andreolli Intel Software.

Win8 on Intel Programming Course The challenge Paul Guermonprez Intel Software

Synchronization (Barriers) Parallel Processing (CS453)

Computational Sprinting on a Hardware/Software Testbed Arun Raghavan *, Laurel Emurian *, Lei Shao #, Marios Papaefthymiou +, Kevin P. Pipe +#, Thomas.

Multicore Programming

What is a sketch? Chapter 1.2 addendum Sketching User Experiences: The Workbook.

The Animated Sequence Chapter 5.1 in Sketching User Experiences: The Workbook.

The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.

Algorithms and data structures Protected by

Win8 on Intel Programming Course Paul Guermonprez Intel Software

Skiplist-based Concurrent Priority Queues Itay Lotan Stanford University Nir Shavit Sun Microsystems Laboratories.

Design of Everyday Things Part 2: Useful Designs? Lecture /slide deck produced by Saul Greenberg, University of Calgary, Canada Images from:

Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.

Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Programming Language Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Concurrent Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used.

Priority Queues Dan Dvorin Based on ‘The Art of Multiprocessor Programming’, by Herlihy & Shavit, chapter 15.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

The Narrative Storyboard Chapter 4.4 in Sketching User Experiences: The Workbook.

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Counting and Distributed Coordination BASED ON CHAPTER 12 IN «THE ART OF MULTIPROCESSOR PROGRAMMING» LECTURE BY: SAMUEL AMAR.

Images of pesticides By: Leslie London, University of Cape Town This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 2.5.

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

The Relative Power of Synchronization Operations Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

Concurrent Objects Companion slides for

Reactive Synchronization Algorithms for Multiprocessors

Shared Counters and Parallelism

Shared Counters and Parallelism

Counting, Sorting, and Distributed Coordination

FOTW Worksheet Slides Christopher Penn, Financial Aid Podcast Student Loan Network.

Simulations and Reductions

Barrier Synchronization

Combinatorial Topology and Distributed Computing

Barrier Synchronization

Shared Counters and Parallelism

Pools, Counters, and Parallelism

CSE 153 Design of Operating Systems Winter 19

Shared Counters and Parallelism

Shared Counters and Parallelism

Presentation transcript:

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit

Art of Multiprocessor Programming 2 A Shared Pool Put –Inserts object –blocks if full Remove –Removes & returns an object –blocks if empty public interface Pool { public void put(Object x); public Object remove(); } Unordered set of objects

Art of Multiprocessor Programming 3 put Simple Locking Implementation put

Art of Multiprocessor Programming 4 put Simple Locking Implementation put Problem: hot- spot contention

Art of Multiprocessor Programming 5 put Simple Locking Implementation Problem: hot- spot contention Problem: Counter is sequential bottleneck put

Art of Multiprocessor Programming 6 put Simple Locking Implementation Problem: hot- spot contention Problem: sequential bottleneck put Solution: Queue Lock

Art of Multiprocessor Programming 7 put Simple Locking Implementation Problem: hot- spot contention Problem: sequential bottleneck put Solution: Queue Lock Solution ???

Art of Multiprocessor Programming 8 Counting Implementation remove put

Art of Multiprocessor Programming 9 Counting Implementation Only the counters are sequential remove put

Art of Multiprocessor Programming 10 Shared Counter

Art of Multiprocessor Programming 11 Shared Counter No duplication

Art of Multiprocessor Programming 12 Shared Counter No duplication No Omission

Art of Multiprocessor Programming 13 Shared Counter Not necessarily linearizable No duplication No Omission

Art of Multiprocessor Programming 14 Shared Counters Can we build a shared counter with –Low memory contention, and –Real parallelism? Locking –Can use queue locks to reduce contention –No help with parallelism issue …

Art of Multiprocessor Programming 15 Software Combining Tree 4 Contention: All spinning local Parallelism: Potential n/log n speedup

Art of Multiprocessor Programming 16 Combining Trees 0

Art of Multiprocessor Programming 17 Combining Trees 0 +3

Art of Multiprocessor Programming 18 Combining Trees

Art of Multiprocessor Programming 19 Combining Trees Two threads meet, combine sums

Art of Multiprocessor Programming 20 Combining Trees Two threads meet, combine sums +5

Art of Multiprocessor Programming 21 Combining Trees Combined sum added to root

Art of Multiprocessor Programming 22 Combining Trees Result returned to children

Art of Multiprocessor Programming 23 Combining Trees Results returned to threads

Art of Multiprocessor Programming 24 Devil in the Details What if –threads don’t arrive at the same time? Wait for a partner to show up? –How long to wait? –Waiting times add up … Instead –Use multi-phase algorithm –Try to wait in parallel …

Art of Multiprocessor Programming 25 Combining Status enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT};

Art of Multiprocessor Programming 26 Combining Status enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT}; Nothing going on

Art of Multiprocessor Programming 27 Combining Status enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT}; 1 st thread in search of partner for combining, will return soon to check for 2 nd thread

Art of Multiprocessor Programming 28 Combining Status enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT}; 2 nd thread arrived with value for combining

Art of Multiprocessor Programming 29 Combining Status enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT}; 1 st thread has completed operation & deposited result for 2 nd thread

Art of Multiprocessor Programming 30 Combining Status enum CStatus{ IDLE, FIRST, SECOND, DONE, ROOT}; Special case: root node

Art of Multiprocessor Programming 31 Node Synchronization Short-term –Synchronized methods –Consistency during method call Long-term –Boolean locked field –Consistency across calls

Art of Multiprocessor Programming 32 Phases Precombining –Set up combining rendez-vous Combining –Collect and combine operations Operation –Hand off to higher thread Distribution –Distribute results to waiting threads

Art of Multiprocessor Programming 33 Precombining Phase 0 Examine status IDLE

Art of Multiprocessor Programming 34 Precombining Phase 0 0 If IDLE, promise to return to look for partner FIRST

Art of Multiprocessor Programming 35 Precombining Phase 0 At ROOT, turn back FIRST

Art of Multiprocessor Programming 36 Precombining Phase 0 FIRST

Art of Multiprocessor Programming 37 Precombining Phase 0 0 SECOND If FIRST, I’m willing to combine, but lock for now

Art of Multiprocessor Programming 38 Code Tree class –In charge of navigation Node class –Combining state –Synchronization state –Bookkeeping

Art of Multiprocessor Programming 39 Combining Phase 0 0 SECOND 1 st thread locked out until 2 nd provides value +3

Art of Multiprocessor Programming 40 Combining Phase 0 0 SECOND 2 nd thread deposits value to be combined, unlocks node, & waits … 2 +3 zzz

Art of Multiprocessor Programming 41 Combining Phase SECOND st thread moves up the tree with combined value … zzz

Art of Multiprocessor Programming 42 Combining (reloaded) nd thread has not yet deposited value … FIRST

Art of Multiprocessor Programming 43 Combining (reloaded) 0 +3 FIRST 1 st thread is alone, locks out late partner

Art of Multiprocessor Programming 44 Combining (reloaded) 0 +3 FIRST Stop at root

Art of Multiprocessor Programming 45 Combining (reloaded) 0 +3 FIRST 2 nd thread’s phase 1 visit locked out

Art of Multiprocessor Programming 46 Operation Phase Add combined value to root, start back down (phase 4) zzz

Art of Multiprocessor Programming 47 Operation Phase (reloaded) 5 Leave value to be combined … SECOND 2

Art of Multiprocessor Programming 48 Operation Phase (reloaded) 5 +2 Unlock, and wait … SECOND 2 zzz

Art of Multiprocessor Programming 49 Distribution Phase 5 0 zzz Move down with result SECOND

Art of Multiprocessor Programming 50 Distribution Phase 5 zzz Leave result for 2 nd thread & lock node SECOND 2

Art of Multiprocessor Programming 51 Distribution Phase 5 0 zzz Move result back down tree SECOND 2

Art of Multiprocessor Programming 52 Distribution Phase 5 2 nd thread awakens, unlocks, takes value IDLE 3

Art of Multiprocessor Programming 53 Bad News: High Latency Log n

Art of Multiprocessor Programming 54 Good News: Real Parallelism threads 1 thread

Art of Multiprocessor Programming 55 Throughput Puzzles Ideal circumstances –All n threads move together, combine –n increments in O(log n) time Worst circumstances –All n threads slightly skewed, locked out –n increments in O(n · log n) time

Art of Multiprocessor Programming 56 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }}

Art of Multiprocessor Programming 57 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} How many iterations

Art of Multiprocessor Programming 58 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Expected time between incrementing counter

Art of Multiprocessor Programming 59 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Take a number

Art of Multiprocessor Programming 60 Index Distribution Benchmark void indexBench(int iters, int work) { while (int i < iters) { i = r.getAndIncrement(); Thread.sleep(random() % work); }} Pretend to work (more work, less concurrency)

Art of Multiprocessor Programming 61 Performance Benchmarks Alewife –NUMA architecture –Simulated Throughput: –average number of inc operations in 1 million cycle period. Latency: –average number of simulator cycles per inc operation.

Art of Multiprocessor Programming 62 Performance Latency: Throughput: Splock Ctree[n] num of processors cycles per operation Splock Ctree[n] operations per million cycles num of processors work = 0

Art of Multiprocessor Programming 63 The Combining Paradigm Implements any RMW operation When tree is loaded –Takes 2 log n steps –for n requests Very sensitive to load fluctuations: –if the arrival rates drop –the combining rates drop –overall performance deteriorates!

Art of Multiprocessor Programming 64 Combining Load Sensitivity Notice Load Fluctuations Throughput processors

Art of Multiprocessor Programming 65 Combining Rate vs Work

Art of Multiprocessor Programming 66 Better to Wait Longer Short wait Indefinite wait Medium wait Throughput processors

Art of Multiprocessor Programming 67 Conclusions Combining Trees –Work well under high contention –Sensitive to load fluctuations –Can be used for getAndMumble() ops Next –Counting networks –A different approach …

Art of Multiprocessor Programming 68 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.