Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used.

Slides:

Advertisements

Similar presentations

Introduction Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual.

Advertisements

Multicore Programming Skip list Tutorial 10 CS Spring 2010.

MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.

7. Optimization Prof. O. Nierstrasz Lecture notes by Marcus Denker.

CHAPTER 2 ALGORITHM ANALYSIS 【 Definition 】 An algorithm is a finite set of instructions that, if followed, accomplishes a particular task. In addition,

Concurrency Important and difficult (Ada slides copied from Ed Schonberg)

Multithreaded Programs in Java. Tasks and Threads A task is an abstraction of a series of steps – Might be done in a separate thread – Java libraries.

Barrier Synchronization Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Introduction Companion slides for

U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science The Implementation of the Cilk-5 Multithreaded Language (Frigo, Leiserson, and.

CILK: An Efficient Multithreaded Runtime System. People n Project at MIT & now at UT Austin –Bobby Blumofe (now UT Austin, Akamai) –Chris Joerg –Brad.

WORK STEALING SCHEDULER 6/16/2010 Work Stealing Scheduler 1.

CP — Concurrent Programming 8. Liveness and Asynchrony Prof. O. Nierstrasz Wintersemester 2005 / 2006.

Parallelism Analysis, and Work Distribution BY DANIEL LIVSHEN. BASED ON CHAPTER 16 “FUTURES, SCHEDULING AND WORK DISTRIBUTION” ON “THE ART OF MULTIPROCESSOR.

Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit (Some images in this.

Shared Counters and Parallelism Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

1 Tuesday, November 07, 2006 “If anything can go wrong, it will.” -Murphy’s Law.

Art of Multiprocessor Programming1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy.

ESE Einführung in Software Engineering X. CHAPTER Prof. O. Nierstrasz Wintersemester 2005 / 2006.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

Concurrency - 1 Tasking Concurrent Programming Declaration, creation, activation, termination Synchronization and communication Time and delays conditional.

50.003: Elements of Software Construction Week 5 Basics of Threads.

Two-Process Systems TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA Companion slides for Distributed Computing.

Juan Mendivelso.  Serial Algorithms: Suitable for running on an uniprocessor computer in which only one instruction executes at a time.  Parallel Algorithms:

Multithreaded Algorithms Andreas Klappenecker. Motivation We have discussed serial algorithms that are suitable for running on a uniprocessor computer.

Win8 on Intel Programming Course The challenge Paul Guermonprez Intel Software

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Multicore Programming

Manifold Protocols TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA Companion slides for Distributed Computing.

CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.

Threads in Java. History  Process is a program in execution  Has stack/heap memory  Has a program counter  Multiuser operating systems since the sixties.

Processes and Threads CS550 Operating Systems. Processes and Threads These exist only at execution time They have fast state changes -> in memory and.

1 Web Based Programming Section 8 James King 12 August 2003.

1 CS 140 : Feb 19, 2015 Cilk Scheduling & Applications Analyzing quicksort Optional: Master method for solving divide-and-conquer recurrences Tips on parallelism.

Multithreading in Java Sameer Singh Chauhan Lecturer, I. T. Dept., SVIT, Vasad.

Spring/2002 Distributed Software Engineering C:\unocourses\4350\slides\DefiningThreads 1 Reusing threads.

1 CSCI 6900: Design, Implementation, and Verification of Concurrent Software Eileen Kraemer August 24 th, 2010 The University of Georgia.

Java Thread and Memory Model

Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Programming Language Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Static Process Scheduling

Multiprocessor Architecture Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

Futures, Scheduling, and Work Distribution Speaker: Eliran Shmila Based on chapter 16 from the book “The art of multiprocessor programming” by Maurice.

Concurrent Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

Concurrency and Performance Based on slides by Henri Casanova.

L6: Threads “the future is parallel” COMP206, Geoff Holmes and Bernhard Pfahringer.

The Relative Power of Synchronization Operations Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Mergesort example: Merge as we return from recursive calls Merge Divide 1 element 829.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

CILK: An Efficient Multithreaded Runtime System

A brief intro to: Parallelism, Threads, and Concurrency

Computer Engg, IIT(BHU)

Concurrent Objects Companion slides for

Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.

Futures, Scheduling, and Work Distribution

Byzantine-Resilient Colorless Computaton

Elements of Combinatorial Topology

Futures, Scheduling, and Work Distribution

Simulations and Reductions

Load Balancing and Multithreaded Programming

Multithreaded Programming

Combinatorial Topology and Distributed Computing

Futures, Scheduling, and Work Distribution

Programming with Shared Memory Specifying parallelism

Introduction to CILK Some slides are from:

Presentation transcript:

Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA

Art of Multiprocessor Programming22 How to write Parallel Apps? split a program into parallel parts In an effective way Thread management

Art of Multiprocessor Programming33 Matrix Multiplication

Art of Multiprocessor Programming44 Matrix Multiplication

Art of Multiprocessor Programming5 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}

Art of Multiprocessor Programming6 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} a thread

Art of Multiprocessor Programming7 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} Which matrix entry to compute

Art of Multiprocessor Programming8 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} Actual computation

Art of Multiprocessor Programming9 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); }

Art of Multiprocessor Programming10 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Create n x n threads

Art of Multiprocessor Programming11 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Start them

Art of Multiprocessor Programming12 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Wait for them to finish Start them

Art of Multiprocessor Programming13 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Wait for them to finish Start them What’s wrong with this picture?

Art of Multiprocessor Programming14 Thread Overhead Threads Require resources –Memory for stacks –Setup, teardown –Scheduler overhead Short-lived threads –Ratio of work versus overhead bad

Art of Multiprocessor Programming15 Thread Pools More sensible to keep a pool of –long-lived threads Threads assigned short-lived tasks –Run the task –Rejoin pool –Wait for next assignment

Art of Multiprocessor Programming16 Thread Pool = Abstraction Insulate programmer from platform –Big machine, big pool –Small machine, small pool Portable code –Works across platforms –Worry about algorithm, not platform

Art of Multiprocessor Programming17 ExecutorService Interface In java.util.concurrent –Task = Runnable object If no result value expected Calls run() method. –Task = Callable object If result value of type T expected Calls T call() method.

Art of Multiprocessor Programming18 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get();

Art of Multiprocessor Programming19 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get(); Submitting a Callable task returns a Future object

Art of Multiprocessor Programming20 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get(); The Future’s get() method blocks until the value is available

Art of Multiprocessor Programming21 Future Runnable task = …; … Future future = executor.submit(task); … future.get();

Art of Multiprocessor Programming22 Future Runnable task = …; … Future future = executor.submit(task); … future.get(); Submitting a Runnable task returns a Future object

Art of Multiprocessor Programming23 Future Runnable task = …; … Future future = executor.submit(task); … future.get(); The Future’s get() method blocks until the computation is complete

Art of Multiprocessor Programming24 Note Executor Service submissions –Like New England traffic signs –Are purely advisory in nature The executor –Like the Boston driver –Is free to ignore any such advice –And could execute tasks sequentially …

Art of Multiprocessor Programming25 Fibonacci 1 if n = 0 or 1 F(n) F(n-1) + F(n-2) otherwise Note –Potential parallelism –Dependencies

Art of Multiprocessor Programming26 Disclaimer This Fibonacci implementation is –Egregiously inefficient So don’t try this at home or job! –But illustrates our point How to deal with dependencies

Art of Multiprocessor Programming27 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci

Art of Multiprocessor Programming28 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci Parallel calls

Art of Multiprocessor Programming29 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci Pick up & combine results

Art of Multiprocessor Programming30 Dynamic Behavior Multithreaded program is –A directed acyclic graph (DAG) –That unfolds dynamically Each node is –A single unit of work

Art of Multiprocessor Programming31 Fibonacci DAG fib(4)

Art of Multiprocessor Programming32 Fibonacci DAG fib(4) fib(3)

Art of Multiprocessor Programming33 Fibonacci DAG fib(4) fib(3)fib(2)

Art of Multiprocessor Programming34 Fibonacci DAG fib(4) fib(3)fib(2) fib(1)

Art of Multiprocessor Programming35 Fibonacci DAG fib(4) fib(3)fib(2) call get fib(2)fib(1)

Art of Multiprocessor Programming36 Matrix Addition

Art of Multiprocessor Programming37 Matrix Addition 4 parallel additions

Art of Multiprocessor Programming38 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }}

Art of Multiprocessor Programming39 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Constant-time operation

Art of Multiprocessor Programming40 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Submit 4 tasks

Art of Multiprocessor Programming41 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Base case: add directly

Art of Multiprocessor Programming42 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Let them finish

Art of Multiprocessor Programming43 Dependencies Matrix example is not typical Tasks are independent –Don’t need results of one task … –To complete another Often tasks are not independent

Art of Multiprocessor Programming44 How Parallel is That? Define work: –Total time on one processor Define critical-path length: –Longest dependency path –Can’t beat that!

Unfolded DAG Art of Multiprocessor Programming45

Work? Art of Multiprocessor Programming T 1 : time needed on one processor Just count the nodes …. T 1 = 18

Critical Path? Art of Multiprocessor Programming47 T ∞ : time needed on as many processors as you like

Critical Path? Art of Multiprocessor Programming T ∞ : time needed on as many processors as you like Longest path …. T ∞ = 9

Art of Multiprocessor Programming49 Notation Watch T P = time on P processors T 1 = work (time on 1 processor) T ∞ = critical path length (time on ∞ processors)

Art of Multiprocessor Programming50 Simple Laws Work Law: T P ≥ T 1 /P –In one step, can’t do more than P work Critical Path Law: T P ≥ T ∞ –Can’t beat infinite resources

Art of Multiprocessor Programming51 Performance Measures Speedup on P processors –Ratio T 1 /T P –How much faster with P processors Linear speedup –T 1 /T P = Θ(P) Max speedup (average parallelism) –T 1 /T ∞

Sequential Composition Art of Multiprocessor Programming52 AB Work: T 1 (A) + T 1 (B) Critical Path: T ∞ (A) + T ∞ (B)

Parallel Composition Art of Multiprocessor Programming53 Work: T 1 (A) + T 1 (B) Critical Path: max{T ∞ (A), T ∞ (B)} A B

Art of Multiprocessor Programming54 Matrix Addition

Art of Multiprocessor Programming55 Matrix Addition 4 parallel additions

Art of Multiprocessor Programming56 Addition Let A P (n) be running time –For n x n matrix –on P processors For example –A 1 (n) is work –A ∞ (n) is critical path length

Art of Multiprocessor Programming57 Addition Work is A 1 (n) = 4 A 1 (n/2) + Θ(1) 4 spawned additions Partition, synch, etc

Art of Multiprocessor Programming58 Addition Work is A 1 (n) = 4 A 1 (n/2) + Θ(1) = Θ(n 2 ) Same as double-loop summation

Art of Multiprocessor Programming59 Addition Critical Path length is A ∞ (n) = A ∞ (n/2) + Θ(1) spawned additions in parallel Partition, synch, etc

Art of Multiprocessor Programming60 Addition Critical Path length is A ∞ (n) = A ∞ (n/2) + Θ(1) = Θ(log n)

Art of Multiprocessor Programming61 Matrix Multiplication Redux

Art of Multiprocessor Programming62 Matrix Multiplication Redux

Art of Multiprocessor Programming63 First Phase … 8 multiplications

Art of Multiprocessor Programming64 Second Phase … 4 additions

Art of Multiprocessor Programming65 Multiplication Work is M 1 (n) = 8 M 1 (n/2) + 4A 1 (n/2) 8 parallel multiplications Final addition

Art of Multiprocessor Programming66 Multiplication Work is M 1 (n) = 8 M 1 (n/2) + Θ(n 2 ) = Θ(n 3 ) Same as serial triple-nested loop

Art of Multiprocessor Programming67 Multiplication Critical path length is M ∞ (n) = M ∞ (n/2) + A ∞ (n/2) Half-size parallel multiplications Final addition

Art of Multiprocessor Programming68 Multiplication Critical path length is M ∞ (n) = M ∞ (n/2) + A ∞ (n/2) = M ∞ (n/2) + Θ(log n) = Θ(log 2 n)

Art of Multiprocessor Programming69 Parallelism M 1 (n)/ M ∞ (n) = Θ(n 3 /log 2 n) To multiply two 1000 x 1000 matrices – /10 2 =10 7 Much more than number of processors on any real machine

Art of Multiprocessor Programming70 Work Distribution zzz…

Art of Multiprocessor Programming71 Work Dealing Yes!

Art of Multiprocessor Programming72 The Problem with Work Dealing D’oh!

Art of Multiprocessor Programming73 Work Stealing No work… Yes!

Art of Multiprocessor Programming74 Lock-Free Work Stealing Each thread has a pool of ready work Remove work without synchronizing If you run out of work, steal someone else’s Choose victim at random

Art of Multiprocessor Programming75 Local Work Pools Each work pool is a Double-Ended Queue

Art of Multiprocessor Programming76 Work DEQueue 1 pushBottom popBottom work 1. Double-Ended Queue

Art of Multiprocessor Programming77 Obtain Work Obtain work Run task until Blocks or terminates popBottom

Art of Multiprocessor Programming78 New Work Unblock node Spawn node pushBottom

Art of Multiprocessor Programming79 Whatcha Gonna do When the Well Runs empty

Art of Multiprocessor Programming80 Steal Work from Others Pick random thread’s DEQeueue

Art of Multiprocessor Programming81 Steal this Task! popTop

Art of Multiprocessor Programming82 Task DEQueue Methods –pushBottom –popBottom –popTop Never happen concurrently

Art of Multiprocessor Programming83 Task DEQueue Methods –pushBottom –popBottom –popTop Most common – make them fast (minimize use of CAS)

Art of Multiprocessor Programming84 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to – Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.