Download presentation
Presentation is loading. Please wait.
Published byLaura Wright Modified over 9 years ago
1
Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA
2
Art of Multiprocessor Programming22 How to write Parallel Apps? split a program into parallel parts In an effective way Thread management
3
Art of Multiprocessor Programming33 Matrix Multiplication
4
Art of Multiprocessor Programming44 Matrix Multiplication
5
Art of Multiprocessor Programming5 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}
6
Art of Multiprocessor Programming6 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} a thread
7
Art of Multiprocessor Programming7 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} Which matrix entry to compute
8
Art of Multiprocessor Programming8 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} Actual computation
9
Art of Multiprocessor Programming9 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); }
10
Art of Multiprocessor Programming10 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Create n x n threads
11
Art of Multiprocessor Programming11 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Start them
12
Art of Multiprocessor Programming12 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Wait for them to finish Start them
13
Art of Multiprocessor Programming13 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Wait for them to finish Start them What’s wrong with this picture?
14
Art of Multiprocessor Programming14 Thread Overhead Threads Require resources –Memory for stacks –Setup, teardown –Scheduler overhead Short-lived threads –Ratio of work versus overhead bad
15
Art of Multiprocessor Programming15 Thread Pools More sensible to keep a pool of –long-lived threads Threads assigned short-lived tasks –Run the task –Rejoin pool –Wait for next assignment
16
Art of Multiprocessor Programming16 Thread Pool = Abstraction Insulate programmer from platform –Big machine, big pool –Small machine, small pool Portable code –Works across platforms –Worry about algorithm, not platform
17
Art of Multiprocessor Programming17 ExecutorService Interface In java.util.concurrent –Task = Runnable object If no result value expected Calls run() method. –Task = Callable object If result value of type T expected Calls T call() method.
18
Art of Multiprocessor Programming18 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get();
19
Art of Multiprocessor Programming19 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get(); Submitting a Callable task returns a Future object
20
Art of Multiprocessor Programming20 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get(); The Future’s get() method blocks until the value is available
21
Art of Multiprocessor Programming21 Future Runnable task = …; … Future future = executor.submit(task); … future.get();
22
Art of Multiprocessor Programming22 Future Runnable task = …; … Future future = executor.submit(task); … future.get(); Submitting a Runnable task returns a Future object
23
Art of Multiprocessor Programming23 Future Runnable task = …; … Future future = executor.submit(task); … future.get(); The Future’s get() method blocks until the computation is complete
24
Art of Multiprocessor Programming24 Note Executor Service submissions –Like New England traffic signs –Are purely advisory in nature The executor –Like the Boston driver –Is free to ignore any such advice –And could execute tasks sequentially …
25
Art of Multiprocessor Programming25 Fibonacci 1 if n = 0 or 1 F(n) F(n-1) + F(n-2) otherwise Note –Potential parallelism –Dependencies
26
Art of Multiprocessor Programming26 Disclaimer This Fibonacci implementation is –Egregiously inefficient So don’t try this at home or job! –But illustrates our point How to deal with dependencies
27
Art of Multiprocessor Programming27 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci
28
Art of Multiprocessor Programming28 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci Parallel calls
29
Art of Multiprocessor Programming29 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci Pick up & combine results
30
Art of Multiprocessor Programming30 Dynamic Behavior Multithreaded program is –A directed acyclic graph (DAG) –That unfolds dynamically Each node is –A single unit of work
31
Art of Multiprocessor Programming31 Fibonacci DAG fib(4)
32
Art of Multiprocessor Programming32 Fibonacci DAG fib(4) fib(3)
33
Art of Multiprocessor Programming33 Fibonacci DAG fib(4) fib(3)fib(2)
34
Art of Multiprocessor Programming34 Fibonacci DAG fib(4) fib(3)fib(2) fib(1)
35
Art of Multiprocessor Programming35 Fibonacci DAG fib(4) fib(3)fib(2) call get fib(2)fib(1)
36
Art of Multiprocessor Programming36 Matrix Addition
37
Art of Multiprocessor Programming37 Matrix Addition 4 parallel additions
38
Art of Multiprocessor Programming38 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }}
39
Art of Multiprocessor Programming39 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Constant-time operation
40
Art of Multiprocessor Programming40 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Submit 4 tasks
41
Art of Multiprocessor Programming41 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Base case: add directly
42
Art of Multiprocessor Programming42 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Let them finish
43
Art of Multiprocessor Programming43 Dependencies Matrix example is not typical Tasks are independent –Don’t need results of one task … –To complete another Often tasks are not independent
44
Art of Multiprocessor Programming44 How Parallel is That? Define work: –Total time on one processor Define critical-path length: –Longest dependency path –Can’t beat that!
45
Unfolded DAG Art of Multiprocessor Programming45
46
Work? Art of Multiprocessor Programming46 1 1 2 2 3 3 7 7 5 5 4 4 7 7 8 8 9 9 10 11 12 13 14 15 16 17 18 T 1 : time needed on one processor Just count the nodes …. T 1 = 18
47
Critical Path? Art of Multiprocessor Programming47 T ∞ : time needed on as many processors as you like
48
Critical Path? Art of Multiprocessor Programming48 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 T ∞ : time needed on as many processors as you like Longest path …. T ∞ = 9
49
Art of Multiprocessor Programming49 Notation Watch T P = time on P processors T 1 = work (time on 1 processor) T ∞ = critical path length (time on ∞ processors)
50
Art of Multiprocessor Programming50 Simple Laws Work Law: T P ≥ T 1 /P –In one step, can’t do more than P work Critical Path Law: T P ≥ T ∞ –Can’t beat infinite resources
51
Art of Multiprocessor Programming51 Performance Measures Speedup on P processors –Ratio T 1 /T P –How much faster with P processors Linear speedup –T 1 /T P = Θ(P) Max speedup (average parallelism) –T 1 /T ∞
52
Sequential Composition Art of Multiprocessor Programming52 AB Work: T 1 (A) + T 1 (B) Critical Path: T ∞ (A) + T ∞ (B)
53
Parallel Composition Art of Multiprocessor Programming53 Work: T 1 (A) + T 1 (B) Critical Path: max{T ∞ (A), T ∞ (B)} A B
54
Art of Multiprocessor Programming54 Matrix Addition
55
Art of Multiprocessor Programming55 Matrix Addition 4 parallel additions
56
Art of Multiprocessor Programming56 Addition Let A P (n) be running time –For n x n matrix –on P processors For example –A 1 (n) is work –A ∞ (n) is critical path length
57
Art of Multiprocessor Programming57 Addition Work is A 1 (n) = 4 A 1 (n/2) + Θ(1) 4 spawned additions Partition, synch, etc
58
Art of Multiprocessor Programming58 Addition Work is A 1 (n) = 4 A 1 (n/2) + Θ(1) = Θ(n 2 ) Same as double-loop summation
59
Art of Multiprocessor Programming59 Addition Critical Path length is A ∞ (n) = A ∞ (n/2) + Θ(1) spawned additions in parallel Partition, synch, etc
60
Art of Multiprocessor Programming60 Addition Critical Path length is A ∞ (n) = A ∞ (n/2) + Θ(1) = Θ(log n)
61
Art of Multiprocessor Programming61 Matrix Multiplication Redux
62
Art of Multiprocessor Programming62 Matrix Multiplication Redux
63
Art of Multiprocessor Programming63 First Phase … 8 multiplications
64
Art of Multiprocessor Programming64 Second Phase … 4 additions
65
Art of Multiprocessor Programming65 Multiplication Work is M 1 (n) = 8 M 1 (n/2) + 4A 1 (n/2) 8 parallel multiplications Final addition
66
Art of Multiprocessor Programming66 Multiplication Work is M 1 (n) = 8 M 1 (n/2) + Θ(n 2 ) = Θ(n 3 ) Same as serial triple-nested loop
67
Art of Multiprocessor Programming67 Multiplication Critical path length is M ∞ (n) = M ∞ (n/2) + A ∞ (n/2) Half-size parallel multiplications Final addition
68
Art of Multiprocessor Programming68 Multiplication Critical path length is M ∞ (n) = M ∞ (n/2) + A ∞ (n/2) = M ∞ (n/2) + Θ(log n) = Θ(log 2 n)
69
Art of Multiprocessor Programming69 Parallelism M 1 (n)/ M ∞ (n) = Θ(n 3 /log 2 n) To multiply two 1000 x 1000 matrices –1000 3 /10 2 =10 7 Much more than number of processors on any real machine
70
Art of Multiprocessor Programming70 Work Distribution zzz…
71
Art of Multiprocessor Programming71 Work Dealing Yes!
72
Art of Multiprocessor Programming72 The Problem with Work Dealing D’oh!
73
Art of Multiprocessor Programming73 Work Stealing No work… Yes!
74
Art of Multiprocessor Programming74 Lock-Free Work Stealing Each thread has a pool of ready work Remove work without synchronizing If you run out of work, steal someone else’s Choose victim at random
75
Art of Multiprocessor Programming75 Local Work Pools Each work pool is a Double-Ended Queue
76
Art of Multiprocessor Programming76 Work DEQueue 1 pushBottom popBottom work 1. Double-Ended Queue
77
Art of Multiprocessor Programming77 Obtain Work Obtain work Run task until Blocks or terminates popBottom
78
Art of Multiprocessor Programming78 New Work Unblock node Spawn node pushBottom
79
Art of Multiprocessor Programming79 Whatcha Gonna do When the Well Runs Dry? @&%$!! empty
80
Art of Multiprocessor Programming80 Steal Work from Others Pick random thread’s DEQeueue
81
Art of Multiprocessor Programming81 Steal this Task! popTop
82
Art of Multiprocessor Programming82 Task DEQueue Methods –pushBottom –popBottom –popTop Never happen concurrently
83
Art of Multiprocessor Programming83 Task DEQueue Methods –pushBottom –popBottom –popTop Most common – make them fast (minimize use of CAS)
84
Art of Multiprocessor Programming84 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.