Presentation is loading. Please wait.

Presentation is loading. Please wait.

Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used.

Similar presentations


Presentation on theme: "Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used."— Presentation transcript:

1 Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAA

2 Art of Multiprocessor Programming22 How to write Parallel Apps? split a program into parallel parts In an effective way Thread management

3 Art of Multiprocessor Programming33 Matrix Multiplication

4 Art of Multiprocessor Programming44 Matrix Multiplication

5 Art of Multiprocessor Programming5 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}

6 Art of Multiprocessor Programming6 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} a thread

7 Art of Multiprocessor Programming7 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} Which matrix entry to compute

8 Art of Multiprocessor Programming8 Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}} Actual computation

9 Art of Multiprocessor Programming9 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); }

10 Art of Multiprocessor Programming10 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Create n x n threads

11 Art of Multiprocessor Programming11 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Start them

12 Art of Multiprocessor Programming12 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Wait for them to finish Start them

13 Art of Multiprocessor Programming13 Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join(); } Wait for them to finish Start them What’s wrong with this picture?

14 Art of Multiprocessor Programming14 Thread Overhead Threads Require resources –Memory for stacks –Setup, teardown –Scheduler overhead Short-lived threads –Ratio of work versus overhead bad

15 Art of Multiprocessor Programming15 Thread Pools More sensible to keep a pool of –long-lived threads Threads assigned short-lived tasks –Run the task –Rejoin pool –Wait for next assignment

16 Art of Multiprocessor Programming16 Thread Pool = Abstraction Insulate programmer from platform –Big machine, big pool –Small machine, small pool Portable code –Works across platforms –Worry about algorithm, not platform

17 Art of Multiprocessor Programming17 ExecutorService Interface In java.util.concurrent –Task = Runnable object If no result value expected Calls run() method. –Task = Callable object If result value of type T expected Calls T call() method.

18 Art of Multiprocessor Programming18 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get();

19 Art of Multiprocessor Programming19 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get(); Submitting a Callable task returns a Future object

20 Art of Multiprocessor Programming20 Future Callable task = …; … Future future = executor.submit(task); … T value = future.get(); The Future’s get() method blocks until the value is available

21 Art of Multiprocessor Programming21 Future Runnable task = …; … Future future = executor.submit(task); … future.get();

22 Art of Multiprocessor Programming22 Future Runnable task = …; … Future future = executor.submit(task); … future.get(); Submitting a Runnable task returns a Future object

23 Art of Multiprocessor Programming23 Future Runnable task = …; … Future future = executor.submit(task); … future.get(); The Future’s get() method blocks until the computation is complete

24 Art of Multiprocessor Programming24 Note Executor Service submissions –Like New England traffic signs –Are purely advisory in nature The executor –Like the Boston driver –Is free to ignore any such advice –And could execute tasks sequentially …

25 Art of Multiprocessor Programming25 Fibonacci 1 if n = 0 or 1 F(n) F(n-1) + F(n-2) otherwise Note –Potential parallelism –Dependencies

26 Art of Multiprocessor Programming26 Disclaimer This Fibonacci implementation is –Egregiously inefficient So don’t try this at home or job! –But illustrates our point How to deal with dependencies

27 Art of Multiprocessor Programming27 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci

28 Art of Multiprocessor Programming28 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci Parallel calls

29 Art of Multiprocessor Programming29 class FibTask implements Callable { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 1) { Future left = exec.submit(new FibTask(arg-1)); Future right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}} Multithreaded Fibonacci Pick up & combine results

30 Art of Multiprocessor Programming30 Dynamic Behavior Multithreaded program is –A directed acyclic graph (DAG) –That unfolds dynamically Each node is –A single unit of work

31 Art of Multiprocessor Programming31 Fibonacci DAG fib(4)

32 Art of Multiprocessor Programming32 Fibonacci DAG fib(4) fib(3)

33 Art of Multiprocessor Programming33 Fibonacci DAG fib(4) fib(3)fib(2)

34 Art of Multiprocessor Programming34 Fibonacci DAG fib(4) fib(3)fib(2) fib(1)

35 Art of Multiprocessor Programming35 Fibonacci DAG fib(4) fib(3)fib(2) call get fib(2)fib(1)

36 Art of Multiprocessor Programming36 Matrix Addition

37 Art of Multiprocessor Programming37 Matrix Addition 4 parallel additions

38 Art of Multiprocessor Programming38 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }}

39 Art of Multiprocessor Programming39 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Constant-time operation

40 Art of Multiprocessor Programming40 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Submit 4 tasks

41 Art of Multiprocessor Programming41 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Base case: add directly

42 Art of Multiprocessor Programming42 Matrix Addition Task class Add implements Runnable { Matrix a, b; // add this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices a ij and b ij ) Future f 00 = exec.submit(new Add(a 00,b 00 )); Future f 01 = exec.submit(new Add(a 01,b 01 )); Future f 10 = exec.submit(new Add(a 10,b 10 )); Future f 11 = exec.submit(new Add(a 11,b 11 )); f 00.get(); …; f 11.get(); … }} Let them finish

43 Art of Multiprocessor Programming43 Dependencies Matrix example is not typical Tasks are independent –Don’t need results of one task … –To complete another Often tasks are not independent

44 Art of Multiprocessor Programming44 How Parallel is That? Define work: –Total time on one processor Define critical-path length: –Longest dependency path –Can’t beat that!

45 Unfolded DAG Art of Multiprocessor Programming45

46 Work? Art of Multiprocessor Programming46 1 1 2 2 3 3 7 7 5 5 4 4 7 7 8 8 9 9 10 11 12 13 14 15 16 17 18 T 1 : time needed on one processor Just count the nodes …. T 1 = 18

47 Critical Path? Art of Multiprocessor Programming47 T ∞ : time needed on as many processors as you like

48 Critical Path? Art of Multiprocessor Programming48 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 T ∞ : time needed on as many processors as you like Longest path …. T ∞ = 9

49 Art of Multiprocessor Programming49 Notation Watch T P = time on P processors T 1 = work (time on 1 processor) T ∞ = critical path length (time on ∞ processors)

50 Art of Multiprocessor Programming50 Simple Laws Work Law: T P ≥ T 1 /P –In one step, can’t do more than P work Critical Path Law: T P ≥ T ∞ –Can’t beat infinite resources

51 Art of Multiprocessor Programming51 Performance Measures Speedup on P processors –Ratio T 1 /T P –How much faster with P processors Linear speedup –T 1 /T P = Θ(P) Max speedup (average parallelism) –T 1 /T ∞

52 Sequential Composition Art of Multiprocessor Programming52 AB Work: T 1 (A) + T 1 (B) Critical Path: T ∞ (A) + T ∞ (B)

53 Parallel Composition Art of Multiprocessor Programming53 Work: T 1 (A) + T 1 (B) Critical Path: max{T ∞ (A), T ∞ (B)} A B

54 Art of Multiprocessor Programming54 Matrix Addition

55 Art of Multiprocessor Programming55 Matrix Addition 4 parallel additions

56 Art of Multiprocessor Programming56 Addition Let A P (n) be running time –For n x n matrix –on P processors For example –A 1 (n) is work –A ∞ (n) is critical path length

57 Art of Multiprocessor Programming57 Addition Work is A 1 (n) = 4 A 1 (n/2) + Θ(1) 4 spawned additions Partition, synch, etc

58 Art of Multiprocessor Programming58 Addition Work is A 1 (n) = 4 A 1 (n/2) + Θ(1) = Θ(n 2 ) Same as double-loop summation

59 Art of Multiprocessor Programming59 Addition Critical Path length is A ∞ (n) = A ∞ (n/2) + Θ(1) spawned additions in parallel Partition, synch, etc

60 Art of Multiprocessor Programming60 Addition Critical Path length is A ∞ (n) = A ∞ (n/2) + Θ(1) = Θ(log n)

61 Art of Multiprocessor Programming61 Matrix Multiplication Redux

62 Art of Multiprocessor Programming62 Matrix Multiplication Redux

63 Art of Multiprocessor Programming63 First Phase … 8 multiplications

64 Art of Multiprocessor Programming64 Second Phase … 4 additions

65 Art of Multiprocessor Programming65 Multiplication Work is M 1 (n) = 8 M 1 (n/2) + 4A 1 (n/2) 8 parallel multiplications Final addition

66 Art of Multiprocessor Programming66 Multiplication Work is M 1 (n) = 8 M 1 (n/2) + Θ(n 2 ) = Θ(n 3 ) Same as serial triple-nested loop

67 Art of Multiprocessor Programming67 Multiplication Critical path length is M ∞ (n) = M ∞ (n/2) + A ∞ (n/2) Half-size parallel multiplications Final addition

68 Art of Multiprocessor Programming68 Multiplication Critical path length is M ∞ (n) = M ∞ (n/2) + A ∞ (n/2) = M ∞ (n/2) + Θ(log n) = Θ(log 2 n)

69 Art of Multiprocessor Programming69 Parallelism M 1 (n)/ M ∞ (n) = Θ(n 3 /log 2 n) To multiply two 1000 x 1000 matrices –1000 3 /10 2 =10 7 Much more than number of processors on any real machine

70 Art of Multiprocessor Programming70 Work Distribution zzz…

71 Art of Multiprocessor Programming71 Work Dealing Yes!

72 Art of Multiprocessor Programming72 The Problem with Work Dealing D’oh!

73 Art of Multiprocessor Programming73 Work Stealing No work… Yes!

74 Art of Multiprocessor Programming74 Lock-Free Work Stealing Each thread has a pool of ready work Remove work without synchronizing If you run out of work, steal someone else’s Choose victim at random

75 Art of Multiprocessor Programming75 Local Work Pools Each work pool is a Double-Ended Queue

76 Art of Multiprocessor Programming76 Work DEQueue 1 pushBottom popBottom work 1. Double-Ended Queue

77 Art of Multiprocessor Programming77 Obtain Work Obtain work Run task until Blocks or terminates popBottom

78 Art of Multiprocessor Programming78 New Work Unblock node Spawn node pushBottom

79 Art of Multiprocessor Programming79 Whatcha Gonna do When the Well Runs Dry? @&%$!! empty

80 Art of Multiprocessor Programming80 Steal Work from Others Pick random thread’s DEQeueue

81 Art of Multiprocessor Programming81 Steal this Task! popTop

82 Art of Multiprocessor Programming82 Task DEQueue Methods –pushBottom –popBottom –popTop Never happen concurrently

83 Art of Multiprocessor Programming83 Task DEQueue Methods –pushBottom –popBottom –popTop Most common – make them fast (minimize use of CAS)

84 Art of Multiprocessor Programming84 This work is licensed under a Creative Commons Attribution- ShareAlike 2.5 License.Creative Commons Attribution- ShareAlike 2.5 License You are free: –to Share — to copy, distribute and transmit the work –to Remix — to adapt the work Under the following conditions: –Attribution. You must attribute the work to “The Art of Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work). –Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license. For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to –http://creativecommons.org/licenses/by-sa/3.0/. Any of the above conditions can be waived if you get permission from the copyright holder. Nothing in this license impairs or restricts the author's moral rights.


Download ppt "Futures, Scheduling, and Work Distribution Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit TexPoint fonts used."

Similar presentations


Ads by Google