Download presentation
Presentation is loading. Please wait.
Published byDaniella Hopkins Modified over 9 years ago
1
Design Issues
2
How to parallelize Task decomposition Data decomposition Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization
3
Jaruloj ChongstitvatanaParallel Programming: Parallelization 3 Task Decomposition
4
Task decomposition Identify tasks Decompose the serial code into parts which can be paralleled. These parts need to be completely independent. Dependency: interaction between tasks. Sequential consistency property The parallel code gives the same result, for the same input, as the serial code does. Test for parallelizable loop If running the loop in reverse order gives the same result as the original loop, it is possibly parallelizable. Jaruloj ChongstitvatanaParallel Programming: Parallelization 4
5
Design Consideration What are the tasks & how are tasks defined? Dependencies between tasks & how to satisfy the dependencies. How to assign tasks to threads/ processord. Jaruloj ChongstitvatanaParallel Programming: Parallelization 5
6
Task Definition Tasks mostly are related to activities. Used in GUI applications. Example: multimedia web application Play background music Display animation. Read input from user. Which part of code should be parallelized? Hotspot: the part that gets executed often. Jaruloj ChongstitvatanaParallel Programming: Parallelization 6
7
Criteria for decomposition More tasks then threads (or cores). Why? Granularity (Fine/coarse-grained decomposition) The amount of computation in tasks or the time between synchronizations. Tasks are big enough, comparing to overhead of handling tasks and threads. Overhead contains thread management, synchronization, etc. Jaruloj ChongstitvatanaParallel Programming: Parallelization 7
8
Fine-grained Coarse-grained Jaruloj ChongstitvatanaParallel Programming: Parallelization 8
9
Dependency between tasks Order dependency Execution order. Can be enforced by: Put dependent tasks in the same thread. Add synchronization. Data dependency Variables shared between tasks. Can be solved by: shared and private variables. locks and critical regions. Jaruloj ChongstitvatanaParallel Programming: Parallelization 9 A D C B AD C B sum=0; suma=0; for (i=0; i<m; i++) suma=suma+a[i]; sumb=0; for (j=0; j<n; j++) sumb=sumb+b[j]; sum=suma+sumb; sum=0; for (i=0; i<m; i++) sum=sum+a[i]; for (i=0; i<n; i++) sum=sum+b[i];
10
Task scheduling Static scheduling Simple. Work well if the amount of work can be estimated before execution. Dynamic scheduling Divide more tasks than processing elements. Assign a task to a processing element whenever it is free. Jaruloj ChongstitvatanaParallel Programming: Parallelization 10
11
Data Decomposition Jaruloj ChongstitvatanaParallel Programming: Parallelization 11
12
Data decomposition Divide data into chunks & each task works on a chunk. Considerations How to divide data Make sure each task have access to require data. Where each chuck goes? Jaruloj ChongstitvatanaParallel Programming: Parallelization 12
13
How to divide data Roughly equally Except when computation is not the same for all data. Shape of the chunks The number of neighbor chunks amount of data exchange Jaruloj ChongstitvatanaParallel Programming: Parallelization 13
14
Data access for each task Make local copy of data for each task. Data duplication Waste memory Synchronization for data consistency No synchronization if data are read only. Not worth if used only few times. No duplication needed in shared memory model. Jaruloj ChongstitvatanaParallel Programming: Parallelization 14
15
Assign chunks to threads/cores Static scheduling In distributed memory model, shared data need to be considered to reduce synchronization. Dynamic scheduling Do not know ahead of time. Jaruloj ChongstitvatanaParallel Programming: Parallelization 15
16
Example void computeNextGen (Grid curr, Grid next, int N, int M) { int count; for (int i = 1; i <= N; i++) { for (int j = 1; j <= M; j++) { count = 0; if (curr[i-1][j-1] == ALIVE) count++; if (curr[i-1][j] == ALIVE) count++; … if (curr[i+1][j+1] == ALIVE) count++; if (count = 4) next[i][j] = DEAD; else if (curr[i][j] == ALIVE && (count == 2 || count == 3)) next[i][j] = ALIVE; else if (curr[i][j] == DEAD && count == 3) next[i][j] = ALIVE; else next[i][j] = DEAD; } } return; } Jaruloj ChongstitvatanaParallel Programming: Parallelization 16
17
Dataflow decomposition Break up problems based on how data flows between tasks. Producer/consumer Jaruloj ChongstitvatanaParallel Programming: Parallelization 17
18
What not to parallelize Algorithms with state Example: Finite state machine simulation Recurrence relations Examples: convergence loop, calculating fibonacci Induction variables Variables incremented once in each iteration of loop Reduction Do something from a collection of data, e.g. sum. Loop-carried dependence Results of previous iteration used in current iteration Jaruloj ChongstitvatanaParallel Programming: Parallelization 18
19
Algorithms with state adding some form of synchronization serialize all concurrent executions writing the code to be reentrant (i.e., it can be reentered without detrimental side effects while it is already running). may not be possible if the update of global variables is part of the code. Use thread-local storage if the variable(s) holding the state does not have to be shared between threads. Jaruloj ChongstitvatanaParallel Programming: Parallelization 19
20
Recurrence Relations Jaruloj ChongstitvatanaParallel Programming: Parallelization 20
21
Induction Variables i1 = 4; i2 = 0; for (k = 1; k < N; k++) { B[i1++] = function1(k,q,r); i2 += k; A[i2] = function2(k,r,q); } i1 = 4; i2 = 0; for (k = 1; k < N; k++) { B[k+4] = function1(k,q,r); i2 = (k*k + k)/2; A[i2] = function2(k,r,q); } Jaruloj ChongstitvatanaParallel Programming: Parallelization 21
22
Reduction Combining a collection of data and reduce it to a single scalar value. To remove the dependency, the combining operation must be associative and commutative. sum = 0; big = c[0]; for (i = 0; i < N; i++) { sum += c[i]; big = (c[i] > big ? c[i] : big); } Jaruloj ChongstitvatanaParallel Programming: Parallelization 22
23
Loop-carried Dependence References to the same array on both LHS and RHS of assignments and a backward reference in some RHS use of array. General case of recurrence relations. Cannot be solved easily. for (k = 5; k < N; k++) { b[k] = DoSomething(k); a[k] = b[k-5] + MoreStuff(k); } Jaruloj ChongstitvatanaParallel Programming: Parallelization 23
24
Example: Loop-carried dependence wrap = a[0] * b[0]; for (i = 1; i < N; i++) { c[i] = wrap; wrap = a[i] * b[i]; d[i] = 2 * wrap; } for (i = 1; i < N; i++) { wrap = a[i-1] * b[i-1]; c[i] = wrap; wrap = a[i] * b[i]; d[i] = 2 * wrap; } Jaruloj ChongstitvatanaParallel Programming: Parallelization 24
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.