Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.

Design Issues

How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization

Jaruloj ChongstitvatanaParallel Programming: Parallelization 3 Task Decomposition

Task decomposition  Identify tasks  Decompose the serial code into parts which can be paralleled.  These parts need to be completely independent.  Dependency: interaction between tasks.  Sequential consistency property  The parallel code gives the same result, for the same input, as the serial code does.  Test for parallelizable loop  If running the loop in reverse order gives the same result as the original loop, it is possibly parallelizable. Jaruloj ChongstitvatanaParallel Programming: Parallelization 4

Design Consideration  What are the tasks & how are tasks defined?  Dependencies between tasks & how to satisfy the dependencies.  How to assign tasks to threads/ processord. Jaruloj ChongstitvatanaParallel Programming: Parallelization 5

Task Definition  Tasks mostly are related to activities.  Used in GUI applications.  Example: multimedia web application  Play background music  Display animation.  Read input from user.  Which part of code should be parallelized?  Hotspot: the part that gets executed often. Jaruloj ChongstitvatanaParallel Programming: Parallelization 6

Criteria for decomposition  More tasks then threads (or cores).  Why?  Granularity (Fine/coarse-grained decomposition)  The amount of computation in tasks or the time between synchronizations.  Tasks are big enough, comparing to overhead of handling tasks and threads.  Overhead contains thread management, synchronization, etc. Jaruloj ChongstitvatanaParallel Programming: Parallelization 7

Fine-grained Coarse-grained Jaruloj ChongstitvatanaParallel Programming: Parallelization 8

Dependency between tasks  Order dependency  Execution order.  Can be enforced by:  Put dependent tasks in the same thread.  Add synchronization.  Data dependency  Variables shared between tasks.  Can be solved by:  shared and private variables.  locks and critical regions. Jaruloj ChongstitvatanaParallel Programming: Parallelization 9 A D C B AD C B sum=0; suma=0; for (i=0; i<m; i++) suma=suma+a[i]; sumb=0; for (j=0; j<n; j++) sumb=sumb+b[j]; sum=suma+sumb; sum=0; for (i=0; i<m; i++) sum=sum+a[i]; for (i=0; i<n; i++) sum=sum+b[i];

Task scheduling  Static scheduling  Simple.  Work well if the amount of work can be estimated before execution.  Dynamic scheduling  Divide more tasks than processing elements.  Assign a task to a processing element whenever it is free. Jaruloj ChongstitvatanaParallel Programming: Parallelization 10

Data Decomposition Jaruloj ChongstitvatanaParallel Programming: Parallelization 11

Data decomposition  Divide data into chunks & each task works on a chunk.  Considerations  How to divide data  Make sure each task have access to require data.  Where each chuck goes? Jaruloj ChongstitvatanaParallel Programming: Parallelization 12

How to divide data  Roughly equally  Except when computation is not the same for all data.  Shape of the chunks  The number of neighbor chunks  amount of data exchange Jaruloj ChongstitvatanaParallel Programming: Parallelization 13

Data access for each task  Make local copy of data for each task.  Data duplication  Waste memory  Synchronization for data consistency  No synchronization if data are read only.  Not worth if used only few times.  No duplication needed in shared memory model. Jaruloj ChongstitvatanaParallel Programming: Parallelization 14

Assign chunks to threads/cores  Static scheduling  In distributed memory model, shared data need to be considered to reduce synchronization.  Dynamic scheduling  Do not know ahead of time. Jaruloj ChongstitvatanaParallel Programming: Parallelization 15

Example void computeNextGen (Grid curr, Grid next, int N, int M) { int count; for (int i = 1; i <= N; i++) { for (int j = 1; j <= M; j++) { count = 0; if (curr[i-1][j-1] == ALIVE) count++; if (curr[i-1][j] == ALIVE) count++; … if (curr[i+1][j+1] == ALIVE) count++; if (count = 4) next[i][j] = DEAD; else if (curr[i][j] == ALIVE && (count == 2 || count == 3)) next[i][j] = ALIVE; else if (curr[i][j] == DEAD && count == 3) next[i][j] = ALIVE; else next[i][j] = DEAD; } } return; } Jaruloj ChongstitvatanaParallel Programming: Parallelization 16

Dataflow decomposition  Break up problems based on how data flows between tasks.  Producer/consumer Jaruloj ChongstitvatanaParallel Programming: Parallelization 17

What not to parallelize  Algorithms with state  Example: Finite state machine simulation  Recurrence relations  Examples: convergence loop, calculating fibonacci  Induction variables  Variables incremented once in each iteration of loop  Reduction  Do something from a collection of data, e.g. sum.  Loop-carried dependence  Results of previous iteration used in current iteration Jaruloj ChongstitvatanaParallel Programming: Parallelization 18

Algorithms with state  adding some form of synchronization  serialize all concurrent executions  writing the code to be reentrant (i.e., it can be reentered without detrimental side effects while it is already running).  may not be possible if the update of global variables is part of the code.  Use thread-local storage if the variable(s) holding the state does not have to be shared between threads. Jaruloj ChongstitvatanaParallel Programming: Parallelization 19

Recurrence Relations Jaruloj ChongstitvatanaParallel Programming: Parallelization 20

Induction Variables i1 = 4; i2 = 0; for (k = 1; k < N; k++) { B[i1++] = function1(k,q,r); i2 += k; A[i2] = function2(k,r,q); } i1 = 4; i2 = 0; for (k = 1; k < N; k++) { B[k+4] = function1(k,q,r); i2 = (k*k + k)/2; A[i2] = function2(k,r,q); } Jaruloj ChongstitvatanaParallel Programming: Parallelization 21

Reduction  Combining a collection of data and reduce it to a single scalar value.  To remove the dependency, the combining operation must be associative and commutative. sum = 0; big = c[0]; for (i = 0; i < N; i++) { sum += c[i]; big = (c[i] > big ? c[i] : big); } Jaruloj ChongstitvatanaParallel Programming: Parallelization 22

Loop-carried Dependence  References to the same array on both LHS and RHS of assignments and a backward reference in some RHS use of array.  General case of recurrence relations.  Cannot be solved easily. for (k = 5; k < N; k++) { b[k] = DoSomething(k); a[k] = b[k-5] + MoreStuff(k); } Jaruloj ChongstitvatanaParallel Programming: Parallelization 23

Example: Loop-carried dependence wrap = a[0] * b[0]; for (i = 1; i < N; i++) { c[i] = wrap; wrap = a[i] * b[i]; d[i] = 2 * wrap; } for (i = 1; i < N; i++) { wrap = a[i-1] * b[i-1]; c[i] = wrap; wrap = a[i] * b[i]; d[i] = 2 * wrap; } Jaruloj ChongstitvatanaParallel Programming: Parallelization 24

Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.

Similar presentations

Presentation on theme: "Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization.

Similar presentations

Presentation on theme: "Design Issues. How to parallelize  Task decomposition  Data decomposition  Dataflow decomposition Jaruloj Chongstitvatana 2 Parallel Programming: Parallelization."— Presentation transcript:

Similar presentations

About project

Feedback