Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1.

Similar presentations


Presentation on theme: "Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1."— Presentation transcript:

1 Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1

2 Parallel Patterns  Serial Patterns  Structured Programming  Universal  Algorithmic Skeletons, Techniques, Strategies  Including OOP  Features  Well-structured  Maintainable  Efficient  Deterministic  Composable 2

3 Nesting Pattern  Ability to hierarchically compose patterns  Patterns within Patterns  As in Structured Programming  Static: Sequence, Selection, Iteration,  Dynamic: Recursion  Any pattern can contain any other pattern 3

4 Data Parallelism vs. Functional Decomposition  Static Patterns  Functional Decomposition  Dynamic Pattern = Recursion  Data Parallelism  Nesting + Recursion  Parallel Slack  What about “excessive” recursion? 4

5 3.2 Serial Control Flow Patterns  Sequence  Selection (Decision)  Iteration (Loop, Repetition)  Loop-Carried Dependency  Map, Scan, Recurrence, Scatter Gather, Pack  Recursion What is an alias? 5

6 Can this loop be parallelized? Problems? void engine (int n, double x[ ], int a[ ], b[ ], c[ ], d[ ]) { for (int k = 0; k < n; ++ k) x[a[k]] = x[b[k]]* x[c[k]]+ x[d[k]] } 6

7 Can this loop be parallelized? Problems? void engine (int n, double x[ ], y[ ] int a[ ], b[ ], c[ ], d[ ]) { for (int k = 0; k < n; ++ k) y[a[k]] = x[b[k]]* x[c[k]]+ x[d[k]] } 7

8 3.3 Parallel Control Patterns  Fork-Join  Map  Stencil  Reduction  Scan  Recurrence 8 Nvidia GE Force 480

9 3.3.1 Fork - Join  Fork – instruction allows creation of new control flow  Join – instruction to synchronize control flows that have been created via the fork instruction; after Join, only one control flow continues  Variation: Spawn – for executing a function  Caller does not wait for return  Barrier – synchronizes multiple control flows but all may continue after Barrier 9

10 3.3.2 Map (Fig.3.6)  Map – technique replicates elemental function over each element of an index set  Elemental function is applied to elements of collections  Iteration (Loop) Replacement  Every iteration is independent  Computation – count, index, data item  Known number of iterations  Pure Elemental Function: No side effects 10

11 3.3.3 Stencil (Fig. 3.7)  Stencil – extension of Map allowing elemental function access to set of “neighbors”  Pattern of access eliminates memory/data conflicts  Special cases: out-of-bounds  Utilizes Tiling (see section 7.3)  Applications: image filtering, simulation (fluid flow), linear algebra 11

12 3.3.4 Reduction (Fig. 3.9)  Reduction - Combines elements of collection into single element (using associative combiner function)  O(log n)  Consider summation of an array  Calculate total number of additions 12

13 3.3.5 Scan (Fig. 3.10)  Scan – computes partial reductions of a collection  For each output position, reduction to that point is computed  AKA – Prefix Sums (example)  Total number of additions serial? Parallel?  How many processors? Implications?  O(log n)  Applications: Checkbook, integration, random numbers 13

14 3.3.6 Recurrence  Omit??? 14

15 3.4 Serial Data Management Patterns  How stored data is allocated, shared, read, written, copied  Random RW  Stack Allocation  Heap Allocation  Closure  Object 15

16 3.4.1 Random Read & Write  Memory Access via Addresses  Pointers  Alias – if “forbidden”- becomes programmers responsibility  Arrays  Safer due to contiguous storage  Can be aliased  Normal for Serial. Implications for Parallel? Locality? 16

17 3.4.2 Stack Allocation  Dynamic Allocation  Nested, as in function calls  Where is stack used by systems?  LIFO  Parallel: each thread has own stack  Preserves locality 17

18 3.4.3 Heap Allocation  Definition?  Where used by system?  Features  Dynamic, Complex, Slow  No Locality guarantee, Loss of Coherence  Fragmented memory  Limited Scalability 18

19 3.4.4 & 3.4.5 Closures & Objects  Omit 19

20 3.5 Parallel Data Management Patterns  Shared or Not Shared data  Modification patterns of data  Help improve performance 20

21 3.5.1 Pack - Unpack  Eliminate unused space in a collection (e.g. array)  How?  Assign 0 or 1 to locations  Use Scan (Parallel Prefix) to compute new address  Write to new array  EXAMPLE - Figure 3.12 (P. 98)  Unpack – return to original array  Applications?? 21

22 3.5.2 Pipeline  Sequence (series) of processing elements such that the output of 1 element is the input of the next element  Functional Decomposition – limited parallelism – number of stages is generally fixed  Useful  For serially dependent tasks  When nested with other patterns 22

23 3.5.3 Geometric Decomposition 3.5.4 Gather  Omit 23


Download ppt "Parallel Computing Chapter 3 - Patterns R. HALVERSON MIDWESTERN STATE UNIVERSITY 1."

Similar presentations


Ads by Google