Download presentation
Presentation is loading. Please wait.
1
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn
2
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 3 Parallel Algorithm Design
3
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies Case studies
4
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Task/Channel Model Parallel computation = set of tasks Parallel computation = set of tasks Task Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels Tasks interact by sending messages through channels
5
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Task/Channel Model Task Channel
6
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Foster’s Design Methodology Partitioning Partitioning Communication Communication Agglomeration Agglomeration Mapping Mapping
7
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Foster’s Methodology
8
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Partitioning Dividing computation and data into pieces Dividing computation and data into pieces Domain decomposition Domain decomposition Divide data into pieces Determine how to associate computations with the data Functional decomposition Functional decomposition Divide computation into pieces Determine how to associate data with the computations
9
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example Domain Decompositions
10
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Example Functional Decomposition
11
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Partitioning Checklist At least 10x more primitive tasks than processors in target computer At least 10x more primitive tasks than processors in target computer Minimize redundant computations and redundant data storage Minimize redundant computations and redundant data storage Primitive tasks roughly the same size Primitive tasks roughly the same size Number of tasks an increasing function of problem size Number of tasks an increasing function of problem size
12
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication Determine values passed among tasks Determine values passed among tasks Local communication Local communication Task needs values from a small number of other tasks Create channels illustrating data flow Global communication Global communication Significant number of tasks contribute data to perform a computation Don’t create channels for them early in design
13
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication Checklist Communication operations balanced among tasks Communication operations balanced among tasks Each task communicates with only small group of neighbors Each task communicates with only small group of neighbors Tasks can perform communications concurrently Tasks can perform communications concurrently Task can perform computations concurrently Task can perform computations concurrently
14
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Agglomeration Grouping tasks into larger tasks Grouping tasks into larger tasks Goals Goals Improve performance Maintain scalability of program Simplify programming In MPI programming, goal often to create one agglomerated task per processor In MPI programming, goal often to create one agglomerated task per processor
15
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Agglomeration Can Improve Performance Eliminate communication between primitive tasks agglomerated into consolidated task Eliminate communication between primitive tasks agglomerated into consolidated task Combine groups of sending and receiving tasks Combine groups of sending and receiving tasks
16
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Agglomeration Checklist Locality of parallel algorithm has increased Locality of parallel algorithm has increased Replicated computations take less time than communications they replace Replicated computations take less time than communications they replace Data replication doesn’t affect scalability Data replication doesn’t affect scalability Agglomerated tasks have similar computational and communications costs Agglomerated tasks have similar computational and communications costs Number of tasks increases with problem size Number of tasks increases with problem size Number of tasks suitable for likely target systems Number of tasks suitable for likely target systems Tradeoff between agglomeration and code modifications costs is reasonable Tradeoff between agglomeration and code modifications costs is reasonable
17
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Mapping Process of assigning tasks to processors Process of assigning tasks to processors Centralized multiprocessor: mapping done by operating system Centralized multiprocessor: mapping done by operating system Distributed memory system: mapping done by user Distributed memory system: mapping done by user Conflicting goals of mapping Conflicting goals of mapping Maximize processor utilization Minimize interprocessor communication
18
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Mapping Example
19
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Optimal Mapping Finding optimal mapping is NP-hard Finding optimal mapping is NP-hard Must rely on heuristics Must rely on heuristics
20
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Mapping Decision Tree Static number of tasks Static number of tasks Structured communication Constant computation time per task Agglomerate tasks to minimize commAgglomerate tasks to minimize comm Create one task per processorCreate one task per processor Variable computation time per task Cyclically map tasks to processorsCyclically map tasks to processors Unstructured communication Use a static load balancing algorithmUse a static load balancing algorithm Dynamic number of tasks Dynamic number of tasks
21
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Mapping Strategy Static number of tasks Static number of tasks Dynamic number of tasks Dynamic number of tasks Frequent communications between tasks Use a dynamic load balancing algorithm Many short-lived tasks Use a run-time task-scheduling algorithm
22
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Mapping Checklist Considered designs based on one task per processor and multiple tasks per processor Considered designs based on one task per processor and multiple tasks per processor Evaluated static and dynamic task allocation Evaluated static and dynamic task allocation If dynamic task allocation chosen, task allocator is not a bottleneck to performance If dynamic task allocation chosen, task allocator is not a bottleneck to performance If static task allocation chosen, ratio of tasks to processors is at least 10:1 If static task allocation chosen, ratio of tasks to processors is at least 10:1
23
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Case Studies Boundary value problem Boundary value problem Finding the maximum Finding the maximum The n-body problem The n-body problem Adding data input Adding data input
24
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Boundary Value Problem Ice waterRodInsulation
25
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Rod Cools as Time Progresses
26
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finite Difference Approximation
27
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Partitioning One data item per grid point One data item per grid point Associate one primitive task with each grid point Associate one primitive task with each grid point Two-dimensional domain decomposition Two-dimensional domain decomposition
28
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication Identify communication pattern between primitive tasks Identify communication pattern between primitive tasks Each interior primitive task has three incoming and three outgoing channels Each interior primitive task has three incoming and three outgoing channels
29
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Agglomeration and Mapping Agglomeration
30
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Sequential execution time – time to update element – time to update element n – number of elements n – number of elements m – number of iterations m – number of iterations Sequential execution time: m (n-1) Sequential execution time: m (n-1)
31
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Execution Time p – number of processors p – number of processors – message latency – message latency Parallel execution time m( (n-1)/p +2 ) Parallel execution time m( (n-1)/p +2 )
32
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finding the Maximum Error 6.25%
33
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Reduction Given associative operator Given associative operator a 0 a 1 a 2 … a n-1 a 0 a 1 a 2 … a n-1 Examples Examples Add Multiply And, Or Maximum, Minimum
34
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Reduction Evolution
35
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Reduction Evolution
36
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Reduction Evolution
37
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Binomial Trees Subgraph of hypercube
38
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finding Global Sum 4207 -35-6-3 8123 -446
39
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finding Global Sum 17-64 4582
40
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finding Global Sum 8-2 910
41
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finding Global Sum 178
42
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Finding Global Sum 25 Binomial Tree
43
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Agglomeration
44
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Agglomeration sum
45
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. The n-body Problem
46
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. The n-body Problem
47
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Partitioning Domain partitioning Domain partitioning Assume one task per particle Assume one task per particle Task has particle’s position, velocity vector Task has particle’s position, velocity vector Iteration Iteration Get positions of all other particles Compute new position, velocity
48
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Gather
49
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. All-gather
50
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Complete Graph for All-gather
51
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Hypercube for All-gather
52
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Communication Time Hypercube Complete graph
53
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Adding Data Input
54
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Scatter
55
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Scatter in log p Steps 12345678 567812345612 7834
56
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary: Task/channel Model Parallel computation Parallel computation Set of tasks Interactions through channels Good designs Good designs Maximize local computations Minimize communications Scale up
57
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary: Design Steps Partition computation Partition computation Agglomerate tasks Agglomerate tasks Map tasks to processors Map tasks to processors Goals Goals Maximize processor utilization Minimize inter-processor communication
58
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Summary: Fundamental Algorithms Reduction Reduction Gather and scatter Gather and scatter All-gather All-gather
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.