Download presentation
Presentation is loading. Please wait.
Published byLogan Barton Modified over 9 years ago
1
INTRODUCTION TO PARALLEL ALGORITHMS
2
Objective Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors Characteristics of Tasks and Interactions Parallel Algorithm Design Models
3
What is a Parallel Algorithm? Imagine you needed to find a lost child in the woods. Even in a small area, searching by yourself would be very time consuming Now if you gathered some friends and family to help you, you could cover the woods in much faster manner…
4
Sherwood Forest
5
Definition In computer science, a parallel algorithm or concurrent algorithm, as opposed to a traditional sequential (or serial) algorithm, is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.computer sciencesequential (or serial) algorithm
6
Elements of a Parallel Algorithm Pieces of work that can be done concurrently- tasks Mapping of the tasks onto multiple processors- processes vs processors Distribution of input/output & intermediate data across the different processors Management the access of shared data either input or intermediate Synchronization of the processors at various points of the parallel execution
7
Decomposition, Tasks, and Dependency Graphs The first step in developing a parallel algorithm is to decompose the problem into tasks that can be executed concurrently A given problem may be docomposed into tasks in many different ways. Tasks may be of same, different, or even indetermined sizes. A decomposition can be illustrated in the form of a directed graph with nodes corresponding to tasks and edges indicating that the result of one task is required for processing the next. Such a graph is called a task dependency graph.
8
Granularity of Task Decompositions The number of tasks into which a problem is decomposed determines its granularity. Decomposition into a large number of tasks results in fine-grained decomposition and that into a small number of tasks results in a coarse grained decomposition. A coarse grained counterpart to the dense matrix-vector product example. Each task in this example corresponds to the computation of three elements of the result vector.
9
Example: Multiplying a Dense Matrix with a Vector Computation of each element of output vector y is independent of other elements. Based on this, a dense matrix-vector product can be decomposed into n tasks. The figure highlights the portion of the matrix and vector accessed by Task 1.
10
Example: Database Query Processing Consider the execution of the query: MODEL = ``CIVIC'' AND YEAR = 2001 AND (COLOR = ``GREEN'' OR COLOR = ``WHITE) on the following database: ID#ModelYearColorDealerPrice 4523Civic2002BlueMN$18,000 3476Corolla1999WhiteIL$15,000 7623Camry2001GreenNY$21,000 9834Prius2001GreenCA$18,000 6734Civic2001WhiteOR$17,000 5342Altima2001GreenFL$19,000 3845Maxima2001BlueNY$22,000 8354Accord2000GreenVT$18,000 4395Civic2001RedCA$17,000 7352Civic2002RedWA$18,000
11
Example: Database Query Processing The execution of the query can be divided into subtasks in various ways. Each task can be thought of as generating an intermediate table of entries that satisfy a particular clause. Decomposing the given query into a number of tasks. Edges in this graph denote that the output of one task is needed to accomplish the next.
12
Task-Dependency Graph Key Concepts Derived from the Task Dependency Graph
Degree of Concurrency The number of tasks that can be concurrently executed
Critical Path The longest vertex-weighted path in the graph
The weights represent task size
Task granularity affects both of the above characteristics
13
Critical Path Length A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other. The longest such path determines the shortest time in which the program can be executed in parallel. The length of the longest path in a task dependency graph is called the critical path length.
14
Critical Path Length Consider the task dependency graphs of the two database query decompositions:
15
Task-Interaction Graph Captures the pattern of interaction between tasks
This graph usually contains the task-dependency graph as a subgraph i.e., there may be interactions between tasks even if there are no dependencies
these interactions usually occur due to accesses on shared data
16
Attributes of parallel algorithms concurrency scalability locality and modularity
17
Contd… Concurrency refers to the ability to perform many actions simultaneously; this is essential if a program is to execute on many processors. Scalability indicates resilience to increasing processor counts and is equally important, as processor counts appear likely to grow in most environments. Locality means a high ratio of local memory accesses to remote memory accesses (communication); this is the key to high performance on multicomputer architectures. Modularity ---the decomposition of complex entities into simpler components---is an essential aspect of software engineering, in parallel computing as well as sequential computing.
18
Design Process of Parallel Algorithms Partitioning. The computation that is to be performed and the data operated on by this computation are decomposed into small tasks. Practical issues such as the number of processors in the target computer are ignored, and attention is focused on recognizing opportunities for parallel execution. Communication. The communication required to coordinate task execution is determined, and appropriate communication structures and algorithms are defined. Agglomeration. The task and communication structures defined in the first two stages of a design are evaluated with respect to performance requirements and implementation costs. If necessary, tasks are combined into larger tasks to improve performance or to reduce development costs. Mapping. Each task is assigned to a processor in a manner that attempts to satisfy the competing goals of maximizing processor utilization and minimizing communication costs. Mapping can be specified statically or determined at runtime by load-balancing algorithms.
19
Contd…
20
Communication Static Each processor is hard-wired to every other processor Completely Connected Star Connected Bounded Degree Dynamic Processors are connected to a series of switches
21
Agglomeration
22
Processes and Mapping In general, the number of tasks in a decomposition exceeds the number of processing elements available. For this reason, a parallel algorithm must also provide a mapping of tasks to processes. Note: We refer to the mapping as being from tasks to processes, as opposed to processors. This is because typical programming do not allow easy binding of tasks to physical processors. Rather, we aggregate tasks into processes and rely on the system to map these processes to physical processors. We use processes, not in the UNIX sense of a process, rather, simply as a collection of tasks and associated data.
23
Processes and Mapping (Cont..) Appropriate mapping of tasks to processes is critical to the parallel performance of an algorithm. Mappings are determined by both the task dependency and task interaction graphs. Task dependency graphs can be used to ensure that work is equally spread across all processes at any point (minimum idling and optimal load balance). Task interaction graphs can be used to make sure that processes need minimum interaction with other processes (minimum communication).
24
Processes and Mapping (Cont..) An appropriate mapping must minimize parallel execution time by: Mapping independent tasks to different processes. Assigning tasks on critical path to processes as soon as they become available. Minimizing interaction between processes by mapping tasks with dense interactions to the same process.
25
Processes and Mapping: Example Mapping tasks in the database query decomposition to processes. These mappings were arrived at by viewing the dependency graph in terms of levels (no two nodes in a level have dependencies). Tasks within a single level are then assigned to different processes.
26
Parallel Algorithm Models Master-Slave Model: One or more processes generate work and allocate it to worker processes. This allocation may be static or dynamic. Pipeline / Producer-Comsumer Model: A stream of data is passed through a succession of processes, each of which perform some task on it. Hybrid Models: A hybrid model may be composed either of multiple models applied hierarchically or multiple models applied sequentially to different phases of a parallel algorithm.
27
Refrences Principles of Parallel Algorithm Design by Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar http://www.cs.cmu.edu/~scandal/nes l/algorithms.html http://www.cs.cmu.edu/~scandal/nes l/algorithms.html http://www- users.cs.umn.edu/~karypis/parbook/ http://www- users.cs.umn.edu/~karypis/parbook/
28
Summary Parallel Algorithm: It is an algorithm which can be executed a piece at a time on many different processing devices, and then put back together again at the end to get the correct result.algorithm Decompose the problem into tasks that can be executed concurrently The number of tasks into which a problem is decomposed determines its granularity. Task-Dependency Graph Based on this graph,mapping is done between processes and processors Task-Interaction Graph Captures the pattern of interaction between tasks Critical Path Length A directed path in the task dependency graph represents a sequence of tasks that must be processed one after the other. The length of the longest path in a task dependency graph is called the critical path length.
29
Summary(Cont….) Attributes of parallel algorithms concurrency scalability locality and modularity Design Process of Parallel Algorithms Partitioning Communication Agglomeration Mapping
30
THANK YOU
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.