Auburn University

Slides:

Advertisements

Similar presentations

Parallel Programming Patterns Eun-Gyu Kim June 10, 2004.

Advertisements

Starting Parallel Algorithm Design David Monismith Based on notes from Introduction to Parallel Programming 2 nd Edition by Grama, Gupta, Karypis, and.

Parallel Algorithm Design

Lecture 7: Task Partitioning and Mapping to Processes Shantanu Dutt ECE Dept., UIC.

Parallel Processing (CS453). How does one decompose a task into various subtasks? While there is no single recipe that works for all problems, we present.

Principles of Parallel Algorithm Design Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text “Introduction to Parallel Computing”,

1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.

High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.

Principles of Parallel Algorithm Design

Parallel Processing (CS526) Spring 2012(Week 4).  Parallelism from two perspectives: ◦ Platform  Parallel Hardware Architecture  Parallel Communication.

Super computers Parallel Processing By Lecturer: Aisha Dawood.

Principles of Parallel Algorithm Design Prof. Dr. Cevdet Aykanat Bilkent Üniversitesi Bilgisayar Mühendisliği Bölümü.

INTRODUCTION TO PARALLEL ALGORITHMS. Objective  Introduction to Parallel Algorithms Tasks and Decomposition Processes and Mapping Processes Versus Processors.

Principles of Parallel Algorithm Design Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text “Introduction to Parallel Computing”,

CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Principles of Parallel Algorithm Design (reading Chp.

Paper_topic: Parallel Matrix Multiplication using Vertical Data.

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Decomposition and Parallel Tasks (cont.) Dr. Xiao Qin Auburn University

CPE 779 Parallel Computing - Spring Creating and Using Threads Based on Slides by Katherine Yelick

COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs in Parallel Machines Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Thread Basics Dr. Xiao Qin Auburn University

CHaRy Software Synthesis for Hard Real-Time Systems

These slides are based on the book:

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.

Auburn University

Advanced Computer Systems

Parallel Patterns.

OPERATING SYSTEMS CS 3502 Fall 2017

Auburn University

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Exploratory Decomposition Dr. Xiao Qin Auburn.

Lecture 1 – Parallel Programming Primer

Ioannis E. Venetis Department of Computer Engineering and Informatics

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Interconnection Networks (Part 2) Dr.

Parallel Tasks Decomposition

Applying Control Theory to Stream Processing Systems

Efficient Join Query Evaluation in a Parallel Database System

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Principles of Message-Passing Programming.

Parallel Programming By J. H. Wang May 2, 2017.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Pthread Mutex Dr. Xiao Qin Auburn University.

Parallel Programming Patterns

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Parallel Odd-Even Sort Algorithm Dr. Xiao.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Data Partition Dr. Xiao Qin Auburn University.

Parallel and Distributed Simulation Techniques

Team 1 Aakanksha Gupta, Solomon Walker, Guanghong Wang

Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Communication Costs (cont.) Dr. Xiao.

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Principles of Message-Passing Programming.

CPSC 531: System Modeling and Simulation

Principles of Parallel Algorithm Design

Introduction to Parallel Computing by Grama, Gupta, Karypis, Kumar

Principles of Parallel Algorithm Design

Threads Chapter 4.

COMP60621 Fundamentals of Parallel and Distributed Systems

Threaded Programming in Python

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Technology Mapping I based on tree covering

Principles of Parallel Algorithm Design

Principles of Parallel Algorithm Design

Design Yaodong Bi.

Principles of Parallel Algorithm Design

Where real stuff starts

COMP60611 Fundamentals of Parallel and Distributed Systems

Mattan Erez The University of Texas at Austin

From Use Cases to Implementation

Chapter 2 from ``Introduction to Parallel Computing'',

Presentation transcript:

Auburn University http://www.eng.auburn.edu/~xqin COMP7330/7336 Advanced Parallel and Distributed Computing Speculative Decomposition Characteristics of Tasks Dr. Xiao Qin Auburn University http://www.eng.auburn.edu/~xqin xqin@auburn.edu 50 minutes: 1-17 slides Slides are adopted from Drs. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Recap: Speculative Decomposition Discrete Event Simulation Data structure: a time-ordered event list. Events are extracted precisely in time order, processed, and if required, resulting events are inserted back into the event list. For example: get up, get ready, drive to work, work, eat lunch, work some more, drive back, eat dinner, and sleep. Each of these events may be processed independently; however, in driving to work, you might meet with an unfortunate accident and not get to work at all. Therefore, an optimistic scheduling of other events will have to be rolled back. Story: Disk simulator In some applications, dependencies between tasks are not known a-priori. Two approaches: conservative approaches, which identify independent tasks only when they are guaranteed to not have dependencies optimistic approaches, which schedule tasks even when they may potentially be erroneous. Problems: Conservative approaches may yield little concurrency Optimistic approaches may require roll-back mechanism in the case of an error. Speculative Decomposition: Example Discrete event simulation: The central data structure in a discrete event simulation is a time-ordered event list. Events are extracted precisely in time order, processed, and if required, resulting events are inserted back into the event list. Consider your day today as a discrete event system - you get up, get ready, drive to work, work, eat lunch, work some more, drive back, eat dinner, and sleep. Each of these events may be processed independently, however, in driving to work, you might meet with an unfortunate accident and not get to work at all. Therefore, an optimistic scheduling of other events will have to be rolled back.

Speculative Decomposition: Example The simulation of a network of nodes To simulate the behavior of this network for various inputs and node delay parameters (note that networks may become unstable for certain values of service rates, queue sizes, etc.). The simulation of a network of nodes (e.g.,, an assembly line or a computer network through which packets pass). The task is to simulate the behavior of this network for various inputs and node delay parameters (note that networks may become unstable for certain values of service rates, queue sizes, etc.).

Problems of Recursive Decompositions Q1: In quicksort, recursive decomposition results in _____ tasks? Q2: What limits concurrency? Dependencies Uneven task sizes How many tasks? O(n) Often, a mix of decomposition techniques is necessary for decomposing a problem. Consider the following examples: In quicksort, recursive decomposition alone limits concurrency (Why?). A mix of data and recursive decompositions is more desirable. In discrete event simulation, there might be concurrency in task processing. A mix of speculative decomposition and data decomposition may work well. Even for simple problems like finding a minimum of a list of numbers, a mix of data and recursive decomposition works well.

Hybrid Decompositions A mix of data and recursive decompositions is more desirable. Q3: How to use a hybrid decomposition? In discrete event simulation, there might be concurrency in task processing. A mix of speculative decomposition and data decomposition may work well. Even for simple problems like finding a minimum of a list of numbers, a mix of data and recursive decomposition works well.

Characteristics of Tasks Step 1: Decomposition Step 2: Map tasks to processes Task characteristics critically impact choice and performance of parallel algorithms. Q4: What are relevant task characteristics? Task generation. Task sizes. Size of data associated with tasks. Step 1: decomposition Step 2: Map tasks to processes

Static Task Generation Concurrent tasks are identified a-priori. Classic Examples: Matrix operations Graph algorithms Image processing applications Regularly structured problems Q5: What decomposition techniques can we apply? Data decomposition technique Speculative decomposition technique. Recursive decomposition technique. Exploratory decomposition technique. Static task generation: Concurrent tasks can be identified a-priori. Typical matrix operations, graph algorithms, image processing applications, and other regularly structured problems fall in this class. These can typically be decomposed using data or recursive decomposition techniques. Dynamic task generation: Tasks are generated as we perform computation. A classic example of this is in game playing - each 15 puzzle board is generated from the previous one. These applications are typically decomposed using exploratory or speculative decompositions.

Dynamic Task Generation Tasks are generated as we perform computation. Classic Example Game playing: each 15 puzzle board is generated from the previous one. Q6: What decomposition techniques can we apply? Data decomposition technique Speculative decomposition technique. Recursive decomposition technique. Exploratory decomposition technique. Dynamic task generation: Tasks are generated as we perform computation. A classic example of this is in game playing - each 15 puzzle board is generated from the previous one. These applications are typically decomposed using exploratory or speculative decompositions.

Task Sizes Task sizes may be Uniform (i.e., all tasks are the same size) or Non-uniform. Q7: Multiplying a dense matrix with a vector. Are these tasks uniform or non-uniform? Can task sizes be determined a-priori or not? Answer: Uniform, can be determined. Non-uniform task sizes may be such that they can be determined (or estimated) a-priori or not. Examples in this class include discrete optimization problems, in which it is difficult to estimate the effective size of a state space.

Size of Data Associated with Tasks Q8: Why data size is important? Data movement overhead. A small data size implies that an algorithm can easily communicate this task to other processes dynamically. Small input and large output Large input and small input The size of data associated with a task may be small or large when viewed in the context of the size of the task. A small context of a task implies that an algorithm can easily communicate this task to other processes dynamically (e.g., the 15 puzzle). A large context ties the task to a process, or alternately, an algorithm may attempt to reconstruct the context at another processes as opposed to communicating the context of the task (e.g., 0/1 integer programming).

Size of Input and Output Data Small/Large Input Data Small/Large Output Data Q9: The 15-puzzle problem: large/small input and output? Small input and small output. Small input and large output Large input and small input The size of data associated with a task may be small or large when viewed in the context of the size of the task. A small context of a task implies that an algorithm can easily communicate this task to other processes dynamically (e.g., the 15 puzzle). A large context ties the task to a process, or alternately, an algorithm may attempt to reconstruct the context at another processes as opposed to communicating the context of the task (e.g., 0/1 integer programming).

Size of Input and Output Data Small/Large Input Data Small/Large Output Data Q10: Computing the minimum of a sequence: large/small input and output? Large input and small output. Small input and large output Large input and small input The size of data associated with a task may be small or large when viewed in the context of the size of the task. A small context of a task implies that an algorithm can easily communicate this task to other processes dynamically (e.g., the 15 puzzle). A large context ties the task to a process, or alternately, an algorithm may attempt to reconstruct the context at another processes as opposed to communicating the context of the task (e.g., 0/1 integer programming).

Size of Input and Output Data Small/Large Input Data Small/Large Output Data Large input and large output. Q11: The quicksort problem: large/small input and output? Small input and large output Large input and small input The size of data associated with a task may be small or large when viewed in the context of the size of the task. A small context of a task implies that an algorithm can easily communicate this task to other processes dynamically (e.g., the 15 puzzle). A large context ties the task to a process, or alternately, an algorithm may attempt to reconstruct the context at another processes as opposed to communicating the context of the task (e.g., 0/1 integer programming).

Characteristics of Task Interactions Tasks may communicate with each other in various ways. The associated dichotomy is: Static interactions: The tasks and their interactions are known a-priori. Dynamic interactions: The timing or interacting tasks cannot be determined a-priori. Q12: Which is hard to program? Static interactions are relatively simpler to code into programs. Dynamic interactions are harder to code, especitally, as we shall see, using message passing APIs.

Characteristics of Task Interactions Static interactions vs. Dynamic interactions Q13: Which types of task interactions are there in these applications? The 15-puzzle problem: breadth-first search Static interactions are relatively simpler to code into programs. Dynamic interactions are harder to code, especitally, as we shall see, using message passing APIs. Dense-matrix-vector multiplication Static Dynamic

Characteristics of Task Interactions Regular interactions: There is a definite pattern (in the graph sense) to the interactions. These patterns can be exploited for efficient implementation. Irregular interactions: Interactions lack well-defined topologies. (e.g., sparse matrix-vector multiplication)

Characteristics of Task Interactions Regular or Irregular interactions? A simple example of a regular static interaction pattern is in image dithering. The underlying communication pattern is a structured (2-D mesh) one as shown here: regular

Characteristics of Task Interactions Regular or Irregular interactions? The multiplication of a sparse matrix with a vector is a good example of a static irregular interaction pattern. Here is an example of a sparse matrix and its associated interaction pattern. irregular

Characteristics of Task Interactions read-only vs. read-write Interactions may be read-only or read-write. Read-only interactions: tasks just read data items associated with other tasks. Read-write interactions: tasks read, as well as modify data items associated with other tasks. Read-write interactions are harder to code. Q14:Why? since they require additional synchronization primitives.

Database Query Processing read-only or read-write? Decomposing the given query into a number of tasks. Edges in this graph denote that the output of one task is needed to accomplish the next.

Characteristics of Task Interactions one-way vs. two-way Interactions may be one-way or two-way. A one-way interaction can be initiated and accomplished by one of the two interacting tasks. A two-way interaction requires participation from both tasks involved in an interaction. One way interactions are somewhat harder to code in message passing APIs.

Characteristics of Task Interactions one-way or two-way? The multiplication of a sparse matrix with a vector is a good example of a static irregular interaction pattern. Here is an example of a sparse matrix and its associated interaction pattern. Two-way.

Summary Speculative Decomposition Hybrid Decompositions Characteristics of Tasks Task generation. Task sizes. Size of data associated with tasks.