CISC 372 5 October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Slides:



Advertisements
Similar presentations
Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
Advertisements

Practical techniques & Examples
CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Gossip Scheduling for Periodic Streams in Ad-hoc WSNs Ercan Ucan, Nathanael Thompson, Indranil Gupta Department of Computer Science University of Illinois.
Reference: Message Passing Fundamentals.
ECE669 L4: Parallel Applications February 10, 2004 ECE 669 Parallel Computer Architecture Lecture 4 Parallel Applications.
1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.
Parallel Simulation etc Roger Curry Presentation on Load Balancing.
Parallel MIMD Algorithm Design
Message Passing Fundamentals Self Test. 1.A shared memory computer has access to: a)the memory of other nodes via a proprietary high- speed communications.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming Models and Paradigms
CISC 879 : Software Support for Multicore Architectures John Cavazos Dept of Computer & Information Sciences University of Delaware
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Strategies for Implementing Dynamic Load Sharing.
Distributed Scheduling. What is Distributed Scheduling? Scheduling: –A resource allocation problem –Often very complex set of constraints –Tied directly.
Chapter 13 Finite Difference Methods: Outline Solving ordinary and partial differential equations Finite difference methods (FDM) vs Finite Element Methods.
ECE669 L23: Parallel Compilation April 29, 2004 ECE 669 Parallel Computer Architecture Lecture 23 Parallel Compilation.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.
Virtues of Good (Parallel) Software
1 Algorithms for Bandwidth Efficient Multicast Routing in Multi-channel Multi-radio Wireless Mesh Networks Hoang Lan Nguyen and Uyen Trang Nguyen Presenter:
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
1 Parallel Computing 5 Parallel Application Design Ondřej Jakl Institute of Geonics, Academy of Sci. of the CR.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
Network Aware Resource Allocation in Distributed Clouds.
Topology aggregation and Multi-constraint QoS routing Presented by Almas Ansari.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is load-balancing? –Dividing up the total work between processes.
Parallel Simulation of Continuous Systems: A Brief Introduction
Dynamic Load Balancing in Charm++ Abhinav S Bhatele Parallel Programming Lab, UIUC.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Finding concurrency Jakub Yaghob. Finding concurrency design space Starting point for design of a parallel solution Analysis The patterns will help identify.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Static Process Scheduling
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
CS 420 Design of Algorithms Parallel Algorithm Design.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
Parallel Computing Presented by Justin Reschke
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Application of Design Patterns to Geometric Decompositions V. Balaji, Thomas L. Clune, Robert W. Numrich and Brice T. Womack.
Auburn University
2D AFEAPI Overview Goals, Design Space Filling Curves Code Structure
High Altitude Low Opening?
Parallel Programming By J. H. Wang May 2, 2017.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Mapping Techniques Dr. Xiao Qin Auburn University.
CS 584 Lecture 3 How is the assignment going?.
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
Parallel Programming with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP
CS 584.
CSCE569 Parallel Computing
Parallel Programming in C with MPI and OpenMP
CS 584 Lecture 5 Assignment. Due NOW!!.
Presentation transcript:

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication

Task/Channel Model Task Channel

Parallel Algorithm Design Large problem –May be complex –May have large data set –May be “embarrassingly parallel” How “solve” problem? (Foster’s Methodology) –Partition problem & data –Determine communication requirements –Agglomerate tasks into more efficient size –Map tasks to processors

Foster’s Methodology

Partitioning Dividing computation and data into pieces Domain decomposition –Divide data into pieces –Determine how to associate computations with the data Functional decomposition –Divide computation into pieces –Determine how to associate data with the computations

Partitioning Checklist At least 10x more primitive tasks than processors in target computer Minimize redundant computations and redundant data storage Primitive tasks roughly the same size Number of tasks an increasing function of problem size

Foster’s Methodology

Communication Determine values passed among tasks Local communication –Task needs values from a small number of other tasks –Create channels illustrating data flow Global communication –Significant number of tasks contribute data to perform a computation –Don’t create channels for them early in design

Communication Checklist Communication operations balanced among tasks Each task communicates with only small group of neighbors Tasks can perform communications concurrently Task can perform computations concurrently

Foster’s Methodology

Agglomeration Grouping tasks into larger tasks Goals –Improve performance –Maintain scalability of program –Simplify programming In MPI programming, goal often to create one agglomerated task per processor

Agglomeration to Improve Performance Eliminate communication between primitive tasks agglomerated into consolidated task Combine groups of sending and receiving tasks

Agglomeration Checklist Locality of parallel algorithm has increased Replicated computations take less time than communications they replace Data replication doesn’t affect scalability Agglomerated tasks have similar computational and communications costs Number of tasks increases with problem size Number of tasks suitable for likely target systems Tradeoff between agglomeration and code modifications costs is reasonable

Foster’s Methodology

Mapping Process of assigning tasks to processors Centralized multiprocessor: mapping done by operating system Distributed memory system: mapping done by user Conflicting goals of mapping –Maximize processor utilization –Minimize interprocessor communication

Mapping Example

Optimal Mapping Finding optimal mapping is NP-hard Must rely on heuristics

Mapping Checklist Considered designs based on one task per processor and multiple tasks per processor Evaluated static and dynamic task allocation If dynamic task allocation chosen, task allocator is not a bottleneck to performance If static task allocation chosen, ratio of tasks to processors is at least 10:1

Complex Mesh Decomposition Mesh Cube With Hollow Sphere inside Cross-section of Cube Finite number of tetrahedra Each tetrahedron varies in size

Tetra-he-who? Tetrahedron: A solid having four triangular faces Maybe not this.This is a tetrahedron.

Static Task Number, Unstructured Comm, Mesh Cube Partitioned with METIS

Edge Detection Finite number of pixels All pixels same size All pixel values constrained (0-255) Stencil computation

Electromagnetic Fields Finite number of points Laplacian equation (Jacobi method - iterate until converge on solution) Convergence time varies at each point Rough surface sphere Radiation source at center Measure strength on surface

Parallel Tree Search Unbalanced tree Nodes may change from active to inactive at any time Searching for specific object in set of all active nodes

Online Sort Receive list in separate pieces at different times (constant update) Keep list sorted at all times