Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Slides:



Advertisements
Similar presentations
Prepared 7/28/2011 by T. O’Neil for 3460:677, Fall 2011, The University of Akron.
Advertisements

CS 484. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.
11Sahalu JunaiduICS 573: High Performance Computing5.1 Analytical Modeling of Parallel Programs Sources of Overhead in Parallel Programs Performance Metrics.
CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.
CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.
1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Chapter 17 Parallel Processing.
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
 Parallel Computer Architecture Taylor Hearn, Fabrice Bokanya, Beenish Zafar, Mathew Simon, Tong Chen.
CS213 Parallel Processing Architecture Lecture 5: MIMD Program Design
Virtues of Good (Parallel) Software
Mapping Techniques for Load Balancing
Flynn’s Taxonomy of Computer Architectures Source: Wikipedia Michael Flynn 1966 CMPS 5433 – Parallel Processing.
Definition of a Parallel Computer
1 Parallel Computing 5 Parallel Application Design Ondřej Jakl Institute of Geonics, Academy of Sci. of the CR.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
Synchronization (Barriers) Parallel Processing (CS453)
LIGO-G Z 8 June 2001L.S.Finn/LDAS Camp1 How to think about parallel programming.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Flynn’s Taxonomy SISD: Although instruction execution may be pipelined, computers in this category can decode only a single instruction in unit time SIMD:
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Parallel Processing - introduction  Traditionally, the computer has been viewed as a sequential machine. This view of the computer has never been entirely.
CHAPTER 12 INTRODUCTION TO PARALLEL PROCESSING CS 147 Guy Wong page
Parallel Simulation of Continuous Systems: A Brief Introduction
High Performance Fortran (HPF) Source: Chapter 7 of "Designing and building parallel programs“ (Ian Foster, 1995)
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Parallel Computing.
CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.
Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Data Structures and Algorithms in Parallel Computing Lecture 1.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CS 584. Discrete Optimization Problems A discrete optimization problem can be expressed as (S, f) S is the set of all feasible solutions f is the cost.
Outline Why this subject? What is High Performance Computing?
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
CPS 258, Fall 2004 Introduction to Computational Science.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
A Pattern Language for Parallel Programming Beverly Sanders University of Florida.
Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.
Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.
Parallel Computing Presented by Justin Reschke
LECTURE #1 INTRODUCTON TO PARALLEL COMPUTING. 1.What is parallel computing? 2.Why we need parallel computing? 3.Why parallel computing is more difficult?
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
Parallel Molecular Dynamics A case study : Programming for performance Laxmikant Kale
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
CHAPTER SEVEN PARALLEL PROCESSING © Prepared By: Razif Razali.
Parallel Programming By J. H. Wang May 2, 2017.
CS 584 Lecture 3 How is the assignment going?.
Parallel Algorithm Design
Parallel Programming in C with MPI and OpenMP
EE 193: Parallel Computing
Multi-Processing in High Performance Computer Architecture:
CS 584.
COMP60621 Fundamentals of Parallel and Distributed Systems
CS 584 Lecture7 Assignment -- Due Now! Paper Review is due next week.
Parallel Programming in C with MPI and OpenMP
COMP60611 Fundamentals of Parallel and Distributed Systems
CS 584 Lecture 5 Assignment. Due NOW!!.
Presentation transcript:

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs by Ian Foster (

Parallel Machines

Flynn's Taxonomy First proposed by Michael J. Flynn in 1966: SISD: single instruction, single data MISD: multiple instruction, single data SIMD: single instruction, multiple data MIMD: multiple instruction, multiple data

A Parallel Programming Model: Tasks and Channels Task operations: send msg receive msg create task terminate In practice: 1.Message passing:MPI 2.Data parallelism 3.Shared memory

Parallel Algorithms Examples: Finite Difference

Parallel Algorithms Examples: Pairwise Interactions Molecular dynamics: total force fi acting on atom Xi

Parallel Algorithms Examples: Search

Parallel Algorithms Examples: Parameter Study

Parallel Program Design

Partitioning Domain Decomposition: Functional Decomposition:

Partitioning Design Checklist Does your partition define at least an order of magnitude more tasks than there are processors in your target computer? Does your partition avoid redundant computation and storage requirements? Are tasks of comparable size? Does the number of tasks scale with problem size? Ideally, an increase in problem size should increase the number of tasks rather than the size of individual tasks. Have you identified several alternative partitions? You can maximize flexibility in subsequent design stages by considering alternatives now. Remember to investigate both domain and functional decompositions.

Communication LocalGlobal Unstructured and Dynamic Asynchronous

Communication Design Checklist Do all tasks perform about the same number of communication operations? Does each task communicate only with a small number of neighbors? Are communication operations able to proceed concurrently? Is the computation associated with different tasks able to proceed concurrently?

Agglomeration

Increasing Granularity

Replicating Computation Sum all and store it on all nodes. array: 2(N-1) steps tree: 2 log N steps Ring instead of array in (N-1) steps?

Replicating Computation

Agglomeration Design Checklist Has agglomeration reduced communication costs by increasing locality? If agglomeration has replicated computation, have you verified that the benefits of this replication outweigh its costs, for a range of problem sizes and processor counts? If agglomeration replicates data, have you verified that this does not compromise the scalability of your algorithm by restricting the range of problem sizes or processor counts that it can address? Has agglomeration yielded tasks with similar computation and communication costs? Does the number of tasks still scale with problem size? If agglomeration eliminated opportunities for concurrent execution, have you verified that there is sufficient concurrency for current and future target computers? Can the number of tasks be reduced still further, without introducing load imbalances, increasing software engineering costs, or reducing scalability? If you are parallelizing an existing sequential program, have you considered the cost of the modifications required to the sequential code?

Mapping

Recursive Bisection

Task Scheduling

Mapping Design Checklist If considering an SPMD design for a complex problem, have you also considered an algorithm based on dynamic task creation and deletion? If considering a design based on dynamic task creation and deletion, have you also considered an SPMD algorithm? If using a centralized load-balancing scheme, have you verified that the manager will not become a bottleneck? If using a dynamic load-balancing scheme, have you evaluated the relative costs of different strategies? If using probabilistic or cyclic methods, do you have a large enough number of tasks to ensure reasonable load balance? Typically, at least ten times as many tasks as processors are required.

Case Study: Atmosphere Model

Approaches to Performance Evaluation Amdhal’s Law Developing models: –Execution time: Computation time Communication time Idle time –Efficiency and Speedup Scalability analysis: –With fixed problem size –With scaled problem size