Parallel Programming in C with MPI and OpenMP

Slides:

Advertisements

Similar presentations

Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.

Advertisements

Practical techniques & Examples

1 Chapter 1 Why Parallel Computing? An Introduction to Parallel Programming Peter Pacheco.

2 Less fish … More fish! Parallelism means doing multiple things at the same time: you can get more work done in the same time.

Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

CISC October Goals for today: Foster’s parallel algorithm design –Partitioning –Task dependency graph Granularity Concurrency Collective communication.

Reference: Message Passing Fundamentals.

CS 584. A Parallel Programming Model We need abstractions to make it simple. The programming model needs to fit our parallel machine model. Abstractions.

1 Friday, September 29, 2006 If all you have is a hammer, then everything looks like a nail. -Anonymous.

Parallel MIMD Algorithm Design

An Introduction to Parallel Computing Dr. David Cronk Innovative Computing Lab University of Tennessee Distribution A: Approved for public release; distribution.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Multi-Tasking Models and Algorithms

Parallel Programming Models and Paradigms

Operating Systems CS208. What is Operating System? It is a program. It is the first piece of software to run after the system boots. It coordinates the.

Strategies for Implementing Dynamic Load Sharing.

Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

1 Parallel Computing 5 Parallel Application Design Ondřej Jakl Institute of Geonics, Academy of Sci. of the CR.

Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.

LIGO-G Z 8 June 2001L.S.Finn/LDAS Camp1 How to think about parallel programming.

1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.

1 Distributed Operating Systems and Process Scheduling Brett O’Neill CSE 8343 – Group A6.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.

Multiple Processor Systems. Multiprocessor Systems Continuous need for faster and powerful computers –shared memory model ( access nsec) –message passing.

Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 1 Introduction Read:

Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Sieve of Eratosthenes by Fola Olagbemi. Outline What is the sieve of Eratosthenes? Algorithm used Parallelizing the algorithm Data decomposition options.

Parallel Simulation of Continuous Systems: A Brief Introduction

Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.

CS 484 Designing Parallel Algorithms Designing a parallel algorithm is not easy. There is no recipe or magical ingredient Except creativity We can benefit.

Lecture 3 : Performance of Parallel Programs Courtesy : MIT Prof. Amarasinghe and Dr. Rabbah’s course note.

MPI and OpenMP.

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Day 2. Agenda Parallelism basics Parallel machines Parallelism again High Throughput Computing –Finding the right grain size.

CS 420 Design of Algorithms Parallel Algorithm Design.

Lecture 3: Designing Parallel Programs. Methodological Design Designing and Building Parallel Programs by Ian Foster www-unix.mcs.anl.gov/dbpp.

Parallel Computing Presented by Justin Reschke

Concurrency and Performance Based on slides by Henri Casanova.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.

Parallel programs Inf-2202 Concurrent and Data-intensive Programming Fall 2016 Lars Ailo Bongo

Auburn University

Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Overview Parallel Processing Pipelining

CS5102 High Performance Computer Systems Thread-Level Parallelism

Parallel Real-Time Systems

Operating Systems (CS 340 D)

Parallel Programming By J. H. Wang May 2, 2017.

CS 147 – Parallel Processing

The University of Adelaide, School of Computer Science

CS 584 Lecture 3 How is the assignment going?.

Parallel Algorithm Design

Operating Systems (CS 340 D)

Parallel Programming with MPI and OpenMP

חוברת שקפים להרצאות של ד"ר יאיר ויסמן מבוססת על אתר האינטרנט:

Threads, SMP, and Microkernels

Parallel Programming in C with MPI and OpenMP

CSCE569 Parallel Computing

Threads Chapter 4.

COMP60621 Fundamentals of Parallel and Distributed Systems

Database System Architectures

Chapter 01: Introduction

Parallel Programming in C with MPI and OpenMP

COMP60611 Fundamentals of Parallel and Distributed Systems

Presentation transcript:

Parallel Programming in C with MPI and OpenMP Michael J. Quinn

Parallel Algorithm Design Chapter 3 Parallel Algorithm Design

Outline Task/channel model Algorithm design methodology Case studies

Task/Channel Model Parallel computation = set of tasks Task Program Local memory Collection of I/O ports Tasks interact by sending messages through channels

Task/Channel Model Task Channel

Multiprocessors Definition unique to Quinn (see Pg 43) Multiple asynchronous CPU’s with a common shared memory. Usually called a shared memory multiprocessors or shared memory MIMDs An example is the symmetric multiprocessor (SMP) Also called a centralized multiprocessor Quinn feels his terminology is more logical.

Multicomputer Definition unique to Quinn (See pg 49) Multiple CPUs with local memory that are connected together. Connection can be by interconnection network, bus, ether net, etc. Usually called a Distributed memory multiprocessor or Distributed memory MIMD Quinn feels his terminology is more logical

Foster’s Design Methodology Partitioning Communication Agglomeration Mapping

Foster’s Methodology

Partitioning Dividing computation and data into pieces Domain decomposition Divide data into pieces Determine how to associate computations with the data Functional decomposition Divide computation into pieces Determine how to associate data with the computations

Example Domain Decompositions

Example Functional Decomposition

Partitioning Checklist At least 10x more primitive tasks than processors in target computer Minimize redundant computations and redundant data storage Primitive tasks roughly the same size Number of tasks an increasing function of problem size

Communication Determine values passed among tasks Local communication Task needs values from a small number of other tasks Create channels illustrating data flow Global communication Significant number of tasks contribute data to perform a computation Don’t create channels for them early in design

Communication Checklist Communication operations balanced among tasks Each task communicates with only small group of neighbors Tasks can perform communications concurrently Task can perform computations concurrently

Agglomeration Grouping tasks into larger tasks Goals Improve performance Maintain scalability of program Simplify programming In MPI programming, goal often to create one agglomerated task per processor

Agglomeration Can Improve Performance Eliminate communication between primitive tasks agglomerated into consolidated task Combine groups of sending and receiving tasks

Agglomeration Checklist Locality of parallel algorithm has increased Replicated computations take less time than communications they replace Data replication doesn’t affect scalability Agglomerated tasks have similar computational and communications costs Number of tasks increases with problem size Number of tasks suitable for likely target systems Tradeoff between agglomeration and code modifications costs is reasonable

Mapping Process of assigning tasks to processors Centralized multiprocessor: mapping done by operating system Distributed memory system: mapping done by user Conflicting goals of mapping Maximize processor utilization Minimize interprocessor communication

Mapping Example

Optimal Mapping Finding optimal mapping is NP-hard Must rely on heuristics

Mapping Decision Tree Static number of tasks Structured communication Constant computation time per task Agglomerate tasks to minimize communications Create one task per processor Variable computation time per task Cyclically map tasks to processors Unstructured communication Use a static load balancing algorithm Dynamic number of tasks

Mapping Decision Tree (cont.) Static number of tasks Dynamic number of tasks Frequent communications between tasks Use a dynamic load balancing algorithm Many short-lived tasks Use a run-time task-scheduling algorithm

Mapping Checklist Considered designs based on one task per processor and multiple tasks per processor Evaluated static and dynamic task allocation If dynamic task allocation chosen, the task allocator (i.e., manager) is not a bottleneck to performance If static task allocation chosen, ratio of tasks to processors is at least 10:1

Case Studies Boundary value problem Finding the maximum The n-body problem Adding data input