Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.

Slides:



Advertisements
Similar presentations
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Advertisements

Asynchronous I/O with MPI Anthony Danalis. Basic Non-Blocking API  MPI_Isend()  MPI_Irecv()  MPI_Wait()  MPI_Waitall()  MPI_Waitany()  MPI_Test()
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Point-to-Point Communication Self Test with solution.
SOME BASIC MPI ROUTINES With formal datatypes specified.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Chapter 6 Floyd’s Algorithm. 2 Chapter Objectives Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and.
Lesson2 Point-to-point semantics Embarrassingly Parallel Examples.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Lecture 8 Objectives Material from Chapter 9 More complete introduction of MPI functions Show how to implement manager-worker programs Parallel Algorithms.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
Today Objectives Chapter 6 of Quinn Creating 2-D arrays Thinking about “grain size” Introducing point-to-point communications Reading and printing 2-D.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
CS 179: GPU Programming Lecture 20: Cross-system communication.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
1 Review –6 Basic MPI Calls –Data Types –Wildcards –Using Status Probing Asynchronous Communication Collective Communications Advanced Topics –"V" operations.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Its.unc.edu 1 University of North Carolina - Chapel Hill ITS Research Computing Instructor: Mark Reed Point to Point Communication.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
L17: MPI, cont. October 25, Final Project Purpose: -A chance to dig in deeper into a parallel programming model and explore concepts. -Research.
An Introduction to MPI (message passing interface)
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI 2 Part II NPACI Parallel Computing Institute.
Message Passing Interface Using resources from
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
CS 4410 – Parallel Computing 1 Chap 9 CS 4410 – Parallel Computing Dr. Dave Gallagher Chap 9 Manager Worker.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
Introduction to parallel computing concepts and technics
MPI Point to Point Communication
Computer Science Department
Parallel Programming with MPI and OpenMP
Distributed Systems CS
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
May 19 Lecture Outline Introduce MPI functionality
CSCE569 Parallel Computing
Introduction to parallelism and the Message Passing Interface
Barriers implementations
Synchronizing Computations
September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Computer Science Department
September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Presentation transcript:

Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session

Example: Jacobi Iteration u ij (new) ← 0.25 * (u i-1,j + u i+1,j + u i,j-1 + u i,j+1 ) For all 1 ≤ i,j ≤ n, do until converged … (1D Decomp)

Jacobi: 1D Decomposition Assign responsibility for n/p rows of the grid to each process. Each process holds copies (“ghost points”) of one row of old data from each neighboring process. Potential for deadlock? –Yes, if order of send’s and recv’s is wrong –Maybe, with periodic boundary conditions and insufficient buffering, i.e., if recv has to be posted before send returns.

Jacobi: 1D Decomposition There is a potential for serialized communication under 2 nd scenario above, with Dirichlet boundary conditions: –When passing data north, only process 0 can finish send immediately, then process 1 can go, then process 2, etc. MPI_Sendrecv function exists to handle this “exchange of data” dance without all the potential buffering problems.

Jacobi: 1D vs. 2D Decomposition 2D decomposition: each process holds n/√p x n/√p subgrid. Per-process memory requirements: –1D case: each holds an n x n/p subgrid –2D case: each holds an n/√p x n/√p subgrid. –If n 2 /p is a constant, then in the 1D case the number of rows per process shrinks as n and p grow.

Jacobi: 1D vs. 2D Decomposition The ratio of computation to communication is key to scalable performance. 1D decomposition: Computation Communication = n 2 /p n √p√p 1 = √p√p n * 2D decomposition: Computation Communication = = n 2 /p n/√p n √p√p

MPI Non-Blocking Message Passing MPI_Isend initiates send, returning immediately with a request handle. MPI_Irecv posts a receive and returns immediately with a request handle. MPI_Wait blocks until a given message passing event, specified by handle, is complete. MPI_Test can be used to check a handle for completion without blocking.

MPI Non-Blocking Send int MPI_Isend (void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) MPI_ISEND(BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERR) BUF(*) INTEGER COUNT, DATATYPE, DEST, TAG, COMM, REQUEST, IERR MPI_ISEND(buf, count, datatype, dest, tag, comm, request) INbuf initial address of send buffer (choice) INcount number of entries to send (integer) INdatatype datatype of each entry (handle) INdest rank of destination (integer) INtag message tag (integer) INcomm communicator (handle) OUTrequest request handle (handle)

MPI Non-Blocking Recv int MPI_Irecv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) MPI_IRECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERR) BUF(*) INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERR MPI_IRECV(buf, count, datatype, source, tag, comm, request) OUTbuf initial address of receive buffer (choice) INcount max number of entries to receive (integer) INdatatype datatype of each entry (handle) INdest rank of source (integer) INtag message tag (integer) INcomm communicator (handle) OUTstatus request handle (handle)

Function MPI_Wait int MPI_Wait (MPI_Request *request, MPI_Status *status) MPI_WAIT(REQUEST, STATUS, IERR) INTEGER REQUEST, STATUS(MPI_STATUS_SIZE), IERR MPI_WAIT(request, status) INOUTrequest request handle (handle) OUTstatus status object (Status)

Jacobi with Asynchronous Communication With non-blocking sends/recvs, can avoid any deadlocks or slowdowns due to buffer management. With some code modification, can improve performance by overlapping communication and computation : Old Algorithm: exchange data do updates New Algorithm: initiate exchange update strictly interior grid points complete exchange update boundary points

Master/Worker Paradigm A common pattern for non-uniform, heterogenous sets of tasks. Get dynamic load balancing for free (at least that’s the goal) Master is a potential bottleneck. See example.