An Introduction to Parallel Programming with MPI February 17, 19, 24, 26 2004 2004 David Adams

Slides:



Advertisements
Similar presentations
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Advertisements

Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
High Performance Computing
Point-to-Point Communication Self Test with solution.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
Please visit our web site: Point-to-Point Communication.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Parallel Programming with Java
CS 179: GPU Programming Lecture 20: Cross-system communication.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
CS 240A Models of parallel programming: Distributed memory and MPI.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 MCSR Unix Camp.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
1 MPI Primer Lesson 10 2 What is MPI MPI is the standard for multi- computer and cluster message passing introduced by the Message-Passing Interface.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
MPI Send/Receive Blocked/Unblocked Tom Murphy Director of Contra Costa College High Performance Computing Center Message Passing Interface BWUPEP2011,
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Lecture 5 CSS314 Parallel Computing Book: “An Introduction to Parallel Programming” by Peter Pacheco
Message Passing Interface Using resources from
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Introduction to parallel computing concepts and technics
MPI Point to Point Communication
Introduction to MPI.
Computer Science Department
An Introduction to Parallel Programming with MPI
More on MPI Nonblocking point-to-point routines Deadlock
MPI-Message Passing Interface
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
Introduction to parallelism and the Message Passing Interface
More on MPI Nonblocking point-to-point routines Deadlock
Barriers implementations
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Computer Science Department
5- Message-Passing Programming
September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Presentation transcript:

An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams

Creating Accounts

Outline Disclaimers Disclaimers Overview of basic parallel programming on a cluster with the goals of MPI Overview of basic parallel programming on a cluster with the goals of MPI Batch system interaction Batch system interaction Startup procedures Startup procedures Quick review Quick review Blocking message passing Blocking message passing Non-blocking message passing Lab day Lab day Collective communications

Review Functions we have covered in detail: MPI_INITMPI_FINALIZE MPI_COMM_SIZE MPI_COMM_RANK MPI_SENDMPI_RECV Useful constants: MPI_COMM_WORLD MPI_ANY_SOURCE MPI_ANY_TAGMPI_SUCCESS

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 SEND RECV SEND RECV SEND …

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 1

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 2

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 3

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 4

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 5

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 6

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 7

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 8

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 9

Motivating Example for Deadlock P1 P10 P9 P8 P7P6 P5 P4 P3 P2 Timestep: 10!

Solution MPI_SENDRECV(sendbuf, sendcount, sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source, recvtag, comm, status, ierror) The semantics of a send-receive operation is what would be obtained if the caller forked two concurrent threads, one to execute the send, and one to execute the receive, followed by a join of these two threads. The semantics of a send-receive operation is what would be obtained if the caller forked two concurrent threads, one to execute the send, and one to execute the receive, followed by a join of these two threads.

Nonblocking Message Passing Allows for the overlap of communication and computation. Completion of a message is broken into four steps instead of two. post-send post-send complete-send complete-send post-receive post-receive complete-receive complete-receive

Posting Operations MPI_ISEND (BUF, COUNT, DATATYPE, DEST, TAG, COMM, REQUEST, IERROR) IN BUF(*) IN BUF(*) IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, IN INTEGER, COUNT, DATATYPE, DEST, TAG, COMM, OUT IERROR, REQUEST OUT IERROR, REQUEST MPI_IRECV (BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, REQUEST, IERROR) IN BUF(*) IN BUF(*) IN INTEGER, COUNT, DATATYPE, SOURCE, TAG, COMM, IN INTEGER, COUNT, DATATYPE, SOURCE, TAG, COMM, OUT IERROR, REQUEST OUT IERROR, REQUEST

Request Objects All nonblocking communications use request objects to identify communication operations and link the posting operation with the completion operation. Conceptually, they can be thought of as a pointer to a specific message instance floating around in MPI space. Just as in pointers, request handles must be treated with care or you can create request handle leaks (like a memory leak) and completely lose access to the status of a message.

Request Objects The value MPI_REQUEST_NULL is used to indicate an invalid request handle. Operations that deallocate request objects set the request handle to this value. Posting operations allocate memory for request objects and completion operations deallocate that memory and clean up the space.

Completion Operations MPI_WAIT(REQUEST, STATUS, IERROR) INOUT INTEGER REQUEST INOUT INTEGER REQUEST OUT STATUS, IERROR OUT STATUS, IERROR A call to MPI_WAIT returns when the operation identified by REQUEST is complete. MPI_WAIT is the blocking version of completion operations where the program has determined it can’t do any more useful work without completing the current message. In this case, it chooses to block until the corresponding send or receive completes. In iterative parallel code, it is often the case that an MPI_WAIT is placed directly before the next post operation that intends to use the same request object variable. Successful completion of the function MPI_WAIT will set REQUEST=MPI_REQUEST_NULL.

Completion Operations MPI_TEST(REQUEST, FLAG, STATUS, IERROR) INOUT INTEGER REQUEST INOUT INTEGER REQUEST OUT STATUS(MPI_STATUS_SIZE) OUT STATUS(MPI_STATUS_SIZE) OUT LOGICAL FLAG OUT LOGICAL FLAG A call to MPI_TEST returns flag=true if the operation identified by REQUEST is complete. MPI_TEST is the nonblocking version of completion operations. If flag=true then MPI_TEST will clean up the space associated with REQUEST, deallocating the memory and setting REQUEST = MPI_REQUEST_NULL. MPI_TEST allows the user to create code that can attempt to communicate as much as possible but continue doing useful work if messages are not ready.

Maximizing Overlap To achieve maximum overlap between computation and communication, communications should be started as soon as possible and completed as late as possible. Sends should be posted as soon as the data to be sent is available. Sends should be posted as soon as the data to be sent is available. Receives should be posted as soon as the receive buffer can be used. Receives should be posted as soon as the receive buffer can be used. Sends should be completed just before the send buffer is to be reused. Sends should be completed just before the send buffer is to be reused. Receives should be completed just before the data in the buffer is to be reused. Receives should be completed just before the data in the buffer is to be reused. Overlap can often be increased by reordering the computation.

Setting up your account for MPI exercise.html exercise.html List of all MCB 124 machines

More Stuff Put /home/grads/raghavgn/mpich-1.2.5/bin in your path. –open file.bash_profile and append :/home/grads/raghavgn/mpich /bin PATH=$PATH:$HOME/bin PATH=$PATH:$HOME/bin PATH=$PATH:$HOME/bin:/home/grads/raghavgn/mpich /bin PATH=$PATH:$HOME/bin:/home/grads/raghavgn/mpich /bin Make a subdirectory, mkdir MPI, and cd to it. cp -r /home/grads/daadams3/MPI.

Compilation and Execution Two folders, one for C and one for FORTRAN77 Hello world example Hello world example For C: Compile and link: mpicc -o hello hello.c Run on 4 processors: mpirun -np 4 –machinefile../mymachines hello For Fortran For Fortran Compile and link: mpif77 –o hello hello.f Run on 4 processors: mpirun -np 4 hello – machinefile../mymachines hello