5- Message-Passing Programming

Slides:

Advertisements

Similar presentations

1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;

Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed

MPI Collective Communications

Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.

MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.

SOME BASIC MPI ROUTINES With formal datatypes specified.

Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.

MPI Collective Communication CS 524 – High-Performance Computing.

MPI Point-to-Point Communication CS 524 – High-Performance Computing.

Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.

Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

Parallel Programming with Java

CS 179: GPU Programming Lecture 20: Cross-system communication.

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.

ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.

2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,

1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.

PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.

Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.

MPI Communications Point to Point Collective Communication Data Packaging.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.

1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.

Parallel Programming with MPI By, Santosh K Jena..

CSCI-455/522 Introduction to High Performance Computing Lecture 4.

MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.

Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.

MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.

-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.

Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN

MPI Derived Data Types and Collective Communication

COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.

Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.

3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:

Computer Science Department

Introduction to MPI Programming Ganesh C.N.

Introduction to parallel computing concepts and technics

CS4402 – Parallel Computing

Introduction to MPI Programming

MPI Point to Point Communication

Introduction to MPI.

MPI Message Passing Interface

Computer Science Department

Send and Receive.

Parallel Programming with MPI and OpenMP

An Introduction to Parallel Programming with MPI

Collective Communication Operations

MPI Groups, Communicators and Topologies

Send and Receive.

Distributed Systems CS

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Distributed Systems CS

Lecture 14: Inter-process Communication

A Message Passing Standard for MPP and Workstations

MPI: Message Passing Interface

Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,

Quiz Questions ITCS 4145/5145 Parallel Programming MPI

Introduction to parallelism and the Message Passing Interface

Barriers implementations

Message Passing Programming Based on MPI

Message-Passing Computing Message Passing Interface (MPI)

Computer Science Department

MPI Message Passing Interface

Presentation transcript:

5- Message-Passing Programming

Introduction to MPI For MPI-1, a static process model is used, which means that the number of processes is set when starting the MPI program. Typically, all MPI processes execute the same program in an SPMD style. For a coordinated I/O behavior, it is essential that only one specific process performs these operations.

Semantic Definitions blocking operation: return of control to the calling process indicates that all resources, such as buffers, specified in the call can be reused. All state transitions initiated by a blocking operation are completed before control return. non-blocking operation: the corresponding call may return before all effects of the operation are completed, and before the resources can be reused. synchronous communication: the communication operation does not complete before both processes have started their communication operation. asynchronous communication: the sender can execute its communication operation without any coordination with the receiving process.

MPI point-to-point communication send operation : (blocking, asynchronous operation) smessage specifies a send buffer containing the data elements to be sent. count is the number of elements to be sent from the send buffer; datatype is the data type of entries of the send buffer; dest specifies the rank of the target process which should receive the data; each process of a communicator has a unique rank; the ranks are numbered from 0 to the number of processes minus one; tag is used by the receiver to distinguish different messages from the same sender; comm specifies the communicator (the communication domain).

MPI point-to-point communication Receive operation: (blocking, asynchronous operation) rmessage specifies the receive buffer in which the message should be stored; count is the maximum number of elements that should be received; datatype is the data type of the elements to be received; source specifies the rank of the sending process which sends the message; tag is the message tag that the message to be received must have; comm is the communicator used for the communication; status specifies a data structure which contains information about a message after the completion of the receive operation. status.MPI_SOURCE status.MPI_TAG status.MPI_ERROR

Predefined data types for MPI

System Buffer If the message is first copied into an internal system buffer of the runtime system, the sender can continue its operations as soon as the copy operation into the system buffer is completed. Thus, the corresponding MPI_Recv() operation does not need to be started. This has the advantage that the sender is not blocked for a long period of time. The drawback is that the system buffer needs additional memory space and that the copying into the system buffer requires additional execution time.

MPI Example 1

Deadlocks with Point-to-point Communications

Deadlock free send & recieve Different Buffers Identical Buffers

Nonblocking Communication Modes MPI_Request : denotes an opaque object that can be used for the identification of the situation of a communication operation.

Test & Wait flag = 1 (true): the communication operation identified by request has been completed. Otherwise, flag = 0 (false) is returned. status is undefined if the specified receive operation has not yet been completed.

MPI Example 2 (ring data gathering)

MPI Example 2 (using blocking communications)

MPI Example 2 (nonblocking communications)

Communication mode Standard Mode: In this mode, the MPI runtime system decides whether outgoing messages are buffered in a local system buffer or not. Synchronous Mode: in synchronous mode, a send operation will be completed not before the corresponding receive operation has been started and the receiving process has started to receive the data sent. A send operation in synchronous mode is provided by the function MPI_Ssend(). Buffered Mode: in buffered mode, control will be returned to the sending process even if the corresponding receive operation has not yet been started. The send buffer can be reused immediately after control returns. A send operation in buffered mode is performed by calling MPI_Bsend(). In buffered mode, the buffer space to be used by the runtime system must be provided by the programmer.

Collective Communication Operations Broadcast operation : Data blocks sent by MPI_Bcast() cannot be received by an MPI_Recv() . The MPI runtime system guarantees that broadcast messages are received in the same order, sent by the root.

Collective Communication Operations 2. Reduction operation : The parameter recvbuf specifies the receive buffer which is provided by the root process root. all participating processes must specify the same values for the parameters count, type, op, and root. MPI supports the definition of user-defined reduction operations: The parameter function must define the following four parameters: void ∗in, void ∗out, int ∗len, MPI_Datatype ∗type The user-defined function must be associative. The parameter commute specifies whether it is also commutative (commute=1) or not (commute=0).

Predefined Reduction Operations MINLOC & MAXLOC operations work on pairs of values, consisting of a value and an index. Therefore the provided data type must represent such a pair of values.

MPI- Example (MAXLOC reduction)

MPI- Example (Scalar Product) MPI_Bcast(&m,1,MPI_int,0,MPI_COMM_WORLD);

Reduction- Example (Complex Product) typedef struct { double real,imag; } Complex; /* the user-defined function */ void myProd( Complex *in, Complex *inout, int *len, MPI_Datatype *dptr ) { int i; Complex c; for (i=0; i< *len; ++i) { c.real = inout->real*in->real - inout->imag*in->imag; c.imag = inout->real*in->imag + inout->imag*in->real; *inout = c; in++; inout++; } /* and, to call it... */ ... /* each process has an array of 100 Complexes */ Complex a[100], answer[100]; /* explain to MPI how type Complex is defined */ MPI_Type_contiguous( 2, MPI_DOUBLE, &ctype ); MPI_Type_commit( &ctype ); MPI_Op myOp; MPI_Datatype ctype; /* create the complex-product user-op */ MPI_Op_create( myProd, True, &myOp ); MPI_Reduce( a, answer, 100, ctype, myOp, root, comm );

Collective Communication Operations 3. Gather operation : Each process can provide a different number of elements to be collected, the variant is MPI_Gatherv(): an integer array recvcounts of length p where recvcounts[i] denotes the number of elements provided by process i; an additional parameter displs after recvcounts, is also an integer array of length p and displs[i] specifies at which position of the receive buffer of the root process the data block of process i is stored.

MPI- Example (Gather)

Collective Communication Operations 4. Scatter operation : there is a generalized version MPI_Scatterv() for which the root process can provide data blocks of different size. The integer array sendcounts is used where sendcounts[i] denotes the number of elements sent to process i; Parameter displs is also an integer array with p entries, displs[i] specifies from which position in the send buffer of the root process the data block for process i should be taken.

MPI- Example (Scatter)

Collective Communication Operations 5. Multi-broadcast:

Collective Communication Operations 6. Multi-accumulation: MPI program piece to compute a matrix-vector multiplication with a column-blockwise distribution of the matrix.

Collective Communication Operations 7. Total exchange:

Process Groups and Communicators MPI does not provide mechanisms to specify the initial allocation of processes to an MPI computation and their binding to physical processors. A process group (or group for short) is an ordered set of processes of an application program. Each process of a group gets an uniquely defined process number which is also called rank. collective communication operations can be restricted to process groups by using the corresponding communicators. In MPI The predefined communicator MPI_COMM_WORLD and its corresponding group are used as starting point. The pointer to a process group of a given communicator can be obtained by calling:

Operations on Process Groups

Operations on Process Groups res = MPI_IDENT : if both groups contain the same processes in the same order. res = MPI_SIMILAR: if both groups contain the same processes, but in different order. res = MPI_UNEQUAL: if the two groups contain different processes.

Operations on Communicators res = MPI_IDENT : if both communicators are same. res = MPI_SIMILAR: if the associated groups contain the same processes, but in different ranks. res = MPI_UNEQUAL: if the two groups contain different processes.

MPI_example (group creation) int me, count, count2; void *send_buf, *recv_buf, *send_buf2, *recv_buf2; MPI_Group MPI_GROUP_WORLD, grprem; MPI_Comm commslave; static int ranks[] = {0}; MPI_Comm_group(MPI_COMM_WORLD, &MPI_GROUP_WORLD); MPI_Comm_rank(MPI_COMM_WORLD, &me); /* local */ MPI_Group_excl(MPI_GROUP_WORLD, 1, ranks, &grprem); /* local */ MPI_Comm_create(MPI_COMM_WORLD, grprem, &commslave); If (me != 0) { /* compute on slave */ MPI_Reduce(send_buf,recv_buff,count, MPI_INT, MPI_SUM, 1, commslave); } /* zero falls through immediately to this reduce, others do later... */ MPI_Reduce(send_buf2, recv_buff2, count2, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); MPI_Comm_free(&commslave); MPI_Group_free(&MPI_GROUP_WORLD); MPI_Group_free(&grprem);