5- Message-Passing Programming

5- Message-Passing Programming

Introduction to MPI For MPI-1, a static process model is used, which means that the number of processes is set when starting the MPI program. Typically, all MPI processes execute the same program in an SPMD style. For a coordinated I/O behavior, it is essential that only one specific process performs these operations.

Semantic Definitions blocking operation: return of control to the calling process indicates that all resources, such as buffers, specified in the call can be reused. All state transitions initiated by a blocking operation are completed before control return. non-blocking operation: the corresponding call may return before all effects of the operation are completed, and before the resources can be reused. synchronous communication: the communication operation does not complete before both processes have started their communication operation. asynchronous communication: the sender can execute its communication operation without any coordination with the receiving process.

MPI point-to-point communication
send operation : (blocking, asynchronous operation) smessage specifies a send buffer containing the data elements to be sent. count is the number of elements to be sent from the send buffer; datatype is the data type of entries of the send buffer; dest specifies the rank of the target process which should receive the data; each process of a communicator has a unique rank; the ranks are numbered from 0 to the number of processes minus one; tag is used by the receiver to distinguish different messages from the same sender; comm specifies the communicator (the communication domain).

MPI point-to-point communication
Receive operation: (blocking, asynchronous operation) rmessage specifies the receive buffer in which the message should be stored; count is the maximum number of elements that should be received; datatype is the data type of the elements to be received; source specifies the rank of the sending process which sends the message; tag is the message tag that the message to be received must have; comm is the communicator used for the communication; status specifies a data structure which contains information about a message after the completion of the receive operation. status.MPI_SOURCE status.MPI_TAG status.MPI_ERROR

Predefined data types for MPI

System Buffer If the message is first copied into an internal system buffer of the runtime system, the sender can continue its operations as soon as the copy operation into the system buffer is completed. Thus, the corresponding MPI_Recv() operation does not need to be started. This has the advantage that the sender is not blocked for a long period of time. The drawback is that the system buffer needs additional memory space and that the copying into the system buffer requires additional execution time.

MPI Example 1

Deadlocks with Point-to-point Communications

Deadlock free send & recieve
Different Buffers Identical Buffers

Nonblocking Communication Modes
MPI_Request : denotes an opaque object that can be used for the identification of the situation of a communication operation.

Test & Wait flag = 1 (true): the communication operation identified by request has been completed. Otherwise, flag = 0 (false) is returned. status is undefined if the specified receive operation has not yet been completed.

MPI Example 2 (ring data gathering)

MPI Example 2 (using blocking communications)

MPI Example 2 (nonblocking communications)

Communication mode Standard Mode: In this mode, the MPI runtime system decides whether outgoing messages are buffered in a local system buffer or not. Synchronous Mode: in synchronous mode, a send operation will be completed not before the corresponding receive operation has been started and the receiving process has started to receive the data sent. A send operation in synchronous mode is provided by the function MPI_Ssend(). Buffered Mode: in buffered mode, control will be returned to the sending process even if the corresponding receive operation has not yet been started. The send buffer can be reused immediately after control returns. A send operation in buffered mode is performed by calling MPI_Bsend(). In buffered mode, the buffer space to be used by the runtime system must be provided by the programmer.

Collective Communication Operations
Broadcast operation : Data blocks sent by MPI_Bcast() cannot be received by an MPI_Recv() . The MPI runtime system guarantees that broadcast messages are received in the same order, sent by the root.

2. Reduction operation : The parameter recvbuf specifies the receive buffer which is provided by the root process root. all participating processes must specify the same values for the parameters count, type, op, and root. MPI supports the definition of user-defined reduction operations: The parameter function must define the following four parameters: void ∗in, void ∗out, int ∗len, MPI_Datatype ∗type The user-defined function must be associative. The parameter commute specifies whether it is also commutative (commute=1) or not (commute=0).

Predefined Reduction Operations
MINLOC & MAXLOC operations work on pairs of values, consisting of a value and an index. Therefore the provided data type must represent such a pair of values.

MPI- Example (MAXLOC reduction)

MPI- Example (Scalar Product)
MPI_Bcast(&m,1,MPI_int,0,MPI_COMM_WORLD);

Reduction- Example (Complex Product)
typedef struct { double real,imag; } Complex; /* the user-defined function */ void myProd( Complex *in, Complex *inout, int *len, MPI_Datatype *dptr ) { int i; Complex c; for (i=0; i< *len; ++i) { c.real = inout->real*in->real - inout->imag*in->imag; c.imag = inout->real*in->imag + inout->imag*in->real; *inout = c; in++; inout++; } /* and, to call it... */ ... /* each process has an array of 100 Complexes */ Complex a[100], answer[100]; /* explain to MPI how type Complex is defined */ MPI_Type_contiguous( 2, MPI_DOUBLE, &ctype ); MPI_Type_commit( &ctype ); MPI_Op myOp; MPI_Datatype ctype; /* create the complex-product user-op */ MPI_Op_create( myProd, True, &myOp ); MPI_Reduce( a, answer, 100, ctype, myOp, root, comm );

3. Gather operation : Each process can provide a different number of elements to be collected, the variant is MPI_Gatherv(): an integer array recvcounts of length p where recvcounts[i] denotes the number of elements provided by process i; an additional parameter displs after recvcounts, is also an integer array of length p and displs[i] specifies at which position of the receive buffer of the root process the data block of process i is stored.

MPI- Example (Gather)

4. Scatter operation : there is a generalized version MPI_Scatterv() for which the root process can provide data blocks of different size. The integer array sendcounts is used where sendcounts[i] denotes the number of elements sent to process i; Parameter displs is also an integer array with p entries, displs[i] specifies from which position in the send buffer of the root process the data block for process i should be taken.

MPI- Example (Scatter)

5. Multi-broadcast:

6. Multi-accumulation: MPI program piece to compute a matrix-vector multiplication with a column-blockwise distribution of the matrix.

7. Total exchange:

Process Groups and Communicators
MPI does not provide mechanisms to specify the initial allocation of processes to an MPI computation and their binding to physical processors. A process group (or group for short) is an ordered set of processes of an application program. Each process of a group gets an uniquely defined process number which is also called rank. collective communication operations can be restricted to process groups by using the corresponding communicators. In MPI The predefined communicator MPI_COMM_WORLD and its corresponding group are used as starting point. The pointer to a process group of a given communicator can be obtained by calling:

Operations on Process Groups

Operations on Process Groups
res = MPI_IDENT : if both groups contain the same processes in the same order. res = MPI_SIMILAR: if both groups contain the same processes, but in different order. res = MPI_UNEQUAL: if the two groups contain different processes.

Operations on Communicators
res = MPI_IDENT : if both communicators are same. res = MPI_SIMILAR: if the associated groups contain the same processes, but in different ranks. res = MPI_UNEQUAL: if the two groups contain different processes.

MPI_example (group creation)
int me, count, count2; void *send_buf, *recv_buf, *send_buf2, *recv_buf2; MPI_Group MPI_GROUP_WORLD, grprem; MPI_Comm commslave; static int ranks[] = {0}; MPI_Comm_group(MPI_COMM_WORLD, &MPI_GROUP_WORLD); MPI_Comm_rank(MPI_COMM_WORLD, &me); /* local */ MPI_Group_excl(MPI_GROUP_WORLD, 1, ranks, &grprem); /* local */ MPI_Comm_create(MPI_COMM_WORLD, grprem, &commslave); If (me != 0) { /* compute on slave */ MPI_Reduce(send_buf,recv_buff,count, MPI_INT, MPI_SUM, 1, commslave); } /* zero falls through immediately to this reduce, others do later... */ MPI_Reduce(send_buf2, recv_buff2, count2, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); MPI_Comm_free(&commslave); MPI_Group_free(&MPI_GROUP_WORLD); MPI_Group_free(&grprem);

5- Message-Passing Programming

Similar presentations

Presentation on theme: "5- Message-Passing Programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

5- Message-Passing Programming

Similar presentations

Presentation on theme: "5- Message-Passing Programming"— Presentation transcript:

Similar presentations

About project

Feedback