Message-Passing Computing Message Passing Interface (MPI)

Slides:



Advertisements
Similar presentations
CS 240A: Models of parallel programming: Distributed memory and MPI.
Advertisements

SOME BASIC MPI ROUTINES With formal datatypes specified.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
1 July 29, 2005 Distributed Computing 1:00 pm - 2:00 pm Introduction to MPI Barry Wilkinson Department of Computer Science UNC-Charlotte Consortium for.
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
MPI (Message Passing Interface) Basics
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
12.1 Parallel Programming Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Message Passing Computing
Introduction to MPI Programming Ganesh C.N.
Chapter 4.
Introduction to parallel computing concepts and technics
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
Send and Receive.
CS 584.
An Introduction to Parallel Programming with MPI
Send and Receive.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Message Passing Models
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
MPI: Message Passing Interface
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Message-Passing Computing
Introduction to parallelism and the Message Passing Interface
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Introduction to Parallel Computing with MPI
Barriers implementations
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Synchronizing Computations
Hello, world in MPI #include <stdio.h> #include "mpi.h"
5- Message-Passing Programming
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Parallel Processing - MPI
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Message-Passing Computing Message Passing Interface (MPI) CSE 661 – Parallel and Vector Architectures Slide 1

Message-Passing Programming using User-level Message-Passing Libraries Two primary mechanisms needed: 1. A method of creating separate processes for execution on different computers 2. A method of sending and receiving messages CSE 661 – Parallel and Vector Architectures Slide 2

Single Program Multiple Data (SPMD) model . Different processes merged into one program. Control statements select different parts for each processor to execute. All executables started together - static process creation Source fi le Basic MPI way Compile to suit processor Executables Processor 0 Processor p - 1 CSE 661 – Parallel and Vector Architectures Slide 3

Multiple Program Multiple Data (MPMD) Model Separate programs for each processor. One processor executes master process. Other processes started from within master process - dynamic process creation. Process 1 Start execution spawn(); of process 2 Process 2 Time CSE 661 – Parallel and Vector Architectures Slide 4

Basic “point-to-point” Send and Receive Routines Passing a message between processes using send() and recv() library calls: Process 1 Process 2 x y Movement send(&x, 2); of data recv(&y, 1); Generic syntax (actual formats later) CSE 661 – Parallel and Vector Architectures Slide 5

Synchronous Message Passing Routines that actually return when message transfer completed Synchronous send routine Waits until complete message can be accepted by the receiving process before sending the message Synchronous receive routine Waits until the message it is expecting arrives Synchronous routines intrinsically perform two actions: They transfer data and they synchronize processes CSE 661 – Parallel and Vector Architectures Slide 6

Synchronous send() and recv() using 3-way protocol Process 1 Process 2 Time send( ); Request to send Suspend process Both continue Acknowledgment recv( ); Message (a) When send() occurs before recv() Process 1 Process 2 Acknowledgment Time recv( ); Suspend process Request to send send( ); Both processes continue Message (b) When recv() occurs before send() CSE 661 – Parallel and Vector Architectures Slide 7

Asynchronous Message Passing Routines that do not wait for actions to complete before returning. Usually require local storage for messages. More than one version depending upon the actual semantics for returning. In general, they do not synchronize processes but allow processes to move forward sooner. Must be used with care. CSE 661 – Parallel and Vector Architectures Slide 8

MPI Definitions of Blocking and Non-Blocking Blocking - return after their local actions complete, though the message transfer may not have been completed. Non-blocking - return immediately. Assumes that data storage used for transfer not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this. These terms may have different interpretations in other systems. CSE 661 – Parallel and Vector Architectures Slide 9

How message-passing routines return before message transfer completed Message buffer needed between source and destination to hold message: Process 1 Process 2 Message buffer Time Continue process send(); recv(); Read buffer CSE 661 – Parallel and Vector Architectures Slide 10

Asynchronous routines changing to synchronous routines Once local actions completed and message is safely on its way, sending process can continue with subsequent work. Buffers only of finite length and a point could be reached when send routine held up because all available buffer space exhausted. Then, send routine will wait until storage becomes re-available - i.e then routine behaves as a synchronous routine. CSE 661 – Parallel and Vector Architectures Slide 11

CSE 661 – Parallel and Vector Architectures Slide 12 Message Tag Used to differentiate between different types of messages being sent. Message tag is carried within message. If special type matching is not required, a wild card message tag is used with recv(), so that it will match with any send(). CSE 661 – Parallel and Vector Architectures Slide 12

CSE 661 – Parallel and Vector Architectures Slide 13 Message Tag Example To send a message, x, with message tag 5 from a source process, 1, to a destination process, 2, and assign to y: Process 1 Process 2 x y Movement send(&x, 2, 5); of data recv(&y, 1, 5); Waits for a message from process 1 with a tag of 5 CSE 661 – Parallel and Vector Architectures Slide 13

“Group” message passing routines Have routines that send message(s) to a group of processes or receive message(s) from a group of processes Higher efficiency than separate point-to-point routines, although not absolutely necessary depending on implementation CSE 661 – Parallel and Vector Architectures Slide 14

CSE 661 – Parallel and Vector Architectures Slide 15 Broadcast Sending same message to all processes including the root (the sender) process Multicast - sending same message to a defined group of processes Process 0 Process 1 Process p - 1 data data data Action buf bcast(); bcast(); bcast(); Code CSE 661 – Parallel and Vector Architectures Slide 15

CSE 661 – Parallel and Vector Architectures Slide 16 Scatter Sending each element of an array in root process to a separate process. Contents of ith location of array sent to ith process. Process 0 Process 1 Process p - 1 data data data Action buf scatter(); scatter(); scatter(); Code CSE 661 – Parallel and Vector Architectures Slide 16

CSE 661 – Parallel and Vector Architectures Slide 17 Gather Having one process collect individual values from a set of processes. Process 0 Process 1 Process p - 1 data data data Action buf gather(); gather(); gather(); Code CSE 661 – Parallel and Vector Architectures Slide 17

CSE 661 – Parallel and Vector Architectures Slide 18 Reduce Gather operation combined with specified arithmetic/logical operation. Alternative Action: values could be gathered and then added together by root Process 0 Process 1 Process p - 1 data data data Action buf + reduce(); reduce(); reduce(); Code CSE 661 – Parallel and Vector Architectures Slide 18

MPI (Message Passing Interface) Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability Defines routines, not implementation MPI 1 defined 120+ functions (MPI 2 added more) Only need few (about 20) to write programs All MPI routines start with the prefix MPI_ (C version) Several free implementations exist MPICH Argonne National Laboratory LAM/MPI Ohio Supercomputing Center CSE 661 – Parallel and Vector Architectures Slide 19

MPI Process Creation and Execution Purposely not defined - Will depend upon implementation. Only static process creation supported in MPI version 1. All processes must be defined prior to execution and started together. Originally SPMD model of computation. MPMD also possible with static creation - each program to be started together specified. CSE 661 – Parallel and Vector Architectures Slide 20

Unsafe message passing - Example Process 0 Process 1 send(…,1,…); (a) Intended Behavior Source lib() send(…,1,…); recv(…,0,…); lib() Destination recv(…,0,…); Process 0 Process 1 send(…,1,…); (b) Runtime Behavior might mix user with library messages lib() send(…,1,…); recv(…,0,…); lib() recv(…,0,…); CSE 661 – Parallel and Vector Architectures Slide 21

MPI Solution: Communicators Defines a communication domain - a set of processes that are allowed to communicate between themselves. Communication domains of libraries can be separated from that of a user program. Used in all point-to-point and collective MPI message-passing communications. CSE 661 – Parallel and Vector Architectures Slide 22

CSE 661 – Parallel and Vector Architectures Slide 23 Communicators Defines scope of a communication operation. Processes have ranks associated with communicator. Initially, all processes enrolled in a “universe” called MPI_COMM_WORLD, and each process is given a unique rank number from 0 to p - 1, with p processes. Other communicators can be established for groups of processes. CSE 661 – Parallel and Vector Architectures Slide 23

Default Communicator MPI_COMM_WORLD Exists as first communicator for all processes existing in the application. A set of MPI routines exists for forming communicators. Processes have a “rank” in a communicator. CSE 661 – Parallel and Vector Architectures Slide 24

Using SPMD Computational Model main (int argc, char *argv[]) { MPI_Init(&argc, &argv); . MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /*find process rank */ if (myrank == 0) master(); else slave(); MPI_Finalize(); } where master() and slave() are to be executed by master process and slave processes, respectively. CSE 661 – Parallel and Vector Architectures Slide 25

Initializing and Ending MPI Initialize MPI Environment MPI_Init(&argc, &argv) Before calling any MPI function argc is the argument count (main function) argv is the argument vector (main function) Terminate MPI execution environment MPI_Finalize( ) CSE 661 – Parallel and Vector Architectures Slide 26

MPI Point-to-Point Communication Uses send and receive routines with message tags as well as communicator Wild card message tags available CSE 661 – Parallel and Vector Architectures Slide 27

CSE 661 – Parallel and Vector Architectures Slide 28 MPI Blocking Routines Return when “locally complete” - when location used to hold message can be used again or altered without affecting message being sent. Blocking send will send message and return - does not mean that message has been received, just that process free to move on without adversely affecting message. CSE 661 – Parallel and Vector Architectures Slide 28

Parameters of Blocking Send MPI_Send(buf, count, datatype, dest, tag, comm) void * buf Address of send buffer int count Number of items to send MPI_Datatype datatype Datatype of each item int dest Rank of destination process int tag Message tag MPI_Comm comm Communicator CSE 661 – Parallel and Vector Architectures Slide 29

Parameters of Blocking Receive MPI_Recv(buf, count, datatype, src, tag, comm, status) void * buf Address of receive buffer (loaded) int count Max number of items to receive MPI_Datatype datatype Datatype of each item int src Rank of source process int tag Message tag MPI_Comm comm Communicator MPI_Status * status Status after operation (returned) CSE 661 – Parallel and Vector Architectures Slide 30

Wildcards and Status in MPI_Recv MPI_ANY_SOURCE matches any source MPI_ANY_TAG matches any tag Status is a return value status -> MPI_SOURCE rank of source status -> MPI_TAG tag of message status -> MPI_ERROR potential errors CSE 661 – Parallel and Vector Architectures Slide 31

CSE 661 – Parallel and Vector Architectures Slide 32 Example To send an integer x from process 0 to process 1 int myrank; int tag = 0; int x; MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find rank */ if (myrank == 0) { /* input some value for x */ MPI_Send(&x, 1, MPI_INT, 1, tag, MPI_COMM_WORLD); } else if (myrank == 1) { MPI_Recv(&x, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, status); . . . CSE 661 – Parallel and Vector Architectures Slide 32

Some Predefined MPI Datatypes C datatype MPI_CHAR signed char MPI_SHORT signed short int MPI_INT signed int MPI_UNSIGNED_CHAR unsigned char MPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED unsigned int MPI_FLOAT float MPI_DOUBLE double CSE 661 – Parallel and Vector Architectures Slide 33

Process Rank and Group Size MPI_Comm_rank(comm, rank) MPI_Comm_size(comm, size) MPI_Comm_rank returns rank of calling process MPI_Comm_size returns size of group within comm MPI_Comm comm communicator int * rank process rank (returned) int * size size of group (returned) CSE 661 – Parallel and Vector Architectures Slide 34

MPI Non-Blocking Routines Non-blocking send - MPI_Isend() - will return “immediately” even before source location is safe to be altered. Non-blocking receive - MPI_Irecv() - will return even if no message to accept. The ‘I’ in ‘Isend’ and ‘Irecv’ means Immediate CSE 661 – Parallel and Vector Architectures Slide 35

Non-Blocking Routine Formats MPI_Isend(buf, count, datatype, dest, tag, comm, request) MPI_Irecv(buf, count, datatype, src, tag, comm, request) void * buf Address of buffer int count number of items to send/receive MPI_Datatype datatype Datatype of each item int dest/src Rank of destination/source process int tag Message tag MPI_Comm comm Communicator MPI_Request * request Request handle (returned) CSE 661 – Parallel and Vector Architectures Slide 36

CSE 661 – Parallel and Vector Architectures Slide 37 Completion Detection Completion detected by MPI_Wait() and MPI_Test() MPI_Wait(request, status) Wait until operation completes and then return MPI_Test(request, flag, status) Test for completion of a non-blocking operation MPI_Request * request request handle MPI_Status * status same as return status of MPI_Recv() int * flag true if operation completed (returned) CSE 661 – Parallel and Vector Architectures Slide 37

CSE 661 – Parallel and Vector Architectures Slide 38 Example Send an integer x from process 0 to process 1 and allow process 0 to continue int myrank; int tag = 0; int x; MPI_Request req; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find rank */ if (myrank == 0) { MPI_Isend(&x,1,MPI_INT,1,tag,MPI_COMM_WORLD,&req); compute(); MPI_Wait(&req, &status); } else if (myrank == 1) { MPI_Recv(&x,1,MPI_INT,0,msgtag,MPI_COMM_WORLD,&status); . . . } CSE 661 – Parallel and Vector Architectures Slide 38

Send Communication Modes Standard Mode Send - Not assumed that corresponding receive routine has started. Amount of buffering not defined by MPI. If buffering provided, send could complete before receive reached. Buffered Mode - Send may start and return before a matching receive. Necessary to specify buffer space via routine MPI_Buffer_attach(). Synchronous Mode - Send and receive can start before each other but can only complete together. Ready Mode - Send can only start if matching receive already reached, otherwise error. Use with care. CSE 661 – Parallel and Vector Architectures Slide 39

CSE 661 – Parallel and Vector Architectures Slide 40 Send Communication Modes – cont’d Each of the four modes can be applied to both blocking and non-blocking send routines. MPI_Send() MPI_Isend() Standard send MPI_Bsend() MPI_Ibsend() Buffered send MPI_Ssend() MPI_Issend() Synchronous send MPI_Rsend() MPI_Irsend() Ready send Only the standard mode is available for the blocking and non-blocking receive routines. Any type of send routine can be used with any type of receive routine. CSE 661 – Parallel and Vector Architectures Slide 40

Collective Communication Involves set of processes, defined by a communicator. Message tags not present. Principal collective operations: MPI_Bcast (buf, count, datatype, root, comm) Broadcast message from root to all processes in comm and to itself MPI_Scatter (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm) Scatter a buffer from root in parts to group of processes MPI_Gather (sendbuf, sendcount, sendtype, Gather values for group of processes including root CSE 661 – Parallel and Vector Architectures Slide 41

CSE 661 – Parallel and Vector Architectures Slide 42 Example on MPI_Gather To gather items from group of processes into process 0, using dynamically allocated memory in root process: int *buf; /* buffer space is allocated dynamically */ int data[10]; /* data sent from all processes */ MPI_Comm_rank(MPI_COMM_WORLD, &myrank); /* find rank */ if (myrank == 0) { /*find group size*/ MPI_Comm_size(MPI_COMM_WORLD, &grp_size); /*allocate memory*/ buf = (int *) malloc(grp_size*10*sizeof(int)); } MPI_Gather (data, 10, MPI_INT, buf, grp_size*10, MPI_INT, 0, MPI_COMM_WORLD); CSE 661 – Parallel and Vector Architectures Slide 42

Collective Communication – Cont’d MPI_Alltoall (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm) Send data from all processes to all processes MPI_Reduce (sendbuf, recvbuf, count, datatype operation, root, comm) Combines values on all processes to a single value at root MPI_Scan (sendbuf, recvbuf, count, datatype, operation, comm) Computes the partial reductions of data on a collection of processes MPI_Barrier(comm) Block process until all processes have called it CSE 661 – Parallel and Vector Architectures Slide 43

CSE 661 – Parallel and Vector Architectures Slide 44 Sample MPI program #include <mpi.h> #include <stdio.h> #include <math.h> #define MAXSIZE 100000 void main(int argc, char *argv) { int myid, numprocs; int data[MAXSIZE], i, x, low, high, myresult, result; char fn[255]; char *fp; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { /* Open input file and initialize data */ strcpy(fn, getenv(“HOME”)); strcat(fn, ”/MPI/rand_data.txt”); if ((fp = fopen(fn, ”r”)) == NULL) { printf(“Can’t open the input file: %s\n\n”, fn); exit(1); } for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]); CSE 661 – Parallel and Vector Architectures Slide 44

Sample MPI program – cont’d Can you guess what this program is doing? /* broadcast data */ MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* Add my portion of data */ x = n/nproc; low = myid * x; high = low + x; for(i = low; i < high; i++) myresult += data[i]; printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */ MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf(“The sum is %d.\n”, result); MPI_Finalize(); } CSE 661 – Parallel and Vector Architectures Slide 45

CSE 661 – Parallel and Vector Architectures Slide 46 Visualization Tools Programs can be watched as they are executed in a space-time diagram (or process-time diagram): Process 1 Process 2 Process 3 Time Computing W aiting Message-passing system routine Message CSE 661 – Parallel and Vector Architectures Slide 46

Evaluating Programs Empirically Measuring Execution Time To measure the execution time between two points in the code, we might have a construction such as . double t1 = MPI_Wtime(); /* start timer */ double t2 = MPI_Wtime(); /* stop timer */ elapsed_time = t2 - t1; /* in seconds */ printf(“Elapsed time = %5.2f seconds”, elapsed_time); MPI provides the routine MPI_Wtime() Returns time in seconds since an arbitrary time in past CSE 661 – Parallel and Vector Architectures Slide 47

CSE 661 – Parallel and Vector Architectures Slide 48 Step by Step Instructions for Compiling and Executing Programs Using MPICH (Class Demo) CSE 661 – Parallel and Vector Architectures Slide 48