Presentation is loading. Please wait.

Presentation is loading. Please wait.

MPI Message Passing Interface

Similar presentations


Presentation on theme: "MPI Message Passing Interface"— Presentation transcript:

1 MPI Message Passing Interface
Yvon Kermarrec

2 More readings “Parallel programming with MPI”, Peter Pacheco, Morgan Kaufmann Publishers LAM/MPI User Guide: The MPI standard is available from

3 Agenda Part 0 – the context
Slides extracted from a lecture from Hanjun Kin, Princeton U. Part 1 - Introduction Basics of Parallel Computing Six-function MPI Point-to-Point Communications Part 2 – Advanced features of MPI Collective Communication Part 3 – examples and how to program an MPI application

4 Serial Computing 1k pieces puzzle Takes 10 hours

5 Parallelism on Shared Memory
Orange and brown share the puzzle on the same table Takes 6 hours (not 5 due to communication & contention)

6 The more, the better?? Lack of seats (Resource limit)
More contention among people

7 Parallelism on Distributed Systems
Scalable seats (Scalable Resource) Less contention from private memory spaces

8 How to share the puzzle? DSM (Distributed Shared Memory)
Message Passing

9 DSM (Distributed Shared Memory)
Provides shared memory physically or virtually Pros - Easy to use Cons - Limited Scalability, High coherence overhead

10 Message Passing Pros – Scalable, Flexible
Cons – Someone says it’s more difficult than DSM

11 Agenda Part 1 - Introduction Basics of Parallel Computing
Six-function MPI Point-to-Point Communications Part 2 – Advanced features of MPI Collective Communication Part 3 – examples and how to program an MPI application

12 Agenda Part 0 – the context
Slides extracted from a lecture from Hanjun Kin, Princeton U. Part 1 - Introduction Basics of Parallel Computing Six-function MPI Point-to-Point Communications Part 2 – Advanced features of MPI Collective Communication Part 3 – examples and how to program an MPI application

13 We need more computational power
The weather forcast example by P Pacheco: Suppose we wish to predict the weather over the United and Canada for the next 48 hours Also suppose that we want to model the atmosphere from sea level to an altitude of 20 km we use a cubical grid, with each cube measuring 0.1 km to model the atmosphere ,or 2.0 x 107 km2 x 20 km x 103 cubes per km3 = 4 x 1011 grid points Suppose we need to computer 100 instructions for each points for the next 48 hours : we need 4 x 1013 x 48 operations If our computer executes 109 ope/sec, we need 23 days

14 The need for parallel programming
We face numerous challenges in science (biology, simulation, earthquakes, …) and we cannot build fast enough computers…. Data can be big (big data…) and memory is rather limited Processors can do a lot ... But to adress figures as mentionned we can program smarter but that is not enough

15 The need for parallel machines
We can build a parallel machines, but there is still a huge amount of work to be done: decide on and implement an interconnection network for the processors and memory modules, design and implement system software for the hardware Design algorithms and data structures to solve our problem Divide the algorithms and data structures into subproblems Indentify the communications and data exchanges Assign subproblems to processors

16 The need for parallel machines
Flynn’s taxonomy (or how to work more!) SISD : Single Instruction – Single Data : the common and classical machine… SIMD : Single Instruction – Multiple data : the same instructions are carried out simultaneously on multiple data items MIMD : Multiple Instructions – Multiple Data SPMD : Single Program – Multiple Data : the same version of the program is replicated and run on different data

17 The need for parallel machines
We can build one parallel computer … but that would be very expensive, time and energy consuming, … and hard to maintain We may want to integrate what is available in the labs – to agregate the available computing ressources and reuse ordinary machines : US D.o Energy and the PVM project (Parallel Virtual Machine) from ‘89

18 MPI : Message Passing Interface ?
MPI : an Interface A message-passing library specification extended message-passing model not a language or compiler specification not a specific implementation or product For parallel computers, clusters, and heterogeneous networks A riche set of features Designed to provide access to advanced parallel hardware for end users, library writers, and tool developers

19 MPI ? An international product
Early vendor systems (Intel’s NX, IBM’s EUI, TMC’s CMMD) were not portable Early portable systems (PVM, p4, TCGMSG, Chameleon) were mainly research efforts Were rather limited… and lacked vendor support Were not implemented at the most efficient level The MPI Forum organized in 1992 with broad participation by: vendors: IBM, Intel, TMC, SGI, Convex … users: application scientists and library writers

20 How big is the MPI library?
Huge ( 125 Functions )… Basic ( 6 Functions ) But only a subset is needed to program a distributed application

21 Environments for parallel programming
Upshot, Jumpshot, and MPE tools • Pallas VAMPIR • Paragraph

22 A Minimal MPI Program in C
#include "mpi.h" #include <stdio.h> int main( int argc, char *argv[] ) { MPI_Init( &argc, &argv ); printf( "Hello, world!\n" ); MPI_Finalize(); return 0; }

23 Finding Out About the Environment
Two important questions that arise early in a parallel program are: How many processes are participating in this computation? Which one am I? MPI provides functions to answer these questions: MPI_Comm_size reports the number of processes. MPI_Comm_rank reports the rank, a number between 0 and size-1, identifying the calling process

24 Better Hello (C) #include "mpi.h" #include <stdio.h>
int main( int argc, char *argv[] ) { int rank, size; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); printf( "I am %d of %d\n", rank, size ); MPI_Finalize(); return 0; }

25 Some Basic Concepts Processes can be collected into groups.
Each message is sent in a context, and must be received in the same context. A group and context together form a communicator. A process is identified by its rank in the group associated with a communicator. There is a default communicator whose group contains all initial processes, called MPI_COMM_WORLD.

26 MPI Datatypes The data in a message to sent or received is described by a triple (address, count, datatype), where An MPI datatype is recursively defined as: predefined, corresponding to a data type from the language (e.g., MPI_INT, MPI_DOUBLE_PRECISION) a contiguous array of MPI datatypes an indexed array of blocks of datatypes an arbitrary structure of datatypes There are MPI functions to construct custom datatypes, such an array of (int, float) pairs, or a row of a matrix stored columnwise.

27 Basic MPI types MPI datatype C datatype MPI_CHAR signed char
MPI_SIGNED_CHAR signed char MPI_UNSIGNED_CHAR unsigned char MPI_SHORT signed short MPI_UNSIGNED_SHORT unsigned short MPI_INT signed int MPI_UNSIGNED unsigned int MPI_LONG signed long MPI_UNSIGNED_LONG unsigned long MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double

28 MPI Tags Messages are sent with an accompanying user-defined integer tag, to assist the receiving process in identifying the message. Messages can be screened at the receiving end by specifying a specific tag, or not screened by specifying MPI_ANY_TAG as the tag in a receive. Some non-MPI message-passing systems have called tags “message types”. MPI calls them tags to avoid confusion with datatypes.

29 MPI blocking send MPI_SEND(void *start, int count,MPI_DATATYPE datatype, int dest, int tag, MPI_COMM comm) The message buffer is described by (start, count, datatype). dest is the rank of the target process in the defined communicator. tag is the message identification number.

30 MPI Basic (Blocking) Receive
MPI_RECV(start, count, datatype, source, tag, comm, status) Waits until a matching (on source and tag) message is received from the system, and the buffer can be used. source is rank in communicator specified by comm, or MPI_ANY_SOURCE. status contains further information Receiving fewer than count occurrences of datatype is OK, but receiving more is an error.

31 Retrieving Further Information
Status is a data structure allocated in the user’s program. In C: int recvd_tag, recvd_from, recvd_count; MPI_Status status; MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG, ..., &status ) recvd_tag = status.MPI_TAG; recvd_from = status.MPI_SOURCE; MPI_Get_count( &status, datatype, &recvd_count );

32 More info A receive operation may accept messages from an arbitrary sender, but a send operation must specify a unique receiver. Source equals destination is allowed, that is, a process can send a message to itself.

33 Why MPI is simple? Many parallel programs can be written using just these six functions, only two of which are non-trivial; MPI_INIT MPI_FINALIZE MPI_COMM_SIZE MPI_COMM_RANK MPI_SEND MPI_RECV

34 Simple full example #include <stdio.h> #include <mpi.h>
int main(int argc, char *argv[]) { const int tag = 42; /* Message tag */ int id, ntasks, source_id, dest_id, err, i; MPI_Status status; int msg[2]; /* Message array */ err = MPI_Init(&argc, &argv); /* Initialize MPI */ if (err != MPI_SUCCESS) { printf("MPI initialization failed!\n"); exit(1); } err = MPI_Comm_size(MPI_COMM_WORLD, &ntasks); /* Get nr of tasks */ err = MPI_Comm_rank(MPI_COMM_WORLD, &id); /* Get id of this process */ if (ntasks < 2) { printf("You have to use at least 2 processors to run this program\n"); MPI_Finalize(); /* Quit if there is only one processor */ exit(0);

35 Simple full example (Cont.)
if (id == 0) { /* Process 0 (the receiver) does this */ for (i=1; i<ntasks; i++) { err = MPI_Recv(msg, 2, MPI_INT, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD, \ &status); /* Receive a message */ source_id = status.MPI_SOURCE; /* Get id of sender */ printf("Received message %d %d from process %d\n", msg[0], msg[1], \ source_id); } else { /* Processes 1 to N-1 (the senders) do this */ msg[0] = id; /* Put own identifier in the message */ msg[1] = ntasks; /* and total number of processes */ dest_id = 0; /* Destination address */ err = MPI_Send(msg, 2, MPI_INT, dest_id, tag, MPI_COMM_WORLD); err = MPI_Finalize(); /* Terminate MPI */ if (id==0) printf("Ready\n"); exit(0); return 0;

36 Agenda Part 0 – the context
Slides extracted from a lecture from Hanjun Kin, Princeton U. Part 1 - Introduction Basics of Parallel Computing Six-function MPI Point-to-Point Communications Part 2 – Advanced features of MPI Collective Communication Part 3 – examples and how to program an MPI application

37 Collective communications
A single call handles the communication between all the processes in a communicator There are 3 types of collective communications Data movement (e.g. MPI_Bcast) Reduction (e.g. MPI_Reduce) Synchronization (e.g. MPI_Barrier)

38 Broadcast int MPI_Bcast(void *buffer, int count, MPI_Datatype datatype, int root, MPI_Comm comm); One process (root) sends data to all the other processes in the same communicator Must be called by all the processes with the same arguments P1 A B C D P1 A B C D P2 P2 A B C D MPI_Bcast P3 P3 A B C D P4 P4 A B C D

39 Gather int MPI_Gather(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm) One process (root) collects data to all the other processes in the same communicator Must be called by all the processes with the same arguments P1 A P1 A B C D P2 B P2 MPI_Gather P3 C P3 P4 D P4

40 Gather to All int MPI_Allgather(void *sendbuf, int sendcnt, MPI_Datatype sendtype, void *recvbuf, int recvcnt, MPI_Datatype recvtype, MPI_Comm comm) All the processes collects data to all the other processes in the same communicator Must be called by all the processes with the same arguments P1 A P1 A B C D P2 B P2 A B C D MPI_Allgather P3 C P3 A B C D P4 D P4 A B C D

41 Reduction int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm) One process (root) collects data to all the other processes in the same communicator, and performs an operation on the data MPI_SUM, MPI_MIN, MPI_MAX, MPI_PROD, logical AND, OR, XOR, and a few more MPI_Op_create(): User defined operator P1 A P1 A+B+C+D P2 B P2 MPI_Reduce P3 C P3 P4 D P4

42 Synchronization int MPI_Barrier(MPI_Comm comm) #include "mpi.h"
#include <stdio.h> int main(int argc, char *argv[]) { int rank, nprocs; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&nprocs); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Barrier(MPI_COMM_WORLD); printf("Hello, world. I am %d of %d\n", rank, nprocs); MPI_Finalize(); return 0; }

43 Examples…. Master and slaves

44 MPICH (http://www-unix.mcs.anl.gov/mpi/mpich/)
For more functions… MPICH ( Open MPI (


Download ppt "MPI Message Passing Interface"

Similar presentations


Ads by Google