Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens
Characteristics of Distributed-Memory Machines Scalable interconnection network. Some memory physically local, some remote Message-passing programming model. Major issues: network latency and bandwidth, message-passing overhead, data domposition. Incremental parallelization can be difficult.
What is MPI? De facto standard API for explicit message- passing SPMD programming. Many implementations over many networks Developed in mid 90’s by consortium, reflecting lessons learned from machine-specific libraries and PVM. Focused on: homogeneous MPPs, high performance, library writing, portability. For more information: MPI linksMPI links
Six-Function MPI MPI_InitInitialize MPI MPI_FinalizeClose it down MPI_Comm_rankGet my process # MPI_Comm_sizeHow many total? MPI_SendSend message MPI_RecvReceive message
MPI “Hello World” in Fortran77 implicit none include 'mpif.h' integer myid, numprocs, ierr call mpi_init( ierr ) call mpi_comm_rank( mpi_comm_world, myid, ierr ) call mpi_comm_size( mpi_comm_world, numprocs, ierr ) print *, "hello from ", myid, " of ", numprocs call mpi_finalize( ierr ) stop end
MPI “Hello World” in C #include "mpi.h" #include int main(int argc, char *argv[]) { int myid, numprocs, namelen; char processor_name[MPI_MAX_PROCESSOR_NAME]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); MPI_Get_processor_name(processor_name, &namelen); printf("hello from %s: process %d of %d\n", processor_name, myid, numprocs); MPI_Finalize(); }
Function MPI_Send int MPI_Send (void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) MPI_SEND(BUF, COUNT, DATATYPE, DEST, TAG, COMM, IERR) BUF(*) INTEGER COUNT, DATATYPE, DEST, TAG, COMM, IERR MPI_SEND(buf, count, datatype, dest, tag, comm) INbuf initial address of send buffer (choice) INcount number of entries to send (integer) INdatatype datatype of each entry (handle) INdest rank of destination (integer) INtag message tag (integer) INcomm communicator (handle)
Function MPI_Recv int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) MPI_RECV(BUF, COUNT, DATATYPE, SOURCE, TAG, COMM, STATUS, IERR) BUF(*) INTEGER COUNT, DATATYPE, SOURCE, TAG, COMM INTEGER STATUS(MPI_STATUS_SIZE), IERR MPI_RECV(buf, count, datatype, source, tag, comm, status) OUTbuf initial address of receive buffer (choice) INcount max number of entries to receive (integer) INdatatype datatype of each entry (handle) INdest rank of source (integer) INtag message tag (integer) INcomm communicator (handle) OUTstatus return status (Status)
MPI Send and Recv Semantics These are “standard mode”, blocking calls. Send returns when buf may be re-used; Recv returns when data in buf is available. Count consecutive items of type datatype, beginning at buf, are sent to process with rank dest. Tag can be used to distinguish among messages from the same source. Messages are non-overtaking. Buffering is up to the implementation.
MPI Topics FDI 2004 Track M Day 2 – Morning Session #2 C. J. Ribbens
MPI Communicators A communicator can be thought of as a set of processes; every communication event takes place in the context of a particular communicator. MPI_COMM_WORLD is the initial set of processes (note: MPI-1 has a static process model). Why do communicators exist? –Collective operations over subsets of processes –Can define special topologies for sets of processes –Separate communication contexts for libraries.
MPI Collective Operations An operation over a communicator Must be called by every member of the communicator Three classes of collective operations: –Synchronization (MPI_Barrier) –Data movement –Collective computation
Collective Patterns (Gropp)
Collective Computation Patterns (Gropp)
Collective Routines (Gropp) Many routines: Allgather AllgathervAllreduce Alltoall AlltoallvBcast Gather GathervReduce ReduceScatter ScanScatter Scatterv All versions deliver results to all participating processes. V versions allow the chunks to have different sizes. Allreduce, Reduce, ReduceScatter, and Scan take both built-in and user-defined combination functions.
MPI topics not covered … Topologies: Cartesian and graph User-defined types Message-passing modes Intercommunicators MPI-2 topics: MPI I/O, remote memory access, dynamic process management