HPCA2001HPCA Message Passing Interface (MPI) and Parallel Algorithm Design
HPCA2001HPCA What is MPI? A message passing library specification –message-passing model –not a compiler specification –not a specific product For parallel computers, clusters and heterogeneous networks. Full-featured
HPCA2001HPCA Why use MPI? (1) Message passing now mature as programming paradigm –well understood –efficient match to hardware –many applications
HPCA2001HPCA Who Designed MPI ? Venders –IBM, Intel, Sun, SGI, Meiko, Cray, Convex, Ncube,….. Research Lab. –PVM, p4, Zipcode, TCGMSG, Chameleon, Express, Linda, PM (Japan RWCP), AM (Berkeley), FM (HPVM at Illinois)
HPCA2001HPCA Vender-Supported MPI HP-MPIHP; Convex SPP MPI-F IBM SP1/SP2 Hitachi/MPI Hitachi SGI/MPI SGI PowerChallenge series MPI/DE NEC. INTEL/MPIIntel. Paragon (iCC lib) T.MPI Telmat Multinode Fujitsu/MPI Fujitsu AP1000 EPCC/MPI Cray & EPCC, T3D/T3E.
HPCA2001HPCA Research MPI MPICHArgonne National Lab. & Mississippi State U. LAMOhio Supercomp. center MPICH/NTMississippi State U. MPI-FMIllinois (Myrinet) MPI-AMUC Berkeley (Myrinet) MPI-PMRWCP, Japan (Myrinet) MPI-CCLCalif. Tech.
HPCA2001HPCA Research MPI CRI/EPCC MPI Cray Research and Edinburgh (Cray T3D/E)Parallel Computing Centre MPI-APAustralian National U.- (AP1000) CAP Research Program W32MPIIllinois, Concurrent Systems RACE-MPIHughes Aircraft Co. MPI-BIPINRIA, France (Myrinet)
HPCA2001HPCA Language Binding MPI 1: C, Fortran (for MPICH-based implementation) MPI 2: C, C++, Fortran Java : –Through Java native method interface (JNI): mpiJava JavaMPI –Implement the MPI package by pure Java: MPIJ: (DOGMA project) –JMPI (by MPI Software Technology)
HPCA2001HPCA Main Features of MPI
HPCA2001HPCA “Communicator” Identify the process group and context with respect to which the operation is to be performed In a parallel environment, processes need to know each others (“naming”: machine name, IP address, process ID)
HPCA2001HPCA Process Communicator (2) Four communicators Process in different communicators cannot communicate Process Communicator within Communicator Process Same process can be existed in different communicators Process
HPCA2001HPCA Point-to-point Communication The basic point-to-point communication operators are send and receive. Communication Modes : –normal mode (blocking and non-blocking), –synchronous mode, –ready mode (to allow access to fast protocols), –buffered mode –….
HPCA2001HPCA Collective Communication Communication that involves a group of processes. E.g, broadcast, barrier, reduce, scatter, gather, all-to-all,..
HPCA2001HPCA MPI Programming
HPCA2001HPCA Writing MPI programs MPI comprises 125 functions Many parallel programs can be written with just 6 basic functions
HPCA2001HPCA Six basic functions (1) 1. MPI_INIT: Initiate an MPI computation 2. MPI_FINALIZE: Terminate a computation 3. MPI_COMM_SIZE: Determine number of processes in a communicator 4. MPI_COMM_RANK: Determine the identifier of a process in a specific communicator 5. MPI_SEND: Send a message from one process to another process 6. MPI_RECV: Receive a message from one process to another process
HPCA2001HPCA Program main begin MPI_INIT() MPI_COMM_SIZE(MPI_COMM_WORLD, count) MPI_COMM_RANK(MPI_COMM_WORLD, myid) print(“I am ”, myid, “ of ”, count) MPI_FINALIZE() end A simple program Initiate computation Find the number of processes Find the process ID of current process Each process prints out its output Shut down
HPCA2001HPCA Result I’m 0 of 4 I’m 2 of 4 I’m 1 of 4 I’m 3 of 4 Process 2Process 3 Process 1 Process 0
HPCA2001HPCA Another program (2 nodes) ….. MPI_COMM_RANK(MPI_COMM_WORLD, myid) if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…) else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…) END IF print(“Received from %s”,words) …… I’m process 0! if myid=0 MPI_SEND(“Zero”,…,…,1,…,…) MPI_RECV(words,…,…,1,…,…,…)…… I’m process 1! else MPI_RECV(words,…,…,0,…,…,…) MPI_SEND(“One”,…,…,0,…,…)
HPCA2001HPCA Result Received from OneReceived from Zero Process 0 Process 1
HPCA2001HPCA Collective Communication Three Types of Collective Operations Barrier for process synchronization MPI_BARRIER Data movement moving data among processes no computation MPI_BCAST, MPI_GATHER, MPI_SCATTER Reduction operations Involve computation MPI_REDUCE, MPI_SCAN
HPCA2001HPCA Barrier MPI_BARRIER Used to synchronize execution of a group of processes wait compute Continue execution All members reach the same point before any can proceed Process 1Process 2Process p Perform barrier Blocking time
HPCA2001HPCA Data Movement Broadcast: –one member sends the same message to all members Scatter: –one member sends a different message to each member Gather: –every member sends a message to a single member All-to-all broadcast: –every member performs a broadcast All-to-all scatter-gather (Total Exchange): –every member performs a scatter (and gather)
HPCA2001HPCA MPI Collective Communications Broadcast (MPI_Bcast) Combine-to-one (MPI_Reduce) Scatter (MPI_Scatter) Gather (MPI_Gather) Collect (MPI_Allgather) Combine-to-all (MPI_Allreduce) Reduce: (MPI_Reduce) Scan: (MPI_Scan) All-to-All: (MPI_Alltoall)
HPCA2001HPCA FACE Process 0Process 1Process 2Process 3 BCAST FACE Data movement (1) MPI_BCAST One single process sends the same data to all other processes, itself included
HPCA2001HPCA Process 0Process 1Process 2Process 3 GATHER E AC EF F A C FACE Data movement (2) MPI_GATHER All process (include the root process) send the same data to one process and store them in rank order
HPCA2001HPCA Process 0Process 1Process 2Process 3 SCATTER FACE F C E A Data movement (3) MPI_SCATTER A process sends out a message, which is split into several equals parts, and the i th portion is sent to the i th process
HPCA2001HPCA Process 0Process 1Process 2Process 3 REDUCE max Data movement (4) MPI_REDUCE (e.g., find maximum value) combine the values of each process, using a specified operation, and return the combined value to a process
HPCA2001HPCA MPI_SCAN Scan Op: + Input Result Scan (parallel prefix): “partial” reduction based upon relative process number Process 0Process 3Process
HPCA2001HPCA Example program (1) Calculating the value of by:
HPCA2001HPCA Example program (2) …… MPI_BCAST(numprocs, …, …, 0, …) for (i = myid + 1; i <= n; i += numprocs) compute the area for each interval accumulate the result in processes’ program data (sum) MPI_REDUCE(&sum, …, …, …, MPI_SUM, 0, …) if (myid == 0) Output result ……
HPCA2001HPCA Calculated by process 0 Calculated by process 1 Calculated by process 2 Calculated by process 3 OK! = Start calculation!