CS 484. Message Passing Based on multi-processor Set of independent processors Connected via some communication net All communication between processes.

Slides:



Advertisements
Similar presentations
MPI Fundamentals—A Quick Overview Shantanu Dutt ECE Dept., UIC.
Advertisements

Introduction MPI Mengxia Zhu Fall An Introduction to MPI Parallel Programming with the Message Passing Interface.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Message Passing Interface (MPI) Part I NPACI Parallel.
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
MPI (Message Passing Interface) Basics
CS 179: GPU Programming Lecture 20: Cross-system communication.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
1 An Introduction to MPI Parallel Programming with the Message Passing Interface Originally by William Gropp and Ewing Lusk Adapted by Anda Iamnitchi.
1 An Introduction to MPI Parallel Programming with the Message Passing Interface William Gropp Ewing Lusk Argonne National Laboratory Presenter: Mike Slavik.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
CS 240A Models of parallel programming: Distributed memory and MPI.
1 Review –6 Basic MPI Calls –Data Types –Wildcards –Using Status Probing Asynchronous Communication Collective Communications Advanced Topics –"V" operations.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
An Introduction to MPI Parallel Programming with the Message Passing Interface Prof S. Ramachandram.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
12.1 Parallel Programming Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI 2 Part II NPACI Parallel Computing Institute.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Outline Background Basics of MPI message passing
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
Introduction to MPI.
MPI Message Passing Interface
CS 584.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
MPI: Message Passing Interface
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Introduction to parallelism and the Message Passing Interface
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Hello, world in MPI #include <stdio.h> #include "mpi.h"
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

CS 484

Message Passing Based on multi-processor Set of independent processors Connected via some communication net All communication between processes is done via a message sent from one to the other

MPI Message Passing Interface Computation is made of: One or more processes Communicate by calling library routines MIMD programming model SPMD most common.

MPI Processes use point-to-point communication operations Collective communication operations are also available. Communication can be modularized by the use of communicators. MPI_COMM_WORLD is the base. Used to identify subsets of processors

MPI Complex, but most problems can be solved using the 6 basic functions. MPI_Init MPI_Finalize MPI_Comm_size MPI_Comm_rank MPI_Send MPI_Recv

MPI Basics Most all calls require a communicator handle as an argument. MPI_COMM_WORLD MPI_Init and MPI_Finalize don’t require a communicator handle used to begin and end and MPI program MUST be called to begin and end

MPI Basics MPI_Comm_size determines the number of processors in the communicator group MPI_Comm_rank determines the integer identifier assigned to the current process zero based

MPI Basics #include main(int argc, char *argv[]) { int iproc, nproc; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nproc); MPI_Comm_rank(MPI_COMM_WORLD, &iproc); printf("I am processor %d of %d\n", iproc, nproc); MPI_Finalize(); }

MPI Communication MPI_Send Sends an array of a given type Requires a destination node, size, and type MPI_Recv Receives an array of a given type Same requirements as MPI_Send Extra parameter  MPI_Status variable.

MPI Basics Made for both FORTRAN and C Standards for C MPI_ prefix to all calls First letter of function name is capitalized Returns MPI_SUCCESS or error code MPI_Status structure MPI data types for each C type OUT parameters passed using & operator

Using MPI Based on rsh or ssh requires a.rhosts file or ssh key setup  hostname login Path to compiler (CS open labs) MPI_HOME /users/faculty/snell/mpich MPI_CC MPI_HOME/bin/mpicc Marylou5 Use mpicc mpicc hello.c –o hello

Using MPI Write program Compile using mpicc Write process file (linux cluster) host nprocs full_path_to_prog 0 for nprocs on first line, 1 for all others Run program (linux cluster) prog -p4pg process_file args mpirun –np #procs –machinefile machines prog Run program (scheduled on marylou5 using pbs) mpirun -np #procs -machinefile $PBS_NODEFILE prog mpiexec prog

#include “mpi.h” #include #define MAXSIZE 1000 void main(int argc, char *argv) { int myid, numprocs; int data[MAXSIZE], i, x, low, high, myresult, result; char fn[255]; char *fp; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); if (myid == 0) { /* Open input file and initialize data */ strcpy(fn,getenv(“HOME”)); strcat(fn,”/MPI/rand_data.txt”); if ((fp = fopen(fn,”r”)) == NULL) { printf(“Can’t open the input file: %s\n\n”, fn); exit(1); } for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]); } /* broadcast data */ MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* Add my portion Of data */ x = n/nproc; low = myid * x; high = low + x; for(i = low; i < high; i++) myresult += data[i]; printf(“I got %d from %d\n”, myresult, myid); /* Compute global sum */ MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf(“The sum is %d.\n”, result); MPI_Finalize(); }

MPI Message Passing programs are non- deterministic because of concurrency Consider 2 processes sending messages to third MPI only guarantees that 2 messages sent from a single process to another will arrive in order. It is the programmer's responsibility to ensure computation determinism

MPI & Determinism MPI A Process may specify the source of the message A Process may specify the type of message Non-Determinism MPI_ANY_SOURCE or MPI_ANY_TAG

Example for (n = 0; n < nproc/2; n++) { MPI_Send(buff, BSIZE, MPI_FLOAT, rnbor, 1, MPI_COMM_WORLD); MPI_Recv(buff, BSIZE, MPI_FLOAT, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD, &status); /* Process the data */ }

Global Operations Coordinated communication involving multiple processes. Can be implemented by the programmer using sends and receives For convenience, MPI provides a suite of collective communication functions. All participating processes must call the same function.

Collective Communication Barrier Synchronize all processes Broadcast Gather Gather data from all processes to one process Scatter Reduction Global sums, products, etc.

Collective Communication

Distribute Problem Size Distribute Input data Exchange Boundary values Find Max Error Collect Results

MPI_Reduce MPI_Reduce(inbuf, outbuf, count, type, op, root, comm)

MPI_Reduce

MPI_Allreduce MPI_Allreduce(inbuf, outbuf, count, type, op, comm)

MPI Collective Routines Several routines: MPI_ALLGATHER MPI_ALLGATHERV MPI_BCAST MPI_ALLTOALL MPI_ALLTOALLV MPI_REDUCE MPI_GATHER MPI_GATHERV MPI_SCATTER MPI_REDUCE_SCATTER MPI_SCAN MPI_SCATTERV MPI_ALLREDUCE All versions deliver results to all participating processes “V” versions allow the chunks to have different sizes MPI_ALLREDUCE, MPI_REDUCE, MPI_REDUCE_SCATTER, and MPI_SCAN take both built-in and user-defined combination functions

Built-In Collective Computation Operations

27 Example: PI in C -1 #include "mpi.h" #include int main(int argc, char *argv[]) { int done = 0, n, myid, numprocs, i, rc; double PI25DT = ; double mypi, pi, h, sum, x, a; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); while (!done) { if (myid == 0) { printf("Enter the number of intervals: (0 quits) "); scanf("%d",&n); } MPI_Bcast(&n, 1, MPI_INT, 0, MPI_COMM_WORLD); if (n == 0) break;

28 Example: PI in C - 2 h = 1.0 / (double) n; sum = 0.0; for (i = myid + 1; i <= n; i += numprocs) { x = h * ((double)i - 0.5); sum += 4.0 / (1.0 + x*x); } mypi = h * sum; MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (myid == 0) printf("pi is approximately %.16f, Error is %.16f\n", pi, fabs(pi - PI25DT)); } MPI_Finalize(); return 0; }

SOME OTHER THINGS

MPI Datatypes Data in messages are described by: Address, Count, Datatype MPI predefines many datatypes MPI_INT, MPI_FLOAT, MPI_DOUBLE, etc. There is an analog for each primitive type Can also construct custom data types for structured data

MPI_Recv Blocks until message is received Message is matched based on source & tag The MPI_Status argument gets filled with information about the message Source & Tag Receiving fewer elements than specified is OK Receiving more elements is an error Use MPI_Get_count to get number of elements received

MPI_Recv int recvd_tag, recvd_from, recvd_count; MPI_Status status; MPI_Recv(..., MPI_ANY_SOURCE, MPI_ANY_TAG,..., &status ) recvd_tag = status.MPI_TAG; recvd_from = status.MPI_SOURCE; MPI_Get_count( &status, datatype, &recvd_count );

Non-blocking communication MPI_Send and MPI_Recv are blocking MPI_Send does not complete until the buffer is available to be modified MPI_Recv does not complete until the buffer is filled Blocking communication can lead to deadlocks for(int p = 0; p < nproc; p++) { MPI_Send(… p ….) MPI_Recv(… p ….) }

Non-blocking communiction MPI_Isend & MPI_Irecv return immediately (non-blocking) MPI_Request request; MPI_Status status; MPI_Isend( start, count, datatype, dest, tag, comm, &request ) MPI_Irecv( start, count, datatype, src, tag, comm, &request ) MPI_WAIT( &request, &status ) Used to overlap communication with computation Anywhere you use MPI_Send or MPI_Recv, you can use the pair of MPI_Isend/MPI_Wait or MPI_Irecv/MPI_Wait Also can use MPI_Waitall, MPI_Waitany, MPI_Waitsome Can also check to see if you have any messages without actually receiving them – MPI_Probe & MPI_Iprobe MPI_Probe blocks until there is a message – MPI_Iprobe sets a flag

Communicators All MPI communication is based on a communicator which contains a context and a group Contexts define a safe communication space for message-passing Contexts can be viewed as system-managed tags Contexts allow different libraries to co-exist The group is just a set of processes Processes are always referred to by unique rank in group

Uses of MPI_COMM_WORLD Contains all processes available at the time the program was started Provides initial safe communication space Simple programs communicate with MPI_COMM_WORLD Even complex programs will use MPI_COMM_WORLD for most communications Complex programs duplicate and subdivide copies of MPI_COMM_WORLD Provides a global communicator for forming smaller groups or subsets of processors for specific tasks MPI_COMM_WORLD

int MPI_Comm_split( MPI_Comm comm, int color, int key, MPI_Comm *newcomm) MPI_COMM_SPLIT( COMM, COLOR, KEY, NEWCOMM, IERR ) INTEGER COMM, COLOR, KEY, NEWCOMM, IERR Subdividing a Communicator with MPI_COMM_SPLIT MPI_COMM_SPLIT partitions the group associated with the given communicator into disjoint subgroups Each subgroup contains all processes having the same value for the argument color Within each subgroup, processes are ranked in the order defined by the value of the argument key, with ties broken according to their rank in old communicator

Subdividing a Communicator To divide a communicator into two non- overlapping groups color = (rank < size/2) ? 0 : 1 ; MPI_Comm_split(comm, color, 0, &newcomm) ; comm newcomm

Subdividing a Communicator To divide a communicator such that all processes with even ranks are in one group all processes with odd ranks are in the other group maintain the reverse order by rank color = (rank % 2 == 0) ? 0 : 1 ; key = size - rank ; MPI_Comm_split(comm, color, key, &newcomm) ; comm newcomm

program main include 'mpif.h' integer ierr, row_comm, col_comm integer myrank, size, P, Q, myrow, mycol P = 4 Q = 3 call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, myrank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr) C Determine row and column position myrow = myrank/Q mycol = mod(myrank,Q) C Split comm into row and column comms call MPI_Comm_split(MPI_COMM_WORLD, myrow, mycol, row_comm, ierr) call MPI_Comm_split(MPI_COMM_WORLD, mycol, myrow, col_comm, ierr) print*, "My coordinates are[",myrank,"] ",myrow. mycol call MPI_Finalize(ierr) stop end

0 (0,0) 1 (0,1) 2 (0,2) 3 (1,0) 4 (1,1) 5 (1,2) 6 (2,0) 7 (2,1) 8 (2,2) 9 (3,0) 10 (3,1) 11 (3,2) MPI_COMM_WORLD row_comm col_comm

DEBUGGING

An ounce of prevention… Defensive programming Check function return codes Verify send and receive sizes Incremental programming Modular programming Test modules – keep test code in place Identify all shared data and think carefully about how it is accessed Correctness first – then speed

Debugging Characterize the bug Run code serially Run in parallel on one core (2-4 processes) Run in parallel (2-4 processes on 2-4 cores) Play around with inputs and other data and data sizes Find smallest data size that exposes the bug Remove as much non-determinism as you can Print statements – use stderr (non buffered) Before and after communication or shared variable access  Print all information – source, sizes, data, tag, etc. Identify process number – first thing in print (helps sorting) Leave the prints in your code - #ifdef

Debugging Learn about C constructs __FILE__, __LINE__, and __FUNCTION__ Make one logical change at a time and then test Learn how to attach debuggers You will probably need some sort of stall code – ie. Wait for input on master then do a barrier – all others just do barrier

Common problems Not all processes call collective call Be very careful about putting collective calls inside conditionals Be sure the communicator is correct Deadlock (everybody on recv) Use non-blocking calls Use MPI_Sendrecv Process waiting for data that is never sent Use collective calls where you can Use simple communication patterns

Best Advice Program incrementally and modularly Characterize the bug and leave yourself time to walk away from it and think about it Never underestimate the value of a second set of eyes Sometimes just explaining your code to someone else helps you help yourself