PPC SPRING 2014 - MPI 11 CSCI-4320/6340: Parllel Programming and Computing: West Hall, Tues, Friday 12-1:20 p.m. Message Passing Interface (MPI 1) Prof.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

The Building Blocks: Send and Receive Operations
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Reference: / MPI Program Structure.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
SOME BASIC MPI ROUTINES With formal datatypes specified.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Message Passing Interface (MPI) Part I NPACI Parallel.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Programming Using the Message Passing Paradigm Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel.
HPDC Spring MPI 11 CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs. 2 – 3:20 p.m Message Passing Interface.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Sahalu JunaiduICS 573: High Performance Computing6.1 Programming Using the Message Passing Paradigm Principles of Message-Passing Programming The Building.
Mapping Techniques for Load Balancing
Message Passing Programming Carl Tropper Department of Computer Science.
CS 179: GPU Programming Lecture 20: Cross-system communication.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Director of Contra Costa College High Performance Computing Center
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
1 What is MPI?  MPI = Message Passing Interface  Specification of message passing libraries for developers and users  Not a library by itself, but specifies.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
CS 420 – Design of Algorithms MPI Data Types Basic Message Passing - sends/receives.
11/04/2010CS4961 CS4961 Parallel Programming Lecture 19: Message Passing, cont. Mary Hall November 4,
Parallel Programming with MPI By, Santosh K Jena..
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
MPI Jakub Yaghob. Literature and references Books Gropp W., Lusk E., Skjellum A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface,
CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
1. 2 The logical view of a machine supporting the message-passing paradigm consists of p processes, each with its own exclusive address space. The logical.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Lecture 5 CSS314 Parallel Computing Book: “An Introduction to Parallel Programming” by Peter Pacheco
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Introduction to parallel computing concepts and technics
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Principles of Message-Passing Programming.
Introduction to MPI.
MPI Message Passing Interface
Introduction to MPI CDP.
Computer Science Department
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Principles of Message-Passing Programming.
CS 584.
To accompany the text ``Introduction to Parallel Computing'',
Programming Using the Message Passing Model
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
Introduction to parallelism and the Message Passing Interface
Introduction to Parallel Computing with MPI
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
NOTE: FOR CLASS USE MPICH Version of MPI!!
Computer Science Department
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

PPC SPRING MPI 11 CSCI-4320/6340: Parllel Programming and Computing: West Hall, Tues, Friday 12-1:20 p.m. Message Passing Interface (MPI 1) Prof. Chris Carothers Computer Science Department Class website: Detailed Reference & Tutorial: List of MPI routines:

PPC SPRING MPI 12 Topic Overview Principles of Message-Passing Programming The Building Blocks: Send and Receive Operations MPI: the Message Passing Interface –Send/Receive –MPI Data Types –Dealing with Heterogeneous Systems

PPC SPRING MPI 13 Mental Model

PPC SPRING MPI 14 Principles of Message-Passing Programming The logical view of a machine supporting the message- passing paradigm consists of p processes, each with its own exclusive address space. Each data element must belong to one of the partitions of the space; hence, data must be explicitly partitioned and placed. All interactions (read-only or read/write) require cooperation of two processes - the process that has the data and the process that wants to access the data. These two constraints, while onerous, make underlying costs very explicit to the programmer.

PPC SPRING MPI 15 Principles of Message-Passing Programming Message-passing programs are often written using the asynchronous or loosely synchronous paradigms. In the asynchronous paradigm, all concurrent tasks execute asynchronously. In the loosely synchronous model, tasks or subsets of tasks synchronize to perform interactions. Between these interactions, tasks execute completely asynchronously. Most message-passing programs are written using the single program multiple data (SPMD) model.

PPC SPRING MPI 16 The Building Blocks: Send and Receive Operations The prototypes of these operations are as follows: send(void *sendbuf, int nelems, int dest) receive(void *recvbuf, int nelems, int source) Consider the following code segments: P0 P1 a = 100; receive(&a, 1, 0) send(&a, 1, 1); printf("%d\n", a); a = 0; The semantics of the send operation require that the value received by process P1 must be 100 as opposed to 0. This motivates the design of the send and receive protocols.

PPC SPRING MPI 17 Non-Buffered Blocking Message Passing Operations A simple method for forcing send/receive semantics is for the send operation to return only when it is safe to do so. In the non-buffered blocking send, the operation does not return until the matching receive has been encountered at the receiving process. Idling and deadlocks are major issues with non-buffered blocking sends. In buffered blocking sends, the sender simply copies the data into the designated buffer and returns after the copy operation has been completed. The data is copied into a buffer at the receiving end as well. Buffering alleviates idling at the expense of copying overheads.

PPC SPRING MPI 18 Non-Buffered Blocking Message Passing Operations Handshake for a blocking non-buffered send/receive operation. It is easy to see that in cases where sender and receiver do not reach communication point at similar times, there can be considerable idling overheads.

PPC SPRING MPI 19 Buffered Blocking Message Passing Operations A simple solution to the idling and deadlocking problem outlined above is to rely on buffers at the sending and receiving ends. The sender simply copies the data into the designated buffer and returns after the copy operation has been completed. The data must be buffered at the receiving end as well. Buffering trades off idling overhead for buffer copying overhead.

PPC SPRING MPI 110 Buffered Blocking Message Passing Operations Blocking buffered transfer protocols: (a) in the presence of communication hardware with buffers at send and receive ends; and (b) in the absence of communication hardware, sender interrupts receiver and deposits data in buffer at receiver end.

PPC SPRING MPI 111 Buffered Blocking Message Passing Operations Bounded buffer sizes can have significant impact on performance. P0 P1 for (i = 0; i < 1000; i++){ produce_data(&a); receive(&a, 1, 0); send(&a, 1, 1); consume_data(&a); } } What if consumer was much slower than producer?

PPC SPRING MPI 112 Buffered Blocking Message Passing Operations Deadlocks are still possible with buffering since receive operations block. P0 P1 receive(&a, 1, 1); receive(&a, 1, 0); send(&b, 1, 1); send(&b, 1, 0);

PPC SPRING MPI 113 Non-Blocking Message Passing Operations The programmer must ensure semantics of the send and receive. This class of non-blocking protocols returns from the send or receive operation before it is semantically safe to do so. Non-blocking operations are generally accompanied by a check-status operation. When used correctly, these primitives are capable of overlapping communication overheads with useful computations. Message passing libraries typically provide both blocking and non-blocking primitives.

PPC SPRING MPI 114 Non-Blocking Message Passing Operations Non-blocking non-buffered send and receive operations (a) in absence of communication hardware; (b) in presence of communication hardware.

PPC SPRING MPI 115 Send and Receive Protocols Space of possible protocols for send and receive operations.

PPC SPRING MPI 116 MPI: the Message Passing Interface MPI defines a standard library for message- passing that can be used to develop portable message-passing programs using either C or Fortran. The MPI standard defines both the syntax as well as the semantics of a core set of library routines. Vendor implementations of MPI are available on almost all commercial parallel computers. It is possible to write fully-functional message- passing programs by using only the six routines. –But it my not be all that efficient!!

PPC SPRING MPI 117 MPI: the Message Passing Interface The minimal set of MPI routines. MPI_Init Initializes MPI. MPI_Finalize Terminates MPI. MPI_Comm_size Determines the number of processes. MPI_Comm_rank Determines the label of calling process. MPI_Send Sends a “unbuffered/blocking” message. MPI_Recv Receives a “unbuffered/blocking message.

PPC SPRING MPI 118 Starting and Terminating the MPI Library MPI_Init is called prior to any calls to other MPI routines. Its purpose is to initialize the MPI environment. MPI_Finalize is called at the end of the computation, and it performs various clean-up tasks to terminate the MPI environment. The prototypes of these two functions are: int MPI_Init(int *argc, char ***argv) int MPI_Finalize() MPI_Init also strips off any MPI related command-line arguments. All MPI routines, data-types, and constants are prefixed by “ MPI _”. The return code for successful completion is MPI_SUCCESS.

PPC SPRING MPI 119 Communicators A communicator defines a communication domain - a set of processes that are allowed to communicate with each other. Information about communication domains is stored in variables of type MPI_Comm. Communicators are used as arguments to all message transfer MPI routines. A process can belong to many different (possibly overlapping) communication domains. MPI defines a default communicator called MPI_COMM_WORLD which includes all the processes.

PPC SPRING MPI 120 Querying Information The MPI_Comm_size and MPI_Comm_rank functions are used to determine the number of processes and the label of the calling process, respectively. The calling sequences of these routines are as follows: int MPI_Comm_size(MPI_Comm comm, int *size) int MPI_Comm_rank(MPI_Comm comm, int *rank) The rank of a process is an integer that ranges from zero up to the size of the communicator minus one.

PPC SPRING MPI 121 Our First MPI Program // export PATH=/usr/local/mpich-3.0.1/bin:$PATH // compile with: mpicc mpi-hello.c –o mpi-hello // run: mpirun –np #pes mpi-hello #include main(int argc, char *argv[]) { int npes, myrank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &npes); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); printf("From process %d out of %d, Hello World!\n", myrank, npes); MPI_Finalize(); } Let’s see how it works…

PPC SPRING MPI 122 Sending and Receiving Messages The basic functions for sending and receiving messages in MPI are the MPI_Send and MPI_Recv, respectively. The calling sequences of these routines are as follows: int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) MPI provides equivalent datatypes for all C datatypes. This is done for portability reasons. The datatype MPI_BYTE corresponds to a byte (8 bits) and MPI_PACKED corresponds to a collection of data items that has been created by packing non-contiguous data. The message-tag can take values ranging from zero up to the MPI defined constant MPI_TAG_UB.

PPC SPRING MPI 123 MPI Datatypes MPI DatatypeC Datatype MPI_CHARsigned char MPI_SHORTsigned short int MPI_INTsigned int MPI_LONGsigned long int MPI_UNSIGNED_CHARunsigned char MPI_UNSIGNED_SHORTunsigned short int MPI_UNSIGNEDunsigned int MPI_UNSIGNED_LONGunsigned long int MPI_FLOATfloat MPI_DOUBLEdouble MPI_LONG_DOUBLElong double MPI_BYTE MPI_PACKED

PPC SPRING MPI 124 Sending and Receiving Messages MPI allows specification of wildcard arguments for both source and tag. If source is set to MPI_ANY_SOURCE, then any process of the communication domain can be the source of the message. If tag is set to MPI_ANY_TAG, then messages with any tag are accepted. On the receive side, the message must be of length equal to or less than the length field specified.

PPC SPRING MPI 125 Sending and Receiving Messages On the receiving end, the status variable can be used to get information about the MPI_Recv operation. The corresponding data structure contains: typedef struct MPI_Status { int MPI_SOURCE; int MPI_TAG; int MPI_ERROR; }; The MPI_Get_count function returns the precise count of data items received. int MPI_Get_count(MPI_Status *status, MPI_Datatype datatype, int *count)

PPC SPRING MPI 126 Avoiding Deadlocks Consider: int a[10], b[10], myrank; MPI_Status status;... MPI_Comm_rank(MPI_COMM_WORLD, &myrank); if (myrank == 0) { MPI_Send(a, 10, MPI_INT, 1, 1, MPI_COMM_WORLD); MPI_Send(b, 10, MPI_INT, 1, 2, MPI_COMM_WORLD); } else if (myrank == 1) { MPI_Recv(b, 10, MPI_INT, 0, 2, MPI_COMM_WORLD, &status); MPI_Recv(a, 10, MPI_INT, 0, 1, MPI_COMM_WORLD, &status); }... If MPI_Send is blocking, there is a deadlock!!

PPC SPRING MPI 127 Avoiding Deadlocks Consider the following piece of code, in which process i sends a message to process i + 1 (modulo the number of processes) and receives a message from process i - 1 (module the number of processes). int a[10], b[10], npes, myrank; MPI_Status status;... MPI_Comm_size(MPI_COMM_WORLD, &npes); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1, MPI_COMM_WORLD); MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1, MPI_COMM_WORLD);... Once again, we have a deadlock if MPI_Send is blocking.

PPC SPRING MPI 128 Avoiding Deadlocks We can break the circular wait to avoid deadlocks as follows: int a[10], b[10], npes, myrank; MPI_Status status;... MPI_Comm_size(MPI_COMM_WORLD, &npes); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); if (myrank%2 == 1) { MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1, MPI_COMM_WORLD); MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1, MPI_COMM_WORLD); } else { MPI_Recv(b, 10, MPI_INT, (myrank-1+npes)%npes, 1, MPI_COMM_WORLD); MPI_Send(a, 10, MPI_INT, (myrank+1)%npes, 1, MPI_COMM_WORLD); }...

PPC SPRING MPI 129 Sending and Receiving Messages Simultaneously To exchange messages, MPI provides the following function: int MPI_Sendrecv(void *sendbuf, int sendcount, MPI_Datatype senddatatype, int dest, int sendtag, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int source, int recvtag, MPI_Comm comm, MPI_Status *status) The arguments include arguments to the send and receive functions. If we wish to use the same buffer for both send and receive, we can use: int MPI_Sendrecv_replace(void *buf, int count, MPI_Datatype datatype, int dest, int sendtag, int source, int recvtag, MPI_Comm comm, MPI_Status *status)

PPC SPRING MPI 130 Send/Recv Complex Data Types Across Heterogenous Sytems? You might have wondered “Why do MPI send/recv routing require to know the MPI data type as well as the size?” –Seems like redundant information… –Size is only 1 piece of the data puzzle… Byte order is the 2 nd piece!! Big Endian vs. Little Endian! –Different machine architectures have different byte orders.

PPC SPRING MPI 131 Byte Ordering How should bytes within multi-byte word be ordered in memory? Conventions –Older Mac’s are “Big Endian” machines Least significant byte has highest address –Intel PC’s are “Little Endian” machines Least significant byte has lowest address Currently.. –Bi-Endian hardware: ARM, PowerPC, DEC Alpha, SPARC V9, MIPS, PA-RISC, and IA64 Note: terms comes from Gulliver’s Travels which denotes which end of the soft boiled egg you should crack.. Big or little “endian”…

PPC SPRING MPI 132 Byte Ordering Example Big Endian –Least significant byte has highest address Little Endian –Least significant byte has lowest address Example –Variable x has 4-byte representation 0x –Address given by &x is 0x100 0x1000x1010x1020x x1000x1010x1020x Big Endian Little Endian Ouch!.. what a pain for message passing programs!!

PPC SPRING MPI 133 Pack/Unpack MPI Semantics Solves “Endian” problems #include { unsigned int membersize, maxsize; int position; int dest, tag; char *buffer; /* * Do this once. */ MPI_Pack_size(1, /* one element */ MPI_INT, /* datatype integer */ MPI_COMM_WORLD, /* consistent comm. */ &membersize); /* max packing space req'd */ maxsize = membersize; MPI_Pack_size(1024, MPI_CHAR, MPI_COMM_WORLD, &membersize); maxsize += membersize; buffer = malloc(maxsize); /* * Do this for every new message. */ position = 0; MPI_Pack(&scanline.lineno, /* pack this element */ 1, /* one element */ MPI_INT, /* datatype int */ buffer, /* packing buffer */ maxsize, /* buffer size */ &position, /* next free byte offset */ MPI_COMM_WORLD); /* consistent comm. */ MPI_Pack(scanline.pixels, 1024, MPI_CHAR, buffer, maxsize, &position, MPI_COMM_WORLD); MPI_Send(buffer, position, MPI_PACKED, dest, tag, MPI_COMM_WORLD); }

PPC SPRING MPI 134 Recv side MPI Unpack… { int src; int msgsize; MPI_Status status; MPI_Recv(buffer, maxsize, MPI_PACKED, src, tag, MPI_COMM_WORLD, &status); position = 0; MPI_Get_count(&status, MPI_PACKED, &msgsize); MPI_Unpack(buffer, /* packing buffer */ msgsize, /* buffer size */ &position, /* next element byte offset */ &scanline.lineno, /* unpack this element */ 1, /* one element */ MPI_INT, /* datatype int */ MPI_COMM_WORLD); /* consistent comm. */ MPI_Unpack(buffer, msgsize, &position, scanline.pixels, 1024, MPI_CHAR, MPI_COMM_WORLD); }

PPC SPRING MPI 135 Summary Parallel Programming Mental Model –Think of a network of workstations (NOWs) –Applications explicitly send messages and recv messages between processors/workstations. MPI provides a clean API for sending and receiving messages between computer systems that provides the illusion of larger, integrate parallel computer. –Blocking send/recv –Buffered send/recv –Simultaneous send/recv –Define new types –Marshall whole complex data structures across hetergeneous systems..