MPI: the last episode By: Camilo A. Silva. Topics Modularity Data Types Buffer issues + Performance issues Compilation using MPICH2 Other topics: MPI.

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

1 Computer Science, University of Warwick Accessing Irregularly Distributed Arrays Process 0’s data arrayProcess 1’s data arrayProcess 2’s data array Process.
1 Non-Blocking Communications. 2 #include int main(int argc, char **argv) { int my_rank, ncpus; int left_neighbor, right_neighbor; int data_received=-1;
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Reference: / Point-to-Point Communication.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
Point-to-Point Communication Self Test with solution.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Lesson2 Point-to-point semantics Embarrassingly Parallel Examples.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Its.unc.edu 1 Derived Datatypes Research Computing UNC - Chapel Hill Instructor: Mark Reed
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.
1 What is message passing? l Data transfer plus synchronization l Requires cooperation of sender and receiver l Cooperation not always apparent in code.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
1 MPI Datatypes l The data in a message to sent or received is described by a triple (address, count, datatype), where l An MPI datatype is recursively.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
Parallel Programming with MPI By, Santosh K Jena..
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.
1 Lecture 4: Part 2: MPI Point-to-Point Communication.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
Project18’s Communication Drawing Design By: Camilo A. Silva BIOinformatics Summer 2008.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
1 Using PMPI routines l PMPI allows selective replacement of MPI routines at link time (no need to recompile) l Some libraries already make use of PMPI.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
MPI Derived Data Types and Collective Communication
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Lecture 3 Point-to-Point Communications Dr. Muhammad Hanif Durad Department of Computer and Information Sciences Pakistan Institute Engineering and Applied.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Introduction to parallel computing concepts and technics
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Message Passing Interface
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
More on MPI Nonblocking point-to-point routines Deadlock
MPI-Message Passing Interface
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
Quiz Questions ITCS 4145/5145 Parallel Programming MPI
Introduction to parallelism and the Message Passing Interface
More on MPI Nonblocking point-to-point routines Deadlock
Message Passing Programming Based on MPI
Message-Passing Computing Message Passing Interface (MPI)
5- Message-Passing Programming
Presentation transcript:

MPI: the last episode By: Camilo A. Silva

Topics Modularity Data Types Buffer issues + Performance issues Compilation using MPICH2 Other topics: MPI objects, tools for evaluating programs, and multiple program connection

Modularity What is a modular design? -The basic idea underlying modular design is to organize a complex system (such as a large program, an electronic circuit, or a mechanical device) as a set of distinct components that can be developed independently and then plugged together.

Why is it important? Programs may need to incorporate multiple parallel algorithms Large programs can be controlled by using modular designs Modular design increases reliability and reduces costs

Modular design principles Provide simple interfaces Ensure that modules hide information Usage of appropriate tools

Modular design checklist The following design checklist can be used to evaluate the success of a modular design. As usual, each question should be answered in the affirmative. 1.Does the design identify clearly defined modules? 2.Does each module have a clearly defined purpose? (Can you summarize it in one sentence?) 3.Is each module's interface sufficiently abstract that you do not need to think about its implementation in order to understand it? Does it hide its implementation details from other modules? 4.Have you subdivided modules as far as usefully possible? 5.Have you verified that different modules do not replicate functionality? 6.Have you isolated those aspects of the design that are most hardware specific, complex, or otherwise likely to change?

Applying modularity in parallel programs Three (3) general forms of modular composition exist in parallel programs: sequential, parallel, and concurrent

Applying modularity using MPI MPI supports modular programming –Provides information hiding –Encapsulates internal communication Communicators are always specified by an MPI communication –Identifies the process group –identifies the context

Implementing flexibility in communicators In the previous discussions, all communication operations have used the default communicator MPI_COMM_WORLD, which incorporates all processes involved in an MPI computation and defines a default context. There are other functions that add flexibility to the communicator and its context: –MPI_COMM_DUP –MPI_COMM_SPLIT –MPI_INTERCOMM_CREATE –MPI_COMM_FREE.

Details of functions

Creating communicators A call of the form MPI_COMM_DUP(comm, newcomm) creates a new communicator newcomm comprising the same processes as comm but with a new context. integer comm, newcomm, ierr ! Handles are integers... call MPI_COMM_DUP(comm, newcomm, ierr) ! Create new context call transpose(newcomm, A) ! Pass to library call MPI_COMM_FREE(newcomm, ierr) ! Free new context

Partitioning processes The term parallel composition is used to denote the parallel execution of two or more program components on disjoint sets of processors Program 1: MPI_Comm comm, newcomm; int myid, color; MPI_Comm_rank(comm, &myid); color = myid%3; MPI_Comm_split(comm, color, myid, &newcomm); Program 2: MPI_Comm comm, newcomm; int myid, color; MPI_Comm_rank(comm, &myid); if (myid < 8) /* Select first 8 processes */ color = 1; else /* Others are not in group */ color = MPI_UNDEFINED; MPI_Comm_split(comm, color, myid, &newcomm);

Communicating between groups

Datatypes CODE 1 call MPI_TYPE_CONTIGUOUS(10, MPI_REAL, tenrealtype, ierr) call MPI_TYPE_COMMIT(tenrealtype, ierr) call MPI_SEND(data, 1, tenrealtype, dest, tag, $ MPI_COMM_WORLD, ierr) CALL MPI_TYPE_FREE(tenrealtype, ierr) CODE 2 float data[1024]; MPI_Datatype floattype; MPI_Type_vector(10, 1, 32, MPI_FLOAT, &floattype); MPI_Type_commit (&floattype); MPI_Send(data, 1, float type, dest, tag, MPI_COMM_WORLD); MPI_Type_free(&floattype);

Heterogeneity MPI datatypes have two main purposes Heterogenity --- parallel programs between different processors Noncontiguous data --- structures, vectors with non-unit stride, etc. Basic datatype, corresponding to the underlying language, are predefined. The user can construct new datatypes at run time; these are called derived datatypes.

Datatypes Elementary: –Language-defined types (e.g., MPI_INT or MPI_DOUBLE_PRECISION ) Vector: –Separated by constant ``stride'' Contiguous: –Vector with stride of one Hvector: –Vector, with stride in bytes Indexed: –Array of indices (for scatter/gather) Hindexed: –Indexed, with indices in bytes Struct: –General mixed types (for C structs etc.)

Vectors To specify this row (in C order), we can use MPI_Type_vector( count, blocklen, stride, oldtype, &newtype ); MPI_Type_commit( &newtype ); The exact code for this is MPI_Type_vector( 5, 1, 7, MPI_DOUBLE, &newtype ); MPI_Type_commit( &newtype );

Structures Structures are described by arrays of number of elements (array_of_len) displacement or location (array_of_displs) datatype (array_of_types) MPI_Type_structure( count, array_of_len, array_of_displs, array_of_types, &newtype );

Structure example

Buffering Issues Where does data go when you send it? One possibility is:

Better buffering This is not very efficient. There are three copies in addition to the exchange of data between processes. We prefer But this requires that either that MPI_Send not return until the data has been delivered or that we allow a send operation to return before completing the transfer. In this case, we need to test for completion later.

Blocking + Non-blocking communication So far we have used blocking communication: -- MPI_Send does not complete until buffer is empty (available for reuse). -- MPI_Recv does not complete until buffer is full (available for use). Simple, but can be ``unsafe'': –Completion depends in general on size of message and amount of system buffering.

Solutions to the “unsafe” problem Order the operations more carefully: Supply receive buffer at same time as send, with MPI_Sendrecv: Use non-blocking operations: Use MPI_Bsend

Non blocking operations Non-blocking operations return (immediately) ``request handles'' that can be waited on and queried: MPI_Isend(start, count, datatype, dest, tag, comm, request) MPI_Irecv(start, count, datatype, dest, tag, comm, request) MPI_Wait(request, status) One can also test without waiting: MPI_Test( request, flag, status)

Multiple completions It is often desirable to wait on multiple requests. An example is a master/slave program, where the master waits for one or more slaves to send it a message. MPI_Waitall(count, array_of_requests, array_of_statuses) MPI_Waitany(count, array_of_requests, index, status) MPI_Waitsome(incount, array_of_requests, outcount, array_of_indices, array_of_statuses) There are corresponding versions of test for each of these.

Fairness

An parallel algorithm is fair if no process is effectively ignored. In the preceeding program, processes with low rank (like process zero) may be the only one whose messages are received. MPI makes no guarentees about fairness. However, MPI makes it possible to write efficient, fair programs.

Communication Modes MPI provides mulitple modes for sending messages: Synchronous mode ( MPI_Ssend): the send does not complete until a matching receive has begun. (Unsafe programs become incorrect and usually deadlock within an MPI_Ssend.) Buffered mode ( MPI_Bsend): the user supplies the buffer to system for its use. (User supplies enough memory to make unsafe program safe). Ready mode ( MPI_Rsend): user guarantees that matching receive has been posted. -- allows access to fast protocols -- undefined behavior if the matching receive is not posted Non-blocking versions: MPI_Issend, MPI_Irsend, MPI_Ibsend Note that an MPI_Recv may receive messages sent with any send mode.

Buffered Send MPI provides a send routine that may be used when MPI_Isend is awkward to use (e.g., lots of small messages). MPI_Bsend makes use of a user-provided buffer to save any messages that can not be immediately sent. int bufsize; char *buf = malloc(bufsize); MPI_Buffer_attach( buf, bufsize );... MPI_Bsend(... same as MPI_Send... );... MPI_Buffer_detach( &buf, &bufsize ); The MPI_Buffer_detach call does not complete until all messages are sent.

Performance Issues

MPICH2 MPICH2 is an all-new implementation of the MPI Standard, designed to implement all of the MPI-2 additions to MPI (dynamic process management, one-sided operations, parallel I/O, and other extensions) and to apply the lessons learned in implementing MPICH1 to make MPICH2 more robust, efficient, and convenient to use.

MPICH2: MPI compilation basic info 1.mpiexec -n 32 a.out 2.mpiexec -n 1 -host loginnode master : -n 32 -host smp slave 3.mpdtrace

Other topics: MPI Objects MPI has a variety of objects (communicators, groups, datatypes, etc.) that can be created and destroyed

MPI Objects MPI_Request –Handle for nonblocking communication, normally freed by MPI in a test or wait MPI_Datatype –MPI datatype. Free with MPI_Type_free. MPI_Op –User-defined operation. Free with MPI_Op_free. MPI_Comm –Communicator. Free with MPI_Comm_free. MPI_Group –Group of processes. Free with MPI_Group_free. MPI_Errhandler –MPI errorhandler. Free with MPI_Errhandler_free.

Freeing objects MPI_Type_vector( ly, 1, nx, MPI_DOUBLE, &newx1 ); MPI_Type_hvector( lz, 1, nx*ny*sizeof(double), newx1, &newx ); MPI_Type_free( &newx1 ); MPI_Type_commit( &newx );

Other topics: tools for evaluating programs MPI provides some tools for evaluating the performance of parallel programs. These are: –Timer –Profiling interface

MPI Timer The elapsed (wall-clock) time between two points in an MPI program can be computed using MPI_Wtime: double t1, t2; t1 = MPI_Wtime();... t2 = MPI_Wtime(); printf( "Elapsed time is %f\n", t2 - t1 ); The value returned by a single call to MPI_Wtime has little value.

MPI Profiling Mechanisms All routines have two entry points: MPI_... and PMPI_.... This makes it easy to provide a single level of low-overhead routines to intercept MPI calls without any source code modifications. Used to provide ``automatic'' generation of trace files. static int nsend = 0; int MPI_Send( start, count, datatype, dest, tag, comm ) { nsend++; return PMPI_Send( start, count, datatype, dest, tag, comm ) }

Profiling routines

Log Files

Creating Log Files This is very easy with the MPICH implementation of MPI. Simply replace -lmpi with -llmpi -lpmpi -lm in the link line for your program, and relink your program. You do not need to recompile. On some systems, you can get a real-time animation by using the libraries -lampi -lmpe -lm -lX11 -lpmpi. Alternately, you can use the -mpilog or -mpianim options to the mpicc or mpif77 commands.

Other topics: connecting several programs together MPI provides support for connection separate message- passing programs together through the use of intercommunicators.

Exchanging data between programs Form intercommunicator (MPI_INTERCOMM_CREATE) Send data MPI_Send(..., 0, intercomm ) MPI_Recv( buf,..., 0, intercomm ); MPI_Bcast( buf,..., localcomm ); More complex point-to-point operations can also be used

Collective operations Use MPI_INTERCOMM_MERGE to create an intercommunicator.

Conclusion So we learned: P2P, Collective, and asynchronous communications Modular programming techniques Data types MPICH2 basic compilation info Important and handy tools

References unix.mcs.anl.gov/dbpp/text/node1.htmlhttp://www- unix.mcs.anl.gov/dbpp/text/node1.html unix.mcs.anl.gov/mpi/tutorial/gropp/talk.ht ml#Node0http://www- unix.mcs.anl.gov/mpi/tutorial/gropp/talk.ht ml#Node0