1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one.

Slides:



Advertisements
Similar presentations
MPI 2.2 William Gropp. 2 Scope of MPI 2.2 Small changes to the standard. A small change is defined as one that does not break existing correct MPI 2.0.
Advertisements

MPI Message Passing Interface Portable Parallel Programs.
1 Computer Science, University of Warwick Accessing Irregularly Distributed Arrays Process 0’s data arrayProcess 1’s data arrayProcess 2’s data array Process.
Parallel and Distributed Simulation Global Virtual Time - Part 2.
Asynchronous I/O with MPI Anthony Danalis. Basic Non-Blocking API  MPI_Isend()  MPI_Irecv()  MPI_Wait()  MPI_Waitall()  MPI_Waitany()  MPI_Test()
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
MPI Collective Communications
1 Collective Operations Dr. Stephen Tse Lesson 12.
Decision Trees and MPI Collective Algorithm Selection Problem Jelena Pje¡sivac-Grbovi´c,Graham E. Fagg, Thara Angskun, George Bosilca, and Jack J. Dongarra,
1 Implementing Master/Slave Algorithms l Many algorithms have one or more master processes that send tasks and receive results from slave processes l Because.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.
1 Parallel Computing—Higher-level concepts of MPI.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
1 Tuesday, October 03, 2006 If I have seen further, it is by standing on the shoulders of giants. -Isaac Newton.
MPI Collective Communication CS 524 – High-Performance Computing.
E.Papandrea 06/11/2003 DFCI COMPUTING - HW REQUIREMENTS1 Enzo Papandrea COMPUTING HW REQUIREMENT.
Topic Overview One-to-All Broadcast and All-to-One Reduction
Its.unc.edu 1 Derived Datatypes Research Computing UNC - Chapel Hill Instructor: Mark Reed
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Parallel Programming with Java
Collective Communication
1 MPI Datatypes l The data in a message to sent or received is described by a triple (address, count, datatype), where l An MPI datatype is recursively.
Basic Communication Operations Based on Chapter 4 of Introduction to Parallel Computing by Ananth Grama, Anshul Gupta, George Karypis and Vipin Kumar These.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Topic Overview One-to-All Broadcast and All-to-One Reduction All-to-All Broadcast and Reduction All-Reduce and Prefix-Sum Operations Scatter and Gather.
 Collectives on Two-tier Direct Networks EuroMPI – 2012 Nikhil Jain, JohnMark Lau, Laxmikant Kale 26 th September, 2012.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MPI (continue) An example for designing explicit message passing programs Advanced MPI concepts.
High Performance Computing Group Feasibility Study of MPI Implementation on the Heterogeneous Multi-Core Cell BE TM Architecture Feasibility Study of MPI.
Basic Communication Operations Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Reduced slides for CSCE 3030 To accompany the text ``Introduction.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
1 BİL 542 Parallel Computing. 2 Message Passing Chapter 2.
FIT5174 Parallel & Distributed Systems Dr. Ronald Pose Lecture FIT5174 Distributed & Parallel Systems Lecture 5 Message Passing and MPI.
ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 8 October 23, 2002 Nayda G. Santiago.
1 Using PMPI routines l PMPI allows selective replacement of MPI routines at link time (no need to recompile) l Some libraries already make use of PMPI.
On Optimizing Collective Communication UT/Texas Advanced Computing Center UT/Computer Science Avi Purkayastha Ernie Chan, Marcel Heinrich Robert van de.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
MPI Adriano Cruz ©2003 NCE/UFRJ e Adriano Cruz NCE e IM - UFRJ Summary n References n Introduction n Point-to-point communication n Collective.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
MPI Derived Data Types and Collective Communication
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
Collective Communication Implementations
Chun-Yuan Lin MPI-Programming training-1. Broadcast Sending same message to all processes concerned with problem. Multicast - sending same message to.
Collective Communication Implementations
MPI Message Passing Interface
An Introduction to Parallel Programming with MPI
Collective Communication Operations
More on MPI Nonblocking point-to-point routines Deadlock
MPI-Message Passing Interface
Collective Communication in MPI and Advanced Features
Collective Communication Implementations
Paraguin Compiler Version 2.1.
Paraguin Compiler Version 2.1.
Introduction to parallelism and the Message Passing Interface
More on MPI Nonblocking point-to-point routines Deadlock
Parallel build blocks.
Message Passing Programming Based on MPI
5- Message-Passing Programming
Introduction to Computer Science
Presentation transcript:

1 Introduction to Collective Operations in MPI l Collective operations are called by all processes in a communicator. MPI_BCAST distributes data from one process (the root) to all others in a communicator. MPI_REDUCE combines data from all processes in communicator and returns it to one process. In many numerical algorithms, SEND/RECEIVE can be replaced by BCAST/REDUCE, improving both simplicity and efficiency.

2 MPI Collective Communication l Communication and computation is coordinated among a group of processes in a communicator. l Groups and communicators can be constructed “by hand” or using topology routines. l Tags are not used; different communicators deliver similar functionality. l No non-blocking collective operations. l Three classes of operations: synchronization, data movement, collective computation.

3 Synchronization l MPI_Barrier( comm ) Blocks until all processes in the group of the communicator comm call it.

4 Collective Data Movement A B D C BCD A A A A Broadcast Scatter Gather A A P0 P1 P2 P3 P0 P1 P2 P3

5 More Collective Data Movement ABDC A0B0C0D0 A1B1C1D1 A3B3C3D3 A2B2C2D2 A0A1A2A3 B0B1B2B3 D0D1D2D3 C0C1C2C3 ABCD ABCD ABCD ABCD Allgather Alltoall P0 P1 P2 P3 P0 P1 P2 P3

6 Collective Computation P0 P1 P2 P3 P0 P1 P2 P3 A B C C A B D C ABCD A AB ABC ABCD Reduce Scan

7 MPI Collective Routines Many Routines: Allgather, Allgatherv, Allreduce, Alltoall, Alltoallv, Bcast, Gather, Gatherv, Reduce, ReduceScatter, Scan, Scatter, Scatterv All versions deliver results to all participating processes. l V versions allow the hunks to have different sizes. Allreduce, Reduce, ReduceScatter, and Scan take both built-in and user-defined combiner functions.

8 MPI Built-in Collective Computation Operations l MPI_Max l MPI_Min l MPI_Prod l MPI_Sum l MPI_Land l MPI_Lor l MPI_Lxor l MPI_Band l MPI_Bor l MPI_Bxor l MPI_Maxloc l MPI_Minloc Maximum Minimum Product Sum Logical and Logical or Logical exclusive or Binary and Binary or Binary exclusive or Maximum and location Minimum and location

9 Defining your own Collective Operations Create your own collective computations with: MPI_Op_create( user_fcn, commutes, &op ); MPI_Op_free( &op ); user_fcn( invec, inoutvec, len, datatype ); The user function should perform: inoutvec[i] = invec[i] op inoutvec[i]; for i from 0 to len-1. l The user function can be non-commutative.

10 When not to use Collective Operations l Sequences of collective communication can be pipelined for better efficiency l Example: Processor 0 reads data from a file and broadcasts it to all other processes. »Do i=1,m if (rank.eq. 0) read *, a call mpi_bcast( a, n, MPI_INTEGER, 0, comm, ierr ) EndDo »Takes m n log p time. l It can be done in (m+p) n time!

11 Pipeline the Messages l Processor 0 reads data from a file and sends it to the next process. Other forward the data. »Do i=1,m if (rank.eq. 0) then read *, a call mpi_send(a, n, type, 1, 0, comm,ierr) else call mpi_recv(a,n,type,rank-1, 0, comm,status,ierr) call mpi_send(a,n,type,next, 0, comm,ierr) endif EndDo

12 Concurrency between Steps l Broadcast: l Pipeline Time Another example of deferring synchronization Each broadcast takes less time then pipeline version, but total time is longer

13 Notes on Pipelining Example l Use MPI_File_read_all »Even more optimizations possible –Multiple disk reads –Pipeline the individual reads –Block transfers l Sometimes called “digital orrery” »Circular particles in n-body problem »Even better performance if pipeline never stops l “Elegance” of collective routines can lead to fine-grain synchronization »performance penalty

14 Implementation Variations l Implementations vary in goals and quality »Short messages (minimize separate communication steps) »Long messages (pipelining, network topology) l MPI’s general datatype rules make some algorithms more difficult to implement »Datatypes can be different on different processes; only the type signature must match

15 Using Datatypes in Collective Operations l Datatypes allow noncontiguous data to be moved (or computed with) l As for all MPI communications, only the type signature (basic, language defined types) must match »Layout in memory can differ on each process

16 Example of Datatypes in Collective Operations l Distribute a matrix from one processor to four »Processor 0 gets A(0:n/2,0:n/2), Processor 1 gets A(n/2+1:n,0:n/2), Processor 2 gets A(0:n/2,n/2+1:n), Processor 3 get A(n/2+1:n,n/2+1:n) l Scatter (One to all, different data to each) »Data at source is not contiguous (n/2 numbers, separated by n/2 numbers) »Use vector type to represent submatrix

17 Matrix Datatype l MPI_Type_vector( n/2 per block, n/2 blocks, dist from beginning of one block to next = n, MPI_DOUBLE_PRECISION, &subarray_type) l Can use this to send »Do j=0,1 Do i=0,1 call MPI_Send( a(1+i*n/2:i*n/2+n/2, 1+j*n/2:j*n/2+n/2),1, subarray_type, … ) »Note sending ONE type contain multiple basic elements

18 Scatter with Datatypes l Scatter is like »Do i=0,p-1 call mpi_send(a(1+i*extent(datatype)),….) –“1+” is from 1-origin indexing in Fortran »Extent is the distance from the beginning of the first to the end of the last data element »For subarray_type, it is ((n/2-1)n+n/2) * extent(double)

19 Layout of Matrix in Memory N = 8 example Process 0 Process 1 Process 2 Process 3

20 Using MPI_UB l Set Extent of each datatype to n/2 »Size of contiguous block all are built from l Use Scatterv (independent multiples of extent) l Location (beginning location) of blocks »Processor 0: 0 * 4 »Processor 1: 1 * 4 »Processor 2: 8 * 4 »Processor 3: 9 * 4 l MPI-2: Use MPI_Type_create_resized instead

21 Changing Extent l MPI_Type_struct »types(1) = subarray_type types(2) = MPI_UB displac(1) = 0 displac(2) = (n/2) * 8 ! Bytes! blklens(1) = 1 blklens(2) = 1 call MPI_Type_struct( 2, blklens, displac, types, newtype, ierr ) newtype contains all of the data of subarray_type. »Only change is “extent,” which is used only when computing where in a buffer to get or put data relative to other data

22 Scattering A Matrix l sdisplace(1) = 0 sdisplace(2) = 1 sdisplace(3) = n sdisplace(4) = n + 1 scounts(1,2,3,4)=1 call MPI_Scatterv( a, scounts, sdispls, newtype,& alocal, n*n/4, MPI_DOUBLE_PRECISION,& 0, comm, ierr ) »Note that process 0 sends 1 item of newtype but all processes receive n 2 /4 double precision elements l Exercise: Work this out and convince yourself that it is correct