MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.

Slides:



Advertisements
Similar presentations
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Advertisements

MPI Collective Communications
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.
MPI_AlltoAllv Function Outline int MPI_Alltoallv ( void *sendbuf, int *sendcnts, int *sdispls, MPI_Datatype sendtype, void *recvbuf, int *recvcnts, int.
1 Friday, October 13, 2006 The biggest difference between time and space is that you can't reuse time. -M. Furst.
12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Point-to-Point Communication Self Test with solution.
SOME BASIC MPI ROUTINES With formal datatypes specified.
MPI Collective Communication CS 524 – High-Performance Computing.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Parallel Programming with Java
CS 179: GPU Programming Lecture 20: Cross-system communication.
Non-Blocking I/O CS550 Operating Systems. Outline Continued discussion of semaphores from the previous lecture notes, as necessary. MPI Types What is.
Parallel Programming with MPI Matthew Pratola
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
1 MPI Primer Lesson 10 2 What is MPI MPI is the standard for multi- computer and cluster message passing introduced by the Message-Passing Interface.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Its.unc.edu 1 University of North Carolina - Chapel Hill ITS Research Computing Instructor: Mark Reed Point to Point Communication.
Message Passing Interface (MPI) 2 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
MPI Derived Data Types and Collective Communication
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Odd-Even Sort Implementation Dr. Xiao Qin.
CS4402 – Parallel Computing
Introduction to MPI Programming
Computer Science Department
Parallel Programming with MPI and OpenMP
An Introduction to Parallel Programming with MPI
Distributed Systems CS
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Systems CS
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
4. Distributed Programming
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Introduction to Parallel Computing with MPI
Barriers implementations
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Synchronizing Computations
September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Computer Science Department
5- Message-Passing Programming
September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Parallel Processing - MPI
Presentation transcript:

MPI_Alltoall By: Jason Michalske

What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process J in the Ith block of the receive buffer. Simple Visual. Can be used to perform a transpose over multiple processors. – Example of this.

What is MPI_Alltoall? int MPI_Alltoall(void* sbuf, int scount, MPI_Datatype stype, void* rbuf, int rcount, MPI_Datatype rtype, MPI_Comm comm ) Scount of the type stype are sent from each process from the sbuf into the rbuf. In this version of Alltoall, the stype and rtype, and the scount and rcount must be the same.

Using MPI_Alltoall MPI_Alltoall(localA, sendcount, MPI_FLOAT, localB, recvcount, MPI_FLOAT, MPI_COMM_WORLD); Here, localA and localB are arrays of the (number of processors * the sendcount or recvcount) ( they are the same) floating point numbers. Example: my code.

How Does MPI_Alltoall work? Each process executes a send to each process (including itself) with a call to MPI_Send(sendbuf+I, sendcount, sendtype, I, …) and a receive from every other process with a call to MPI_Recv(recvbuf+I, recvcount, recvtype, I, …). Example: Simplest Solution: Conditional Statements.

Problems with MPI_Send and Conditional Statements Code is very repetitive for any number of processors. What about different data types? This solution would make the code very complex. What if a call to Receive is called on 2 processes and they wait for each other to finish. This yields a problem which will cause the execution to not finish.

Fixing the Problems Repetitive code is generally best solved by looping. If there are different data types between calls to MPI_Alltoall, then this can be dealt with much more easily because of the looping construct. The problem with MPI_Send or MPI_Recv is that it is a blocking function. This means that no code beyond that statement can be executed until it is locally complete.

Fixing the Problems To solve the blocking problem, you can use a non-blocking form of communication called MPI_Isend and MPI_Irecv. The send buffer may not be modified until the request has been completed by MPI_WAIT, MPI_TEST, or one of the other MPI wait or test functions.

Better Solution MPI_Send(sendbuf+I, sendcount, sendtype, I, …) and a receive from every other process with a call to MPI_Recv(recvbuf+I, recvcount, recvcount, I, …). if (sendtype == MPI_FLOAT) { fsendbuf = (float*)sendbuf; frecvbuf = (float*) recvbuf; for (source = 0; source < numtasks; source++){ dest = source; MPI_Isend((fsendbuf+(source*sendcount)), sendcount, sendtype, dest, tag, MPI_COMM_WORLD, &reqs[0]); MPI_Irecv((frecvbuf+(source*sendcount)), recvcount, recvtype, source, tag, MPI_COMM_WORLD, &reqs[1]); MPI_Waitall(2, reqs, stats); } recvbuf = frecvbuf; sendbuf = fsendbuf; }

Does the Code Work? With 6 processors and a sendcount of 1, my resulting receive buffers on each were as shown in the next visual (These are the same parameters from the graphic used to explain what Alltoall should do). With 10 processors and a sendcount of 2, my resulting receive buffers on each were as shown in the next visual.

Comparisons Until the data was near 60 million in size, both versions of MPI_Alltoall were very close in time. After 60 million, when either program did not exceed 600 seconds, my version seemed to be much faster. Printed out a sample location for verification -> my version was consistent with the real Alltoall

Conclusions The resulting Timing Data showed that both versions seemed to perform very closely with smaller data sets. As the Send and Receive Buffers got bigger, so did the time. – More data means bigger messages sent between the processors, which leads to the function taking longer. For the same length send and receive buffers, and an increasing number of processors, the time also increased. – More processors means more communications for the same data set, which leads to more function calls, resulting in increased times.