COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.

Slides:



Advertisements
Similar presentations
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Advertisements

MPI Collective Communications
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
Message Passing Interface COS 597C Hanjun Kim. Princeton University Serial Computing 1k pieces puzzle Takes 10 hours.
HPDC Spring MPI 11 CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs. 2 – 3:20 p.m Message Passing Interface.
SOME BASIC MPI ROUTINES With formal datatypes specified.
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Programming Using the Message Passing Paradigm Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Sahalu JunaiduICS 573: High Performance Computing6.1 Programming Using the Message Passing Paradigm Principles of Message-Passing Programming The Building.
Message Passing Programming Carl Tropper Department of Computer Science.
CS 179: GPU Programming Lecture 20: Cross-system communication.
Parallel Programming with MPI Matthew Pratola
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Chapter 6 Parallel Sorting Algorithm Sorting Parallel Sorting Bubble Sort Odd-Even (Transposition) Sort Parallel Odd-Even Transposition Sort Related Functions.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
MA471Fall 2003 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Steve Lantz Computing and Information Science Distributed Memory Programming Using Advanced MPI (Message Passing Interface)
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
11/04/2010CS4961 CS4961 Parallel Programming Lecture 19: Message Passing, cont. Mary Hall November 4,
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
L17: MPI, cont. October 25, Final Project Purpose: -A chance to dig in deeper into a parallel programming model and explore concepts. -Research.
1. 2 The logical view of a machine supporting the message-passing paradigm consists of p processes, each with its own exclusive address space. The logical.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Odd-Even Sort Implementation Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
CS4402 – Parallel Computing
Computer Science Department
Send and Receive.
Collective Communication with MPI
Collective Communication Operations
Send and Receive.
Distributed Systems CS
To accompany the text ``Introduction to Parallel Computing'',
Programming Using the Message Passing Model
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Systems CS
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Computer Science Department
5- Message-Passing Programming
September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Parallel Processing - MPI
Presentation transcript:

COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao Qin Auburn University

MPI_Bcast: Bad Example 3

MPI_Bcast: Good Example 4

MPI_Bcast: Another Example 5

Gather and Scatter The gather operation: a single node collects a unique message from each node. The scatter operation: a single node sends a unique message of size m to every other node (also called a one-to-all personalized communication). The scatter operation is different from broadcast, the algorithmic structure is similar, except for differences in message sizes (messages get smaller in scatter and stay constant in broadcast).

Gather The gather operation is performed in MPI using: int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int target, MPI_Comm comm) int MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int target, MPI_Comm comm) Q1: What are differences between the Gather and Reduce operations? Each process sends data stored in sendbuf to the target process. The data received in the recvbuf of the target process in a rank order. Data from process i are stored in recvbuf starting at location i*sendcount.

Allgather int MPI_Gather(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int target, MPI_Comm comm) MPI also provides the MPI_Allgather function in which the data are gathered at all the processes. int MPI_Allgather(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, MPI_Comm comm) Q2: What are differences between the Gather and Allgather operations?

Scatter The scatter operation is performed in MPI using: int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype senddatatype, void *recvbuf, int recvcount, MPI_Datatype recvdatatype, int source, MPI_Comm comm) int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int source, MPI_Comm comm) Q3: What are differences between the Scatter and Broadcast operations? Process i receives sendcount contiguous elements of type senddatatype starting from the i*sendcount location of the sendbuf of the source process.

Overlapping Communication with Computation MPI provides non-blocking send and receive operations. int MPI_Isend(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm, MPI_Request *request) int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Request *request) These operations return before the operations have been completed. int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status) Q4: What are differences between these blocking and non-blocking operations?

What is the “MPI_Request” parameter? Function MPI_Test tests whether or not the non-blocking send or receive operation identified by its request has finished. int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status) MPI_Wait waits for the operation to complete. int MPI_Wait(MPI_Request *request, MPI_Status *status)

Exercise: Ring (Non-blocking Communication) Goal: communicate its rank around a ring. The sum of all ranks is then accumulated and printed out by each processor. Each processor stores its rank in MPI_COMM_WORLD as an integer and sends this value to the processor on its right. It then receives an integer from its left neighbor. Keep track of the sum of all the integers received. The processors continue passing on the values they receive until they get their own rank back. Each process should finish by printing out the sum of the values. 12

Ring (Non-blocking Communication) 13 Use synchronous non-blocking send MPI_Issend(). Please do not overwrite information in your implementation. We use synchronous message passing, because the standard send can be either buffered or synchronous.

Ring - Non-blocking Communication 14 void main(int argc, char *argv[]) { int myrank, nprocs, leftid, rightid; int val, sum, tmp; MPI_Status recv_status, send_status; MPI_Request send_request; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &myrank); MPI_Comm_size(MPI_COMM_WORLD, &nprocs);......

How to find neighbors? 15 if ((leftid=(myrank-1)) < 0) leftid = nprocs-1; if ((rightid=(myrank+1)) == nprocs) rightid = 0;

How to implement the while loop? 16 val = myrank; sum = 0; while (val != myrank) { val = tmp; sum += val; }

How to use MPI_Issend() in the loop body? 17 val = myrank; sum = 0; do { MPI_Issend(&val,1,MPI_INT,rightid, 99,MPI_COMM_WORLD,&send_request); MPI_Recv(&tmp,1,MPI_INT,leftid, 99,MPI_COMM_WORLD,&recv_status); MPI_Wait(&send_request,&send_status); val = tmp; sum += val; } while (val != myrank); Why we need two variables (i.e., val and tmp ) here?

Ring (Blocking Communication) 18