September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.

Slides:



Advertisements
Similar presentations
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
SOME BASIC MPI ROUTINES With formal datatypes specified.
Please visit our web site: Point-to-Point Communication.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Parallel Programming with Java
CS 179: GPU Programming Lecture 20: Cross-system communication.
Parallel Programming with MPI Matthew Pratola
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Specialized Sending and Receiving David Monismith CS599 Based upon notes from Chapter 3 of the MPI 3.0 Standard
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
1 MPI Primer Lesson 10 2 What is MPI MPI is the standard for multi- computer and cluster message passing introduced by the Message-Passing Interface.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
1 Overview on Send And Receive routines in MPI Kamyar Miremadi November 2004.
Parallel Programming with MPI By, Santosh K Jena..
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Its.unc.edu 1 University of North Carolina - Chapel Hill ITS Research Computing Instructor: Mark Reed Point to Point Communication.
Distributed Systems CS Programming Models Gregory Kesden Borrowed and adapted from our good friends at CMU-Doha, Qatar Majd F. Sakr, Mohammad Hammoud.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
MPI Send/Receive Blocked/Unblocked Josh Alexander, University of Oklahoma Ivan Babic, Earlham College Andrew Fitz Gibbon, Shodor Education Foundation Inc.
Chapter 5. Nonblocking Communication MPI_Send, MPI_Recv are blocking operations Will not return until the arguments to the functions can be safely modified.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
Distributed Systems CS Programming Models- Part II Lecture 14, Oct 28, 2013 Mohammad Hammoud 1.
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Odd-Even Sort Implementation Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
CS4402 – Parallel Computing
Introduction to MPI Programming
MPI Point to Point Communication
Computer Science Department
Send and Receive.
Collective Communication with MPI
CS4961 Parallel Programming Lecture 17: Message Passing, cont
An Introduction to Parallel Programming with MPI
Distributed Systems CS
More on MPI Nonblocking point-to-point routines Deadlock
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Systems CS
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
High Performance Parallel Programming
MPI: Message Passing Interface
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
More on MPI Nonblocking point-to-point routines Deadlock
Introduction to Parallel Computing with MPI
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Barriers implementations
Message-Passing Computing Message Passing Interface (MPI)
Synchronizing Computations
Computer Science Department
5- Message-Passing Programming
September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Parallel Processing - MPI
Presentation transcript:

September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived from chapters 13 in Pacheco Parallel Processing

September 4, 1997 Introduction Objective: To further examine message passing communication patterns. Topics Implementing Allgather Ring Hypercube Non-blocking send/recv MPI_Isend MPI_Wait MPI_Test Parallel Processing

Broadcast/Reduce Ring P3 P2 P3 P2 P0 P1 P0 P1 P3 P2 P3 P2 P0 P1 P0 P1 Parallel Processing

Bi-directional Broadcast Ring P3 P2 P3 P2 P0 P1 P0 P1 P3 P2 P0 P1 Parallel Processing

Allgather Ring P3 P2 P3 P2 x3 x2 x2,x3 x1,x2 x0 x1 x0,x3 x0,x1 P0 P1 Parallel Processing

AllGather int MPI_AllGather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 x0 Process 1 x1 Process 2 x2 Process 3 x3 Parallel Processing

Allgather_ring void Allgather_ring(float x[], int blocksize, float y[], MPI_Comm comm) { int i, p, my_rank; int successor, predecessor; int send_offset, recv_offset; MPI_Status status; MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; successor = (my_rank + 1) % p; predecessor = (my_rank – 1 + p) % p; Parallel Processing

Allgather_ring for (i=0; i < p-1; i++) { send_offset = ((my_rank – i + p) % p)*blocksize; recv_offset = ((my_rank –i – 1+p) % p)*blocksize; MPI_Send(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm); MPI_Recv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&status); } Parallel Processing

Hypercube Graph (recursively defined) n-dimensional cube has 2n nodes with each node connected to n vertices Binary labels of adjacent nodes differ in one bit 000 001 101 100 010 011 110 111 00 01 10 11 1 Parallel Processing

Broadcast/Reduce 000 001 101 100 010 011 110 111 Parallel Processing

Allgather 000 001 101 100 010 011 110 111 Parallel Processing

Allgather 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Parallel Processing

Allgather_cube void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) { int i, d, p, my_rank; unsigned eor_bit, and_bits; int stage, partner; MPI_Datatype hole_type; int send_offset, recv_offset; MPI_Status status; int log_base2(int p); MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; d = log_base2(p); eor_bit = 1 << (d-1); and_bits = (1 << d) – 1; Parallel Processing

Allgather_cube for (stage = 0; stage < d; stage++) { partner = my_rank ^ eor_bit; send_offset = (my_rank & and_bits) * blocksize; recv_offset = (partner & and_bits)*blocksize; MPI_Type_vector(1 << stage, blocksize, (1 << (d-stage)*blocksize, MPI_FLOAT,&hold_type); MPI_Type_commit(&hole_type); MPI_Send(y+send_offset,1,hole_type,partner, 0, comm); MPI_Recv(y+recv_offset,1,hole_type,partner, 0, comm,&status); MPI_Type_free(&hole_type); eor_bit = eor_bit >> 1; and_bits = and_bits >> 1; } Parallel Processing

Buffering Assumption Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur. SendRecv can be used to guarantee that deadlock does not occur. Parallel Processing

SendRecv int MPI_Sendrecv( void* send_buf /* in */, int send_count /* in */, MPI_Datatype send_type /* in */, int dest /* in */, int send_tag /* in */, void* recv_buf /* out */, int recv_count /* in */, MPI_Datatype recv_type /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */) Parallel Processing

SendRecvReplace int MPI_Sendrecv_replace( void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int dest /* in */, int send_tag /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */) Parallel Processing

Nonblocking Send/Recv Allow overlap of communication and computation. Does not wait for buffer to be copied or receive to occur. The communication is posted and can be tested later for completion int MPI_Isend( /* Immediate */ void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int dest /* in */, int tag /* in */, MPI_Comm comm /* in */, MPI_Request* request /* out */) Parallel Processing

Nonblocking Send/Recv int MPI_Irecv( void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int source /* in */, int tag /* in */, MPI_Comm comm /* in */, MPI_Request* request /* out */) int MPI_Wait( MPI_Request* request /* in/out a*/, MPI_Status* status /* out */) int MPI_Test(MPI_Request* request, int * flat, MPI_Status* status); Parallel Processing

Allgather_ring (Overlapped) recv_offset = ((my_rank –1 + p) % p)*blocksize; for (i=0; i < p-1; i++) { MPI_ISend(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm, &send_request); MPI_IRecv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&recv_request); send_offset = ((my_rank – i -1 + p) % p)*blocksize; recv_offset = ((my_rank – i – 2 +p) % p)*blocksize; MPI_Wait(&send_request, &status); MPI_Wait(&recv_request, &status); } Parallel Processing

AllGather int MPI_AllGather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 x0 Process 1 x1 Process 2 x2 Process 3 x3 Parallel Processing

Alltoall int MPI_Alltoall( void* send_buffer /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_buffer /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 00 01 02 03 00 10 20 30 Process 1 10 11 12 13 01 11 21 31 Process 2 20 21 22 23 02 12 22 32 Process 3 30 31 32 33 03 13 23 33 Parallel Processing

AlltoAll Sequence of permutations implemented with send_recv 1 2 3 4 5 1 2 3 4 5 6 7 Parallel Processing

AlltoAll (2 way) Sequence of permutations implemented with send_recv 1 1 2 3 4 5 6 7 Parallel Processing

Communication Modes Synchronous (wait for receive) Ready (make sure receive has been posted) Buffered (user provides buffer space) Parallel Processing