Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.

Similar presentations


Presentation on theme: "September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived."— Presentation transcript:

1 September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived from chapters 13 in Pacheco Oct. 30, 2002 Parallel Processing

2 September 4, 1997 Introduction Objective: To further examine message passing communication patterns. Topics Implementing Allgather Ring Hypercube Non-blocking send/recv MPI_Isend MPI_Wait MPI_Test Oct. 30, 2002 Parallel Processing

3 Broadcast/Reduce Ring
P3 P2 P3 P2 P0 P1 P0 P1 P3 P2 P3 P2 P0 P1 P0 P1 Oct. 30, 2002 Parallel Processing

4 Bi-directional Broadcast Ring
P3 P2 P3 P2 P0 P1 P0 P1 P3 P2 P0 P1 Oct. 30, 2002 Parallel Processing

5 Allgather Ring P3 P2 P3 P2 x3 x2 x2,x3 x1,x2 x0 x1 x0,x3 x0,x1 P0 P1
Oct. 30, 2002 Parallel Processing

6 AllGather int MPI_AllGather( void* send_data /* in */
int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 x0 Process 1 x1 Process 2 x2 Process 3 x3 Oct. 30, 2002 Parallel Processing

7 Allgather_ring void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) { int i, p, my_rank; int successor, predecessor; int send_offset, recv_offset; MPI_Status status; MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; successor = (my_rank + 1) % p; predecessor = (my_rank – 1 + p) % p; Oct. 30, 2002 Parallel Processing

8 Allgather_ring for (i=0; i < p-1; i++) {
send_offset = ((my_rank – i + p) % p)*blocksize; recv_offset = ((my_rank –i – 1+p) % p)*blocksize; MPI_Send(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm); MPI_Recv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&status); } Oct. 30, 2002 Parallel Processing

9 Hypercube Graph (recursively defined)
n-dimensional cube has 2n nodes with each node connected to n vertices Binary labels of adjacent nodes differ in one bit 000 001 101 100 010 011 110 111 00 01 10 11 1 Oct. 30, 2002 Parallel Processing

10 Broadcast/Reduce 000 001 101 100 010 011 110 111 Oct. 30, 2002 Parallel Processing

11 Allgather 000 001 101 100 010 011 110 111 Oct. 30, 2002 Parallel Processing

12 Allgather 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 1 2 3 4 5 6 7 Oct. 30, 2002 Parallel Processing

13 Allgather_cube void Allgather_cube(float x[], int blocksize, float y[], MPI_Comm comm) { int i, d, p, my_rank; unsigned eor_bit, and_bits; int stage, partner; MPI_Datatype hole_type; int send_offset, recv_offset; MPI_Status status; int log_base2(int p); MPI_Comm_size(comm, &p); MPI_Comm_Rank(comm, &my_rank); for (i=0; i < blocksize; i++) y[i + my_rank*blocksize] = x[i]; d = log_base2(p); eor_bit = 1 << (d-1); and_bits = (1 << d) – 1; Oct. 30, 2002 Parallel Processing

14 Allgather_cube for (stage = 0; stage < d; stage++) {
partner = my_rank ^ eor_bit; send_offset = (my_rank & and_bits) * blocksize; recv_offset = (partner & and_bits)*blocksize; MPI_Type_vector(1 << stage, blocksize, (1 << (d-stage)*blocksize, MPI_FLOAT,&hold_type); MPI_Type_commit(&hole_type); MPI_Send(y+send_offset,1,hold_type,partner, 0, comm); MPI_Recv(y+recv_offset,1,hold_type,partner, 0, comm,&status); MPI_Type_free(&hole_type); eor_bit = eor_bit >> 1; and_bits = and_bits >> 1; } Oct. 30, 2002 Parallel Processing

15 Buffering Assumption Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur. SendRecv can be used to guarantee that deadlock does not occur. Oct. 30, 2002 Parallel Processing

16 SendRecv int MPI_Sendrecv( void* send_buf /* in */,
int send_count /* in */, MPI_Datatype send_type /* in */, int dest /* in */, int send_tag /* in */, void* recv_buf /* out */, int recv_count /* in */, MPI_Datatype recv_type /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */) Oct. 30, 2002 Parallel Processing

17 SendRecvReplace int MPI_Sendrecv_replace( void* buffer /* in */,
int count /* in */, MPI_Datatype datatype /* in */, int dest /* in */, int send_tag /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */) Oct. 30, 2002 Parallel Processing

18 Nonblocking Send/Recv
Allow overlap of communication and computation. Does not wait for buffer to be copied or receive to occur. The communication is posted and can be tested later for completion int MPI_Isend( /* Immediate */ void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int dest /* in */, int tag /* in */, MPI_Comm comm /* in */, MPI_Request* request /* out */) Oct. 30, 2002 Parallel Processing

19 Nonblocking Send/Recv
int MPI_Irecv( void* buffer /* in */, int count /* in */, MPI_Datatype datatype /* in */, int source /* in */, int tag /* in */, MPI_Comm comm /* in */, MPI_Request* request /* out */) int MPI_Wait( MPI_Request* request /* in/out a*/, MPI_Status* status /* out */) int MPI_Test(MPI_Request* request, int * flat, MPI_Status* status); Oct. 30, 2002 Parallel Processing

20 Allgather_ring (Overlapped)
recv_offset = ((my_rank –1 + p) % p)*blocksize; for (i=0; i < p-1; i++) { MPI_ISend(y + send_offset,blocksize,MPI_FLOAT, successor, 0, comm, &send_request); MPI_IRecv(y + rec_offset,blocksize,MPI_FLOAT,predecessor,0, comm,&recv_request); send_offset = ((my_rank – i p) % p)*blocksize; recv_offset = ((my_rank – i – 2 +p) % p)*blocksize; MPI_Wait(&send_request, &status); MPI_Wait(&recv_request, &status); } Oct. 30, 2002 Parallel Processing

21 AlltoAll Sequence of permutations implemented with send_recv 1 2 3 4 5
1 2 3 4 5 6 7 Oct. 30, 2002 Parallel Processing

22 AlltoAll (2 way) Sequence of permutations implemented with send_recv 1
1 2 3 4 5 6 7 Oct. 30, 2002 Parallel Processing

23 Communication Modes Synchronous (wait for receive)
Ready (make sure receive has been posted) Buffered (user provides buffer space) Oct. 30, 2002 Parallel Processing


Download ppt "September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived."

Similar presentations


Ads by Google