1 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI Part III NPACI Parallel Computing Institute.

1 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI Part III NPACI Parallel Computing Institute August 19 - 23, 2002 San Diego Supercomputer Center

2 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Point to Point Communications in MPI Basic operations of Point to Point (PtoP) communication and issues of deadlock Several steps are involved in the PtoP communication Sending process –data is copied to the user buffer by the user –User calls one of the MPI send routines –System copies the data from the user buffer to the system buffer –System sends the data from the system buffer to the destination processor

3 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Point to Point Communications in MPI Receiving process –User calls one of the MPI receive subroutines –System receives the data from the source process, and copies it to the system buffer –System copies the data from the system buffer to the user buffer –User uses the data in the user buffer

4 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE sendbuf Call send routine Now sendbuf can be reused Process 0 : User mode Kernel mode Copying data from sendbuf to systembuf Send data from sysbuf to dest data Process 1 : User modeKernel mode Call receive routine receive data from src to systembuf Copying data from sysbuf to recvbuf sysbuf recvbuf Now recvbuf contains valid data

5 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Unidirectional communication Blocking send and blocking receive if (myrank == 0) then call MPI_Send(…) elseif (myrank == 1) then call MPI_Recv(….) endif Non-blocking send and blocking receive if (myrank == 0) then call MPI_ISend(…) call MPI_Wait(…) else if (myrank == 1) then call MPI_Recv(….) endif

6 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Blocking send and non-blocking recv if (myrank == 0 ) then call MPI_Send(…..) elseif (myrank == 1) then call MPI_Irecv (…) call MPI_Wait(…) endif Non-blocking send and non-blocking recv if (myrank == 0 ) then call MPI_Isend (…) call MPI_Wait (…) elseif (myrank == 1) then call MPI_Irecv (….) call MPI_Wait(..) endif Unidirectional communication

7 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Bidirectional communication Need to be careful about deadlock when two processes exchange data with each other Deadlock can occur due to incorrect order of send and recv or due to limited size of the system buffer sendbuf recvbuf Rank 0 Rank 1 recvbuf sendbuf

8 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Bidirectional communication Case 1 : both processes call send first, then recv if (myrank == 0 ) then call MPI_Send(….) call MPI_Recv (…) elseif (myrank == 1) then call MPI_Send(….) call MPI_Recv(….) endif No deadlock as long as system buffer is larger than send buffer Deadlock if system buffer is smaller than send buf If you replace MPI_Send with MPI_Isend and MPI_Wait, it is still the same Moral : there may be error in coding that only shows up for larger problem size

9 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Bidirectional communication Following is free from deadlock if (myrank == 0 ) then call MPI_Isend(….) call MPI_Recv (…) call MPI_Wait(…) elseif (myrank == 1) then call MPI_Isend(….) call MPI_Recv(….) call MPI_Wait(….) endif

10 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Bidirectional communication Case 2 : both processes call recv first, then send if (myrank == 0 ) then call MPI_Recv(….) call MPI_Send (…) elseif (myrank == 1) then call MPI_Recv(….) call MPI_Send(….) endif The above will always lead to deadlock (even if you replace MPI_Send with MPI_Isend and MPI_Wait)

11 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Bidirectional communication The following code can be safely executed if (myrank == 0 ) then call MPI_Irecv(….) call MPI_Send (…) call MPI_Wait(…) elseif (myrank == 1) then call MPI_Irecv(….) call MPI_Send(….) call MPI_Wait(….) endif

12 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Bidirectional communication Case 3 : one process call send and recv in this order, and the other calls in the opposite order if (myrank == 0 ) then call MPI_Send(….) call MPI_Recv(…) elseif (myrank == 1) then call MPI_Recv(….) call MPI_Send(….) endif The above is always safe You can replace both send and recv on both processor with Isend and Irecv

13 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Scatter and Gather A Ap0 p1 p2 p3 p0 p1 p2 p3 A A A broadcast scatter ABCDA B C D gather ABCDABCD A B C D all gather p0 p1 p2 p3 p0 p1 p2 p3 p0 p1 p2 p3 p0 p1 p2 p3

14 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Scatter Operation using MPI_Scatter Similar to Broadcast but sends a section of an array to each processors A(0) A(1) A(2).. ………. A(N-1) P 0 P 1 P 2... P n-1 Goes to processors: Data in an array on root node:

15 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Scatter C –int MPI_Scatter(&sendbuf, sendcnts, sendtype, &recvbuf, recvcnts, recvtype, root, comm ); Fortran –MPI_Scatter(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,root,comm,ierror) Parameters –sendbuf is an array of size (number processors*sendcnts) –sendcnts number of elements sent to each processor –recvcnts number of element(s) obtained from the root processor –recvbuf contains element(s) obtained from the root processor, may be an array

16 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Scatter Operation using MPI_Scatter Scatter with Sendcnts = 2 A(0)A(2)A(4)... A(2N-2) A(1) A(3) A(5)... A(2N-1) P 0 P 1 P 2... P n-1 B(0)B(0)B(0) B(0) B(1)B(1)B(1) B(1) Goes to processors: Data in an array on root node:

17 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Gather Operation using MPI_Gather Used to collect data from all processors to the root, inverse of scatter Data is collected into an array on root processor A(0) A(1) A(2)... A(N-1) P 0 P 1 P 2... P n-1 A 0 A 1 A 2... A n-1 Data from various Processors: Goes to an array on root node:

18 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Gather C –int MPI_Gather(&sendbuf,sendcnts, sendtype, &recvbuf, recvcnts,recvtype,root, comm ); Fortran –MPI_Gather(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,root,comm,ierror) Parameters –sendcnts number of elements sent from each processor –sendbuf is an array of size sendcnts –recvcnts number of elements obtained from each processor –recvbuf of size recvcnts*number of processors

19 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Code for Scatter and Gather A parallel program to scatter data using MPI_Scatter Each processor sums the data Use MPI_Gather to get the data back to the root processor Root processor prints the global data See attached Fortran and C code

20 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE module mpi !DEC$ NOFREEFORM include "mpif.h“ !DEC$ FREEFORM end module ! This program shows how to use MPI_Scatter and MPI_Gather ! Each processor gets different data from the root processor ! by way of mpi_scatter. The data is summed and then sent back ! to the root processor using MPI_Gather. The root processor ! then prints the global sum. module global integer numnodes,myid,mpi_err integer, parameter :: mpi_root=0 end module subroutine init use mpi use global implicit none ! do the mpi init stuff call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err)

21 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE end subroutine init program test1 use mpi use global implicit none integer, allocatable :: myray(:),send_ray(:),back_ray(:) integer count integer size,mysize,i,k,j,total call init ! each processor will get count elements from the root count=4 allocate(myray(count)) ! create the data to be sent on the root if(myid == mpi_root)then size=count*numnodes allocate(send_ray(0:size-1)) allocate(back_ray(0:numnodes-1)) do i=0,size-1 send_ray(i)= i enddo endif

22 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE call MPI_Scatter( send_ray, count, MPI_INTEGER, & myray, count, MPI_INTEGER, & mpi_root, MPI_COMM_WORLD,mpi_err) ! each processor does a local sum total=sum(myray) write(*,*)"myid= ",myid," total= ",total ! send the local sums back to the root call MPI_Gather( total, 1, MPI_INTEGER, & back_ray, 1, MPI_INTEGER, & mpi_root, MPI_COMM_WORLD,mpi_err) ! the root prints the global sum if(myid == mpi_root)then write(*,*)"results from all processors= ",sum(back_ray) endif call mpi_finalize(mpi_err) end program

23 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE #include #include #include /*! This program shows how to use MPI_Scatter and MPI_Gather ! Each processor gets different data from the root processor ! by way of mpi_scatter. The data is summed and then sent back ! to the root processor using MPI_Gather. The root processor ! then prints the global sum. */ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end globals */ void init_it(int *argc, char ***argv); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }

24 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE int main(int argc,char *argv[]){ int *myray,*send_ray,*back_ray; int count; int size,mysize,i,k,j,total; init_it(&argc,&argv); /* each processor will get count elements from the root */ count=4; myray=(int*)malloc(count*sizeof(int)); /* create the data to be sent on the root */ if(myid == mpi_root){ size=count*numnodes; send_ray=(int*)malloc(size*sizeof(int)); back_ray=(int*)malloc(numnodes*sizeof(int)); for(i=0;i<size;i++) send_ray[i]=i; } /* send different data to each processor */

25 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE mpi_err = MPI_Scatter( send_ray, count, MPI_INT, myray, count, MPI_INT, mpi_root, MPI_COMM_WORLD); /* each processor does a local sum */ total=0; for(i=0;i<count;i++) total=total+myray[i]; printf("myid= %d total= %d\n ",myid,total); /* send the local sums back to the root */ mpi_err = MPI_Gather(&total, 1, MPI_INT, back_ray, 1, MPI_INT, mpi_root, MPI_COMM_WORLD); /* the root prints the global sum */ if(myid == mpi_root){ total=0; for(i=0;i<numnodes;i++) total=total+back_ray[i]; printf("results from all processors= %d \n ",total); } mpi_err = MPI_Finalize();}

26 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Output of previous fortran code on 4 procs ultra:/work/majumdar/examples/mpi % bsub -q hpc -m ultra -I -n 4./a.out Job is submitted to queue. > myid= 1 total= 22 myid= 2 total= 38 myid= 3 total= 54 myid= 0 total= 6 results from all processors= 120 ( 0 through 15 added up = (15) (15 + 1) /2 = 120)

27 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Global Sum with MPI_Reduce 2d array spread across processors

28 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE All Gather and All Reduce Gather and Reduce come in an "ALL" variation Results are returned to all processors The root parameter is missing from the call Similar to a gather or reduce followed by a broadcast

29 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Global Sum with MPI_AllReduce 2d array spread across processors

30 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE All to All communication with MPI_Alltoall Each processor sends and receives data to/from all others C –int MPI_Alltoall(&sendbuf,sendcnts, sendtype, &recvbuf, recvcnts, recvtype, MPI_Comm); Fortran –call MPI_Alltoall(sendbuf,sendcnts,sendtype, recvbuf,recvcnts,recvtype,comm,ierror)

31 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE b0 b1 b2 b3 c0 c1 c2 c3 d0 d1 d2 d3 a0 a1 a2 a3 a1 b1 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3 a0 b0 c0 d0 All to All

32 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE All to All with MPI_Alltoall Parameters –sendcnts # of elements sent to each processor –sendbuf is an array of size sendcnts –recvcnts # of elements obtained from each processor –recvbuf of size recvcnts Note that both send buffer and receive buffer must be an array of size of the number of processors See attached Fortran and C codes

33 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE module mpi !DEC$ NOFREEFORM include "mpif.h“ !DEC$ FREEFORM end module ! This program shows how to use MPI_Alltoall. Each processor ! send/rec a different random number to/from other processors. module global integer numnodes,myid,mpi_err integer, parameter :: mpi_root=0 end module subroutine init use mpi use global implicit none ! do the mpi init stuff call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_Comm_rank(MPI_COMM_WORLD, myid, mpi_err) end subroutine init

34 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE program test1 use mpi use global implicit none integer, allocatable :: scounts(:),rcounts(:) integer ssize,rsize,i,k,j real z call init ! counts and displacement arrays allocate(scounts(0:numnodes-1)) allocate(rcounts(0:numnodes-1)) call seed_random ! find data to send do i=0,numnodes-1 call random_number(z) scounts(i)=nint(10.0*z)+1 Enddo write(*,*)"myid= ",myid," scounts= ",scounts

35 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE ! send the data call MPI_alltoall( scounts,1,MPI_INTEGER, & rcounts,1,MPI_INTEGER, MPI_COMM_WORLD,mpi_err) write(*,*)"myid= ",myid," rcounts= ",rcounts call mpi_finalize(mpi_err) end program subroutine seed_random use global implicit none integer the_size,j integer, allocatable :: seed(:) real z call random_seed(size=the_size) ! how big is the intrisic seed? allocate(seed(the_size)) ! allocate space for seed do j=1,the_size ! create the seed seed(j)=abs(myid*10)+(j*myid*myid)+100 ! abs is generic enddo call random_seed(put=seed) ! assign the seed deallocate(seed) end subroutine

36 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE #include #include | #include /*! This program shows how to use MPI_Alltoall. Each processor ! send/rec a different random number to/from other processors. */ /* globals */ int numnodes,myid,mpi_err; #define mpi_root 0 /* end module */ void init_it(int *argc, char ***argv); void seed_random(int id); void random_number(float *z); void init_it(int *argc, char ***argv) { mpi_err = MPI_Init(argc,argv); mpi_err = MPI_Comm_size( MPI_COMM_WORLD, &numnodes ); mpi_err = MPI_Comm_rank(MPI_COMM_WORLD, &myid); }

37 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE int main(int argc,char *argv[]){ int *sray,*rray; int *scounts,*rcounts; int ssize,rsize,i,k,j; float z; init_it(&argc,&argv); scounts=(int*)malloc(sizeof(int)*numnodes); rcounts=(int*)malloc(sizeof(int)*numnodes); /*! seed the random number generator with a ! different number on each processor*/ seed_random(myid); /* find data to send */ for(i=0;i<numnodes;i++){ random_number(&z); scounts[i]=(int)(10.0*z)+1; } printf("myid= %d scounts=",myid); for(i=0;i<numnodes;i++) printf("%d ",scounts[i]); printf("\n");

38 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE /* send the data */ mpi_err = MPI_Alltoall( scounts,1,MPI_INT, rcounts,1,MPI_INT, MPI_COMM_WORLD); printf("myid= %d rcounts=",myid); for(i=0;i<numnodes;i++) printf("%d ",rcounts[i]); printf("\n"); mpi_err = MPI_Finalize();} void seed_random(int id){ srand((unsigned int)id);} void random_number(float *z){ int i; i=rand(); *z=(float)i/32767; }

39 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Output of previous fortran code on 4 procs ultra:/work/majumdar/examples/mpi % bsub -q hpc -m ultra -I -n 4 a.out Job is submitted to queue. > myid= 1 scounts= 5 2 4 4 myid= 1 rcounts= 1 2 2 3 myid= 2 scounts= 9 2 2 6 myid= 2 rcounts= 7 4 2 11 myid= 3 scounts= 3 3 11 8 myid= 3 rcounts= 2 4 6 8 myid= 0 scounts= 11 1 7 2 myid= 0 rcounts= 11 5 9 3 -------------------------------------------- 11 1 7 2 11 5 9 3 5 2 4 4 1 2 2 3 9 2 2 6 7 4 2 11 3 3 11 8 2 4 6 8

40 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE The variable or “V” operators A collection of very powerful but difficult to setup global communication routines MPI_Gatherv: Gather different amounts of data from each processor to the root processor MPI_Alltoallv: Send and receive different amounts of data form all processors MPI_Allgatherv: Gather different amounts of data from each processor and send all data to each MPI_Scatterv: Send different amounts of data to each processor from the root processor We discuss MPI_Gatherv and MPI_Alltoallv

41 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Gatherv C –int MPI_Gatherv (&sendbuf, sendcnts, sendtype, &recvbuf, &recvcnts, &rdispls,recvtype, comm); Fortran –MPI_Gatherv (sendbuf, sendcnts, sendtype, recvbuf, recvcnts, rdispls, recvtype, comm, ierror) Parameters: –Recvcnts is now an array –Rdispls is a displacement ·See attached codes

42 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI GatherV rank 0 = root rank 1rank 2 123 sendbuf23 sendbuf3 sendbuf Recvcounts(0)10 = displs(0) Recvcounts(1)21 = displs(1)2 Recvcounts(2)33 = displs(2) 34 35 Recvbuf

43 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI Gatherv code Sample program: include ‘mpif.h’ integer isend(3), irecv(3) integer ircnt(0:2), idisp(0:2) data icrnt/1,2,3/ idisp/0,1,3/ call mpi_init(ierr) call mpi_comm_size(MPI_COMM_WORLD, nprocs,ierr) call mpi_comm_rank(MPI_COMM_WORLD,myrank,ierr) do I = 1,myrank+1 isend(I) = myrank+1 enddo iscnt = myrank + 1 call MPI_GATHERV(isend,iscnt,MPI_INTEGER,irecv,ircnt,idisp,MPI_INTEGER &0,MPI_COMM_WORLD, ierr) if (myrank.eq. 0) then print *, ‘irecv =‘, irecv endif call MPI_FINALIZE(ierr) end Sample execution: % bsub –q hpc –m ultra –I –n 3./a.out % 0: irecv = 1 2 2 3 3 3

44 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Alltoallv Send and receive different amounts of data form all processors C –int MPI_Alltoallv (&sendbuf, &sendcnts, &sdispls, sendtype, &recvbuf, &recvcnts, &rdispls, recvtype, comm ); Fortran –Call MPI_Alltoallv(sendbuf, sendcnts, sdispls, sendtype, recvbuf, recvcnts, rdispls,recvtype, comm,ierror); See attached code

45 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI ALLTOALLV rank0rank1rank2 Sendcounts(0)1470=sdispls(0) Sendcounts(1)2581=sdispls(1) 2582 Sendcounts(3)3693=sdispls(2) 3694 3695sendbuf Recvcounts(0)1230=rdispls(0) Recvcounts(1)4231 Recvcounts(3)7532 recvbuf563=rdispls(1) 864 865 recvbuf96=rdispls(2) 97 98

46 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI ALLTOALLV proc ircnt0 1 2 01 23 1 1 2 3 2 1 2 3 proc irdsp 0 1 2 0 0 0 0 1 1 2 3 2 2 4 6

47 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI ALLTOALLV Program alltoallv include ‘mpif.h’ integer isend(6), irecv(9) integer iscnt(0:2), isdsp(0:2), ircnt(0), irdsp(0:2) data isend/1,2,2,3,3,3/ data iscnt/1,2,3/ isdsp/0,1,3/ call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs, ierr) call MPI_COMM_RANK(MIP_COMM_WORLD,myrank, ierr) do i = 1,6 isend(i) = isend(i) + nprocs*myrank enddo do i = 0, nprocs – 1 ircnt(i) = myrank + 1 irdsp(i) = i* (myrank + 1) enddo print*, ‘isend=‘, isend call MP_FLUSH(1) call MPI_ALLTOALLV(isend,iscnt,isdsp,MPI_INTEGER,irecv, ircnt, irdsp,MPI_INTEGER, MPI_COMM_WORLD, ierr) print*, ‘irecv=‘,irecv call MPI_FINALIZE(ierr) end

48 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI ALLTOALLV Sample execution of mpialltoallv program: % bsub –q hpc –m ultra –I –n 3 % 0: isend = 1 2 2 3 3 3 1: isend = 4 5 5 6 6 6 2: isend = 7 8 8 9 9 9 0: irecv = 1 4 7 0 0 0 0 0 0 1: irecv = 2 2 5 5 8 8 0 0 0 2: irecv = 3 3 3 6 6 6 9 9 9

49 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Derived types C and Fortran 90 have the ability to define arbitrary data types that encapsulate reals, integers, and characters. MPI allows you to define message data types corresponding to your data types Can use these data types just as default types

50 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Derived types, Three main classifications: Contiguous Vectors: enable you to send contiguous blocks of the same type of data lumped together Noncontiguous Vectors: enable you to send noncontiguous blocks of the same type of data lumped together Abstract types: enable you to (carefully) send C or Fortran 90 structures, don't send pointers

51 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Derived types, how to use them Three step process –Define the type using MPI_TYPE_CONTIGUOUS for contiguous vectors MPI_TYPE_VECTOR for noncontiguous vectors MPI_TYPE_STRUCT for structures –Commit the type using MPI_TYPE_COMMIT –Use in normal communication calls MPI_Send(buffer, count, MY_TYPE, destination,tag, MPI_COMM_WORLD, ierr)

52 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_TYPE_CONTIGUOUS Defines a new data type of length count elements from your old data type C –MPI_TYPE_CONTIGUOUS(int count, old_type, &new_type) Fortran –Call MPI_TYPE_CONTIGUOUS(count, old_type, new_type, ierror) Parameters –Old_type: your base type –New_type: a type count elements of Old_type See attached codes

53 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_TYPE_CONTIGUOUS Sample program: program type_contiguous include ‘mpif.h’ integer ibuf(20) call MPI_INIT(ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs,ierr) call MPI_COMM_RANK(MPI_COMM_WORLD,myrank,ierr) if (myrank.eq. 0) then do i = 1,20 ibuf(i) = I enddo endif call MPI_TYPE_CONTIGUOUS(3,MPI_INTEGER,inewtype, ierr) call MPI_TYPE_COMMIT(inewtype, ierr) call MPI_BCAST(ibuf,3,inewtype,0,MPI_COMM_WORLD, ierr) print*, ‘ibuf=‘,ibuf call MPI_FINALIZE(ierr) end Sample execution: % bsub –q hpc –m ultra –I in 2 a.out % 0 : ibuf =1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1: ibuf = 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0

54 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_TYPE_VECTOR Defines a datatype which consists of count blocks each of length blocklength and stride displacement between blocks C –MPI_TYPE_VECTOR(count, blocklength, stride, old_type, *new_type) Fortran –Call MPI_TYPE_VECTOR(count, blocklength, stride, old_type, new_type, ierror) See attached codes

55 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE module mpi !DEC$ NOFREEFORM include "mpif.h" !DEC$ FREEFORM end module !Shows how to use MPI_Type_vector to send noncontiguous blocks of data !and MPI_Get_count and MPI_Get_elements to see the number of elements sent program do_vect use mpi ! include "mpif.h" integer, parameter :: size=24 integer myid, ierr,numprocs real*8 svect(0:size),rvect(0:size) integer i,bonk1,bonk2,numx,stride,extent integer MY_TYPE integer status(MPI_STATUS_SIZE) call MPI_INIT( ierr ) call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr ) stride=5 numx=(size+1)/stride

56 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE extent = 1 if(myid == 1)write(*,*)"numx=",numx," extent=",extent," stride=",stride call MPI_Type_vector(numx,extent,stride,MPI_DOUBLE_PRECISION,MY_TYPE,ier r) call MPI_Type_commit(MY_TYPE, ierr ) if(myid == 0)then do i=0,size svect(i)=i enddo call MPI_Send(svect,1,MY_TYPE,1,100,MPI_COMM_WORLD,ierr) endif if(myid == 1)then do i=0,size rvect(i)=-1 enddo call MPI_Recv(rvect,1,MY_TYPE,0,100,MPI_COMM_WORLD,status,ierr) endif if(myid == 1)then call MPI_Get_count(status,MY_TYPE,bonk1, ierr ) call MPI_Get_elements(status,MPI_DOUBLE_PRECISION,bonk2,ierr)

57 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE write(*,*)"got ", bonk1," elements of type MY_TYPE" write(*,*)"which contained ", bonk2," elements of type MPI_DOUBLE_PREC ISION" do i=0,size if(rvect(i) /= -1)write(*,'(i2,f4.0)')i,rvect(i) enddo endif call MPI_Finalize(ierr ) end program ! output ! numx= 5 extent= 1 stride= 5 ! got 1 elements of type MY_TYPE ! which contained 5 elements of type MPI_DOUBLE_PRECISION ! 0 0. ! 5 5. ! 10 10. ! 15 15. ! 20 20.

58 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_TYPE_STRUCT Defines a MPI datatype which maps to a user defined derived datatype C –int MPI_TYPE_STRUCT(count, &array_of_blocklengths, &array_of_displacement, &array_of_types, &newtype); Fortran –Call MPI_TYPE_STRUCT(count, array_of_blocklengths, array_of_displacement, array_of_types, newtype,ierror)

59 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_TYPE_STRUCT Parameters: –[IN count] # of old types in the new type (integer) –[IN array_of_blocklengths] how many of each type in new structure (integer) –[IN array_of_types] types in new structure (integer) –[IN array_of_displacement] offset in bytes for the beginning of each group of types (integer) –[OUT newtype] new datatype (handle) –Call MPI_TYPE_STRUCT(count, array_of_blocklengths, array_of_displacement,array_of_types, newtype,ierror)

60 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Derived Data type Example Consider the data type or structure consisting of 3 MPI_DOUBLE_PRECISION 10 MPI_INTEGER 2 MPI_LOGICAL Creating the MPI data structure matching this C/Fortran structure is a three step process Fill the descriptor arrays: B - blocklengths T - types D - displacements Call MPI_TYPE_STRUCT to create the MPI data structure Commit the new data type using MPI_TYPE_COMMIT

61 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Derived Data type Example Consider the data type or structure consisting of –3 MPI_DOUBLE_PRECISION –10 MPI_INTEGER –2 MPI_LOGICAL To create the MPI data structure matching this C/Fortran structure –Fill the descriptor arrays: B - blocklengths T - types D - displacements –Call MPI_TYPE_STRUCT

62 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Derived Data type Example (continued) ! t contains the types that ! make up the structure t(1)=MPI_DOUBLE_PRECISION t(2)=MPI_INTEGER t(3)=MPI_LOGICAL ! b contains the number of each type b(1)=3;b(2)=10;b(3)=2 ! d contains the byte offset of ! the start of each type d(1)=0;d(2)=24;d(3)=64 call MPI_TYPE_STRUCT(3,b,d,t, MPI_CHARLES,mpi_err) MPI_CHARLES is our new data type

63 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Type_commit Before we use the new data type we call MPI_Type_commit C –MPI_Type_commit(MPI_CHARLES) Fortran –Call MPI_Type_commit(MPI_CHARLES,ierr)

64 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Communicators A communicator is a parameter in all MPI message passing routines A communicator is a collection of processors that can engage in communication MPI_COMM_WORLD is the default communicator that consists of all processors MPI allows you to create subsets of communicators

65 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE Why Communicators? Isolate communication to a small number of processors Useful for creating libraries Different processors can work on different parts of the problem Useful for communicating with "nearest neighbors"

66 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Comm_split Provides a short cut method to create a collection of communicators All processors with the "same color" will be in the same communicator Index gives rank in new communicator Fortran –call MPI_COMM_SPLIT(OLD_COMM, color, index, NEW_COMM, mpi_err) C –MPI_Comm_split(OLD_COMM, color, index, &NEW_COMM)

67 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Comm_split Split odd and even processors into 2 communicators Program comm_split include "mpif.h" Integer color,zero_one call MPI_INIT( mpi_err ) call MPI_COMM_SIZE( MPI_COMM_WORLD, numnodes, mpi_err ) call MPI_COMM_RANK( MPI_COMM_WORLD, myid, mpi_err ) color=mod(myid,2) !color is either 1 or 0 call MPI_COMM_SPLIT(MPI_COMM_WORLD,color,myid,NEW_COMM,mpi_err) call MPI_COMM_RANK( NEW_COMM, new_id, mpi_err ) call MPI_COMM_SIZE( NEW_COMM, new_nodes, mpi_err ) Zero_one = -1 If(new_id==0)Zero_one = color Call MPI_Bcast(Zero_one,1,MPI_INTEGER,0, NEW_COMM,mpi_err) If(zero_one==0)write(*,*)"part of even processor communicator" If(zero_one==1)write(*,*)"part of odd processor communicator" Write(*,*)"old_id=", myid, "new_id=", new_id Call MPI_FINALIZE(mpi_error) End program

68 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI_Comm_split Split odd and even processors into 2 communicators 0: part of even processor communicator 0: old_id= 0 new_id= 0 2: part of even processor communicator 2: old_id= 2 new_id= 1 1: part of odd processor communicator 1: old_id= 1 new_id= 0 3: part of odd processor communicator 3: old_id= 3 new_id= 1

69 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE #include "mpi.h" #include int main(argc,argv) int argc; char *argv[]; { int myid, numprocs; int color,Zero_one,new_id,new_nodes; MPI_Comm NEW_COMM; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&numprocs); MPI_Comm_rank(MPI_COMM_WORLD,&myid); color=myid % 2; MPI_Comm_split(MPI_COMM_WORLD,color,myid,&NEW_COMM); MPI_Comm_rank( NEW_COMM, &new_id); MPI_Comm_size( NEW_COMM, &new_nodes); Zero_one = -1; if(new_id==0)Zero_one = color; MPI_Bcast(&Zero_one,1,MPI_INT,0, NEW_COMM); if(Zero_one==0)printf("part of even processor communicator \n"); if(Zero_one==1)printf("part of odd processor communicator \n"); printf("old_id= %d new_id= %d\n", myid, new_id); MPI_Finalize(); }

1 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI Part III NPACI Parallel Computing Institute.

Similar presentations

Presentation on theme: "1 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI Part III NPACI Parallel Computing Institute."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI Part III NPACI Parallel Computing Institute.

Similar presentations

Presentation on theme: "1 S an D IEGO S UPERCOMPUTER C ENTER N ATIONAL P ARTNERSHIP FOR A DVANCED C OMPUTATIONAL I NFRASTRUCTURE MPI Part III NPACI Parallel Computing Institute."— Presentation transcript:

Similar presentations

About project

Feedback