Parallel Systems Dr. Guy Tel-Zur. Agenda Barnes-Hut (final remarks) Continue slides5 from previous lecture MPI Virtual Topologies Scalapack Mixing programming.

Parallel Systems Dr. Guy Tel-Zur

Agenda Barnes-Hut (final remarks) Continue slides5 from previous lecture MPI Virtual Topologies Scalapack Mixing programming languages Impressions from the SC12 conference Home Assignment #2 נא לחשוב במרץ ולשלוח אלי הצעות למצגת גמר !!!

A simple Jacobi iteration Analyze program from: http://www.mcs.anl.gov/research/projects/mpi/ tutorial/mpiexmpl/src/jacobi/C/main.html

In this example, you will put together some of the previous examples to implement a simple Jacobi iteration for approximating the solution to a linear system of equations. In this example, we solve the Laplace equation in two dimensions with finite differences. This may sound involved, but really amount only to a simple computation, combined with the previous example of a parallel mesh data structure. Any numerical analysis text will show that iterating while (not converged) { for (i,j) xnew[i][j] = (x[i+1][j] + x[i-1][j] + x[i][j+1] + x[i][j-1])/4; for (i,j) x[i][j] = xnew[i][j]; }

will compute an approximation for the solution of Laplace's equation. There is one last detail; this replacement of xnew with the average of the values around it is applied only in the interior; the boundary values are left fixed. In practice, this means that if the mesh is n by n, then the values x[0][j] x[n-1][j] x[i][0] x[i][n-1] are left unchanged. Of course, these refer to the complete mesh; you'll have to figure out what to do with for the decomposed data structures (xlocal). Because the values are replaced by averaging around them, these techniques are called relaxation methods. We wish to compute this approximation in parallel. Write a program to apply this approximation. For convergence testing, compute diffnorm = 0; for (i,j) diffnorm += (xnew[i][j] - x[i][j]) * (xnew[i][j] - x[i][j]); diffnorm = sqrt(diffnorm);

You'll need to use MPI_Allreduce for this. (Why not use MPI_Reduce?) Have process zero write out the value of diffnorm and the iteration count at each iteration. When diffnorm is less that 1.0e-2, consider the iteration converged. Also, if you reach 100 iterations, exit the loop.

For simplicity, consider a 12 x 12 mesh on 4 processors The example solution uses the boundary values from the previous exercise; they are -1 on the top and bottom, and the rank of the process on the side. The initial data (the values of x that are being relaxed) are also the same; the interior points have the same value as the rank of the process. This is shown below: -1 -1 -1 -1 -1 -1 3 3 3 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 -1 -1 -1 -1 -1 -1

#include "mpi.h" /* This example handles a 12 x 12 mesh, on 4 processors only. */ #define maxn 12 int main( argc, argv ) int argc; char **argv; { int rank, value, size, errcnt, toterr, i, j, itcnt; int i_first, i_last; MPI_Status status; double diffnorm, gdiffnorm; double xlocal[(12/4)+2][12]; double xnew[(12/3)+2][12]; MPI_Init( &argc, &argv ); MPI_Comm_rank( MPI_COMM_WORLD, &rank ); MPI_Comm_size( MPI_COMM_WORLD, &size ); if (size != 4) MPI_Abort( MPI_COMM_WORLD, 1 );

/* xlocal[][0] is lower ghostpoints, xlocal[][maxn+2] is upper */ /* Note that top and bottom processes have one less row of interior points */ i_first = 1; i_last = maxn/size; if (rank == 0) i_first++; if (rank == size - 1) i_last--; /* Fill the data as specified */ for (i=1; i<=maxn/size; i++) for (j=0; j<maxn; j++) xlocal[i][j] = rank; for (j=0; j<maxn; j++) { xlocal[i_first-1][j] = -1; xlocal[i_last+1][j] = -1; }

itcnt = 0; do { /* Send up unless I'm at the top, then receive from below */ /* Note the use of xlocal[i] for &xlocal[i][0] */ if (rank < size - 1) MPI_Send( xlocal[maxn/size], maxn, MPI_DOUBLE, rank + 1, 0, MPI_COMM_WORLD ); if (rank > 0) MPI_Recv( xlocal[0], maxn, MPI_DOUBLE, rank - 1, 0, MPI_COMM_WORLD, &status ); /* Send down unless I'm at the bottom */ if (rank > 0) MPI_Send( xlocal[1], maxn, MPI_DOUBLE, rank - 1, 1, MPI_COMM_WORLD ); if (rank < size - 1) MPI_Recv( xlocal[maxn/size+1], maxn, MPI_DOUBLE, rank + 1, 1, MPI_COMM_WORLD, &status );

/* Compute new values (but not on boundary) */ itcnt ++; diffnorm = 0.0; for (i=i_first; i<=i_last; i++) for (j=1; j<maxn-1; j++) { xnew[i][j] = (xlocal[i][j+1] + xlocal[i][j-1] + xlocal[i+1][j] + xlocal[i-1][j]) / 4.0; diffnorm += (xnew[i][j] - xlocal[i][j]) * (xnew[i][j] - xlocal[i][j]); } /* Only transfer the interior points */ for (i=i_first; i<=i_last; i++) for (j=1; j<maxn-1; j++) xlocal[i][j] = xnew[i][j]; MPI_Allreduce( &diffnorm, &gdiffnorm, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD ); gdiffnorm = sqrt( gdiffnorm ); if (rank == 0) printf( "At iteration %d, diff is %e\n", itcnt, gdiffnorm ); } while (gdiffnorm > 1.0e-2 && itcnt < 100); MPI_Finalize( ); return 0; }

The Makefile # Generated automatically from Makefile.in by configure. ALL: jacobi SHELL = /bin/sh DIRS = jacobi: jacobi.c mpicc -o jacobi jacobi.c -lm profile.alog: jacobi.c mpicc -o jacobi.log -mpilog jacobi.c -lm mpirun -np 4 jacobi.log /bin/mv jacobi.log_profile.log profile.alog clean: /bin/rm -f jacobi jacobi.o jacobi.log #for dir in $(DIRS) ; do \ # ( cd $$dir ; make clean ) ; done

MPI Virtual Topologies http://www.pdc.kth.se/education/historical/p revious-years-summer- schools/2009/handouts/lect5.pdf http://www.pdc.kth.se/education/historical/p revious-years-summer- schools/2009/handouts/lect5.pdf

Cartesian Constructor ParameterMeaning of Parameter comm_oldinput communicator (handle) ndimsnumber of dimensions of cartesian grid (integer) dims integer array of size ndims specifying the number of processes in each dimension periods logical array of size ndims specifying whether the grid is periodic (true) or not (false) in each dimension reorderranking may be reordered (true) or not (false) (logical) comm_cartcommunicator with new cartesian topology (handle) int MPI_Cart_create ( MPI_Comm comm_old, int ndims, int *dims, int *periods, int reorder, MPI_Comm *comm_cart )

Cartesian Convenience - MPI_Dims_create int MPI_Dims_create( int nnodes, int ndims, int *dims) ParameterMeaning of Parameter nnodes number of nodes in a grid (integer) ndims number of cartesian dimensions (integer) dims integer array of size ndims specifying the number of nodes in each dimension MPI_Dims_create creates a division of processors in a cartesian grid.

See more: http://www.rc.usf.edu/tutorials/classes/tutorial/mpi/chapter10.html See more: http://www.rc.usf.edu/tutorials/classes/tutorial/mpi/chapter10.html

#include "mpi.h" #include #define SIZE 16 #define UP 0 #define DOWN 1 #define LEFT 2 #define RIGHT 3 int main(argc,argv) int argc; char *argv[]; { int numtasks, rank, source, dest, outbuf, i, tag=1, inbuf[4]={MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,MPI_PROC_NULL,}, nbrs[4], dims[2]={4,4}, periods[2]={0,0}, reorder=0, coords[2]; MPI_Request reqs[8]; MPI_Status stats[8]; MPI_Comm cartcomm; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); An Example

if (numtasks == SIZE) { MPI_Cart_create(MPI_COMM_WORLD, 2, dims, periods, reorder, &cartcomm); MPI_Comm_rank(cartcomm, &rank); MPI_Cart_coords(cartcomm, rank, 2, coords); MPI_Cart_shift(cartcomm, 0, 1, &nbrs[UP], &nbrs[DOWN]); MPI_Cart_shift(cartcomm, 1, 1, &nbrs[LEFT], &nbrs[RIGHT]); printf("rank= %d coords= %d %d neighbors(u,d,l,r)= %d %d %d %d\n", rank,coords[0],coords[1],nbrs[UP],nbrs[DOWN],nbrs[LEFT], nbrs[RIGHT]); outbuf = rank; for (i=0; i<4; i++) { dest = nbrs[i]; source = nbrs[i]; MPI_Isend(&outbuf, 1, MPI_INT, dest, tag, MPI_COMM_WORLD, &reqs[i]); MPI_Irecv(&inbuf[i], 1, MPI_INT, source, tag, MPI_COMM_WORLD, &reqs[i+4]); } MPI_Waitall(8, reqs, stats); printf("rank= %d inbuf(u,d,l,r)= %d %d %d %d\n", rank,inbuf[UP],inbuf[DOWN],inbuf[LEFT],inbuf[RIGHT]); } else printf("Must specify %d processors. Terminating.\n",SIZE); MPI_Finalize(); }

Jacobi Iteration using MPI virtual topologies Reference: Intermediate MPI from “Using MPI” book http://www.mcs.anl.gov/research/projects/mpi /usingmpi/examples/intermediate/main.htm http://www.mcs.anl.gov/research/projects/mpi /usingmpi/examples/intermediate/main.htm Similar: an MPI parallelization of the solution of the 2D Poisson's equationan MPI parallelization of the solution of the 2D Poisson's equation http://www.ee.bgu.ac.il/~tel-zur/2003B/lecture10.pdf

Scalapack

Hands-On Exercises for ScaLAPACK http://acts.nersc.gov/scalapack/hands-on/ Hands-On Exercises for ScaLAPACK http://acts.nersc.gov/scalapack/hands-on/ http://www.netlib.org/scalapack/pblas_qref.html PBLAS Quick Reference Card http://www.netlib.org/scalapack/

C interface to BLACS http://www.netlib.org/blacs/cblacsqref.ps See next slide http://www.netlib.org/blacs/cblacsqref.ps See next slide

http://netlib.org/scalapack/ slug/node184.html

Mixing C and FORTRAN Demo on my other laptop – ~/tests/cprog.c and ffunction.f – http://www.cae.tntech.edu/help/programming/mixed _languages http://www.cae.tntech.edu/help/programming/mixed _languages – Important! Column-major order vs. Row-major order: C => Row, Fortran=>Column – Ref: http://en.wikipedia.org/wiki/Row-major_orderhttp://en.wikipedia.org/wiki/Row-major_order

Example: PDGEMV() The sample is using PDGEMV(), which computes a distributed matrix-vector product y =alphaAx+beta*y. A= [1 4 7 10 13 ; 3 6 9 12 15 ; 5 8 11 14 17 ; 7 10 13 16 19; 9 12 15 18 21] and x=[1 ; 1; 0; 0; 1]T, x is a column vector, Call PDGEMV() routine to compute y=Ax. The right result is y=[18; 24; 30; 36; 42]T T=transpose Code sample in C: pdgemv.c The function call in C is like,pdgemv.c double alpha = 1.0; double beta = 0.0; pdgemv_("N",&M,&M,&alpha,A,&ONE,&ONE,descA,x,&ONE,&ONE,descx,&ONE, &beta,y,&ONE,& amp;ONE,descy,&ONE); The sample is using PDGEMV(), which computes a distributed matrix-vector product y =alphaAx+beta*y. A= [1 4 7 10 13 ; 3 6 9 12 15 ; 5 8 11 14 17 ; 7 10 13 16 19; 9 12 15 18 21] and x=[1 ; 1; 0; 0; 1]T, x is a column vector, Call PDGEMV() routine to compute y=Ax. The right result is y=[18; 24; 30; 36; 42]T T=transpose Code sample in C: pdgemv.c The function call in C is like,pdgemv.c double alpha = 1.0; double beta = 0.0; pdgemv_("N",&M,&M,&alpha,A,&ONE,&ONE,descA,x,&ONE,&ONE,descx,&ONE, &beta,y,&ONE,& amp;ONE,descy,&ONE);

PDGEMV() Check pdgemv_c. here: http://www.netlib.org/scalapack/explore-html/da/dca/pdgemv___8c_source.html http://www.cs.utexas.edu/~plapack/Code/ScaLAPACK/PBLAS/pdgemv_.c

Purpose ======= PDGEMV performs one of the matrix-vector operations sub( Y ) := alpha*sub( A ) *sub( X ) + beta*sub( Y ), or sub( Y ) := alpha*sub( A )'*sub( X ) + beta*sub( Y ), where sub( A ) denotes A(IA:IA+M-1,JA:JA+N-1). When TRANS = 'N', sub( X ) denotes X(IX:IX,JX:JX+N-1), if INCX = M_X, X(IX:IX+N-1,JX:JX), if INCX = 1 and INCX <> M_X, and, sub( Y ) denotes Y(IY:IY,JY:JY+M-1), if INCY = M_Y, Y(IY:IY+M-1,JY:JY), if INCY = 1 and INCY <> M_Y, and, otherwise sub( X ) denotes X(IX:IX,JX:JX+M-1), if INCX = M_X, X(IX:IX+M-1,JX:JX), if INCX = 1 and INCX <> M_X, and, sub( Y ) denotes Y(IY:IY,JY:JY+N-1), if INCY = M_Y, Y(IY:IY+N-1,JY:JY), if INCY = 1 and INCY <> M_Y. Alpha and beta are scalars, and sub( X ) and sub( Y ) are subvectors and sub( A ) is an m by n submatrix.

Parallel Systems Dr. Guy Tel-Zur. Agenda Barnes-Hut (final remarks) Continue slides5 from previous lecture MPI Virtual Topologies Scalapack Mixing programming.

Similar presentations

Presentation on theme: "Parallel Systems Dr. Guy Tel-Zur. Agenda Barnes-Hut (final remarks) Continue slides5 from previous lecture MPI Virtual Topologies Scalapack Mixing programming."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Parallel Systems Dr. Guy Tel-Zur. Agenda Barnes-Hut (final remarks) Continue slides5 from previous lecture MPI Virtual Topologies Scalapack Mixing programming.

Similar presentations

Presentation on theme: "Parallel Systems Dr. Guy Tel-Zur. Agenda Barnes-Hut (final remarks) Continue slides5 from previous lecture MPI Virtual Topologies Scalapack Mixing programming."— Presentation transcript:

Similar presentations

About project

Feedback