MPI Workshop - III Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3.

Slides:



Advertisements
Similar presentations
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Advertisements

MPI Collective Communications
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
1 Parallel Programming with MPI- Day 4 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH.
1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.
Virtual Topologies Self Test with solution. Self Test 1.When using MPI_Cart_create, if the cartesian grid size is smaller than processes available in.
MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.
HPDC Spring MPI 11 CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs. 2 – 3:20 p.m Message Passing Interface.
1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.
Introduction MPI Mengxia Zhu Fall An Introduction to MPI Parallel Programming with the Message Passing Interface.
A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.
SOME BASIC MPI ROUTINES With formal datatypes specified.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Matrix-vector Multiplication.
Derived Datatypes and Related Features Self Test with solution.
MPI Workshop - II Research Staff Week 2 of 3.
Message Passing Interface. Message Passing Interface (MPI) Message Passing Interface (MPI) is a specification designed for parallel applications. The.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Message Passing Programming Carl Tropper Department of Computer Science.
Parallel Programming with Java
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
Director of Contra Costa College High Performance Computing Center
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.
1 Why Derived Data Types  Message data contains different data types  Can use several separate messages  performance may not be good  Message data.
Parallel Processing1 Parallel Processing (CS 676) Lecture: Grouping Data and Communicators in MPI Jeremy R. Johnson *Parts of this lecture was derived.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams
CS4402 – Parallel Computing
Parallel Programming with MPI By, Santosh K Jena..
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
MPI Jakub Yaghob. Literature and references Books Gropp W., Lusk E., Skjellum A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface,
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
1. 2 The logical view of a machine supporting the message-passing paradigm consists of p processes, each with its own exclusive address space. The logical.
MPI Adriano Cruz ©2003 NCE/UFRJ e Adriano Cruz NCE e IM - UFRJ Summary n References n Introduction n Point-to-point communication n Collective.
Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.
MPI Groups, Communicators and Topologies. Groups and communicators In our case studies, we saw examples where collective communication needed to be performed.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
MPI Derived Data Types and Collective Communication
An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Research Staff Passing Structures in MPI Week 3 of 3
MPI Jakub Yaghob.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
Parallel Programming in C with MPI and OpenMP
Computer Science Department
MPI Groups, Communicators and Topologies
CSCE569 Parallel Computing
September 4, 1997 Parallel Processing (CS 730) Lecture 7: Grouping Data and Communicators in MPI Jeremy R. Johnson *Parts of this lecture was derived.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
Parallel Programming with MPI- Day 4
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Computer Science Department
5- Message-Passing Programming
Presentation transcript:

MPI Workshop - III Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3

Schedule  Course Map  Fix environments to run MPI codes  CartesianTopology  MPI_Cart_create  MPI_ Cart _rank, MPI_Cart_coords  MPI_ Cart _shift  MPI_Gatherv, MPI_Reduce, MPI_Sendrecv  Passing Data Structures in MPI  MPI_Type_struct, MPI_Vector, MPI_Hvector

Course Map

Poisson Equation on a 2D Grid periodic boundary conditions

Serial Poisson Solver  F90 Code  N 2 matrix for  and   Initialize .  Discretize the equation  iterate until convergence  output results

Serial Poisson Solver: Solution

Serial Poisson Solver (cont) do j=1, M do i = 1, N!Fortran: access down columns first phi(i,j) = rho(i,j) +.25 * ( phi_old( modulo(i,N)+1, j ) + phi_old( modulo(i-2,N)+1, j ) + phi_old( i, modulo(j,N)+1 ) + phi_old( i, modulo(j-2,N)+1 ) ) enddo

Parallel Poisson Solver in MPI domain decomp. 3 x 5 processor grid 1,0 0,10,00,2 0,40,3 1,11,2 1,3 1,4 2,42,0 2,1 2,2 2,3 Columns Row 0 Row 1 Row 2

Parallel Poisson Solver in MPI Processor Grid: e.g. 3 x 5, N=M=64 Columns Row 0 Row 1 Row 2 0,00,20,1 0,40,3 1,01,1 1,2 1,3 1,4 2,02,12,2 2,3 2,4 M N

Parallel Poisson Solver in MPI Boundary data movement each iteration 1,0 0,10,00,2 0,40,3 1,11,2 1,3 1,4 2,42,0 2,1 2,2 2,3 Columns Row 0 Row 1 Row 2

Ghost Cells: Local Indices P(1,2) N_local = M_local =

Data Movement e.g. Shift Right, (East) P(i, j) P(i+1, j) --> boundary data --> ghost cells

Communicators and Topologies  A Communicator is a set of processors which can talk to each other  The basic communicator is MPI_COMM_WORLD  One can create new groups or subgroups of processors from MPI_COMM_WORLD or other communicators  MPI allows one to associate a Cartesian or Graph topology with a communicator

MPI Cartesian Topology Functions  MPI_CART_CREATE( old_comm, nmbr_of_dims, dim_sizes(), wrap_around(), reorder, cart_comm, ierr)  old_comm = MPI_COMM_WORLD  nmbr_of_dims = 2  dim_sizes() = (np_rows, np_cols) = (3, 5)  wrap_around= (.true.,.true. )  reorder=.false. (generally set to.true.)  allows system to reorder the procr #s for better performance  cart_comm= grid_comm(name for new communicator)

MPI Cartesian Topology Functions  MPI_CART_RANK( comm, coords(), rank, ierr)  comm = grid_comm  coords()= ( coords(1), coords(2) ), e.g. (0,2) for P(0,2)  rank = processor rank inside grid_com  returns the rank of the procr with coordinates coords()  MPI_CART_COORDS( comm, rank, nmbr_of_dims, coords(), ierr )  nmbr_of_dims= 2  returns the coordinates of the procr in grid_comm given its rank in grid_comm

MPI Cartesian Topology Functions  MPI_CART_SUB( grid_comm, free_coords(), sub_comm, ierr)  grid_comm= communicator with a topology  free_coords(): (.false.,.true. ) -> (i fixed, j varies), i.e. row communicator  : (.true.,.false. ) -> (i varies, j fixed), i.e. column communicator  sub_comm= the new sub communicator (say row_comm or col_comm)

MPI Cartesian Topology Functions  MPI_CART_SHIFT( grid_comm, direction, displ, rank_recv_from, rank_send_to, ierr)  grid_comm= communicator with a topology  direction: = 0  i varies,  column shift; N or S  : = 1  j varies,  row shift ; E or W  disp : how many procrs to shift over (+ or -)  e.g. N shift:direction=0, disp= -1,  S shift:direction=0, disp= +1  E shift:direction=1, disp= +1  W shift:direction=1, disp= -1

MPI Cartesian Topology Functions  MPI_CART_SHIFT( grid_comm, direction, displ, rank_recv_from, rank_send_to, ierr)  MPI_CART_SHIFT does not actually perform any data transfer. It returns two ranks.  rank_recv_from: the rank of the procr from which the calling procr will receive the new data  rank_send_to: the rank of the procr to which data will be sent from the calling procr  Note: MPI_CART_SHIFT does the modulo arithmetic if the corresponding dimensions has wrap_around(*) =.true.

Parallel Poisson Solver N=upward shift in colms !N or upward shift !P(i+1,j)  (recv from)  P(i,j)  (send to)  P(i-1,j) direction = 0 !i varies disp = -1 !i  i-1 top_bottom_buffer = phi_old_local(1,:) call MPI_CART_SHIFT( grid_comm, direction, disp, rank_recv_from, rank_send_to, ierr) call MPI_SENDRECV( top_bottom_buffer, M_local+1, MPI_DOUBLE_PRECISION, rank_send_to, tag, bottom_ghost_cells, M_local, MPI_DOUBLE_PRECISION, rank_recv_from, tag, grid_comm, status, ierr) phi_old_local(N_local+1,:) = bottom_ghost_cells

Parallel Poisson Solver Main computation do j=1, M_local do i = 1, N_local phi(i,j) = rho(i,j) +.25 * ( phi_old( i+1, j ) + phi_old( i-1, j ) + phi_old( i, j+1 ) + phi_old( i, j-1 ) ) enddo Note: indices are all within range now due to ghost cells

Parallel Poisson Solver: Global vs. Local Indices i_offset=0; do i = 1, coord(1) i_offset = i_offset + nmbr_local_rows(i) enddo j_offset=0; do j = 1, coord(2) j_offset = j_offset + nmbr_local_cols(j) enddo do j = j_offset+1, j_offset + M_local!global indices y = (real(j)-.5)/M*Ly-Ly/2 do i = i_offset+1, i_offset+1 + N_local !global “ x = (real(i)-.5)/N*Lx makerho_local(i-i_offset,j-j_offset) = f(x,y) enddo !store with local indices

Parallel Poisson Solver in MPI processor grid: e.g. 3 x 5, N=M=64 Columns Row 0 Row 1 Row 2 0,00,20,1 0,40,3 1,01,1 1,2 1,3 1,4 2,02,12,2 2,3 2,4 M N

MPI Reduction & Communication Functions  Point-to-point communications in N,S,E,W shifts  MPI_SENDRECV( sendbuf, sendcount,sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source,recvtag, comm, status ierr)  Reduction operations in the computation  MPI_ALLREDUCE( sendbuf, recvbuf, count, datatype, operation, ierr)  operation = MPI_SUM, MPI_MAX, MPI_MIN_LOC,...

I/O of final results Step 1: in row_comm, Gatherv the columns into matrices of size (# rows) x M 1,0 0,10,00,2 0,40,3 1,11,2 1,3 1,4 2,42,0 2,1 2,2 2,3 Columns Row 0 Row 1 Row 2 M N

I/O of final results Step 2: transpose the matrix; in row_comm, Gatherv the columns in M x N matrix. Result is the Transpose of the matrix for . M N M 

MPI Workshop - III Research Staff Passing Structures in MPI Week 3 of 3

Topic  Passing Data Structures in MPI: C version  MPI_Type_struct  also see more general features MPI_PACK, MPI_UNPACK  See the code MPI_Type_struct_annotated.c  in /nfs/workshops/mpi/advanced_mpi_examples

A structure in C #define IDIM 2 #define FDIM 3 #define CHDIM 10 struct buf2{ int Int[IDIM]; float Float[FDIM]; char Char[CHDIM]; float x, y, z; int i,j,k,l,m,n; } sendbuf, recvbuf;

Create a structure datatype in MPI struct buf2{ int Int[IDIM]; float Float[FDIM]; char Char[CHDIM]; float x,y,z; int i,j,k,l,m,n; } sendbuf, recvbuf; MPI_Datatype type[NBLOCKS]; /* #define NBLOCKS 5 */ int blocklen[NBLOCKS]; type=[MPI_INT, MPI_FLOAT, MPI_CHAR, MPI_FLOAT, MPI_INT]; blocklen=[IDIM,FDIM,CHDIM,3,6];

Create a structure datatype in MPI (cont) struct buf2{ int Int[IDIM]; float Float[FDIM]; char Char[CHDIM]; float x,y,z; int i,j,k,l,m,n; } sendbuf, recvbuf; MPI_Aint disp;/* to align on 4-byte boundaries */ MPI_Address(&sendbuf.Int[0],disp); MPI_Address(&sendbuf.Float[0],disp+1); MPI_Address(&sendbuf.Char[0],disp+2); MPI_Address(&sendbuf.x,disp+3); MPI_Address(&sendbuf.i,disp+4);

Create a structure datatype in MPI (cont) int base; MPI_Datatype NewType; /* new MPI type we’re creating */ MPI_Status status; base = disp[0]; /* byte alignment */ for(i=0;i<NBLOCKS;i++) disp[i] -= base; /* MPI calls to create new structure datatype. Since this is the 90’s we need more than to just want this new datatype, we must commit to it. */ MPI_Type_struct(NBLOCKS, blocklen, disp, type, &NewType); MPI_Type_commit(&NewType);

Use of the new structure datatype in MPI /* Right Shift: rank-1 mod(NP)  rank  rank+1 mod(NP) */ MPI_Comm_size(MPI_COMM_WORLD, &NP); MPI_Comm_rank(MPI_COMM_WORLD, &rank); sendto= (rank+1)%NP; recvfrom= (rank-1+NP)%NP; MPI_Sendrecv(&sendbuf, 1, NewType, sendto, tag, &recvbuf, 1, NewType, recvfrom, tag, MPI_COMM_WORLD, &status);

References: Some useful books  MPI: The complete reference  Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker and Jack Dongara, MIT Press  /nfs/workshops/mpi/docs/mpi_complete_reference.ps.Z  Parallel Programming with MPI  Peter S. Pacheco, Morgan Kaufman Publishers, Inc  Using MPI: Portable Parallel Programing with Message Passing Interface  William Gropp, E. Lusk and A. Skjellum, MIT Press