Research Staff Passing Structures in MPI Week 3 of 3

Slides:

Advertisements

Similar presentations

MPI Message Passing Interface

Advertisements

MPI version of the Serial Code With One-Dimensional Decomposition Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy.

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed

MPI Collective Communications

Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.

1 Parallel Programming with MPI- Day 4 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH.

Introduction to MPI Programming (Part III)‏ Michael Griffiths, Deniz Savas & Alan Real January 2006.

1 Buffers l When you send data, where does it go? One possibility is: Process 0Process 1 User data Local buffer the network User data Local buffer.

Parallel & Cluster Computing Distributed Cartesian Meshes Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy,

Virtual Topologies Self Test with solution. Self Test 1.When using MPI_Cart_create, if the cartesian grid size is smaller than processes available in.

MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.

1 Friday, October 20, 2006 “Work expands to fill the time available for its completion.” -Parkinson’s 1st Law.

Introduction MPI Mengxia Zhu Fall An Introduction to MPI Parallel Programming with the Message Passing Interface.

A Message Passing Standard for MPP and Workstations Communications of the ACM, July 1996 J.J. Dongarra, S.W. Otto, M. Snir, and D.W. Walker.

SOME BASIC MPI ROUTINES With formal datatypes specified.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Matrix-vector Multiplication.

MPI Workshop - II Research Staff Week 2 of 3.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

A Brief Look At MPI’s Point To Point Communication Brian T. Smith Professor, Department of Computer Science Director, Albuquerque High Performance Computing.

Parallel Programming with Java

Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.

L15: Putting it together: N-body (Ch. 6) October 30, 2012.

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.

1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.

Parallel Processing1 Parallel Processing (CS 676) Lecture: Grouping Data and Communicators in MPI Jeremy R. Johnson *Parts of this lecture was derived.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Performance Oriented MPI Jeffrey M. Squyres Andrew Lumsdaine NERSC/LBNL and U. Notre Dame.

1 The Message-Passing Model l A process is (traditionally) a program counter and address space. l Processes may have multiple threads (program counters.

An Introduction to Parallel Programming with MPI March 22, 24, 29, David Adams

CS4402 – Parallel Computing

Parallel Programming with MPI By, Santosh K Jena..

Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

MPI Workshop - III Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3.

April 24, 2002 Parallel Port Example. April 24, 2002 Introduction The objective of this lecture is to go over a simple problem that illustrates the use.

MPI Adriano Cruz ©2003 NCE/UFRJ e Adriano Cruz NCE e IM - UFRJ Summary n References n Introduction n Point-to-point communication n Collective.

Parallel Algorithms & Implementations: Data-Parallelism, Asynchronous Communication and Master/Worker Paradigm FDI 2007 Track Q Day 2 – Morning Session.

MPI Groups, Communicators and Topologies. Groups and communicators In our case studies, we saw examples where collective communication needed to be performed.

Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN

An Introduction to Parallel Programming with MPI February 17, 19, 24, David Adams

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:

Introduction to MPI Programming Ganesh C.N.

High Altitude Low Opening?

MPI Jakub Yaghob.

Source: MPI – Message Passing Interface Communicator groups and Process Topologies Source:

Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.

CS4402 – Parallel Computing

Introduction to MPI Programming

CS4961 Parallel Programming Lecture 18: Introduction to Message Passing Mary Hall November 2, /02/2010 CS4961.

Parallel Programming in C with MPI and OpenMP

Parallel Programming with MPI and OpenMP

An Introduction to Parallel Programming with MPI

MPI Groups, Communicators and Topologies

CSCE569 Parallel Computing

September 4, 1997 Parallel Processing (CS 676) Lecture 8: Grouping Data and Communicators in MPI Jeremy R. Johnson *Parts of this lecture was derived.

September 4, 1997 Parallel Processing (CS 730) Lecture 7: Grouping Data and Communicators in MPI Jeremy R. Johnson *Parts of this lecture was derived.

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

CS 5334/4390 Spring 2017 Rogelio Long

Lecture 14: Inter-process Communication

A Message Passing Standard for MPP and Workstations

Parallel Programming with MPI- Day 4

CSCE569 Parallel Computing

Introduction to parallelism and the Message Passing Interface

Jacobi Project Salvatore Orlando.

Introduction to High Performance Computing Lecture 16

Synchronizing Computations

5- Message-Passing Programming

Parallel Processing - MPI

Parallel Programming & Cluster Computing Transport Codes and Shifting

Presentation transcript:

Research Staff Passing Structures in MPI Week 3 of 3 MPI Workshop - III Research Staff Passing Structures in MPI Week 3 of 3

Schedule Course Map CartesianTopology Passing Data Structures in MPI MPI_CART_CREATE MPI_CART_RANK, MPI_CART_COORDS MPI_CART_SHIFT MPI_GATHERV, MPI_REDUCE, MPI_SENDRECV Passing Data Structures in MPI MPI_Vector, MPI_Hvector, MPI_Struct

Course Map

Memory Considerations While data structures in programming languages generally are generally thought of as arrays, MPI views data a one continguous 1-D array.

Examples MPI_SCATTER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm) MPI_SCATTERV(sendbuf,sendcount,displs, sendtype,recvbuf,recvcount,recvtype,root, comm) MPI_GATHER(sendbuf,sendcount,sendtype,recvbuf,recvcount,recvtype,root,comm) MPI_GATHERV(sendbuf,sendcount,sendtype, recvbuf,recvcount,displs,recvtype,root, comm)

Derived Data Types Passing Data Structures in MPI MPI_TYPE_STRUCT also see more general features MPI_PACK, MPI_UNPACK See the code MPI_Type_struct_annotated.c in /nfs/workshops/mpi/advanced_mpi_examples

Poisson Equation on a 2D Grid periodic boundary conditions

Serial Poisson Solver F90 Code N2 matrix for r and f Initialize r. Discretize the equation iterate until convergence output results

Serial Poisson Solver: Solution

Serial Poisson Solver (cont) do j=1, M do i = 1, N !Fortran: access down columns first phi(i,j) = rho(i,j) + .25 * ( phi_old( modulo(i,N)+1, j ) + phi_old( modulo(i-2,N)+1, j ) + phi_old( i, modulo(j,N)+1 ) + phi_old( i, modulo(j-2,N)+1 ) ) enddo

Parallel Poisson Solver in MPI domain decomp. 3 x 5 processor grid Columns 1 2 3 4 1,0 0,1 0,0 0,2 0,4 0,3 1,1 1,2 1,3 1,4 2,4 2,0 2,1 2,2 2,3 Row 0 Row 1 Row 2

Parallel Poisson Solver in MPI processor grid: e.g. 3 x 5, N=M=64 Columns 1 2 3 4 22 23 43 13 14 26 27 39 64 44 52 40 53 Row 0 Row 1 Row 2 0,0 0,2 0,1 0,4 0,3 1,0 1,1 1,2 1,3 1,4 2,0 2,1 2,2 2,3 2,4 M N

Parallel Poisson Solver in MPI Boundary data movement each iteration 1,0 0,1 0,0 0,2 0,4 0,3 1,1 1,2 1,3 1,4 2,4 2,0 2,1 2,2 2,3 Columns 1 2 3 4 Row 0 Row 1 Row 2

Ghost Cells: Local Indices M_local = 13 1 13 14 1 N_local = 21 P(1,2) 21 22

Data Movement e.g. Shift Right, (East) --> boundary data --> ghost cells P(i, j) P(i+1, j)

Communicators and Topologies A Communicator is a set of processors which can talk to each other The basic communicator is MPI_COMM_WORLD One can create new groups or subgroups of processors from MPI_COMM_WORLD or other communicators MPI allows one to associate a Cartesian or Graph topology with a communicator

MPI Cartesian Topology Functions MPI_CART_CREATE( old_comm, nmbr_of_dims, dim_sizes(), wrap_around(), reorder, cart_comm, ierr) old_comm = MPI_COMM_WORLD nmbr_of_dims = 2 dim_sizes() = (np_rows, np_cols) = (3, 5) wrap_around = ( .true., .true. ) reorder = .false. (generally set to .true.) allows system to reorder the procr #s for better performance cart_comm = grid_comm (name for new communicator)

MPI Cartesian Topology Functions MPI_CART_RANK( comm, coords(), rank, ierr) comm = grid_comm coords() = ( coords(1), coords(2) ), e.g. (0,2) for P(0,2) rank = processor rank inside grid_com returns the rank of the procr with coordinates coords() MPI_CART_COORDS( comm, rank, nmbr_of_dims, coords(), ierr ) nmbr_of_dims = 2 returns the coordinates of the procr in grid_comm given its rank in grid_comm

MPI Cartesian Topology Functions MPI_CART_SUB( grid_comm, free_coords(), sub_comm, ierr) grid_comm = communicator with a topology free_coords() : ( .false., .true. ) -> (i fixed, j varies), i.e. row communicator : ( .true., .false. ) -> (i varies, j fixed), i.e. column communicator sub_comm = the new sub communicator (say row_comm or col_comm)

MPI Cartesian Topology Functions MPI_CART_SHIFT( grid_comm, direction, displ, rank_recv_from, rank_send_to, ierr) grid_comm = communicator with a topology direction : = 0 -> i varies, -> column shift; N or S : = 1 -> j varies, -> row shift ; E or W disp : how many procrs to shift over (+ or -) e.g. N shift: direction=0, disp= -1, S shift: direction=0, disp= +1 E shift: direction=1, disp= +1 Wshift: direction=1, disp= -1

MPI Cartesian Topology Functions MPI_CART_SHIFT( grid_comm, direction, displ, rank_recv_from, rank_send_to, ierr) MPI_CART_SHIFT does not actually perform any data transfer. It returns two ranks rank_recv_from: the rank of the procr from which the calling procr will receive the new data rank_send_to: the rank of the procr to which data will be sent from the calling procr Note: MPI_CART_SHIFT does the modulo arithmetic if the corresponding dimensions has wrap_around(*) = .true.

Parallel Poisson Solver N=upward shift in colms !N or upward shift !P(i+1,j) ->(recv from)-> P(i,j) ->(send to)-> P(i-1,j) direction = 0 !i varies disp = -1 !i->i-1 top_bottom_buffer = phi_old_local(1,:) call MPI_CART_SHIFT( grid_comm, direction, disp, rank_recv_from, rank_send_to, ierr) call MPI_SENDRECV( top_bottom_buffer, M_local+1, MPI_DOUBLE_PRECISION, rank_send_to, tag, bottom_ghost_cells, M_local, MPI_DOUBLE_PRECISION, rank_recv_from, tag, grid_comm, status, ierr) phi_old_local(N_local+1,:) = bottom_ghost_cells

Parallel Poisson Solver Main computation do j=1, M_local do i = 1, N_local phi(i,j) = rho(i,j) + .25 * ( phi_old( i+1, j ) + phi_old( i-1, j ) + phi_old( i, j+1 ) + phi_old( i, j-1 ) ) enddo Note: indices are all within range now due to ghost cells

Parallel Poisson Solver: Global vs. Local Indices i_offset=0; do i = 1, coord(1) i_offset = i_offset + nmbr_local_rows(i) enddo j_offset=0; do j = 1, coord(2) j_offset = j_offset + nmbr_local_cols(j) do j = j_offset+1, j_offset + M_local !global indices y = (real(j)-.5)/M*Ly-Ly/2 do i = i_offset+1, i_offset+1 + N_local !global “ x = (real(i)-.5)/N*Lx makerho_local(i-i_offset,j-j_offset) = f(x,y) !store with local indices

Parallel Poisson Solver in MPI processor grid: e.g. 3 x 5, N=M=64 Columns 1 2 3 4 22 23 43 13 14 26 27 39 64 44 52 40 53 Row 0 Row 1 Row 2 0,0 0,2 0,1 0,4 0,3 1,0 1,1 1,2 1,3 1,4 2,0 2,1 2,2 2,3 2,4 M N 1 13 1 21

MPI Reduction & Communication Functions Point-to-point communications in N,S,E,W shifts MPI_SENDRECV( sendbuf, sendcount,sendtype, dest, sendtag, recvbuf, recvcount, recvtype, source,recvtag, comm, status ierr) Reduction operations in the computation MPI_ALLREDUCE( sendbuf, recvbuf, count, datatype, operation, ierr) operation = MPI_SUM, MPI_MAX, MPI_MIN_LOC, ...

I/O of final results Step 1: in row_comm, Gatherv the columns into matrices of size (# rows) x M 1,0 0,1 0,0 0,2 0,4 0,3 1,1 1,2 1,3 1,4 2,4 2,0 2,1 2,2 2,3 Columns 1 2 3 4 Row 0 Row 1 Row 2 M N

I/O of final results Step 2: transpose the matrix; in row_comm, Gatherv the columns in M x N matrix. Result is the Transpose of the matrix for f. N f M M

References: Some useful books MPI: The complete reference Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker and Jack Dongara, MIT Press /nfs/workshops/mpi/docs/mpi_complete_reference.ps.Z Parallel Programming with MPI Peter S. Pacheco, Morgan Kaufman Publishers, Inc Using MPI: Portable Parallel Programing with Message Passing Interface William Gropp, E. Lusk and A. Skjellum, MIT Press