MPI Workshop - II Research Staff Week 2 of 3.

Slides:



Advertisements
Similar presentations
Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
Advertisements

MPI Collective Communications
1 Collective Operations Dr. Stephen Tse Lesson 12.
Sahalu Junaidu ICS 573: High Performance Computing 8.1 Topic Overview Matrix-Matrix Multiplication Block Matrix Operations A Simple Parallel Matrix-Matrix.
MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.
MPI_AlltoAllv Function Outline int MPI_Alltoallv ( void *sendbuf, int *sendcnts, int *sdispls, MPI_Datatype sendtype, void *recvbuf, int *recvcnts, int.
12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
SOME BASIC MPI ROUTINES With formal datatypes specified.
MPI Collective Communication CS 524 – High-Performance Computing.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 8 Matrix-vector Multiplication.
Parallel Programming in C with MPI and OpenMP
Collective Communications
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Jonathan Carroll-Nellenback CIRC Summer School MESSAGE PASSING INTERFACE (MPI)
Distributed Systems CS Programming Models- Part II Lecture 17, Nov 2, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar 1.
Parallel Programming with Java
Parallel Programming with MPI Matthew Pratola
L15: Putting it together: N-body (Ch. 6) October 30, 2012.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
1 Copyright © 2010, Elsevier Inc. All rights Reserved Chapter 3 Distributed Memory Programming with MPI An Introduction to Parallel Programming Peter Pacheco.
Chapter 6 Parallel Sorting Algorithm Sorting Parallel Sorting Bubble Sort Odd-Even (Transposition) Sort Parallel Odd-Even Transposition Sort Related Functions.
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
2a.1 Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.
PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
MPI Communications Point to Point Collective Communication Data Packaging.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Parallel Programming with MPI By, Santosh K Jena..
Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Parallel Programming & Cluster Computing MPI Collective Communications Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen Providence.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
MPI Jakub Yaghob. Literature and references Books Gropp W., Lusk E., Skjellum A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Message-passing Model.
MPI Workshop - III Research Staff Cartesian Topologies in MPI and Passing Structures in MPI Week 3 of 3.
L19: Putting it together: N-body (Ch. 6) November 22, 2011.
Supercomputing in Plain English Collective Communications and N-Body Problems Henry Neeman, Director OU Supercomputing Center for Education & Research.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
MPI_Alltoall By: Jason Michalske. What is MPI_Alltoall? Each process sends distinct data to each receiver. The Jth block of process I is received by process.
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
Research Staff Passing Structures in MPI Week 3 of 3
Collectives Reduce Scatter Gather Many more.
CS4402 – Parallel Computing
Introduction to MPI Programming
Computer Science Department
Send and Receive.
Collective Communication with MPI
An Introduction to Parallel Programming with MPI
Collective Communication Operations
Send and Receive.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Distributed Systems CS
CS 5334/4390 Spring 2017 Rogelio Long
Collective Communication in MPI and Advanced Features
High Performance Parallel Programming
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
All-to-All Pattern A pattern where all (slave) processes can communicate with each other Somewhat the worst case scenario! ITCS 4/5145 Parallel Computing,
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Computer Science Department
5- Message-Passing Programming
Presentation transcript:

MPI Workshop - II Research Staff Week 2 of 3

Today’s Topics Course Map Basic Collective Communications MPI_Barrier MPI_Scatterv, MPI_Gatherv, MPI_Reduce MPI Routines/Exercises Pi, Matrix-Matrix mult., Vector-Matrix mult. Other Collective Calls References

Course Map

Example 1 - Pi Calculation Uses the following MPI calls: MPI_BARRIER, MPI_BCAST, MPI_REDUCE

Integration Domain: Serial x0 x1 x2 x3 xN

Serial Pseudocode f(x) = 1/(1+x2) Example: h = 1/N, sum = 0.0 N = 10, h=0.1 do i = 1, N x={.05, .15, .25, .35, .45, .55, x = h*(i - 0.5) .65, .75, .85, .95} sum = sum + f(x) enddo pi = h * sum

Integration Domain: Parallel x0 x1 x2 x3 x4 xN

Parallel Pseudocode P(0) reads in N and Broadcasts N to each processor f(x) = 1/(1+x2) Example: h = 1/N, sum = 0.0 N = 10, h=0.1 do i = rank+1, N, nprocrs Procrs: {P(0),P(1),P(2)} x = h*(i - 0.5) P(0) -> {.05, .35, .65, .95} sum = sum + f(x) P(1) -> {.15, .45, .75} enddo P(2) -> {.25, .55, .85} mypi = h * sum Collect (Reduce) mypi from each processor into a collective value of pi on the output processor

Collective Communications - Synchronization Collective calls can (but are not required to) return as soon as their participation in a collective call is complete. Return from a call does NOT indicate that other processes have completed their part in the communication. Occasionally, it is necessary to force the synchronization of processes. MPI_BARRIER

Collective Communications - Broadcast MPI_BCAST

Collective Communications - Reduction MPI_REDUCE MPI_SUM, MPI_PROD, MPI_MAX, MPI_MIN, MPI_IAND, MPI_BAND,...

Example 2: Matrix Multiplication (Easy) in C Two versions depending on whether or not the # rows of C and A are evenly divisible by the number of processes. Uses the following MPI calls: MPI_BCAST, MPI_BARRIER, MPI_SCATTERV, MPI_GATHERV

Serial Code in C/C++ for(i=0; i<nrow_c; i++) for(j=0;j<ncol_c; j++) c[i][j]=0.0e0; for(k=0; k<ncol_a; k++) c[i][j]+=a[i][k]*b[k][j]; Note that all the arrays accessed in row major order. Hence, it makes sense to distribute the arrays by rows.

Matrix Multiplication in C Parallel Example

Collective Communications - Scatter/Gather MPI_GATHER, MPI_SCATTER, MPI_GATHERV, MPI_SCATTERV

Flavors of Scatter/Gather Equal-sized pieces of data distributed to each processor MPI_SCATTER, MPI_GATHER Unequal-sized pieces of data distributed MPI_SCATTERV, MPI_GATHERV Must specify arrays of sizes of data and their displacements from the start of the data to be distributed or collected. Both of these arrays are of length equal to the size of communications group

Scatter/Scatterv Calling Syntax int MPI_Scatter(void *sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm) int MPI_Scatterv(void *sendbuf, int *sendcounts, int *offsets, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

Abbreviated Parallel Code (Equal size) ierr=MPI_Scatter(*a,nrow_a*ncol_a/size,...); ierr=MPI_Bcast(*b,nrow_b*ncol_b,...); for(i=0; i<nrow_c/size; i++) for(j=0;j<ncol_c; j++) cpart[i][j]=0.0e0; for(k=0; k<ncol_a; k++) cpart[i][j]+=apart[i][k]*b[k][j]; ierr=MPI_Gather(*cpart,(nrow_c/size)*ncol_c, ...);

Abbreviated Parallel Code (Unequal) ierr=MPI_Scatterv(*a,a_chunk_sizes,a_offsets,...); ierr=MPI_Bcast(*b,nrow_b*ncol_b, ...); for(i=0; i<c_chunk_sizes[rank]/ncol_c; i++) for(j=0;j<ncol_c; j++) cpart[i][j]=0.0e0; for(k=0; k<ncol_a; k++) cpart[i][j]+=apart[i][k]*b[k][j]; ierr=MPI_Gatherv(*cpart, c_chunk_sizes[rank], MPI_DOUBLE, ...); Look at C code to see how sizes and offsets are done.

Fortran version F77 - no dynamic memory allocation. F90 - allocatable arrays, arrays allocated in contiguous memory. Multi-dimensional arrays are stored in memory in column major order. Questions for the student. How should we distribute the data in this case? What about loop ordering? We never distributed B matrix. What if B is large?

Example 3: Vector Matrix Product in C Illustrates MPI_Scatterv, MPI_Reduce, MPI_Bcast

Main part of parallel code ierr=MPI_Scatterv(a,a_chunk_sizes,a_offsets,MPI_DOUBLE, apart,a_chunk_sizes[rank],MPI_DOUBLE, root, MPI_COMM_WORLD); ierr=MPI_Scatterv(btmp,b_chunk_sizes,b_offsets,MPI_DOUBLE, bparttmp,b_chunk_sizes[rank],MPI_DOUBLE, … initialize cpart to zero … for(k=0; k<a_chunk_sizes[rank]; k++) for(j=0; j<ncol_c; j++) cpart[j]+=apart[k]*bpart[k][j]; ierr=MPI_Reduce(cpart, c, ncol_c, MPI_DOUBLE, MPI_SUM, root, MPI_COMM_WORLD);

Collective Communications - Allgather MPI_ALLGATHER

Collective Communications - Alltoall MPI_ALLTOALL

References - MPI Tutorial CS471 Class Web Site - Andy Pineda http://www.arc.unm.edu/~acpineda/CS471/HTML/CS471.html MHPCC http://www.mhpcc.edu/training/workshop/html/mpi/MPIIntro.html Edinburgh Parallel Computing Center http://www.epcc.ed.ac.uk/epic/mpi/notes/mpi-course-epic.book_1.html Cornell Theory Center http://www.tc.cornell.edu/Edu/Talks/topic.html#mess

References - IBM Parallel Environment POE - Parallel Operating Environment http://www.mhpcc.edu/training/workshop/html/poe/poe.html http://ibm.tc.cornell.edu/ibm/pps/doc/primer/ Loadleveler http://www.mhpcc.edu/training/workshop/html/loadleveler/LoadLeveler.html http://ibm.tc.cornell.edu/ibm/pps/doc/LlPrimer.html http://www.qpsf.edu.au/software/ll-hints.html

Exercise: Vector Matrix Product in C Rewrite Example 3 to perform the vector matrix product as shown.