12d.1 Two Example Parallel Programs using MPI UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007.

Slides:



Advertisements
Similar presentations
Parallel Programming in C with MPI and OpenMP
Advertisements

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers Chapter 11: Numerical Algorithms Sec 11.2: Implementing.
Partitioning and Divide-and-Conquer Strategies ITCS 4/5145 Cluster Computing, UNC-Charlotte, B. Wilkinson, 2007.
Introduction MPI Mengxia Zhu Fall An Introduction to MPI Parallel Programming with the Message Passing Interface.
12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
12e.1 More on Parallel Computing UNC-Wilmington, C. Ferner, 2007 Mar 21, 2007.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
1 Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as a i,j and elements of.
1 Tuesday, October 31, 2006 “Data expands to fill the space available for storage.” -Parkinson’s Law.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
CS470/570 Lecture 5 Introduction to OpenMP Compute Pi example OpenMP directives and options.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.
ECE 1747H : Parallel Programming Message Passing (MPI)
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
Message Passing Interface Dr. Bo Yuan
MPI and High Performance Computing: Systems and Programming Barry Britt, Systems Administrator Department of Computer Science Iowa State University.
1. Create list of unmarked natural numbers 2, 3, …, n 2. k  2 3. Repeat: (a) Mark all multiples of k between k 2 and n (b) k  smallest unmarked number.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
1 " Teaching Parallel Design Patterns to Undergraduates in Computer Science” Panel member SIGCSE The 45 th ACM Technical Symposium on Computer Science.
Parallel Programming with MPI By, Santosh K Jena..
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Parallel Programming & Cluster Computing MPI Collective Communications Dan Ernst Andrew Fitz Gibbon Tom Murphy Henry Neeman Charlie Peck Stephen Providence.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
CSC 7600 Lecture 8 : MPI2 Spring 2011 HIGH PERFORMANCE COMPUTING: MODELS, METHODS, & MEANS MESSAGE PASSING INTERFACE MPI (PART B) Prof. Thomas Sterling.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Message-passing Model.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Task/ChannelMessage-passing TaskProcess Explicit channelsMessage communication.
1 Introduction to Parallel Programming with Single and Multiple GPUs Frank Mueller
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()
Timing in MPI Tarik Booker MPI Presentation May 7, 2003.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Using Compiler Directives Paraguin Compiler 1 © 2013 B. Wilkinson/Clayton Ferner SIGCSE 2013 Workshop 310 session2a.ppt Modification date: Jan 9, 2013.
Lecture 3: Today’s topics MPI Broadcast (Quinn Chapter 5) –Sieve of Eratosthenes MPI Send and Receive calls (Quinn Chapter 6) –Floyd’s algorithm Other.
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
Suzaku Pattern Programming Framework (a) Structure and low level patterns © 2015 B. Wilkinson Suzaku.pptx Modification date February 22,
Parallel Programming C. Ferner & B. Wilkinson, 2014 Introduction to Message Passing Interface (MPI) Introduction 9/4/
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.
Message-Passing Computing
Paraguin Compiler Examples.
Using compiler-directed approach to create MPI code automatically
Introduction to Message Passing Interface (MPI)
Paraguin Compiler Examples.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Measuring Program Performance Matrix Multiply
Parallel Matrix Operations
Hybrid Parallel Programming
Paraguin Compiler Communication.
Paraguin Compiler Examples.
Pattern Programming Tools
Lab Course CFD Parallelisation Dr. Miriam Mehl.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
Hybrid Parallel Programming
Using compiler-directed approach to create MPI code automatically
Hybrid Parallel Programming
Introduction to Parallel Computing with MPI
Patterns Paraguin Compiler Version 2.1.
Matrix Addition and Multiplication
Matrix Addition, C = A + B Add corresponding elements of each matrix to form elements of result matrix. Given elements of A as ai,j and elements of B as.
Hybrid Parallel Programming
Measuring Program Performance Matrix Multiply
Matrix Multiplication Sec. 4.2
Some codes for analysis and preparation for programming
Presentation transcript:

12d.1 Two Example Parallel Programs using MPI UNC-Wilmington, C. Ferner, 2007 Mar 209, 2007

12d.2 Matrix Multiplication Matrices are multiplied together using the dot product of each row of the first matrix with each column of the second matrix * = A B C

12d.3 Matrix Multiplication For each value at row i and column j, the result is the dot product of the i th row from A and the j th column from B:

12d.4 Matrix Multiplication For each row i from [0..N-1] and each column j from [0..N-1] the value for position [i][j] of the resulting matrix is computed: for (i = 0; i < N; i++) for (j = 0; j < N; j++) { C[i][j] = 0; for (k = 0; k < N; j++) C[i][j] += A[i][k] * B[k][j]; }

12d.5 Matrix Multiplication This can be implemented on multiple processors where each processor is responsible for computing a different set of rows in the final matrix As long as each processor has the parts of the A and B matrix, they can do this without communication C

12d.6 Matrix Multiplication If there are N rows and P processors, then each processor is responsible for N/P rows. Each processor is responsible for the rows from my_rank * N/P up to (but excluding) (my_rank + 1) * N/P my_rank = 0 my_rank = 1 my_rank = 2 0 * N/P 1 * N/P 2 * N/P 3 * N/P { { {

12d.7 Matrix Multiplication This is coded as: for (i = 0 + my_rank * N/P; i < 0 + (my_rank + 1) * N/P; i++) for (j = 0; j < N; j++) { C[i][j] = 0; for (k = 0; k < N; j++) C[i][j] += A[i][k] * B[k][j]; }

12d.8 Matrix Multiplication One Problem: What if N/P is not an integer? The last processor has fewer than N/P rows for which it is responsible. The code on the previous slide will cause the last processors (or last couple of processors) to compute beyond the last row of the matrix

12d.9 Matrix Multiplication This is dealt with as follows: blksz = (int) ceil((float) N / P); for (i = 0 + my_rank * blksz; i < min(N, 0 + (my_rank + 1) * blksz); i++) for (j = 0; j < N; j++) { C[i][j] = 0; for (k = 0; k < N; j++) C[i][j] += A[i][k] * B[k][j]; }

12d.10 Matrix Multiplication For example suppose N=13 and P=4. Then: blksz = ceiling(13/4) = 4 Processor 0 : i = [0*4..1*4) = [0..4) Processor 1 : i = [1*4..2*4) = [4..8) Processor 2 : i = [2*4..3*4) = [8..12) Processor 3 : i = [3*4..min(13,4*4))=[12..13)

12d.11 Matrix Multiplication The assignment deals with the parallel execution of matrix multiplication

12d.12 Numerical Integration Suppose we have a non-negative, continuous function f and we want to compute the integral of f from a to b: a b y x

12d.13 Numerical Integration We can approximate the integral by dividing the area into trapezoids and summing the area of the trapezoids a b y x

12d.14 Numerical Integration If we use equal width partitions, then each partition is h=(a+b)/n a b y x

12d.15 Numerical Integration The area of the i th trapezoid is: a b y x h

12d.16 Numerical Integration The area for all trapezoids is:

12d.17 Numerical Integration Sequential program double f(double x); main (int argc, char *argv[]) { int N, i; double a, b, h, x, integral; char *usage = "Usage: %s a b N \n"; double elapsed_time; struct timeval tv1, tv2;

12d.18 Numerical Integration Sequential program if (argc < 4) { fprintf (stderr, usage, argv[0]); return -1; } a = atof(argv[1]); b = atof(argv[2]); N = atoi(argv[3]);

12d.19 Numerical Integration Sequential program gettimeofday(&tv1, NULL); h = (b - a) / N; integral = (f(a) + f(b))/2.0; x = a + h; for (i = 1; i < N; i++) { integral += f(x); x += h; } integral = integral*h; gettimeofday(&tv2, NULL);

12d.20 Numerical Integration Sequential program elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / ); printf ("elapsed_time=\t%lf seconds\n", elapsed_time); printf ("With N = %d trapezoids, \n", N); printf ("estimate of integral from %f to %f = %f\n", N, a, b, integral); }

12d.21 Numerical Integration Sequential program double f(double x) { return 6*x*x - 5*x; }

12d.22 Numerical Integration Sequential program $./integ a = , b = , N = elapsed_time= seconds With N = trapezoids, estimate of integral from to =

12d.23 Numerical Integration Parallel program Each processor will be responsible for computing the area of a subset of trapezoids a b y x { P0P0 { P1P1 { P2P2

12d.24 Numerical Integration Parallel program double f (double x); int main(int argc, char *argv[]) { int N, P, mypid, blksz, i; double a, b, h, x, integral, localA, localB, total; char *usage = "Usage: %s a b N \n"; double elapsed_time; struct timeval tv1, tv2; int abort = 0;

12d.25 Numerical Integration Parallel program a = atof(argv[1]); b = atof(argv[2]); N = atoi(argv[3]); MPI_Bcast (&a, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast (&b, 1, MPI_DOUBLE, 0, MPI_COMM_WORLD); MPI_Bcast (&N, 1, MPI_INT, 0, MPI_COMM_WORLD); h = (b - a) / N;

12d.26 Numerical Integration Parallel program blksz = (int) ceil ( ((float) N) / P); localA = a + mypid * blksz * h; localB = min(b, a + (mypid + 1) * blksz * h); integral = (f(localA) + f(localB))/2.0; x = localA + h; for (i = 1; i < blksz && x <= localB; i++) { integral += f(x); x += h; } integral = integral*h;

12d.27 Numerical Integration Parallel program MPI_Reduce (&integral, &total, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (mypid == 0) printf ("integral = %f\n", total); } float f(float x) { return 6*x*x - 5*x; }

12d.28 Numerical Integration Parallel program $ mpicc mpiInteg.c -o mpiInteg -lm $ mpirun -nolocal -np 4 mpiInteg elapsed_time= seconds integral =