Introduction to Parallel Programming with MPI

Slides:

Advertisements

Similar presentations

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed

Advertisements

MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.

Reference: / MPI Program Structure.

MPI_Gatherv CISC372 Fall 2006 Andrew Toy Tom Lynch Bill Meehan.

Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

1 CS 668: Lecture 2 An Introduction to MPI Fred Annexstein University of Cincinnati CS668: Parallel Computing Fall 2007 CC Some.

Parallel Programming in C with MPI and OpenMP

EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.

Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.

Message Passing Interface. Message Passing Interface (MPI) Message Passing Interface (MPI) is a specification designed for parallel applications. The.

Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.

ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Director of Contra Costa College High Performance Computing Center

1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.

An Introduction to Parallel Programming and MPICH Nikolaos Hatzopoulos.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.

Hybrid MPI and OpenMP Parallel Programming

CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

Parallel Programming with MPI By, Santosh K Jena..

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Message-passing Model.

Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.

Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.

2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN

Message Passing Interface Using resources from

Introduction to MPI programming Morris Law, SCID May 18/25, 2013.

COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University

Chapter 4 Message-Passing Programming. Learning Objectives Understanding how MPI programs execute Understanding how MPI programs execute Familiarity with.

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.

Introduction to MPI Programming Ganesh C.N.

MPI Jakub Yaghob.

CS4402 – Parallel Computing

Introduction to MPI.

MPI Message Passing Interface

CS 668: Lecture 3 An Introduction to MPI

Introduction to MPI CDP.

Send and Receive.

Send and Receive.

Introduction to Message Passing Interface (MPI)

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

CS 5334/4390 Spring 2017 Rogelio Long

Lecture 14: Inter-process Communication

Hybrid Parallel Programming

CSCE569 Parallel Computing

Introduction to parallelism and the Message Passing Interface

ITCS 4/5145 Parallel Computing, UNC-Charlotte, B

Hybrid Parallel Programming

Hybrid Parallel Programming

MPI MPI = Message Passing Interface

Introduction to Parallel Computing with MPI

Hybrid MPI and OpenMP Parallel Programming

Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes

Introduction to Parallel Computing

Parallel Processing - MPI

MPI Message Passing Interface

Some codes for analysis and preparation for programming

CS 584 Lecture 8 Assignment?.

Presentation transcript:

Introduction to Parallel Programming with MPI Morris Law, SCID Apr 25, 2018

Multi-core programming Currently, most CPUs has multiple cores that can be utilized easily by compiling with openmp support Programmers no longer need to rewrite a sequential code but to add directives to instruct the compiler for parallelizing the code with openmp. For reference site: http://bisqwit.iki.fi/story/howto/openmp/

Openmp example /* printf("Completed array init.\n"); * Sample program to test runtime of simple matrix multiply printf("Crunching without OMP..."); * with and without OpenMP on gcc-4.3.3-tdm1 (mingw) fflush(stdout); * compile with gcc –fopenmp start = omp_get_wtime(); * (c) 2009, Rajorshi Biswas */ #include <stdio.h> temp = 0; #include <stdlib.h> for(k=0; k<n; ++k) { #include <time.h> temp += arr1[i][k] * arr2[k][j]; #include <assert.h> arr3[i][j] = temp; #include <omp.h> int main(int argc, char **argv) { end = omp_get_wtime(); int i,j,k; printf(" took %f seconds.\n", end-start); int n; printf("Crunching with OMP..."); double temp; double start, end, run; printf("Enter dimension ('N' for 'NxN' matrix) (100-2000): "); #pragma omp parallel for private(i, j, k, temp) scanf("%d", &n); assert( n >= 100 && n <= 2000 ); int **arr1 = malloc( sizeof(int*) * n); int **arr2 = malloc( sizeof(int*) * n); int **arr3 = malloc( sizeof(int*) * n); for(i=0; i<n; ++i) { arr1[i] = malloc( sizeof(int) * n ); arr2[i] = malloc( sizeof(int) * n ); arr3[i] = malloc( sizeof(int) * n ); } printf("Populating array with random values...\n"); srand( time(NULL) ); for(j=0; j<n; ++j) { return 0; arr1[i][j] = (rand() % n); arr2[i][j] = (rand() % n);

Compiling for openmp support GCC gcc –fopenmp –o foo foo.c gfortran –fopenmp –o foo foo.f Intel Compiler icc -openmp –o foo foo.c ifort –openmp –o foo foo.f PGI Compiler pgcc -mp –o foo foo.c pgf90 –mp –o foo foo.f

What is Message Passing Interface (MPI)? Portable standard for communication Processes can communicate through messages. Each process is a separable program All data is private

What is Message Passing Interface (MPI)? This is a library, not a language!! Different compilers, but all must use the same libraries, i.e. MPICH, LAM, OPENMPI etc. Use standard sequential language. Fortran, C, C++, etc.

Basic Idea of Message Passing Interface (MPI) MPI Environment Initialize, manage, and terminate communication among processes Communication between processes Point to point communication, i.e. send, receive, etc. Collective communication, i.e. broadcast, gather, etc. Complicated data structures Communicate the data effectively e.g. matrices and memory

Message Passing Model Serial Message Passing Process time Process 0 Data exchange via interconnection Message Passing

General MPI program structure MPI include file variable declarations Initialize MPI environment Do work and make message passing calls Terminate MPI Environment #include <mpi.h> int main (int argc, char *argv[]) { int np, rank, ierr; ierr = MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&np); /* Do Some Works */ printf(“Helloworld, I’m P%d of %d\n”,rank,np); ierr = MPI_Finalize(); } Helloworld, I’m P0 of 3 Helloworld, I’m P1 of 3 Helloworld, I’m P2 of 3

When Use MPI? You need a portable parallel program You are writing a parallel library You care about performance You have a problem that can be solved in parallel ways

F77/F90, C/C++ MPI library calls Fortran 77/90 uses subroutines CALL is used to invoke the library call Nothing is returned, the error code variable is the last argument All variables are passed by reference C/C++ uses functions Just the name is used to invoke the library call The function returns an integer value (an error code) Variables are passed by value, unless otherwise specified

Types of Communication Point to Point Communication communication involving only two processes. Collective Communication communication that involves a group of processes.

Implementation of MPI

Browse the sample files Inside your home directory, the sample zip file, mpi-1.zip has been stored for the laboratory. Please unzip the file unzip mpi-1.zip There shall be 4 subdirectories inside mpi-1 ls –l mpi-1 total 20 drwxr-xr-x. 2 morris dean 4096 Nov 27 09:55 00openmp drwxrwxr-x. 2 morris dean 4096 Nov 27 09:41 0-hello drwxrwxr-x. 2 morris dean 4096 Nov 27 09:51 array-pi drwxrwxr-x. 2 morris dean 4096 Nov 27 09:49 mc-pi drwxrwxr-x. 2 morris dean 4096 Nov 27 09:51 series-pi The 4 subdirectories stored sample mpi-1 programs with README files for this laboratory.

First MPI C program:- hello1.c Change directory to hello and use an editor, e.g. nano to open hello1.c cd hello nano hello1.c #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int version, subversion; MPI_Init(&argc, &argv); MPI_Get_version(&version, &subversion); printf("Hello world!\n"); printf("Your MPI Version is: %d.%d\n", version, subversion); MPI_Finalize(); return(0); }

First MPI Fortran program:- hello1.f Use an editor to open hello1.f cd hello nano hello1.f program main include 'mpif.h' integer ierr, version, subversion call MPI_INIT(ierr) call MPI_GET_VERSION(version, subversion, ierr) print *, 'Hello world!' print *, 'Your MPI Version is: ', version, '.', subversion call MPI_FINALIZE(ierr) end

Second MPI C program:- hello2.c Use an editor to open hello2.c cd hello nano hello2.c #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int rank, size; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); printf("Hello world! I am P%d of %d\n", rank, size); MPI_Finalize(); return(0); }

Second MPI Fortran program:- hello2.f Use an editor to open hello2.f cd hello nano hello2.f program main include 'mpif.h' integer rank, size, ierr call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD, size, ierr) print *, 'Hello world! I am P', rank, ' of ', size call MPI_FINALIZE(ierr) end

Make all files in hello ‘Makefile’ is written for each example directory. Run ‘make’ will compile all hello examples make /usr/lib64/mpich/bin/mpif77 -o helloF1 hello1.f /usr/lib64/mpich/bin/mpif77 -o helloF2 hello2.f /usr/lib64/mpich/bin/mpicc -o helloC1 hello1.c /usr/lib64/mpich/bin/mpicc -o helloC2 hello2.c

mpirun hello examples in foreground You may run the hello examples in foreground by specifying the no of processors and the machinefile with mpirun. e.g. mpirun –np 4 –machinefile machine ./helloC2 Hello world! I am P0 of 4 Hello world! I am P2 of 4 Hello world! I am P3 of 4 Hello world! I am P1 of 4 machine is the file storing the hostname you want the programs run.

Exercise Follow the above hello example, mpirun helloC1, helloF1 and helloF2 in foreground with 4 processors in foreground Change directory to mc-pi, compile all programs inside using ‘make’ Run mpi-mc-pi using 2,4,8 processors. Change directory to series-pi, Run series-pi using 2,4,6,8 processors. Note the time difference

Parallelization example: serial-pi.c #include <stdio.h> static long num_steps = 10000000; double step; int main () { int i; double x, pi, sum = 0.0; step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); 22 22

Parallelizing serial-pi. c into mpi-pi Parallelizing serial-pi.c into mpi-pi.c:- Step 1: Adding MPI environment #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, pi, sum = 0.0; MPI_Init(&argc,&argv); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f\n",pi); MPI_Finalize();

Parallelizing serial-pi. c into mpi-pi Parallelizing serial-pi.c into mpi-pi.c :- Step 2: Adding variables to print ranks #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=0;i< num_steps; i++){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } pi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",pi, rank, size); MPI_Finalize();

Parallelizing serial-pi.c into mpi-pi.c :- Step 3: divide the workload #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum; printf("Est Pi= %f, Processor %d of %d \n",mypi, rank, size); MPI_Finalize();

Parallelizing serial-pi. c into mpi-pi Parallelizing serial-pi.c into mpi-pi.c :- Step 4: collect partial results #include "mpi.h" #include <stdio.h> static long num_steps = 10000000; double step; int main (int argc, char *argv[]) { int i; double x, mypi, pi, sum = 0.0; int rank, size; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD, &rank); MPI_Comm_size(MPI_COMM_WORLD, &size); step = 1.0/(double) num_steps; for (i=rank;i< num_steps; i+=size){ x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } mypi = step * sum MPI_Reduce(&mypi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); if (rank==0) printf("Est Pi= %f, \n",pi); MPI_Finalize();

Compile and run mpi program $ mpicc –o mpi-pi mpi-pi.c $ mpirun -np 4 -machinefile machines mpi-pi

Parallelization example 2: serial-mc-pi.c #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q; time_t now; in = 0; srand(time(&now)); printf("Input no of samples : "); scanf("%ld",&n); for (i=0;i<n;i++) x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) in++; } q = ((double)4.0)*in/n; printf("pi = %.20lf\n",q); printf("rmse = %.20lf\n",sqrt(( (double) q*(4-q))/n)); 2r

Parallelization example 2: mpi-mc-pi.c #include "mpi.h" #include <stdio.h> #include <stdlib.h> #include <time.h> main(int argc, char *argv[]) { long in,i,n; double x,y,q,Q; time_t now; int rank,size; MPI_Init(&argc, &argv); in = 0; MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); srand(time(&now)+rank); if (rank==0) { printf("Input no of samples : "); scanf("%ld",&n); } MPI_Bcast(&n,1,MPI_LONG,0,MPI_COMM_WORLD); for (i=0;i<n;i++) x = rand()/(RAND_MAX+1.0); y = rand()/(RAND_MAX+1.0); if ((x*x + y*y) < 1) in++; q = ((double)4.0)*in/n; MPI_Reduce(&q,&Q,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); Q = Q / size; if (rank==0) { printf("pi = %.20lf\n",Q); printf("rmse = %.20lf\n",sqrt(( (double) Q*(4-Q))/n/size)); MPI_Finalize(); 2r

Compile and run mpi-mc-pi $ mpicc –o mpi-mc-pi mpi-mc-pi.c $ mpirun -np 4 -machinefile machines mpi-mc-pi

Collective communication scatter 1 3 5 7 105 reduction (e.g. PROD) distribute your data among the processes information of all processes is used to provide a condensed result by/for one process

MPI_Scatter Distributes data from root to all other tasks in a group int MPI_Scatter (void *sendbuf, int sendcnt, MPI_Datatype sendtype ,void *recvbuf,int recvcnt, MPI_Datatype recvtype, int root, MPI_Comm comm ) Input Parameters sendbuf address of send buffer (choice, significant only at root ) sendcnt no. of elements sent to each process (integer, significant only at root) sendtype data type of send buffer elements (significant only at root) (handle) recvcnt number of elements in receive buffer (integer) recvtype data type of receive buffer elements (handle) root rank of sending process (integer) comm communicator (handle) Output Parameter recvbuf address of receive buffer (choice)

MPI_Scatter(&a,1,MPI_INT,&m,1,MPI_INT,2,MPI_COMM_WORLD); Example: two vectors are distributed in order to prepare a parallel computation of their scalar product Data 3 2 a[3] a[2] 13 12 11 10 1 a[1] a[0] Processor m 3 2 a[3] a[2] 13 12 11 10 1 a[1] a[0] Processor m Data MPI_Scatter MPI_Scatter(&a,1,MPI_INT,&m,1,MPI_INT,2,MPI_COMM_WORLD);

MPI_Reduce Reduces values on all processes to a single value on root. int MPI_Reduce (void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, MPI_Op op, int root, MPI_Comm comm ) Input Parameters sendbuf address of send buffer (choice) count number of elements in send buffer (integer) datatype data type of elements of send buffer (handle) op reduce operation (handle) root rank of root process (integer) comm communicator (handle) Output Parameter recvbuf address of receive buffer (choice, significant only at root)

MPI_Reduce Example: calculation of the global minimum of the variables kept by all processes, calculation of a global sum, etc. 3 2 d c 9 5 1 b a Processor Data 3 2 d c 9 19 5 1 b a Processor Data MPI_Reduce op:MPI_SUM MPI_Reduce(&b,&d,1,MPI_INT,MPI_SUM,2,MPI_COMM_WORLD);

MPI Datatype MPI Datatype Corr. Datatype in C MPI_CHAR signed char MPI_SHORT signed short int MPI_INT signed int MPI_LONG signed long int MPI_UNSIGNED_CHAR unsigned char MPI_UNSIGNED_SHORT unsigned short int MPI_UNSIGNED unsigned int MPI_UNSIGNED_LONG unsigned long int MPI_FLOAT float MPI_DOUBLE double MPI_LONG_DOUBLE long double

Thanks Please give me some comments https://goo.gl/forms/880NY3kZ9h7ay7r32 Morris Law, SCID (morris@hkbu.edu.hk)