KAUST Winter Enhancement Program 2010 (WE 244)

Slides:



Advertisements
Similar presentations
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Advertisements

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
1 OpenMP—An API for Shared Memory Programming Slides are based on:
Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.
CSCI-6964: High Performance Parallel & Distributed Computing (HPDC) AE 216, Mon/Thurs 2-3:20 p.m. Pthreads (reading Chp 7.10) Prof. Chris Carothers Computer.
A Very Short Introduction to OpenMP Basile Schaeli EPFL – I&C – LSP Vincent Keller EPFL – STI – LIN.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
1 Parallel Programming With OpenMP. 2 Contents  Overview of Parallel Programming & OpenMP  Difference between OpenMP & MPI  OpenMP Programming Model.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.
IBM Research © 2006 IBM Corporation CDT Static Analysis Features CDT Developer Summit - Ottawa Beth September.
Director of Contra Costa College High Performance Computing Center
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
Hybrid MPI and OpenMP Parallel Programming
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI and OpenMP By: Jesus Caban and Matt McKnight.
Parallel Programming with MPI By, Santosh K Jena..
Introduction to OpenMP
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
MPI and OpenMP.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
Threaded Programming Lecture 2: Introduction to OpenMP.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
PVM and MPI.
1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.
OpenMP An API : For Writing Portable SMP Application Software Rider NCHC GTD.
Chapter 4.
Introduction to OpenMP
Introduction to parallel computing concepts and technics
MPI Basics.
CS4402 – Parallel Computing
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Improving Barrier Performance Dr. Xiao Qin.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing A bug in the rwlock program Dr. Xiao Qin.
Introduction to MPI.
Computer Engg, IIT(BHU)
MPI Message Passing Interface
Introduction to OpenMP
Computer Science Department
CS 668: Lecture 3 An Introduction to MPI
CS 584.
Introduction to Message Passing Interface (MPI)
Message Passing Models
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
Lab Course CFD Parallelisation Dr. Miriam Mehl.
Introduction to parallelism and the Message Passing Interface
Hybrid Parallel Programming
Introduction to OpenMP
Introduction to Parallel Computing with MPI
Hybrid MPI and OpenMP Parallel Programming
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Introduction to Parallel Computing
Distributed Memory Programming with Message-Passing
Parallel Processing - MPI
MPI Message Passing Interface
Some codes for analysis and preparation for programming
CS 584 Lecture 8 Assignment?.
Programming Parallel Computers
Presentation transcript:

KAUST Winter Enhancement Program 2010 (WE 244) MPI and OpenMP Craig C. Douglas School of Energy Resources Department of Mathematics University of Wyoming

What is MPI? MPI: Message Passing Interface MPI is not a new programming language, but a library with functions that can be called from C/C++/Fortran/Python Successor to PVM (Parallel Virtual Machine ) Developed by an open, international forum with representation from industry, academia, and government laboratories.

What Is It Good For? Allows data to be passed between processes in a distributed memory environment Provides source-code portability Allows efficient implementation A great deal of functionality Support for heterogeneous parallel architectures

MPI Communicator Idea: Most functions use communicators Group of processors that are allowed to communicate to each other Most functions use communicators MPI_COMM_WORLD Note MPI Format: MPI_XXX var = MPI_Xxx(parameters); MPI_Xxx(parameters);

Getting Started Include MPI header file Initialize MPI environment Work: Make message passing calls Send Receive Terminate MPI environment

Include MPI header file Include File Include Include MPI header file #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char** argv){ … } Initialize Work Terminate

Initialize MPI environment Include Initialize MPI environment int main(int argc, char** argv){ int numtasks, rank; MPI_Init (*argc,*argv) ; MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ... } Initialize Work Terminate

Initialize MPI (cont.) MPI_Init (&argc,&argv) Include Not MPI functions called before this call. MPI_Comm_size(MPI_COMM_WORLD, &nump) A communicator is a collection of processes that can send messages to each other. MPI_COMM_WORLD is a predefined communicator that consists of all the processes running when the program execution begins. MPI_Comm_rank(MPI_COMM_WORLD, &myrank) In order for a process to find out its rank (its identification number). Include Initialize Work Terminate

Terminate MPI environment Include #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char** argv){ … MPI_Finalize(); } Initialize Work No MPI functions called after this call. Terminate

Make message passing calls (Send, Receive) Let’s work with MPI Work: Make message passing calls (Send, Receive) Include if(my_rank != 0){ MPI_Send(data, strlen(data)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else{ MPI_Recv(data, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); Initialize Work Terminate

Work (cont.) int MPI_Send ( void* message, Include int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) Include Initialize Work int MPI_Recv ( void* message, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm MPI_Status *status) Terminate

Hello World!! #include "mpi.h" int main(int argc, char* argv[]) { int my_rank, p, source, dest, tag = 0; char message[100]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); if (my_rank != 0) { /* Create message */ sprintf(message, “Hello from process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); }else { for(source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s", message); }} MPI_Finalize(); }

Compile and Run MPI Compile mpicc mpi_hello.c Run mpirun –np 5 hello.exe Output $mpirun –np 5 hello.exe Hello from process 1! Hello from process 2! Hello from process 3! Hello from process 4!

More MPI Functions MPI_Bcast( void *m, int s, MPI_Datatype dt, int root, MPI_Comm) Sends a copy of the data in m on the process with rank root to each process in the communicator. MPI_Reduce( void *operand, void* result, int count, MPI_Datatype datatye, MPI_Op operator, int root, MPI_Comm comm) Combines the operands stored in the memory referenced by operand using operation operator and stores the result in res on process root. double MPI_Wtime( void) Returns a double precision value that represents the number of seconds that have elapsed since some point in the past. MPI_Barrier ( MPI_Comm comm) Each process in comm block until every process in comm has called it.

More Examples Trapezoidal Rule: Compute Pi Integral from a to b of a nonnegative function f(x) Approach: Estimating the area by partitioning the region into regular geometric shapes and then add the areas of the shapes Compute Pi

Compute PI #include <stdio.h> #include "mpi.h" #define PI 3.141592653589793238462643 #define PI_STR "3.141592653589793238462643" #define MAXLEN 40 #define f(x) (4./(1.+ (x)*(x))) void main(int argc, char *argv[]){ int N=0,rank,nprocrs,i,answer=1; double mypi,pi,h,sum, x, starttime,endtime,runtime,runtime_max; char buff[MAXLEN]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf(“CPU %d saying hello",rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocrs); if(rank==0) printf("Using a total of %d CPUs",nprocrs);

Compute PI while(answer){ if(rank==0){ printf("This program computes pi as “ "4.*Integral{0->1}[1/(1+x^2)]"); printf("(Using PI = %s)",PI_STR); printf("Input the Number of intervals: N ="); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&N); printf("pi will be computed with %d intervals on %d processors.", N ,nprocrs); } /*Procr 0 = P(0) gives N to all other processors*/ MPI_Bcast(&N,1,MPI_INT,0,MPI_COMM_WORLD); if(N<=0) goto end_program;

Compute PI starttime=MPI_Wtime(); sum=0.0; h=1./N; for(i=1+rank;i<=N;i+=nprocrs){ x=h*(i-0.5); sum+=f(x); } mypi=sum*h; endtime=MPI_Wtime(); runtime=endtime-starttime; MPI_Reduce(&mypi,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); MPI_Reduce(&runtime,&runtime_max,1,MPI_DOUBLE,MPI_MAX,0, MPI_COMM_WORLD); printf("Procr %d: runtime = %f",rank,runtime); fflush(stdout); if(rank==0){ printf("For %d intervals, pi = %.14lf,error=%g",N,pi,fabs(pi-PI));

Compute PI printf("computed in = %f secs",runtime_max); fflush(stdout); printf("Do you wish to try another run? (y=1;n=0)"); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&answer); } /*processors wait while P(0) gets new input from user*/ MPI_Barrier(MPI_COMM_WORLD); MPI_Bcast(&answer,1,MPI_INT,0,MPI_COMM_WORLD); if(!answer) break; end_program: printf("\nProcr %d: Saying good-bye!\n",rank); if(rank==0) printf("\nEND PROGRAM\n"); MPI_Finalize(); }

Compile and Run Example 2 mpicc –o pi.exe pi.c $mpirun –np 2 pi.exe Procr 1 saying hello. Procr 0 saying hello Using a total of 2 CPUs This program computes pi as 4.*Integral{0->1}[1/(1+x^2)] (Using PI = 3.141592653589793238462643) Input the Number of intervals: N = 10 pi will be computed with 10 intervals on 2 processors Procr 0: runtime = 0.000003 Procr 1: runtime = 0.000003 For 10 intervals, pi = 3.14242598500110, error = 0.000833331 computed in = 0.000003 secs

OpenMP What does OpenMP stand for? Open specifications for Multi Processing It is an API with three main components Compiler directives Library routines Variables Used for writing multithreaded programs in shared memory environments

What do you need? What programming languages? What operating systems? C and C++ FORTRAN (77, 90, 95) What operating systems? UNIX based ones Windows Can I compile OpenMP code with gcc? Yes: gcc -o pgm.exe -fopenmp pgm.c

Some compilers for OpenMP Free Software Foundation (GNU) Intel Portland Group Compilers and Tools IBM XL SGI MIPSpro Sun Studio 10 Absoft Pro FortranMP

What It Does Program starts off with a master thread It runs for some amount of time When the master thread reaches a region where the work can be done concurrently It creates several threads They all do work in this region When the end of the region is reached All of the extra threads terminate The master thread continues

Example You (master thread) get a job moving boxes When you go to work you bring several “friends” (sub-threads) Who help you move the boxes On pay day You do not bring any friends and you get all of the money

OpenMP directives #pragma omp parallel for shared(y) Format example Always starts with #pragma omp Then the directive name parallel for Followed by a clause The clause is optional shared(y) At the end a newline

Directives list PARALLEL DO/for SECTIONS SINGLE PARALLEL DO/for Multiple threads will execute on the code DO/for Causes the do or for loop to be executed in parallel by the worker threads SECTIONS Each section will be executed by multiple threads SINGLE Only to be executed by one thread PARALLEL DO/for Contains only one DO/for loop in the block PARALLEL SECTIONS Contains only one section in the block

Work Sharing

Work Sharing

Work Sharing

Data scope attribute clauses PRIVATE Variables declared in this block are independent for each thread SHARED Variables declared in this block are shared for each thread DEFAULT Allows a scope for all variables in the block FIRSTPRIVATE PRIVATE that has initialization of the variables LASTPRIVATE PRIVATE that copies the value from the last loop through the block is copied to the original object COPYIN Assign the same value to a variable independent for each thread REDUCTION Applies the variable to all the private copies of a shared variable

Directives and clauses

Synchronization MASTER CRITICAL BARRIER ATOMIC FLUSH ORDERED Only the master thread can execute this block CRITICAL Only one thread can execute this block at a time BARRIER Causes all of the threads to wait at this point until all of the threads reaches this point ATOMIC The memory location will be written one thread at a time FLUSH The view of memory must be consistent ORDERED The loop will be executed as if it was serially executed

Environment Variables OMP_SCHEDULE Number of runs through a loop OMP_NUM_THREADS Number of threads OMP_DYNAMIC If dynamic number of thread is allowed OMP_NESTED If nested parallelism is allowed

Library Routines OMP_SET_NUM_THREADS OMP_GET_NUM_THREADS OMP_GET_MAX_THREADS OMP_GET_THREAD_NUM OMP_GET_NUM_PROCS OMP_IN_PARALLEL OMP_SET_DYNAMIC OMP_GET_DYNAMIC OMP_SET_NESTED OMP_GET_NESTED OMP_INIT_LOCK OMP_DESTROY_LOCK OMP_SET_LOCK OMP_UNSET_LOCK OMP_TEST_LOCK

Example http://beowulf.lcs.mit.edu/18.337/beowulf.html #include <math.h> #include <stdio.h> #define N 16384 #define M 10 double dotproduct(int, double *); double dotproduct(int i, double *x) { double temp=0.0, denom; int j; for (j=0; j<N; j++) // zero based!! denom = (i+j)*(i+j+1)/2 + i+1; temp = temp + x[j]*(1/denom); } return temp; } int main() { double *x = new double[N]; double *y = new double[N]; double eig = sqrt(N); double denom,temp; int i,j,k; for (i=0; i<N; i++) {x[i] = 1/eig; } for (k=0;k<M;k++) { y[i]=0; // compute y = Ax #pragma omp parallel for shared(y) for (i=0; i<N; i++) { y[i] = dotproduct(i,x); } // find largest eigenvalue of y eig = 0; for (i=0; i<N; i++) { eig = eig + y[i]*y[i]; } eig = sqrt(eig); printf("The largest eigenvalue after %2d iteration is %16.15e\n",k+1, eig); // normalize for (i=0; i<N; i++) { x[i] = y[i]/eig; } }

OpenMP References Book: Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost, Ruud van der Pas, and David Kuck https://computing.llnl.gov/tutorials/openMP http://openmp.org/wp/resources/#Tutorials http://beowulf.lcs.mit.edu/18.337/beowulf.html http://www.compunity.org/resources/compilers/index.php

MPI References Book: Parallel Programming with MPI, Peter Pacheco https://computing.llnl.gov/tutorials/mpi http://www-unix.mcs.anl.gov/mpi www.openmpi.org http://alliance.osc.edu/impi/ http://rocs.acomp.usf.edu/tut/mpi.php http://www.lam-mpi.org/tutorials/nd