KAUST Winter Enhancement Program 2010 (WE 244)

KAUST Winter Enhancement Program 2010 (WE 244)
MPI and OpenMP Craig C. Douglas School of Energy Resources Department of Mathematics University of Wyoming

What is MPI? MPI: Message Passing Interface
MPI is not a new programming language, but a library with functions that can be called from C/C++/Fortran/Python Successor to PVM (Parallel Virtual Machine ) Developed by an open, international forum with representation from industry, academia, and government laboratories.

What Is It Good For? Allows data to be passed between processes in a distributed memory environment Provides source-code portability Allows efficient implementation A great deal of functionality Support for heterogeneous parallel architectures

MPI Communicator Idea: Most functions use communicators
Group of processors that are allowed to communicate to each other Most functions use communicators MPI_COMM_WORLD Note MPI Format: MPI_XXX var = MPI_Xxx(parameters); MPI_Xxx(parameters);

Getting Started Include MPI header file Initialize MPI environment
Work: Make message passing calls Send Receive Terminate MPI environment

Include MPI header file
Include File Include Include MPI header file #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char** argv){ … } Initialize Work Terminate

Initialize MPI environment
Include Initialize MPI environment int main(int argc, char** argv){ int numtasks, rank; MPI_Init (*argc,*argv) ; MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); ... } Initialize Work Terminate

Initialize MPI (cont.) MPI_Init (&argc,&argv) Include
Not MPI functions called before this call. MPI_Comm_size(MPI_COMM_WORLD, &nump) A communicator is a collection of processes that can send messages to each other. MPI_COMM_WORLD is a predefined communicator that consists of all the processes running when the program execution begins. MPI_Comm_rank(MPI_COMM_WORLD, &myrank) In order for a process to find out its rank (its identification number). Include Initialize Work Terminate

Terminate MPI environment
Include #include <stdio.h> #include <stdlib.h> #include <mpi.h> int main(int argc, char** argv){ … MPI_Finalize(); } Initialize Work No MPI functions called after this call. Terminate

Make message passing calls (Send, Receive)
Let’s work with MPI Work: Make message passing calls (Send, Receive) Include if(my_rank != 0){ MPI_Send(data, strlen(data)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else{ MPI_Recv(data, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); Initialize Work Terminate

Work (cont.) int MPI_Send ( void* message, Include int count,
MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) Include Initialize Work int MPI_Recv ( void* message, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm MPI_Status *status) Terminate

Hello World!! #include "mpi.h" int main(int argc, char* argv[]) {
int my_rank, p, source, dest, tag = 0; char message[100]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); if (my_rank != 0) { /* Create message */ sprintf(message, “Hello from process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); }else { for(source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s", message); }} MPI_Finalize(); }

Compile and Run MPI Compile mpicc mpi_hello.c Run
mpirun –np 5 hello.exe Output $mpirun –np 5 hello.exe Hello from process 1! Hello from process 2! Hello from process 3! Hello from process 4!

More MPI Functions MPI_Bcast( void *m, int s, MPI_Datatype dt, int root, MPI_Comm) Sends a copy of the data in m on the process with rank root to each process in the communicator. MPI_Reduce( void *operand, void* result, int count, MPI_Datatype datatye, MPI_Op operator, int root, MPI_Comm comm) Combines the operands stored in the memory referenced by operand using operation operator and stores the result in res on process root. double MPI_Wtime( void) Returns a double precision value that represents the number of seconds that have elapsed since some point in the past. MPI_Barrier ( MPI_Comm comm) Each process in comm block until every process in comm has called it.

More Examples Trapezoidal Rule: Compute Pi
Integral from a to b of a nonnegative function f(x) Approach: Estimating the area by partitioning the region into regular geometric shapes and then add the areas of the shapes Compute Pi

Compute PI #include <stdio.h> #include "mpi.h"
#define PI #define PI_STR " " #define MAXLEN 40 #define f(x) (4./(1.+ (x)*(x))) void main(int argc, char *argv[]){ int N=0,rank,nprocrs,i,answer=1; double mypi,pi,h,sum, x, starttime,endtime,runtime,runtime_max; char buff[MAXLEN]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf(“CPU %d saying hello",rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocrs); if(rank==0) printf("Using a total of %d CPUs",nprocrs);

Compute PI while(answer){ if(rank==0){
printf("This program computes pi as “ "4.*Integral{0->1}[1/(1+x^2)]"); printf("(Using PI = %s)",PI_STR); printf("Input the Number of intervals: N ="); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&N); printf("pi will be computed with %d intervals on %d processors.", N ,nprocrs); } /*Procr 0 = P(0) gives N to all other processors*/ MPI_Bcast(&N,1,MPI_INT,0,MPI_COMM_WORLD); if(N<=0) goto end_program;

Compute PI starttime=MPI_Wtime(); sum=0.0; h=1./N;
for(i=1+rank;i<=N;i+=nprocrs){ x=h*(i-0.5); sum+=f(x); } mypi=sum*h; endtime=MPI_Wtime(); runtime=endtime-starttime; MPI_Reduce(&mypi,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); MPI_Reduce(&runtime,&runtime_max,1,MPI_DOUBLE,MPI_MAX,0, MPI_COMM_WORLD); printf("Procr %d: runtime = %f",rank,runtime); fflush(stdout); if(rank==0){ printf("For %d intervals, pi = %.14lf,error=%g",N,pi,fabs(pi-PI));

Compute PI printf("computed in = %f secs",runtime_max);
fflush(stdout); printf("Do you wish to try another run? (y=1;n=0)"); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&answer); } /*processors wait while P(0) gets new input from user*/ MPI_Barrier(MPI_COMM_WORLD); MPI_Bcast(&answer,1,MPI_INT,0,MPI_COMM_WORLD); if(!answer) break; end_program: printf("\nProcr %d: Saying good-bye!\n",rank); if(rank==0) printf("\nEND PROGRAM\n"); MPI_Finalize(); }

Compile and Run Example 2
mpicc –o pi.exe pi.c $mpirun –np 2 pi.exe Procr 1 saying hello. Procr 0 saying hello Using a total of 2 CPUs This program computes pi as 4.*Integral{0->1}[1/(1+x^2)] (Using PI = ) Input the Number of intervals: N = 10 pi will be computed with 10 intervals on 2 processors Procr 0: runtime = Procr 1: runtime = For 10 intervals, pi = , error = computed in = secs

OpenMP What does OpenMP stand for?
Open specifications for Multi Processing It is an API with three main components Compiler directives Library routines Variables Used for writing multithreaded programs in shared memory environments

What do you need? What programming languages? What operating systems?
C and C++ FORTRAN (77, 90, 95) What operating systems? UNIX based ones Windows Can I compile OpenMP code with gcc? Yes: gcc -o pgm.exe -fopenmp pgm.c

Some compilers for OpenMP
Free Software Foundation (GNU) Intel Portland Group Compilers and Tools IBM XL SGI MIPSpro Sun Studio 10 Absoft Pro FortranMP

What It Does Program starts off with a master thread
It runs for some amount of time When the master thread reaches a region where the work can be done concurrently It creates several threads They all do work in this region When the end of the region is reached All of the extra threads terminate The master thread continues

Example You (master thread) get a job moving boxes
When you go to work you bring several “friends” (sub-threads) Who help you move the boxes On pay day You do not bring any friends and you get all of the money

OpenMP directives #pragma omp parallel for shared(y) Format example
Always starts with #pragma omp Then the directive name parallel for Followed by a clause The clause is optional shared(y) At the end a newline

Directives list PARALLEL DO/for SECTIONS SINGLE PARALLEL DO/for
Multiple threads will execute on the code DO/for Causes the do or for loop to be executed in parallel by the worker threads SECTIONS Each section will be executed by multiple threads SINGLE Only to be executed by one thread PARALLEL DO/for Contains only one DO/for loop in the block PARALLEL SECTIONS Contains only one section in the block

Work Sharing

Data scope attribute clauses
PRIVATE Variables declared in this block are independent for each thread SHARED Variables declared in this block are shared for each thread DEFAULT Allows a scope for all variables in the block FIRSTPRIVATE PRIVATE that has initialization of the variables LASTPRIVATE PRIVATE that copies the value from the last loop through the block is copied to the original object COPYIN Assign the same value to a variable independent for each thread REDUCTION Applies the variable to all the private copies of a shared variable

Directives and clauses

Synchronization MASTER CRITICAL BARRIER ATOMIC FLUSH ORDERED
Only the master thread can execute this block CRITICAL Only one thread can execute this block at a time BARRIER Causes all of the threads to wait at this point until all of the threads reaches this point ATOMIC The memory location will be written one thread at a time FLUSH The view of memory must be consistent ORDERED The loop will be executed as if it was serially executed

Environment Variables
OMP_SCHEDULE Number of runs through a loop OMP_NUM_THREADS Number of threads OMP_DYNAMIC If dynamic number of thread is allowed OMP_NESTED If nested parallelism is allowed

Library Routines OMP_SET_NUM_THREADS OMP_GET_NUM_THREADS
OMP_GET_MAX_THREADS OMP_GET_THREAD_NUM OMP_GET_NUM_PROCS OMP_IN_PARALLEL OMP_SET_DYNAMIC OMP_GET_DYNAMIC OMP_SET_NESTED OMP_GET_NESTED OMP_INIT_LOCK OMP_DESTROY_LOCK OMP_SET_LOCK OMP_UNSET_LOCK OMP_TEST_LOCK

Example http://beowulf.lcs.mit.edu/18.337/beowulf.html
#include <math.h> #include <stdio.h> #define N 16384 #define M 10 double dotproduct(int, double *); double dotproduct(int i, double *x) { double temp=0.0, denom; int j; for (j=0; j<N; j++) // zero based!! denom = (i+j)*(i+j+1)/2 + i+1; temp = temp + x[j]*(1/denom); } return temp; } int main() { double *x = new double[N]; double *y = new double[N]; double eig = sqrt(N); double denom,temp; int i,j,k; for (i=0; i<N; i++) {x[i] = 1/eig; } for (k=0;k<M;k++) { y[i]=0; // compute y = Ax #pragma omp parallel for shared(y) for (i=0; i<N; i++) { y[i] = dotproduct(i,x); } // find largest eigenvalue of y eig = 0; for (i=0; i<N; i++) { eig = eig + y[i]*y[i]; } eig = sqrt(eig); printf("The largest eigenvalue after %2d iteration is %16.15e\n",k+1, eig); // normalize for (i=0; i<N; i++) { x[i] = y[i]/eig; } }

OpenMP References Book: Using OpenMP: Portable Shared Memory Parallel Programming, Barbara Chapman, Gabriele Jost, Ruud van der Pas, and David Kuck

MPI References Book: Parallel Programming with MPI, Peter Pacheco

KAUST Winter Enhancement Program 2010 (WE 244)

Similar presentations

Presentation on theme: "KAUST Winter Enhancement Program 2010 (WE 244)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

KAUST Winter Enhancement Program 2010 (WE 244)

Similar presentations

Presentation on theme: "KAUST Winter Enhancement Program 2010 (WE 244)"— Presentation transcript:

Similar presentations

About project

Feedback