MPI and OpenMP By: Jesus Caban and Matt McKnight.

Slides:

Advertisements

Similar presentations

NewsFlash!! Earth Simulator no longer #1. In slightly less earthshaking news… Homework #1 due date postponed to 10/11.

Advertisements

MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.

Reference: / MPI Program Structure.

High Performance Computing

Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.

Computer Architecture II 1 Computer architecture II Programming: POSIX Threads OpenMP.

Message-Passing Programming and MPI CS 524 – High-Performance Computing.

Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.

Parallel Programming in C with MPI and OpenMP

1 ITCS4145/5145, Parallel Programming B. Wilkinson Feb 21, 2012 Programming with Shared Memory Introduction to OpenMP.

1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.

Programming with Shared Memory Introduction to OpenMP

ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.

Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

IBM Research © 2006 IBM Corporation CDT Static Analysis Features CDT Developer Summit - Ottawa Beth September.

Director of Contra Costa College High Performance Computing Center

1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.

2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.

OpenMP – Introduction* *UHEM yaz çalıştayı notlarından derlenmiştir. (uhem.itu.edu.tr)

An Introduction to Parallel Programming and MPICH Nikolaos Hatzopoulos.

Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.

Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.

Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.

Hybrid MPI and OpenMP Parallel Programming

CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.

Message Passing Programming Model AMANO, Hideharu Textbook pp. １４０－１４７.

Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.

OpenMP Presented by Kyle Eli. OpenMP Open –Open, Collaborative Specification –Managed by the OpenMP Architecture Review Board (ARB) MP –Multi Processing.

Parallel Programming with MPI By, Santosh K Jena..

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©

1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.

Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.

CS4230 CS4230 Parallel Programming Lecture 13: Introduction to Message Passing Mary Hall October 23, /23/2012.

Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.

Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.

Introduction to MPI Nischint Rajmohan 5 November 2007.

MPI and OpenMP.

Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.

Threaded Programming Lecture 2: Introduction to OpenMP.

2.1 Collective Communication Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations: MPI_BCAST()

3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

Message Passing Interface Using resources from

MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.

COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming - Exercises Dr. Xiao Qin Auburn University

1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.

Introduction to OpenMP

Introduction to parallel computing concepts and technics

Introduction to MPI.

Computer Engg, IIT(BHU)

MPI Message Passing Interface

Introduction to Message Passing Interface (MPI)

Message Passing Models

Lecture 14: Inter-process Communication

Programming with Shared Memory Introduction to OpenMP

KAUST Winter Enhancement Program 2010 (WE 244)

Lab Course CFD Parallelisation Dr. Miriam Mehl.

Introduction to parallelism and the Message Passing Interface

Hybrid Parallel Programming

Introduction to OpenMP

Introduction to Parallel Computing with MPI

Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes

Distributed Memory Programming with Message-Passing

Parallel Processing - MPI

MPI Message Passing Interface

Some codes for analysis and preparation for programming

Presentation transcript:

MPI and OpenMP By: Jesus Caban and Matt McKnight

What is MPI? MPI: Message Passing Interface –Is not a new programming language, is a library with functions that can be called from C/Fortran/Python –Successor to PVM (Parallel Virtual Machine ) –Developed by an open, international forum with representation from industry, academia and government laboratories.

What it’s for? Allows data to be passed between processes in a distributed memory environment Provides source-code portability Allows efficient implementation A great deal of functionality Support for heterogeneous parallel architectures

MPI Communicator Idea: –Group of processors that are allowed to communicate to each other Most often use communicators –MPI_COMM_WORLD Note MPI Format : MPI_XXX var = MPI_Xxx(parameters); MPI_Xxx(parameters);

Getting Started Include MPI header file Initialize MPI environment Work: Make message passing calls Send Receive Terminate MPI environment

Include File Include Initialize Work Terminate Include MPI header file #include int main(int argc, char** argv){ … }

Initialize MPI Include Initialize Work Terminate Initialize MPI environment int main(int argc, char** argv){ int numtasks, rank; MPI_Init (*argc,*argv) ; MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank);... }

Initialize MPI (cont.) Include Initialize Work Terminate MPI_Init (&argc,&argv) Not MPI functions called before this call. MPI_Comm_size(MPI_COMM_WORLD, &nump) A communicator is a collection of processes that can send messages to each other. MPI_COMM_WORLD is a predefined communicator that consists of all the processes running when the program execution begins. MPI_Comm_rank(MPI_COMM_WORLD, &myrank) In order for a process to find out its rank.

Terminate MPI environment Include Initialize Work Terminate Terminate MPI environment #include int main(int argc, char** argv){ … MPI_Finalize(); } No MPI functions called after this call.

Let’s work with MPI Include Initialize Work Terminate Work: Make message passing calls (Send, Receive) if(my_rank != 0){ MPI_Send(data, strlen(data)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else{ MPI_Recv(data, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); }

Work (cont.) Include Initialize Work Terminate int MPI_Send ( void* message, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) int MPI_Recv ( void* message, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm MPI_Status *status)

Hello World!! #include "mpi.h" int main(int argc, char* argv[]) { int my_rank, p, source, dest, tag = 0; char message[100]; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); MPI_Comm_size(MPI_COMM_WORLD, &p); if (my_rank != 0) { /* Create message */ sprintf(message, “Hello from process %d!", my_rank); dest = 0; MPI_Send(message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); }else { for(source = 1; source < p; source++) { MPI_Recv(message, 100, MPI_CHAR, source, tag, MPI_COMM_WORLD, &status); printf("%s", message); }} MPI_Finalize(); }

Compile and Run MPI Compile –gcc –c hello.exe mpi_hello.c –lmpi –mpicc mpi_hello.c Run –mpirun –np 5 hello.exe Output $mpirun –np 5 hello.exe Hello from process 1! Hello from process 2! Hello from process 3! Hello from process 4!

More MPI Functions MPI_Bcast( void *m, int s, MPI_Datatype dt, int root, MPI_Comm) –Sends a copy of the data in m on the process with rank root to each process in the communicator. MPI_Reduce( void *operand, void* result, int count, MPI_Datatype datatye, MPI_Op operator, int root, MPI_Comm comm) –Combines the operands stored in the memory referenced by operand using operation operator and stores the result in res on process root. double MPI_Wtime( void) –Returns a double precision value that represents the number of seconds that have elapsed since some point in the past. MPI_Barrier ( MPI_Comm comm) –Each process in comm block until every process in comm has called it.

More Examples Trapezoidal Rule: –Integral from a to b of a nonnegative function f(x) –Approach: Estimating the area by partitioning the region into regular geometric shapes and then add the areas of the shapes Compute Pi

Compute PI #include #include "mpi.h" #define PI #define PI_STR " " #define MAXLEN 40 #define f(x) (4./(1.+ (x)*(x))) void main(int argc, char *argv[]){ int N=0,rank,nprocrs,i,answer=1; double mypi,pi,h,sum, x, starttime,endtime,runtime,runtime_max; char buff[MAXLEN]; MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); printf(“CPU %d saying hello",rank); MPI_Comm_size(MPI_COMM_WORLD, &nprocrs); if(rank==0) printf("Using a total of %d CPUs",nprocrs);

Compute PI while(answer){ if(rank==0){ printf("This program computes pi as “ "4.*Integral{0->1}[1/(1+x^2)]"); printf("(Using PI = %s)",PI_STR); printf("Input the Number of intervals: N ="); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&N); printf("pi will be computed with %d intervals on %d processors.", N,nprocrs); } /*Procr 0 = P(0) gives N to all other processors*/ MPI_Bcast(&N,1,MPI_INT,0,MPI_COMM_WORLD); if(N<=0) goto end_program;

Compute PI starttime=MPI_Wtime(); sum=0.0; h=1./N; for(i=1+rank;i<=N;i+=nprocrs){ x=h*(i-0.5); sum+=f(x); } mypi=sum*h; endtime=MPI_Wtime(); runtime=endtime-starttime; MPI_Reduce(&mypi,&pi,1,MPI_DOUBLE,MPI_SUM,0,MPI_COMM_WORLD); MPI_Reduce(&runtime,&runtime_max,1,MPI_DOUBLE,MPI_MAX,0, MPI_COMM_WORLD); printf("Procr %d: runtime = %f",rank,runtime); fflush(stdout); if(rank==0){ printf("For %d intervals, pi = %.14lf,error=%g",N,pi,fabs(pi-PI));

Compute PI printf("computed in = %f secs",runtime_max); fflush(stdout); printf("Do you wish to try another run? (y=1;n=0)"); fgets(buff,MAXLEN,stdin); sscanf(buff,"%d",&answer); } /*processors wait while P(0) gets new input from user*/ MPI_Barrier(MPI_COMM_WORLD); MPI_Bcast(&answer,1,MPI_INT,0,MPI_COMM_WORLD); if(!answer) break; } end_program: printf("\nProcr %d: Saying good-bye!\n",rank); if(rank==0) printf("\nEND PROGRAM\n"); MPI_Finalize(); }

Compile and Run Example 2 Compile –gcc –c pi.exe pi.c –lmpi $mpirun –np 2 pi.exe Procr 1 saying hello. Procr 0 saying hello Using a total of 2 CPUs This program computes pi as 4.*Integral{0->1}[1/(1+x^2)] (Using PI = ) Input the Number of intervals: N = 10 pi will be computed with 10 intervals on 2 processors Procr 0: runtime = Procr 1: runtime = For 10 intervals, pi = , error = computed in = secs

What is Similar to MPI, but used for shared memory parallelism Simple set of directives Incremental parallelism Unfortunately only works with proprietary compilers… ?

Compilers and Platforms Compilers and Platforms Fujitsu/Lahey Fortran, C and C++ –Intel Linux SystemsIntel Linux Systems –Sun Solaris SystemsSun Solaris Systems HP HP-UX PA-RISC/Itanium –FortranFortran –CC –aC++aC++ HP Tru64 Unix –FortranFortran –CC –C++C++ IBM XL Fortran and C from IBMXL FortranC –IBM AIX Systems Intel C++ and Fortran Compilers from IntelIntel C++ and Fortran Compilers –Intel IA32 Linux Systems –Intel IA32 Windows Systems –Intel Itanium-based Linux Systems –Intel Itanium-based Windows Systems Guide Fortran and C/C++ from Intel's KAI Softare LabGuide –Intel Linux Systems –Intel Windows Systems PGF77 and PGF90 Compilers from The Portland Group, Inc. (PGI)PGF77 and PGF90 Compilers –Intel Linux Systems –Intel Solaris Systems –Intel Windows/NT Systems SGI MIPSpro 7.4 Compilers –SGI IRIX Systems Sun Microsystems Sun ONE Studio 8, Compiler Collection, Fortran 95, C, and C++Sun ONE Studio 8, Compiler Collection, Fortran 95, C, and C++ –Sun Solaris Platforms –Compiler Collection PortalCompiler Collection Portal VAST from Veridian Pacific-Sierra ResearchVeridian Pacific-Sierra Research –IBM AIX Systems –Intel IA32 Linux Systems –Intel Windows/NT Systems –SGI IRIX Systems –Sun Solaris Systems taken from

How do you use OpenMP? –C/C++ API Parallel Construct – when a ‘region’ of the program can be executed in multiple parallel threads, this fundamental construct starts the execution. #pragma omp parallel [clause[ [, ]clase] …] new-line structured-block The clause is one of the following: if (scalar–expression) private (variable-list) firstprivate (variable-list) default (shared | none) shared (variable-list) copyin (variable-list) reduction (operator : variable-list) num_threads (integer-expression)

for Construct –Defines an iterative work-sharing construct in which the iterations of the associated loop will execute in parallel. Sections Construct –Identifies a noniterative work-sharing construct that specifies a set of constructs that are to be divided among threads, each section being executed only once by each thread Fundamental Constructs

single Construct –associates a structured block’s execution with only one thread parallel for Construct –Shortcut for a parallel region containing only one for directive parallel sections Construct –Shortcut for a parallel region containing only a single sections directive

Master and Synchronization Directives master Construct –Specifies a structured block that is executed by the master thread of the team critical Construct –Restricts execution of the associated structured block to a single thread at a time barrier Directive –Synchronize all threads in a team. When this construct is encountered, all threads wait until the others have reached this point.

atomic Construct –Ensures that a specific memory location is updated ‘atomically’ (meaning only one thread is allowed write-access at a time) flush Directive –Specifies a “cross-thread” sequence point at which all threads in a team are ensured a “clean” view of certain objects in memory ordered Construct –A structured block following this directive will iterate in the same order as if executed in a sequential loop.

Data How do we control the data in this SMP environment? –threadprivate Directive makes files-scope and namespace-scope private to a thread Data-Sharing Attributes –private - private to each thread –firstprivate –lastprivate –shared – shared among all threads –default – User affects attributes –reduction – perform reduction on scalars –copyin – assign the same value to threadprivate variables –copyprivate – broadcast the value of a private variable from one member of a team to the others

Scalability test on SGI Origin 2000 Timing results of the dot product test in milliseconds for n = 16 *

Timing results of matrix times matrix test in milliseconds for n = 128

Architecture comparison From

References Book: Parallel Programming with MPI, Peter Pacheco www-unix.mcs.anl.gov/mpi