Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Slides:



Advertisements
Similar presentations
Parallel Processing1 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Advertisements

Its.unc.edu 1 Collective Communication University of North Carolina - Chapel Hill ITS - Research Computing Instructor: Mark Reed
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
12c.1 Collective Communication in MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
SOME BASIC MPI ROUTINES With formal datatypes specified.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Parallel Programming in C with MPI and OpenMP
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
Collective Communication.  Collective communication is defined as communication that involves a group of processes  More restrictive than point to point.
Message Passing Interface. Message Passing Interface (MPI) Message Passing Interface (MPI) is a specification designed for parallel applications. The.
Sahalu JunaiduICS 573: High Performance Computing6.1 Programming Using the Message Passing Paradigm Principles of Message-Passing Programming The Building.
Message Passing Programming Carl Tropper Department of Computer Science.
Parallel Programming with Java
CS 179: GPU Programming Lecture 20: Cross-system communication.
Parallel Programming with MPI Matthew Pratola
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
Paul Gray, University of Northern Iowa Henry Neeman, University of Oklahoma Charlie Peck, Earlham College Tuesday October University of Oklahoma.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
IBM Research © 2006 IBM Corporation CDT Static Analysis Features CDT Developer Summit - Ottawa Beth September.
Director of Contra Costa College High Performance Computing Center
Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard.
1 Collective Communications. 2 Overview  All processes in a group participate in communication, by calling the same function with matching arguments.
Message Passing Interface Dr. Bo Yuan
PP Lab MPI programming VI. Program 1 Break up a long vector into subvectors of equal length. Distribute subvectors to processes. Let them compute the.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Parallel Programming with MPI By, Santosh K Jena..
Lecture 6: Message Passing Interface (MPI). Parallel Programming Models Message Passing Model Used on Distributed memory MIMD architectures Multiple processes.
Oct. 23, 2002Parallel Processing1 Parallel Processing (CS 730) Lecture 6: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived.
MPI Jakub Yaghob. Literature and references Books Gropp W., Lusk E., Skjellum A.: Using MPI: Portable Parallel Programming with the Message-Passing Interface,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Message-passing Model.
-1.1- MPI Lectured by: Nguyễn Đức Thái Prepared by: Thoại Nam.
Lecture 5 CSS314 Parallel Computing Book: “An Introduction to Parallel Programming” by Peter Pacheco
Message Passing Programming Based on MPI Collective Communication I Bora AKAYDIN
Message Passing Interface Using resources from
COMP7330/7336 Advanced Parallel and Distributed Computing MPI Programming: 1. Collective Operations 2. Overlapping Communication with Computation Dr. Xiao.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
1 MPI: Message Passing Interface Prabhaker Mateti Wright State University.
Distributed Processing with MPI International Summer School 2015 Tomsk Polytechnic University Assistant Professor Dr. Sergey Axyonov.
Computer Science Department
Introduction to MPI Programming Ganesh C.N.
MPI Basics.
MPI Jakub Yaghob.
CS4402 – Parallel Computing
MPI Message Passing Interface
CS 668: Lecture 3 An Introduction to MPI
Computer Science Department
Send and Receive.
Collective Communication with MPI
An Introduction to Parallel Programming with MPI
Send and Receive.
ITCS 4/5145 Parallel Computing, UNC-Charlotte, B
CS 5334/4390 Spring 2017 Rogelio Long
Lecture 14: Inter-process Communication
High Performance Parallel Programming
MPI: Message Passing Interface
CSCE569 Parallel Computing
Introduction to Parallel Computing with MPI
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
September 4, 1997 Parallel Processing (CS 667) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
Computer Science Department
September 4, 1997 Parallel Processing (CS 730) Lecture 9: Advanced Point to Point Communication Jeremy R. Johnson *Parts of this lecture was derived.
MPI Message Passing Interface
Some codes for analysis and preparation for programming
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters 3-5,11 in Pacheco

Parallel Processing2 Introduction Objective: To introduce distributed memory parallel programming using message passing. Introduction to the MPI standard for message passing. Topics –Introduction to MPI hello.c hello.f –Example Problem (numeric integration) –Collective Communication –Performance Model

Parallel Processing3 MPI Message Passing Interface Distributed Memory Model –Single Program Multiple Data (SPMD) –Communication using message passing Send/Recv –Collective Communication Broadcast Reduce (AllReduce) Gather (AllGather) Scatter (AllScatter) Alltoall

Parallel Processing4 Benefits/Disadvantges No new language is requried Portable Good performance Explicitly forces programmer to deal with local/global access Harder to program that shared memory – requires larger program/algorithm changes

Parallel Processing5 Further Information en.wikipedia.org/wiki/Message_Passing_Interface Textbook –Peter S. Pacheco, Parallel Programming with MPI, Morgan Kaufman, 1997.

Parallel Processing6 Basic MPI Functions int MPI_Init( int* argc /* in/out */, char** argv /* in/out */) int MPI_Finalize(void) Int MPI_Comm_size( MPI_Comm communicator /* in */, int* number_of_processors /* out */) Int MPI_Comm_rank( MPI_Comm communicator /* in */, int* my_rank /* out */)

Parallel Processing7 Send Must package message in envelope containing destination, size, and an identifying tag, set of processors participating in the communication. int MPI_Send( void* message /* in */ int count /* in */ MPI_Datatype datatype /* in */ int dest /* in */ int tag /* in */ MPI_Comm communicator /* in */)

Parallel Processing8 Receive int MPI_Recv( void* message /* out */ int count /* in */ MPI_Datatype datatype /* in */ int source /* in */ int tag /* in */ MPI_Comm communicator /* in */ MPI_Status* status /* out */)

Parallel Processing9 Status Status-> MPI_SOURCE Status-> MPI_TAG Status-> MPI_ERROR Int MPI_Get_count( MPI_Status* status /* in */, MPI_Datatype datatype /* in */, int* count_ptr /* out */)

Parallel Processing10 hello.c #include #include "mpi.h" main(int argc, char * argv[]) { int my_rank; /* rank of process */ int p; /* number of processes */ int source; /* rank of sender */ int dest; /* rank of receiver */ int tag = 0; /* tag for messages */ char message[100]; /* storage for message */ MPI_Status status; /* return status for receive */ /* Start up MPI */ MPI_Init(&argc, &argv); /* Find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank); /* Find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p);

Parallel Processing11 hello.c if (my_rank != 0) { /* create message */ sprintf(message, "Greetings from process %d!\n",my_rank); dest = 0; /* user strlen + 1 so tat '\0' gets transmitted */ MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1; source < p; source++) { MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); printf("%s\n",message); } /* Shut down MPI */ MPI_Finalize(); }

Parallel Processing12 AnySource if (my_rank != 0) { /* create message */ sprintf(message, "Greetings from process %d!\n",my_rank); dest = 0; /* user strlen + 1 so tat '\0' gets transmitted */ MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1; source < p; source++) { MPI_Recv(message,100, MPI_CHAR, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD,&status); printf("%s\n",message); }

Ring Communication Oct. 30, 2002Parallel Processing

Parallel Processing14 First Attempt sprintf(message, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); source = (my_rank -1) % p; MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); printf("PE %d received: %s\n",my_rank,message);

Parallel Processing15 Deadlock sprintf(message, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; source = (my_rank -1) % p; MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); printf("PE %d received: %s\n",my_rank,message);

Oct. 30, 2002Parallel Processing16 Buffering Assumption Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur. SendRecv can be used to guarantee that deadlock does not occur.

Oct. 30, 2002Parallel Processing17 SendRecv int MPI_Sendrecv( void* send_buf /* in */, int send_count /* in */, MPI_Datatype send_type /* in */, int dest /* in */, int send_tag /* in */, void* recv_buf /* out */, int recv_count /* in */, MPI_Datatype recv_type /* in */, int source /* in */, int recv_tag /* in */, MPI_Comm communicator /* in */, MPI_Status* status /* out */)

Parallel Processing18 Correct Version with SendReceive sprintf(omessage, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; source = (my_rank -1) % p; MPI_Sendrecv(omessage,strlen(omessage)+1,MPI_CHAR,dest,tag, imessage,100,MPI_CHAR,source,tag,MPI_COMM_WORLD,&status ); printf("PE %d received: %s\n",my_rank,imessage);

Parallel Processing19 Lower Level Implementation sprintf(smessage, "Greetings from process %d!\n",my_rank); dest = (my_rank + 1) % p; source = (my_rank -1) % p; if (my_rank % 2 == 0) { MPI_Send(smessage, strlen(smessage)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); MPI_Recv(dmessage,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); } else { MPI_Recv(dmessage,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); MPI_Send(smessage, strlen(smessage)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); } printf("PE %d received: %s\n",my_rank,dmessage);

Parallel Processing20 Compiling and Executing MPI Programs with OpenMPI To compile a C program with MPI calls –mpicc hello.c -o hello To run an MPI program –mpirun –np PROCS hello –You can provide a hostfile with –hostfile NAME (see man page for details)

Parallel Processing21 dot.c #include float Serial_doc(float x[] /* in */, float y[] /* in */, int n /* in */) { int i; float sum = 0.0; for (i=0; i< n; i++) sum = sum + x[i]*y[i]; return sum; }

Parallel Processing22 Parallel Dot float Parallel_doc(float local_x[] /* in */, float local_y[] /* in */, int n_bar /* in */) { float local_dot; local_dot = Serial_dot(local_x, local_y,b_bar); MPI_Reduce(&local_dot, &dot, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD); return dot; }

Parallel Processing23 Parallel All Dot float Parallel_doc(float local_x[] /* in */, float local_y[] /* in */, int n_bar /* in */) { float local_dot; local_dot = Serial_dot(local_x, local_y,b_bar); MPI_Allreduce(&local_dot, &dot, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD); return dot; }

Parallel Processing24 Reduce int MPI_Reduce( void* operand /* in */ void* result /* out */ int count /* in */ MPI_Datatype datatype /* in */ MPI_Op operator /* in */ int root /* in */ MPI_Comm communicator /* in */) Operators –MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

x 0 + x 1 +x 2 +x 3 +x 4 + x 5 +x 6 + x 7 Parallel Processing25 Reduce x0x0 x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x 0 +x 4,x 1 +x 5,x 2 +x 6,x 3 +x 7 x 0 +x 4 +x 2 +x 6 x 1 +x 5 +x 3 + x 7

Parallel Processing26 AllReduce int MPI_Allreduce( void* operand /* in */ void* result /* out */ int count /* in */ MPI_Datatype datatype /* in */ MPI_Op operator /* in */ int root /* in */ MPI_Comm communicator /* in */) Operators –MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND, MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

Parallel Processing27 AllReduce

Parallel Processing28 Broadcast int MPI_Bcast( void* message /* in */ int count /* in */ MPI_Datatype datatype /* in */ int root /* in */ MPI_Comm communicator /* in */)

Parallel Processing29 Broadcast

Parallel Processing30 Gather int MPI_Gather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ int root /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1 x2 x3

Parallel Processing31 Scatter int MPI_Scatter( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ int root /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1x2 x3

Parallel Processing32 AllGather int MPI_AllGather( void* send_data /* in */ int send_count /* in */ MPI_Datatype send_type /* in */ void* recv_data /* out */ int recv_count /* in */ MPI_Datatype recv_type /* in */ MPI_Comm communicator /* in */) Process 0 Process 1 Process 2 Process 3 x0 x1 x2 x3

Parallel Processing33 Matrix-Vector Product (block cyclic storage) y = Ax, y j =  0  i<n A ij *x j 0  i < m –Store blocks of A, x, y in local memory –Gather local blocks of x in each process –Compute chunks of y in parallel Process 0 Process 1 Process 2 Process 3 Axy = 

Parallel Processing34 Parallel Matrix-Vector Product float Parallel_matrix_vector_product(LOCAL_MATRIX_T local_A, int m, int n, float local_x, float global_x, float local_y, int local_m, int local_n) { /* local_m = m/p0, local_n = n/p */ int I, j; MPI_Allgather(local_x, local_n, MPI_FLOAT, global_x, local_n, MPI_FLOAT, MPI_COMM_WORLD); for (i=0; i< local_m; i++) { local_y[i] = 0.0; for (j = 0; j< n; j++) local_y[i] = local_y[i] + local_A[I][j]*global_x[j]; }

Parallel Processing35 Embarrassingly Parallel Example Numerical integration (trapezoid rule)  t=0..1 f(t) dt  1/2*h[f(x 0 ) + 2f(x 1 )+…+ 2f(x n-1 )+f(x n )] abx1x1 x n-1 