Message-Passing Computing

Slides:



Advertisements
Similar presentations
MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Advertisements

CS 140: Models of parallel programming: Distributed memory and MPI.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
CS 240A: Models of parallel programming: Distributed memory and MPI.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
1 July 29, 2005 Distributed Computing 1:00 pm - 2:00 pm Introduction to MPI Barry Wilkinson Department of Computer Science UNC-Charlotte Consortium for.
12b.1 Introduction to Message-passing with MPI UNC-Wilmington, C. Ferner, 2008 Nov 4, 2008.
MPI (Message Passing Interface) Basics
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Basics of Message-passing Mechanics of message-passing –A means of creating separate processes on different computers –A way to send and receive messages.
Director of Contra Costa College High Performance Computing Center
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
9-2.1 “Grid-enabling” applications Part 2 Using Multiple Grid Computers to Solve a Single Problem MPI © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
CS 240A Models of parallel programming: Distributed memory and MPI.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Implementing Processes and Threads CS550 Operating Systems.
1 Parallel and Distributed Processing Lecture 5: Message-Passing Computing Chapter 2, Wilkinson & Allen, “Parallel Programming”, 2 nd Ed.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
1 Programming distributed memory systems Clusters Distributed computers ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 6, 2015.
Parallel Programming C. Ferner & B. Wilkinson, 2014 Introduction to Message Passing Interface (MPI) Introduction 9/4/
1 ITCS4145 Parallel Programming B. Wilkinson March 23, hybrid-abw.ppt Hybrid Parallel Programming Introduction.
Chapter 4.
Introduction to parallel computing concepts and technics
MPI Basics.
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing Message Passing Interface (cont.) Topologies.
Introduction to MPI.
MPI Message Passing Interface
CS 584.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
Introduction to Message Passing Interface (MPI)
MPI-Message Passing Interface
Message Passing Models
Lecture 14: Inter-process Communication
Pattern Programming Tools
Message-Passing Computing
Quiz Questions ITCS 4145/5145 Parallel Programming MPI
Introduction to parallelism and the Message Passing Interface
Hybrid Parallel Programming
Introduction to Parallel Computing with MPI
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Distributed Memory Programming with Message-Passing
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Parallel Processing - MPI
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Programming Parallel Computers
Presentation transcript:

Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, 2013. Oct 18, 2013.

Computer Cluster Complete computers connected together through an interconnection network, often Ethernet switch. The memory of each computer is not directly accessible from other computers (distributed memory system) Programing model – separate processes running on each system communicating through explicit messages to exchange data and synchronization.

Typical cluster Message passing between nodes Dedicated Cluster User Computers Dedicated Cluster Ethernet interface Message passing between nodes Master node Switch Switch Compute nodes Compute nodes

The patterns we have introduced are message passing patterns, for example workpool, stencil, etc. Workpool Pattern Each a separate process or thread Slaves Might combine Messages into broadcast/scatter/gather operations Messages to/from pairs of processes Master In these patterns, we do not provide for sharing of any data. Data has to be explicitly sent between processes.

Software Tools for Message Passing Parallel Programming Late 1980’s Parallel Virtual Machine (PVM) – developed. Became very popular. Mid 1990’s - Message-Passing Interface (MPI) - standard defined. Both provide a set of user-level libraries for message passing. Use with sequential programming languages (C, C++, ...).

MPI (Message Passing Interface) Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability. Defines routines, not implementation. Several free implementations exist, e.g. OpenMP, MPICH, LAM, …

Message passing concept using library routines

Implementation Message routing between computers might be done by daemon processes installed on computers that form the “virtual machine”. W or kstation daemon process Can ha v e more than one process ru nning on each computer . Application prog r am (e x ecutab le) Messages sent through W or kstation netw or k W or kstation Application prog r am (e x ecutab le) Application prog r am (e x ecutab le) .

Message-Passing Programming using User-level Message-Passing Libraries Two primary mechanisms needed: 1. A method of creating processes for execution on different computers 2. A method of sending and receiving messages

Creating processes on different computers

1. Multiple program, multiple data (MPMD) model Different programs executed by each processor Source Source fi le fi le Compile to suit processor Executable Processor 0 Processor p - 1

2. Single Program Multiple Data (SPMD) model Same program executed by each processor Control statements select different parts for each processor to execute. Source fi le Basic MPI way Compile to suit processor Ex ecutab les Processor 0 Processor p - 1

Static process creation All executables started together. Done when one starts the compiled programs. Normal MPI way. Possible to dynamically start processes from within an executing process (fork) in MPI-2, which might find applicability if do not initially how many processes needed.

MPI program structure Takes command line arguments See later about executing code int main(int argc, char **argv) { MPI_Init(&argc, &argv); // Code executed by all processes MPI_Finalize(); }

How is the number of processes determined? Number of processes determined at program execution time When you run your MPI program, you can specify how many processes you want on the command line: mpirun –np 8 <program> –np option tells mpirun to run your parallel program using the specified number of processes. Originally compile and execution commands not part of MPI standard. More recent command to execute the program, mpiexec, now regarded as part of the MPI standard.

In MPI, processes within a defined “communicating group” given a number called a rank starting from zero onwards. Program uses control constructs, typically IF statements, to direct processes to perform specific actions. Example if (rank == 0) ... /* do this */; if (rank == 1) ... /* do this */; .

Master-Slave approach Usually computation constructed as a master-slave model One process (the master), performs one set of actions and all the other processes (the slaves) perform identical actions although on different data, i.e. if (rank == 0) ... /* master do this */; else ... /* all slaves do this */;

Methods of sending and receiving messages

MPI point-to-point message passing using MPI_send() and MPI_recv() library calls To send a message, x, from a source process, 1, to a destination process, 2, and assign to y: MPI_send(&x, 2, … ); int x Process 1 with rank = 1 Process with rank = 2 int y MPI_recv(&y, 1, … ); Movement of data Destination rank Source rank Buffer holding data Buffer holding data Process 2 waits for a message from process 1

Semantics of MPI_Send() and MPI_Recv() Called blocking, which means in MPI that routine waits until all its local actions within process have taken place before returning. After returning, any local variables used can be altered without affecting message transfer but not before. MPI_Send() – When returns, message may not reached its destination but process can continue in the knowledge that message safely on its way. MPI_Recv() – Returns when message received and data collected. Will cause process to stall until message received. Other versions of MPI_Send() and MPI_Recv() have different semantics.

Message Tag Used to differentiate between different types of messages being sent. Message tag is carried within message. If special type matching is not required, a wild card message tag used. Then recv() will match with any send().

Message Tag Example To send a message, x, from a source process, 1, with message tag 5 to a destination process, 2, and assign to y: MPI_send(&x, 2, …, 5, … ); int x Process 1 with rank = 1 Process with rank = 2 int y MPI_recv(&y, 1, … ,5, … ); Movement of data Tag Source rank Tag Destination rank Buffer holding data Buffer holding data Process 2 waits for a message from process 1 with a tag of 5

Unsafe message passing - Example Process 0 Process 1 Destination send(…,1,…); lib() send(…,1,…); Source (a) Intended beha vior recv(…,0,…); lib() recv(…,0,…); Process 0 Process 1 send(…,1,…); lib() send(…,1,…); (b) P ossib le beha vior recv(…,0,…); lib() recv(…,0,…); Tags alone will not fix this as the same tag numbers might be used.

MPI Solution “Communicators” Defines a communication domain - a set of processes that are allowed to communicate between themselves. Communication domains of libraries can be separated from that of a user program. Used in all point-to-point and collective MPI message-passing communications. Process rank is a “rank” in a particular communicator. Note: Intracommunicator – for communicating within a single group of processes. Intercommunicator - for communicating within two or more groups of processes

Default Communicator MPI_COMM_WORLD Exists as first communicator for all processes existing in the application. Process rank in MPI_COMM_World obtained from: MPI_Comm_rank(MPI_COMM_WORLD,&myrank); A set of MPI routines exists for forming additional communicators although we will not use them.

Parameters of blocking send MPI_Send(buf, count, datatype, dest, tag, comm) Datatype of each item Message tag Address of send buffer Number of items to send Communicator Rank of destination process Notice a pointer

Parameters of blocking receive MPI_Recv(buf, count, datatype, src, tag, comm, status) Datatype of each item Message tag Address of receive buffer Maximum number of items to receive Communicator Rank of source process Status after operation In our code we do not check this but might be good programming practice to do so. Usually send and recv counts are the same.

MPI Datatypes (defined in mpi.h) MPI_BYTE MPI_PACKED MPI_CHAR MPI_SHORT MPI_INT MPI_LONG MPI_FLOAT MPI_DOUBLE MPI_LONG_DOUBLE MPI_UNSIGNED_CHAR Slide from C. Ferner, UNC-W

Wide cards -- any source or tag In MPI_Recv(), source can be MPI_ANY_SOURCE and tag can be MPI_ANY_TAG Cause MPI_Recv() to take any message destined for current process regardless of source and/or tag. Example MPI_Recv(message,256,MPI_CHAR, MPI_ANY_SOURCE, MPI_ANY_TAG,MPI_COMM_WORLD, &status);

Program Examples To send an integer x from process 0 to process 1 and assign to y. int x, y; //all processes have their own copies of x and y MPI_Comm_rank(MPI_COMM_WORLD,&myrank); // find rank if (myrank == 0) { MPI_Send(&x,1,MPI_INT,1,msgtag, MPI_COMM_WORLD); } else if (myrank == 1) { MPI_Recv(&y,1,MPI_INT,0,msgtag,MPI_COMM_WORLD,status); }

Another version What is the difference? To send an integer x from process 0 to process 1 and assign to y. MPI_Comm_rank(MPI_COMM_WORLD,&myrank); // find rank if (myrank == 0) { int x; MPI_Send(&x,1,MPI_INT,1,msgtag, MPI_COMM_WORLD); } else if (myrank == 1) { int y; MPI_Recv(&y,1,MPI_INT,0,msgtag,MPI_COMM_WORLD,status); } What is the difference?

Sample MPI Hello World program #include <stddef.h> #include <stdlib.h> #include "mpi.h" main(int argc, char **argv ) { char message[20]; int i,rank, size, type=99; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Comm_rank(MPI_COMM_WORLD,&rank); if(rank == 0) { strcpy(message, "Hello, world"); for (i=1; i<size; i++) MPI_Send(message,13,MPI_CHAR,i,type,MPI_COMM_WORLD); } else MPI_Recv(message,20,MPI_CHAR,0,type,MPI_COMM_WORLD,&status); printf( "Message from process =%d : %.13s\n", rank,message); MPI_Finalize(); }

Program sends message “Hello World” from master process (rank = 0) to each of the other processes (rank != 0). Then, all processes execute a println statement. In MPI, standard output automatically redirected from remote computers to the user’s console (thankfully!) so final result on console will be Message from process =1 : Hello, world Message from process =0 : Hello, world Message from process =2 : Hello, world Message from process =3 : Hello, world ... except that the order of messages might be different but is unlikely to be in ascending order of process ID; it will depend upon how the processes are scheduled.

Another Example (array) int array[100]; … // rank 0 fills the array with data if (rank == 0) MPI_Send (array, 100, MPI_INT, 1, 0, MPI_COMM_WORLD); else if (rank == 1) MPI_Recv(array, 100, MPI_INT, 0, 0, MPI_COMM_WORLD, &status); tag Destination Source Number of Elements Slide based upon slide from C. Ferner, UNC-W

Another Example (Ring) Each process (excepts the master) receives a token from the process with rank 1 less than its own rank. Then each process increments the token by 2 and sends it to the next process (with rank 1 more than its own). The last process sends the token to the master 1 7 2 6 3 5 4 Question: Do we have pattern for this? Slide based upon slides from C. Ferner, UNC-W

Ring Example #include <stdio.h> #include <mpi.h> int main (int argc, char *argv[]) { int token, NP, myrank; MPI_Status status; MPI_Init (&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &NP); MPI_Comm_rank(MPI_COMM_WORLD, &myrank);

Ring Example continued if (myrank == 0) { token = -1; // Master sets initial value before sending. else { // Everyone except master receives from process 1 less // than its own rank. MPI_Recv(&token, 1, MPI_INT, myrank - 1, 0, MPI_COMM_WORLD, &status); printf("Process %d received token %d from process %d\n", myrank, token, myrank - 1); }

Ring Example continued // all processes token += 2; // add 2 to token before sending it MPI_Send(&token, 1, MPI_INT, (myrank + 1) % NP, 0, MPI_COMM_WORLD); // Now process 0 can receive from the last process. if (myrank == 0) { MPI_Recv(&token, 1, MPI_INT, NP - 1, 0, MPI_COMM_WORLD, &status); printf("Process %d received token %d from process %d\n", myrank, token, NP - 1); } MPI_Finalize();

Results (Ring) Process 1 received token 1 from process 0 Process 2 received token 3 from process 1 Process 3 received token 5 from process 2 Process 4 received token 7 from process 3 Process 5 received token 9 from process 4 Process 6 received token 11 from process 5 Process 7 received token 13 from process 6 Process 0 received token 15 from process 7

Matching up sends and recvs Notice in code how you have to be very careful matching up send’s and recv’s. Every send must have matching recv. The sends return after local actions complete but the recv will wait for the message so easy to get deadlock if written wrong Pre-implemented patterns are designed to avoid deadlock. We will look at deadlock again

Measuring Execution Time MPI provides the routine MPI_Wtime() for returning time (in seconds) from some pint in the past. To measure execution time between point L1 and point L2 in code, might have construction such as: double start_time, end_time, exe_time; L1: start_time = MPI_Wtime();         .     . L2: end_time = MPI_Wtime(); exe_time = end_time - start_time; .

Using C time routines To measure execution time between point L1 and point L2 in code, might have construction such as: . L1: time(&t1); /* start timer */ L2: time(&t2); /* stop timer */ elapsed_Time = difftime(t2, t1); /*time=t2-t1*/ printf(“Elapsed time=%5.2f secs”,elapsed_Time);

gettimeofday() #include <sys/time.h> double elapsed_time; struct timeval tv1, tv2; gettimeofday(&tv1, NULL); … gettimeofday(&tv2, NULL); elapsed_time = (tv2.tv_sec - tv1.tv_sec) + ((tv2.tv_usec - tv1.tv_usec) / 1000000.0); Measure time to execute this section Using time() or timeofday() routines may be useful if you want to compare with a sequential C version of the program with same libraries.

Compiling and executing MPI programs (without a scheduler)

MPICH Commands Two basic commands: mpicc, a script to compile MPI programs (same command line options as gcc) mpiexec - MPI-2 standard command * * mpiexec replaces earlier mpirun command although mpirun still exists.

Compiling/executing MPI program on command line To start MPI: Depends upon implementation, usually nothing special. (For UNC-C cluster, make sure MPI mpd daemons running.) To compile MPI programs: for C mpicc -o prog prog.c for C++ mpiCC -o prog prog.cpp To execute MPI program: mpiexec -n no_procs prog or mpirun -np no_procs prog A positive integer

Setting Up the Message Passing Environment Usually computers specified in a file, called a hostfile or machines file. File contains names of computers and possibly number of processes that should run on each computer. Implementation-specific algorithm selects computers from list to run user programs. If a machines file not specified, a default machines file used or it may be that program will only run on a single computer.

Executing program on multiple computers Create a file called say “machines” containing the names of computers you wish to use, each name on one line. Then specify the file with the –machines option. For example: mpiexec -machinefile machines -n 4 ./prog would run prog with four processes. Each processes would execute on one of machines in list. MPI would cycle through list of machines giving processes to machines (round robin scheduling).

Executing program on UNCC cluster In a cluster internal compute nodes have names used just internally. For example a machines file to use nodes 5, 7 and 8 and the front node of the cci-grid0x cluster would be: cci-grid05 cci-grid07 cci-grid08 cci-gridgw.uncc.edu Then: mpiexec -machinefile machines -n 4 ./prog would run prog with four processes, one on cci-grid05, one on cci-grid07, one on cci-grid08, and one on cci-gridgw.uncc.edu.

Specifying number of processes to execute on each computer The machines file can include how many processes to execute on each computer. For example: # a comment cci-grid05:2 # first 2 processes on 05 cci-grid07:3 # next 3 processes on 07 cci-grid08:4 # next 4 processes on 08 cci-gridgw.uncc.edu:1 # Last process on gridgw (09) the processes in total: mpiexec -machinefile machines -n 10 ./prog If more processes were specified, they would be scheduled in round robin fashion.

MPI Scalable Process Management System (Hydra) The UNCC cci-grid0x cluster uses the MPI Scalable Process Management System (Hydra). To execute MPI programs, use: mpiexec.hydra -machinefile machines -n 4 ./prog -f <hostfile> can also specify host names on which to run the application. More information on Hydra Commands: http://software.intel.com/sites/products/documentation/hpc/ics/impi/41/lin/Reference_Manual/index.htm#Global_Options.htm Some commands do not seem to work (-hosts, -machine)

Programming Tools

Eclipse IDE PTP Parallel Tools Platform A recent version of Eclipse IDE that supports development of parallel programs (MPI, OpenMP) http://download.eclipse.org/tools/ptp/docs/ptp-sc11-slides-final.pdf

Visualization Tools Programs can be watched as they are executed in a space-time diagram (or process-time diagram): Process 1 Process 2 Process 3 Computing Time W aiting Message-passing system routine Message Visualization tools available for MPI, e.g., Upshot.

Questions

Next topic More on MPI