Parallel Programming with Message-Passing Interface (MPI)

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

MPI Basics Introduction to Parallel Programming and Cluster Computing University of Washington/Idaho State University MPI Basics Charlie Peck Earlham College.
Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Reference: / MPI Program Structure.
Tutorial on MPI Experimental Environment for ECE5610/CSC
High Performance Computing
Reference: Getting Started with MPI.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
1 Parallel Programming with MPI: Day 1 Science & Technology Support High Performance Computing Ohio Supercomputer Center 1224 Kinnear Road Columbus, OH.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
High Performance Parallel Programming Dirk van der Knijff Advanced Research Computing Information Division.
Comp 422: Parallel Programming Lecture 8: Message Passing (MPI)
EECC756 - Shaaban #1 lec # 7 Spring Message Passing Interface (MPI) MPI, the Message Passing Interface, is a library, and a software standard.
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
ORNL is managed by UT-Battelle for the US Department of Energy Crash Course In Message Passing Interface Adam Simpson NCCS User Assistance.
1 CS4402 – Parallel Computing Lecture 2 MPI – Getting Started. MPI – Point to Point Communication.
Director of Contra Costa College High Performance Computing Center
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 17, 2012.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
2.1 Message-Passing Computing ITCS 4/5145 Parallel Computing, UNC-Charlotte, B. Wilkinson, Jan 14, 2013.
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
CS 240A Models of parallel programming: Distributed memory and MPI.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 The University of Southern Mississippi April 8, 2010.
Introduction to Parallel Programming with C and MPI at MCSR Part 1 MCSR Unix Camp.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
Message Passing Interface (MPI) 1 Amit Majumdar Scientific Computing Applications Group San Diego Supercomputer Center Tim Kaiser (now at Colorado School.
MPI Introduction to MPI Commands. Basics – Send and Receive MPI is a message passing environment. The processors’ method of sharing information is NOT.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained.
Parallel Programming with MPI By, Santosh K Jena..
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd Edition, by B. Wilkinson & M. Allen, ©
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Message Passing and MPI Laxmikant Kale CS Message Passing Program consists of independent processes, –Each running in its own address space –Processors.
Message-Passing Computing Chapter 2. Programming Multicomputer Design special parallel programming language –Occam Extend existing language to handle.
Introduction to MPI CDP 1. Shared Memory vs. Message Passing Shared Memory Implicit communication via memory operations (load/store/lock) Global address.
Programming distributed memory systems: Message Passing Interface (MPI) Distributed memory systems: multiple processing units working on one task (e.g.
An Introduction to MPI (message passing interface)
NORA/Clusters AMANO, Hideharu Textbook pp. 140-147.
1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
PVM and MPI.
Introduction to parallel computing concepts and technics
MPI Basics.
Introduction to MPI.
MPI Message Passing Interface
Introduction to MPI CDP.
CS 584.
Introduction to Message Passing Interface (MPI)
Message Passing Models
Lecture 14: Inter-process Communication
Introduction to parallelism and the Message Passing Interface
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Distributed Memory Programming with Message-Passing
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Parallel Programming with Message-Passing Interface (MPI) An Introduction WW Grid Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS) Lab. The University of Melbourne Melbourne, Australia www.gridbus.org

Message-Passing Programming Paradigm Each processor in a message-passing program runs a sub-program written in a conventional sequential language all variables are private communicate via special subroutine calls M M M Memory P P P Processors Interconnection Network

MPI Slides are Derived from Dirk van der Knijff, High Performance Parallel Programming, PPT Slides MPI Notes, Maui HPC Centre. Melbourne Advanced Research Computing Center http://www.hpc.unimelb.edu.au

Single Program Multiple Data Introduced in data parallel programming (HPF) Same program runs everywhere Restriction on general message-passing model Some vendors only support SPMD parallel programs Usual way of writing MPI programs General message-passing model can be emulated

SPMD examples main(int argc, char **argv) { if(process is to become Master) MasterRoutine(/*arguments*/) } else /* it is worker process */ WorkerRoutine(/*arguments*/)

Messages Messages are packets of data moving between sub-programs The message passing system has to be told the following information Sending processor Source location Data type Data length Receiving processor(s) Destination location Destination size

Messages Access: Addressing: Reception: Each sub-program needs to be connected to a message passing system Addressing: Messages need to have addresses to be sent to Reception: It is important that the receiving process is capable of dealing with the messages it is sent A message passing system is similar to: Post-office, Phone line, Fax, E-mail, etc Message Types: Point-to-Point, Collective, Synchronous (telephone)/Asynchronous (Postal)

Point-to-Point Communication Simplest form of message passing One process sends a message to another Several variations on how sending a message can interact with execution of the sub-program

Point-to-Point variations Synchronous Sends provide information about the completion of the message e.g. fax machines Asynchronous Sends Only know when the message has left e.g. post cards Blocking operations only return from the call when operation has completed Non-blocking operations return straight away - can test/wait later for completion

Collective Communications Collective communication routines are higher level routines involving several processes at a time Can be built out of point-to-point communications Barriers synchronise processes Broadcast one-to-many communication Reduction operations combine data from several processes to produce a single (usually) result

Message Passing Systems Initially each manufacturer developed their own Wide range of features, often incompatible Several groups developed systems for workstations PVM - (Parallel Virtual Machine) de facto standard before MPI Open Source (NOT public domain!) User Interface to the System (daemons) Support for Dynamic environments

MPI Forum - www.mpi-forum.org Sixty people from forty different organisations Both users and vendors, from the US and Europe Two-year process of proposals, meetings and review Produced a document defining a standard Message Passing Interface (MPI) to provide source-code portability to allow efficient implementation it provides a high level of functionality support for heterogeneous parallel architectures parallel I/O (in MPI 2.0) MPI 1.0 contains over 115 routines/functions that can be grouped into 8 categories.

General MPI Program Structure MPI Include File Initialise MPI Environment Do work and perform message communication Terminate MPI Environment

MPI programs MPI is a library - there are NO language changes Header Files C: #include <mpi.h> MPI Function Format C: error = MPI_Xxxx(parameter,...); MPI_Xxxx(parameter,...);

MPI helloworld.c #include <mpi.h> main(int argc, char **argv) { int numtasks, rank; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, & numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("Hello World from process %d of %d\n“, rank, numtasks); MPI_Finalize(); }

Example - C #include <mpi.h> /* include other usual header files*/ main(int argc, char **argv) { /* initialize MPI */ MPI_Init(&argc, &argv); /* main part of program */ /* terminate MPI */ MPI_Finalize(); exit(0); }

Handles MPI controls its own internal data structures MPI releases ‘handles’ to allow programmers to refer to these C handles are of distinct typedef‘d types and arrays are indexed from 0 Some arguments can be of any type - in C these are declared as void *

Initializing MPI The first MPI routine called in any MPI program must be MPI_Init. The C version accepts the arguments to main int MPI_Init(int *argc, char ***argv); MPI_Init must be called by every MPI program Making multiple MPI_Init calls is erroneous MPI_INITIALIZED is an exception to first rule

MPI_COMM_WORLD MPI_INIT defines a communicator called MPI_COMM_WORLD for every process that calls it. All MPI communication calls require a communicator argument MPI processes can only communicate if they share a communicator. A communicator contains a group which is a list of processes Each process has it’s rank within the communicator A process can have several communicators

Communicators MPI uses objects called Communicators that defines which collection of processes communicate with each other. Every process has unique integer identifier assigned by the system when the process initialises. A rand is sometimes called process ID. Processes can request information from a communicator MPI_Comm_rank(MPI_comm comm, int *rank) Returns the rank of the process in comm MPI_Comm_size(MPI_Comm comm, int *size) Returns the size of the group in comm

Finishing up An MPI program should call MPI_Finalize when all communications have completed. Once called no other MPI calls can be made Aborting: MPI_Abort(comm) Attempts to abort all processes listed in comm if comm = MPI_COMM_WORLD the whole program terminates

MPI Programs Compilation and Execution Let us look into MARC Aplha Cluster

Manjra: GRIDS Lab Linux Cluster Master Node: manjra.cs.mu.oz.au Dual Xeon 2GHz 512 MB memory 250 GB integrated storage Gigabit LAN CDROM & Floppy Drives Red Hat Linux release 7.3 (Valhalla) Worker Nodes(node1..node13) Each of the 13 worker node consists of the following: Pentium 4 2GHz 40 GB harddisk Master: manjra.cs.mu.oz.au Internal worker nodes: node1 node2 .... node13 Manjra Linux cluster

How legion clusters looks Front View Back View

A legion cluster view from angle!

Compile and Run Commands manjra> mpicc helloworld.c -o helloworld Run: manjra> mpirun -np 3 -machinefile machines.list helloworld The file machines.list contains nodes list: manjra.cs.mu.oz.au node1 node2 node3 node4 node6 node5 and node7 are not working today! No of processes

Sample Run and Output A Run with 3 Processes: A Run by default manjra> mpirun -np 3 -machinefile machines.list helloworld Hello World from process 0 of 3 Hello World from process 1 of 3 Hello World from process 2 of 3 A Run by default manjra> helloworld Hello World from process 0 of 1

Sample Run and Output A Run with 6 Processes: manjra> mpirun -np 6 -machinefile machines.list helloworld Hello World from process 0 of 6 Hello World from process 3 of 6 Hello World from process 1 of 6 Hello World from process 5 of 6 Hello World from process 4 of 6 Hello World from process 2 of 6 Note: Process execution need not be in process number order.

Sample Run and Output A Run with 6 Processes: manjra> mpirun -np 6 -machinefile machines.list helloworld Hello World from process 0 of 6 Hello World from process 3 of 6 Hello World from process 1 of 6 Hello World from process 2 of 6 Hello World from process 5 of 6 Hello World from process 4 of 6 Note: Change in process output order. For each run, process mapping can be different. They may run on machines with different load. Hence such difference.

Hello World with Error Check

MPI Routines

MPI Routines – C and Fortran Environment Management Point-to-Point Communication Collective Communication Process Group Management Communicators Derived Type Virtual Topologies Miscellaneous Routines

Environment Management Routines

Point-to-Point Communication

Collective Communication Routines

Process Group Management Routines

Communicators Routines

Derived Type Routines

Virtual Topologies Routines

Miscellaneous Routines

MPI Messages A message contains a number of elements of some particular datatype MPI datatypes Basic Types Derived types Derived types can be built up from basic types C types are different from Fortran types

MPI Basic Datatypes - C

Point-to-Point Communication Communication between two processes Source process sends message to destination process Communication takes place within a communicator Destination process is identified by its rank in the communicator MPI provides four communication modes for sending messages standard, synchronous, buffered, and ready Only one mode for receiving

Standard Send Completes once the message has been sent Note: it may or may not have been received Programs should obey the following rules: It should not assume the send will complete before the receive begins - can lead to deadlock It should not assume the send will complete after the receive begins - can lead to non-determinism processes should be eager readers - they should guarantee to receive all messages sent to them - else network overload Can be implemented as either a buffered send or synchronous send

Standard Send (cont.) MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) buf the address of the data to be sent count the number of elements of datatype buf contains datatype the MPI datatype dest rank of destination in communicator comm tag a marker used to distinguish different message types comm the communicator shared by sender and receiver ierror the fortran return value of the send

Standard Blocking Receive Note: all sends so far have been blocking (but this only makes a difference for synchronous sends) Completes when message received MPI_Recv(buf, count, datatype, source, tag, comm, status) source - rank of source process in communicator comm status - returns information about message Synchronous Blocking Message-Passing processes synchronise sender process specifies the synchronous mode blocking - both processes wait until transaction completed

For a communication to succeed Sender must specify a valid destination rank Receiver must specify a valid source rank The communicator must be the same Tags must match Message types must match Receivers buffer must be large enough Receiver can use wildcards MPI_ANY_SOURCE MPI_ANY_TAG actual source and tag are returned in status parameter

Standard/Blocked Send/Receive

MPI Send/Receive a Character (cont...) #include <mpi.h> #include <stdio.h> int main(int argc, char *argv[]) { int numtasks, rank, dest, source, rc, tag=1; char inmsg, outmsg='X'; MPI_Status Stat; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { dest = 1; rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); printf("Rank0 sent: %c\n", outmsg); source = 1; rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); }

MPI Send/Receive a Character else if (rank == 1) { source = 0; rc = MPI_Recv(&inmsg, 1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); printf("Rank1 received: %c\n", inmsg); dest = 0; rc = MPI_Send(&outmsg, 1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } MPI_Finalize();

Synchronous Send Completes when the message has been received Effect is to synchronise the sender and receiver Deadlocks if no receiver Safer than standard send but may be slower MPI_Ssend(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) All parameters as for standard send Fortran equivalent as usual (plus ierror)

Buffered Send Guarantees to complete immediately Copies message to buffer if necessary To use buffered mode the user must explicitly attach buffer space MPI_Bsend(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm) MPI_Buffer_attach(void *buf, int size) Only one buffer can de attached at any one time Buffers can be detached MPI_Buffer_detach(void *buf, int size)

Ready Send Completes immediately Guaranteed to succeed if receive is already posted Outcome is undefined in no receive posted May improve performance Requires careful attention to messaging patterns MPI_Rsend(buf, count, datatype, dest, tag, comm) process 0 process 1 synchronous send with tag 1 ready send with tag 0 non-blocking receive with tag 0 blocking receive with tag 1 test non-blocking receive

Communication Envelope Information Envelope information is returned from MPI_Recv as status Information includes Source: status.MPI_SOURCE or status(MPI_SOURCE) Tag: status.MPI_TAG or status(MPI_TAG) Count: MPI_Get_count(MPI_Status status, MPI_Datatype datatype, int *count)

Point-to-Point Rules Message Order Preservation Progress messages do not overtake each other true even for non-synchronous sends i.e. if process a posts two sends and process posts matching receives then they will complete in the order they were sent Progress It is not possible for a matching send and receive pair to remain permanently outstanding. It is possible for a third process to match one of the pair

Non Blocking Message Passing

Exercise: Ping Pong Write a program in which two processes repeatedly pass a message back and forth. Insert timing calls to measure the time taken for one message. Investigate how the time taken varies with the size of the message.

A simple Ping Pong.c (cont..) #include <mpi.h> #include <stdio.h> int main(int argc, char *argv[]) { int numtasks, rank, dest, source, rc, tag=1; char inmsg, outmsg='X'; char pingmsg[10]; char pongmsg[10]; char buff[100]; MPI_Status Stat; strcpy(pingmsg, "ping"); strcpy(pongmsg, "pong"); MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &numtasks); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { /* Send Ping, Receive Pong */ dest = 1; source = 1; rc = MPI_Send(pingmsg, strlen(pingmsg)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); rc = MPI_Recv(buff, strlen(pongmsg)+1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); printf("Rank0 Sent: %d & Received: %s\n", pingmsg, buff); } Why + 1 ?

A simple Ping Pong.c else if (rank == 1) { /* Receive Ping, Send Pong */ dest = 0; source = 0; rc = MPI_Recv(buff, strlen(pingmsg)+1, MPI_CHAR, source, tag, MPI_COMM_WORLD, &Stat); printf("Rank1 received: %s & Sending: %s\n", buff, pongmsg); rc = MPI_Send(pongmsg, strlen(pongmsg)+1, MPI_CHAR, dest, tag, MPI_COMM_WORLD); } MPI_Finalize();

Timers C: double MPI_Wtime(void); Time is measured in seconds Returns an elapsed wall clock time in seconds (double precision) on the calling processor. Time is measured in seconds Time to perform a task is measured by consulting the time before and after

mpich on legion cluster Compile with mpicc or mpif90 Don’t need -lmpi Run with qsub -q pque <jobscript> where jobscript is #PBS -np=2 mpirun <progname>

mpich After the MPI standard was announced a portable implementation, mpich, was produced by ANL (Argonne National Lab, Chicago, US). It consists of: libraries and include files - libmpi, mpi.h compilers - mpicc, mpif90. These know about things like where relevant include and library files are mpicc helloworld.c –o helloworld runtime loader - mpirun Has arguments -np <number of nodes>, and -machinefile <file of nodenames> implements SPMD paradigm by starting a copy of program on each node. The program must therefore do do any differentitation itself (using MPI_Comm_size() and MPI_Comm_rank() functions). mpicc –np 3 –machinefile machines.list helloworld NOTE: our version gets # CPUs and their addresses from PBS (ie, don't use -np and/or -machinefile)

PBS PBS is a batch system - jobs get submitted to a queue The job is a shell script to execute your program The shell script can contain job management instructions (note that these instructions can also be in the command line) PBS will allocate your job to some other computer, log in as you, and execute your script, ie your script must contain cd's or aboslute references to access files (or globus objects) Useful PBS commands: qsub - submits a job qstat - monitors status qdel - deletes a job from a queue

PBS directives Some PBS directives to insert at the start of your shell script: #PBS -q <queuename> #PBS -e <filename> (stderr location) #PBS -o <filename> (stdout location) #PBS -eo (combines stderr and stdout) #PBS -t <seconds> (maximum time) #PBS -l <attribute>=<value> (eg -l nodes=2)

MPI Programs Compilation and Execution Let us look into MARC Aplha Cluster

Melbourne Advanced Research Computing (MARC) Alpha Cluster Exclusive nodes (cnet1..cnet16) (Parallel jobs only) Compaq Personal Workstation 600au 600 MHz 21164AXP cpu - 96 KByte internal cache 2 MByte external cache 192 MByte memory 4.3 GByte Ultrawide SCSI disc 100 Mbps Ethernet legion.hpc.unimelb.edu.au cnet1.hpc.unimelb.edu.au cnet2.hpc.unimelb.edu.au .... cnet16.hpc.unimelb.edu.au legion Alpha cluster

How legion clusters looks Front View Back View

A legion cluster view from angle!

Compile and Run Commands legion> mpicc helloworld.c -o helloworld Run: legion> mpirun -np 3 -machinefile machines.list helloworld The file machines.list contains nodes list: legion.hpc.unimelb.edu.au cnet1.hpc.unimelb.edu.au cnet2.hpc.unimelb.edu.au cnet3.hpc.unimelb.edu.au cnet4.hpc.unimelb.edu.au cnet5.hpc.unimelb.edu.au No of processes

Sample Run and Output A Run with 3 Processes: A Run by default legion> mpirun -np 3 -machinefile machines.list helloworld Hello World from process 0 of 3 Hello World from process 1 of 3 Hello World from process 2 of 3 A Run by default legion> helloworld Hello World from process 0 of 1

Sample Run and Output A Run with 6 Processes: legion> mpirun -np 6 -machinefile machines.list helloworld Hello World from process 0 of 6 Hello World from process 3 of 6 Hello World from process 1 of 6 Hello World from process 5 of 6 Hello World from process 4 of 6 Hello World from process 2 of 6 Note: Process execution need not be in process number order.

Sample Run and Output A Run with 6 Processes: legion> mpirun -np 6 -machinefile machines.list helloworld Hello World from process 0 of 6 Hello World from process 3 of 6 Hello World from process 1 of 6 Hello World from process 2 of 6 Hello World from process 5 of 6 Hello World from process 4 of 6 Note: Change in process output order. For each run, process mapping can be different. They may run on machines with different load. Hence such difference.