Special Topics: MPI jobs

Slides:



Advertisements
Similar presentations
MPI Message Passing Interface
Advertisements

Chapter 3. MPI MPI = Message Passing Interface Specification of message passing libraries for developers and users –Not a library by itself, but specifies.
Reference: / MPI Program Structure.
MPI support in gLite Enol Fernández CSIC. EMI INFSO-RI CREAM/WMS MPI-Start MPI on the Grid Submission/Allocation – Definition of job characteristics.
Introduction to MPI. What is Message Passing Interface (MPI)?  Portable standard for communication  Processes can communicate through messages.  Each.
Message-Passing Programming and MPI CS 524 – High-Performance Computing.
Distributed Memory Programming with MPI. What is MPI? Message Passing Interface (MPI) is an industry standard message passing system designed to be both.
1 Parallel Computing—Introduction to Message Passing Interface (MPI)
MPI Point-to-Point Communication CS 524 – High-Performance Computing.
1 Tuesday, October 10, 2006 To err is human, and to blame it on a computer is even more so. -Robert Orben.
Parallel Programming Using Basic MPI Presented by Timothy H. Kaiser, Ph.D. San Diego Supercomputer Center Presented by Timothy H. Kaiser, Ph.D. San Diego.
Parallel & Cluster Computing MPI Basics Paul Gray, University of Northern Iowa David Joiner, Shodor Education Foundation Tom Murphy, Contra Costa College.
1 MPI: Message-Passing Interface Chapter 2. 2 MPI - (Message Passing Interface) Message passing library standard (MPI) is developed by group of academics.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) MPI Applications with the Grid Engine Riccardo Rotondo
Part I MPI from scratch. Part I By: Camilo A. SilvaBIOinformatics Summer 2008 PIRE :: REU :: Cyberbridges.
Parallel Computing A task is broken down into tasks, performed by separate workers or processes Processes interact by exchanging information What do we.
Parallel Programming with MPI Prof. Sivarama Dandamudi School of Computer Science Carleton University.
Message Passing Programming with MPI Introduction to MPI Basic MPI functions Most of the MPI materials are obtained from William Gropp and Rusty Lusk’s.
CS 838: Pervasive Parallelism Introduction to MPI Copyright 2005 Mark D. Hill University of Wisconsin-Madison Slides are derived from an online tutorial.
Message Passing Programming Model AMANO, Hideharu Textbook pp. 140-147.
Summary of MPI commands Luis Basurto. Large scale systems Shared Memory systems – Memory is shared among processors Distributed memory systems – Each.
E-science grid facility for Europe and Latin America gLite MPI Tutorial for Grid School Daniel Alberto Burbano Sefair, Universidad de Los.
Distributed-Memory (Message-Passing) Paradigm FDI 2004 Track M Day 2 – Morning Session #1 C. J. Ribbens.
MA471Fall 2002 Lecture5. More Point To Point Communications in MPI Note: so far we have covered –MPI_Init, MPI_Finalize –MPI_Comm_size, MPI_Comm_rank.
CSCI-455/522 Introduction to High Performance Computing Lecture 4.
1 Message Passing Models CEG 4131 Computer Architecture III Miodrag Bolic.
Chapter 4 Message-Passing Programming. The Message-Passing Model.
Introduction to MPI Nischint Rajmohan 5 November 2007.
MPI Point to Point Communication CDP 1. Message Passing Definitions Application buffer Holds the data for send or receive Handled by the user System buffer.
An Introduction to MPI (message passing interface)
Introduction to Parallel Programming at MCSR Message Passing Computing –Processes coordinate and communicate results via calls to message passing library.
3/12/2013Computer Engg, IIT(BHU)1 MPI-1. MESSAGE PASSING INTERFACE A message passing library specification Extended message-passing model Not a language.
Message Passing Interface Using resources from
MPI-Message Passing Interface. What is MPI?  MPI is a specification for the developers and users of message passing libraries. By itself, it is NOT a.
Advanced gLite job management Paschalis Korosoglou, AUTH/GRNET EPIKH Application Porting School 2011 Beijing, China Paschalis Korosoglou,
LA 4 CHAIN GISELA EPIKH School SPECFEM3D on Science Gateway.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Special Topics: MPI jobs Maha Dessokey (
PVM and MPI.
3/12/2013Computer Engg, IIT(BHU)1 MPI-2. POINT-TO-POINT COMMUNICATION Communication between 2 and only 2 processes. One sending and one receiving. Types:
Chapter 4.
MPI Applications with the Grid Engine
Advanced Topics: MPI jobs
Introduction to parallel computing concepts and technics
MPI Basics.
gLite MPI Job Amina KHEDIMI CERIST
CS4402 – Parallel Computing
MPI Point to Point Communication
Introduction to MPI.
MPI Applications with the Grid Engine
MPI Message Passing Interface
Send and Receive.
CS 584.
Send and Receive.
CS4961 Parallel Programming Lecture 16: Introduction to Message Passing Mary Hall November 3, /03/2011 CS4961.
MPI-Message Passing Interface
Message Passing Models
Lecture 14: Inter-process Communication
A Message Passing Standard for MPP and Workstations
Cenni sul calcolo parallelo. Descrizione di JDL per i job di tipo MPI.
Message-Passing Computing More MPI routines: Collective routines Synchronous routines Non-blocking routines ITCS 4/5145 Parallel Computing, UNC-Charlotte,
Introduction to parallelism and the Message Passing Interface
MPI MPI = Message Passing Interface
Hardware Environment VIA cluster - 8 nodes Blade Server – 5 nodes
Message-Passing Computing Message Passing Interface (MPI)
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Distributed Memory Programming with Message-Passing
Hello, world in MPI #include <stdio.h> #include "mpi.h"
Parallel Processing - MPI
MPI Message Passing Interface
CS 584 Lecture 8 Assignment?.
Presentation transcript:

Special Topics: MPI jobs Giuseppe LA ROCCA INFN Catania giuseppe.larocca@ct.infn.it Joint EUMEDGRID-Support/EPIKH School for Application Porting Algiers, 04-15.07.2010

Table of Contents MPI and its implementations Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010 2

MPI and its implementations The Message Passing Interface (MPI) is a de-facto standard for writing parallel application. There are two versions of MPI, MPI-1 and MPI-2; Two implementations of MPI-1: LAM; MPICH. Two implementations of MPI-2: OpenMPI; MPICH2. Each version has different implementations: Some implementation are hardware related: E.g.: InfiniBand networks require MVAPICH v.1 or v.2 libraries. Individual sites may chose to support only a subset of these implementations, or none at all. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Goals of the MPI standard MPI prime goals are: To provide source-code portability To allow efficient implementation across a range of architectures A great deal of functionality the user need not cope with communication failures. Such failures are dealt with by the underlying communication subsystem. a “master” node starts some processes “slaves” by establishing SSH sessions all processes can share a common workspace and/or exchange data based on send() and receive() routines Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

A bit of history ... The Message Passing Interface (MPI) is a standard developed by the Message Passing Interface Forum (MPIF). It specifies a portable interface APIs for writing message-passing programs in Fortran, C and C++ MPIF (http://www.mpi-forum.org/), with the participation of more than 40 organizations, started working on the standard in 1992. The first draft (Version 1.0), which was published in 1994, was strongly influenced by the work at the IBM T. J. Watson Research Center. MPIF has further enhanced the first version to develop a second version (MPI-2) in 1997. The latest release of the first version (Version 1.2) is offered as an update to the previous release and is contained in the MPI-2 document. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

..some basic concepts An MPI process consists of a C/C++ or Fortran 77 program which communicates with other MPI processes by calling MPI routines The MPI routines provide the programmer with a consistent interface across a wide variety of different platforms. All names of MPI routines and constants in both C and Fortran begin with the prefix MPI_ to avoid name collisions. Fortran routine names are all upper case but C routine names are mixed case. In general, C MPI routines return an int and Fortran MPI routines have an IERROR argument. The default action on detection of an error by MPI is to cause the parallel computation to abort, rather than return with an error code, but this can be changed. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Basic Structures of MPI Programs Header files Initializing MPI MPI Communicator MPI Function format Communicator Size Process Rank Finalizing MPI Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

1.Header files All sub-programs that contains calls to MPI subroutine MUST include the MPI HEADER file C: #include <mpi.h> Fortran: include ‘mpi.h’ The header file contains definitions of MPI constants, MPI types and functions Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

2.Initializing MPI The first MPI routine called in any MPI program must be the initialisation routine MPI_INIT. Every MPI program must call this routine once, before any other MPI routines. Making multiple calls to MPI_INIT is erroneous. The C version of the routine accepts argc and argv as arguments : int MPI_Init(int &argc, char &argv); The Fortran version takes no arguments other than the error code: MPI_INIT(IERROR) Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

3.MPI Communicator The Communicator is a variable identifying a group of processes that are allowed to communicate with each other There is a default communicator MPI_COMM_WORLD which identify the group of all the processes. The processes are ordered and numbered consecutively from 0 (in both Fortran and C), the number of each process being known as its rank The rank identifies each process within the communicator. The predefined communicator MPI_COMM_WORLD for 7 processes The numbers indicate the ranks of each process. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Error = MPI_XXX (parameter, …); Fortran: 4.MPI Functions format C: Error = MPI_XXX (parameter, …); Fortran: CALL MPI_XXX (parameter, IERROR) Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

How many processes are associated with a communicator ? 5.Communicator Size How many processes are associated with a communicator ? C : MPI_Comm_size (MPI_Comm comm, int *SIZE); Fortran : INTEGER COMM, SIZE, IERR CALL MPI_COMM_SIZE (COMM,SIZE,IERR) Output SIZE Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

What is the ID of a process in a group ? C: 6.Process Rank What is the ID of a process in a group ? C: MPI_Comm_rank (MPI_Comm comm, int *RANK); Fortran: INTEGER COMM, RANK, IERR CALL MPI_COMM_RANK (COMM, RANK, IERR) Output : RANK Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

7.Finalizing MPI An MPI program should call the MPI routine MPI_FINALIZE when all communications have completed. This routine cleans up all MPI data-structures, etc. Once this routine has been called, no other calls can be made to MPI routines Finalizing the MPI environment C: int MPI_Finalize (); Fortran: INTEGER IERR CALL MPI_FINALIZE (IERR) Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Point-to-point & collective communication A point-to-point communication always involves exactly two processes. One process sends a MESSAGE to the other. This distinguishes it from the collective communication, which involves a whole group of process at one time. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Blocking and Non-Blocking Communication /1 Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Blocking and Non-Blocking Communication /2 Slow and simple Fast, complex and insecure Between the initiation and the completion the program could do some useful computation (latency hiding) The programmer has to insert code to check for completion Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Communication Modes and MPI subroutines The Standard Send completes once the message has been sent, which may or may not imply that the message has arrived at its destination. The message may instead lie “in the communication network” for some time. MPI_SEND (buf, count, datatype, dest, tag, comm); Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Advantage Disadvantage safer slower Synchronous Send If user needs to know if the sending message has been received by the receiver, then both process may use synchronous communication. An acknowledgement is sent by the receiver to the sender (‘handshake process’) If the ack is properly received by sender the send is considered completed. MPI_SSEND (buf,count,datatype,dest,tag,comm); Advantage Disadvantage safer slower Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Sender & receiver are not synchtonised Can’t pre-allocate buffer space Buffered Send Buffered Send guarantees to complete immediately, copying the message to system buffer for later transmission if necessary. The programmer has to allocate enough buffer space for the program with calls to MPI_BUFFER_ATTACK (buffer, size) Buffer space is detached with calls to MPI_BUFFER_DETACH (buffer, size) Advantage Disadvantage Sender & receiver are not synchtonised Can’t pre-allocate buffer space Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Complex to debug in case of failures Ready Send Like the Buffered Send, a Ready Send completes immediately. The sending process simply throws the message out onto the communication network and hopes that the receiver is waiting to catch it. If the receiver is ready, the message will be received, otherwise the message will be dropped. MPI_RSEND (buf, count, datatypes, source, tag, comm, status) Advantage Disadvantage High performance Complex to debug in case of failures Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

The standard blocking receive The format of the standard blocking receive is: MPI_RECV (buf, count, datatype, source, tag, comm, status) Where: buf is the address where the data should be placed once received (the receive buffer) count is the number of elements which buf can contain. datatype is the MPI datatype for the message source is the rank of the source of the message in the group associated with the communicator comm. tag is used by the receiving process to specify the message the receiver is waiting for. comm is the communicator status contains the status of the receiving process Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

To receive from any source: MPI_ANY_SOURCE Wildcards Both in Fortran and C MPI_RECV accept wildcard: To receive from any source: MPI_ANY_SOURCE MPI_Recv (&SIGMA_X, 1, MPI_DOUBLE, MPI_ANY_SOURCE, MSG_RESULT, MPI_COMM_WORLD, &status); Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Non-Blocking Communication The non-blocking routines have identical arguments to their blocking counterparts except for an extra argument in the non-blocking routines. This argument, request, is very important as it provides a handle which is used to test when the communication has completed Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Waiting and Testing for Completion /1 MPI_WAIT (req, status, ierr) MPI_WAIT (MPI_Request *req, MPI_Status *status); A call to this subroutine cause the code to wait until the communication pointed by req is completed. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Waiting and Testing for Completion /2 MPI_TEST (req, flag, status, ierr) MPI_TEST (MPI_Request *req, int *flag, MPI_Status *status); A call to this subroutine sets flag to true if the communication pointed by req has completed, set flag to false otherwise. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI_ISSEND (buf, count, datatype, dest, tag, comm, request) After the sending the program can continues with other computations which do not alter the send buffer. Before the sending process can update the send buffer it must check that the send has completed MPI_IRECV (buf, count, datatype, source, tag, comm, request) The receiving process can then carry on with other computations until it needs the received data. It then checks the receive buffer to see if the communication has completed. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Fortran – MPI Data types Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

C - MPI Data types Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Collective Communication The following table shows 16 MPI collective communication subroutines that are divided into four categories: The subroutine printed in boldface are used most frequently. All the MPI collective communication subroutine are blocking. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI_BCAST The subroutine MPI_BCAST broadcasts the message from a specific process called root to all the other processes identified by a communicator given as input Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI_GATHER The subroutine MPI_GATHER transmits data from all the processes in the communicator to a single receiving process. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI_REDUCE The subroutine MPI_REDUCE does reduction operations, such as summation of data distributed over processes, and brings the result to the root process Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI and its implementations Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010 36

Wrapper script for mpi-start /1 mpi-start is a recommended solution to hide the implementation details for jobs submission. The design of mpi-start was focused in making the MPI job submission as transparent as possible from the cluster details! It was developed inside the Int.EU.Grid project The RPM to be installed in all WNs can be found here Using the mpi-start system requires the user to define a wrapper script that set the environment variables and a set of hooks. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Wrapper script for mpi-start /2 #!/bin/bash # Pull in the arguments. MY_EXECUTABLE=`pwd`/$1 MPI_FLAVOR=$2 # Convert flavor to lowercase for passing to mpi-start. MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'` # Pull out the correct paths for the requested flavor. eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH` # Ensure the prefix is correctly set. Don't rely on the defaults. eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH export I2G_${MPI_FLAVOR}_PREFIX # Touch the executable. #It exist must for the shared file system check. # If it does not, then mpi-start may try to distribute the executable # when it shouldn't. touch $MY_EXECUTABLE # Setup for mpi-start. export I2G_MPI_APPLICATION=$MY_EXECUTABLE export I2G_MPI_APPLICATION_ARGS= export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh # If these are set then you will get more debugging information. export I2G_MPI_START_VERBOSE=1 #export I2G_MPI_START_DEBUG=1 # Invoke mpi-start. $I2G_MPI_START Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI and its implementations Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010 39

Hooks for mpi-start /1 The user may write a script which is called before and after the MPI executable is run. The pre-hook script can be used, for example, to compile the executable itself or download data; The post-hook script can be used to analyze results or to save the results on the grid. The pre- and post- hooks script may be defined in separate files, but the name of the functions named exactly “pre_run_hook” and “post_run_hook” Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Hooks for mpi-start /2 #!/bin/sh # This function will be called before the MPI executable is started. # pre_run_hook () { # Compile the program. echo "Compiling ${I2G_MPI_APPLICATION}" # Actually compile the program. cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c" echo $cmd $cmd if [ ! $? -eq 0 ]; then echo "Error compiling program. Exiting..." exit 1 fi # Everything's OK. echo "Successfully compiled ${I2G_MPI_APPLICATION}" return 0 } # This function will be called before the MPI executable is finished. # A typical case for this is to upload the results to a Storage Elem. post_run_hook () { echo "Executing post hook." echo "Finished the post hook." Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI and its implementations Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010 42

Defining the job and executable /1 Running the MPI job itself is not significantly different from running a standard grid job. JobType = “Normal"; CpuNumber = 2; Executable = "mpi-start-wrapper.sh"; Arguments = "mpi-test MPICH"; StdOutput = "mpi-test.out"; StdError = "mpi-test.err"; InputSandbox = {"mpi-start-wrapper.sh", "mpi-hooks.sh","mpi-test.c"}; OutputSandbox = {"mpi-test.err","mpi-test.out"}; Requirements = Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment) && Member(“MPICH", other.GlueHostApplicationSoftwareRunTimeEnvironment); The JobType must be “Normal” and the attribute CpuNumber must be defined Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Defining the job and executable /2 #include "mpi.h" #include <stdio.h> int main(int argc, char *argv[]) { int numprocs; /* Number of processors */ int procnum; /* Processor number */ /* Initialize MPI */ MPI_Init(&argc, &argv); /* Find this processor number */ MPI_Comm_rank(MPI_COMM_WORLD, &procnum); /* Find the number of processors */ MPI_Comm_size(MPI_COMM_WORLD, &numprocs); printf ("Hello world! from processor %d out of %d\n", procnum, numprocs); /* Shut down MPI */ MPI_Finalize(); return 0; } Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Hook Helpers The shell variable $MPI_START_SHARED_FS can be checked to figure out if the current site has a shared file system or not. The mpi_start_foreach_host shell function can be used to iterate over all the available machines in the current run. do_foreach_node () { # the first parameter $1 contains the hostname } post_run_hook () { ... mpi_start_foreach_host do_foreach_node Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI and its implementations Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010 46

Running the MPI job is no different from any other grid job. If the job ran correctly, then the standard output should contain something like the following: -<START PRE-RUN HOOK>------------------------------------------- […] -<STOP PRE-RUN HOOK>--------------------------------------------- =[START]========================================================= Hello world! from processor 1 out of 2 Hello world! from processor 0 out of 2 =[FINISHED]====================================================== -<START POST-RUN HOOK>------------------------------------------- -<STOP POST-RUN HOOK>------------------------------------------- Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

MPI and its implementations Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010 48

MPI-START Documentation [ link ] Site config for MPI [ link ] References EGEE Mpi guide [ link ] EGEE MPI WG [ link ] MPI-START Documentation [ link ] Site config for MPI [ link ] Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010

Thank you for your kind attention ! Any questions ? Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010