Presentation is loading. Please wait.

Presentation is loading. Please wait.

Special Topics: MPI jobs

Similar presentations


Presentation on theme: "Special Topics: MPI jobs"— Presentation transcript:

1 Special Topics: MPI jobs
Giuseppe LA ROCCA INFN Catania Joint EUMEDGRID-Support/EPIKH School for Application Porting Algiers,

2 Table of Contents MPI and its implementations
Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 2

3 MPI and its implementations
The Message Passing Interface (MPI) is a de-facto standard for writing parallel application. There are two versions of MPI, MPI-1 and MPI-2; Two implementations of MPI-1: LAM; MPICH. Two implementations of MPI-2: OpenMPI; MPICH2. Each version has different implementations: Some implementation are hardware related: E.g.: InfiniBand networks require MVAPICH v.1 or v.2 libraries. Individual sites may chose to support only a subset of these implementations, or none at all. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

4 Goals of the MPI standard
MPI prime goals are: To provide source-code portability To allow efficient implementation across a range of architectures A great deal of functionality the user need not cope with communication failures. Such failures are dealt with by the underlying communication subsystem. a “master” node starts some processes “slaves” by establishing SSH sessions all processes can share a common workspace and/or exchange data based on send() and receive() routines Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

5 A bit of history ... The Message Passing Interface (MPI) is a standard developed by the Message Passing Interface Forum (MPIF). It specifies a portable interface APIs for writing message-passing programs in Fortran, C and C++ MPIF ( with the participation of more than 40 organizations, started working on the standard in 1992. The first draft (Version 1.0), which was published in 1994, was strongly influenced by the work at the IBM T. J. Watson Research Center. MPIF has further enhanced the first version to develop a second version (MPI-2) in The latest release of the first version (Version 1.2) is offered as an update to the previous release and is contained in the MPI-2 document. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

6 ..some basic concepts An MPI process consists of a C/C++ or Fortran 77 program which communicates with other MPI processes by calling MPI routines The MPI routines provide the programmer with a consistent interface across a wide variety of different platforms. All names of MPI routines and constants in both C and Fortran begin with the prefix MPI_ to avoid name collisions. Fortran routine names are all upper case but C routine names are mixed case. In general, C MPI routines return an int and Fortran MPI routines have an IERROR argument. The default action on detection of an error by MPI is to cause the parallel computation to abort, rather than return with an error code, but this can be changed. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

7 Basic Structures of MPI Programs
Header files Initializing MPI MPI Communicator MPI Function format Communicator Size Process Rank Finalizing MPI Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

8 1.Header files All sub-programs that contains calls to MPI subroutine MUST include the MPI HEADER file C: #include <mpi.h> Fortran: include ‘mpi.h’ The header file contains definitions of MPI constants, MPI types and functions Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

9 2.Initializing MPI The first MPI routine called in any MPI program must be the initialisation routine MPI_INIT. Every MPI program must call this routine once, before any other MPI routines. Making multiple calls to MPI_INIT is erroneous. The C version of the routine accepts argc and argv as arguments : int MPI_Init(int &argc, char &argv); The Fortran version takes no arguments other than the error code: MPI_INIT(IERROR) Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

10 3.MPI Communicator The Communicator is a variable identifying a group of processes that are allowed to communicate with each other There is a default communicator MPI_COMM_WORLD which identify the group of all the processes. The processes are ordered and numbered consecutively from 0 (in both Fortran and C), the number of each process being known as its rank The rank identifies each process within the communicator. The predefined communicator MPI_COMM_WORLD for 7 processes The numbers indicate the ranks of each process. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

11 Error = MPI_XXX (parameter, …); Fortran:
4.MPI Functions format C: Error = MPI_XXX (parameter, …); Fortran: CALL MPI_XXX (parameter, IERROR) Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

12 How many processes are associated with a communicator ?
5.Communicator Size How many processes are associated with a communicator ? C : MPI_Comm_size (MPI_Comm comm, int *SIZE); Fortran : INTEGER COMM, SIZE, IERR CALL MPI_COMM_SIZE (COMM,SIZE,IERR) Output SIZE Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

13 What is the ID of a process in a group ? C:
6.Process Rank What is the ID of a process in a group ? C: MPI_Comm_rank (MPI_Comm comm, int *RANK); Fortran: INTEGER COMM, RANK, IERR CALL MPI_COMM_RANK (COMM, RANK, IERR) Output : RANK Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

14 7.Finalizing MPI An MPI program should call the MPI routine MPI_FINALIZE when all communications have completed. This routine cleans up all MPI data-structures, etc. Once this routine has been called, no other calls can be made to MPI routines Finalizing the MPI environment C: int MPI_Finalize (); Fortran: INTEGER IERR CALL MPI_FINALIZE (IERR) Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

15 Point-to-point & collective communication
A point-to-point communication always involves exactly two processes. One process sends a MESSAGE to the other. This distinguishes it from the collective communication, which involves a whole group of process at one time. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

16 Blocking and Non-Blocking Communication /1
Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

17 Blocking and Non-Blocking Communication /2
Slow and simple Fast, complex and insecure Between the initiation and the completion the program could do some useful computation (latency hiding) The programmer has to insert code to check for completion Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

18 Communication Modes and MPI subroutines
The Standard Send completes once the message has been sent, which may or may not imply that the message has arrived at its destination. The message may instead lie “in the communication network” for some time. MPI_SEND (buf, count, datatype, dest, tag, comm); Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

19 Advantage Disadvantage safer slower
Synchronous Send If user needs to know if the sending message has been received by the receiver, then both process may use synchronous communication. An acknowledgement is sent by the receiver to the sender (‘handshake process’) If the ack is properly received by sender the send is considered completed. MPI_SSEND (buf,count,datatype,dest,tag,comm); Advantage Disadvantage safer slower Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

20 Sender & receiver are not synchtonised Can’t pre-allocate buffer space
Buffered Send Buffered Send guarantees to complete immediately, copying the message to system buffer for later transmission if necessary. The programmer has to allocate enough buffer space for the program with calls to MPI_BUFFER_ATTACK (buffer, size) Buffer space is detached with calls to MPI_BUFFER_DETACH (buffer, size) Advantage Disadvantage Sender & receiver are not synchtonised Can’t pre-allocate buffer space Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

21 Complex to debug in case of failures
Ready Send Like the Buffered Send, a Ready Send completes immediately. The sending process simply throws the message out onto the communication network and hopes that the receiver is waiting to catch it. If the receiver is ready, the message will be received, otherwise the message will be dropped. MPI_RSEND (buf, count, datatypes, source, tag, comm, status) Advantage Disadvantage High performance Complex to debug in case of failures Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

22 The standard blocking receive
The format of the standard blocking receive is: MPI_RECV (buf, count, datatype, source, tag, comm, status) Where: buf is the address where the data should be placed once received (the receive buffer) count is the number of elements which buf can contain. datatype is the MPI datatype for the message source is the rank of the source of the message in the group associated with the communicator comm. tag is used by the receiving process to specify the message the receiver is waiting for. comm is the communicator status contains the status of the receiving process Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

23 To receive from any source: MPI_ANY_SOURCE
Wildcards Both in Fortran and C MPI_RECV accept wildcard: To receive from any source: MPI_ANY_SOURCE MPI_Recv (&SIGMA_X, 1, MPI_DOUBLE, MPI_ANY_SOURCE, MSG_RESULT, MPI_COMM_WORLD, &status); Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

24 Non-Blocking Communication
The non-blocking routines have identical arguments to their blocking counterparts except for an extra argument in the non-blocking routines. This argument, request, is very important as it provides a handle which is used to test when the communication has completed Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

25 Waiting and Testing for Completion /1
MPI_WAIT (req, status, ierr) MPI_WAIT (MPI_Request *req, MPI_Status *status); A call to this subroutine cause the code to wait until the communication pointed by req is completed. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

26 Waiting and Testing for Completion /2
MPI_TEST (req, flag, status, ierr) MPI_TEST (MPI_Request *req, int *flag, MPI_Status *status); A call to this subroutine sets flag to true if the communication pointed by req has completed, set flag to false otherwise. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

27 MPI_ISSEND (buf, count, datatype, dest, tag, comm, request)
After the sending the program can continues with other computations which do not alter the send buffer. Before the sending process can update the send buffer it must check that the send has completed MPI_IRECV (buf, count, datatype, source, tag, comm, request) The receiving process can then carry on with other computations until it needs the received data. It then checks the receive buffer to see if the communication has completed. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

28 Fortran – MPI Data types
Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

29 C - MPI Data types Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

30 Collective Communication
The following table shows 16 MPI collective communication subroutines that are divided into four categories: The subroutine printed in boldface are used most frequently. All the MPI collective communication subroutine are blocking. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

31 MPI_BCAST The subroutine MPI_BCAST broadcasts the message from a specific process called root to all the other processes identified by a communicator given as input Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

32 MPI_GATHER The subroutine MPI_GATHER transmits data from all the processes in the communicator to a single receiving process. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

33 MPI_REDUCE The subroutine MPI_REDUCE does reduction operations, such as summation of data distributed over processes, and brings the result to the root process Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

34 MPI and its implementations Wrapper script for mpi-start
Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 36

35 Wrapper script for mpi-start /1
mpi-start is a recommended solution to hide the implementation details for jobs submission. The design of mpi-start was focused in making the MPI job submission as transparent as possible from the cluster details! It was developed inside the Int.EU.Grid project The RPM to be installed in all WNs can be found here Using the mpi-start system requires the user to define a wrapper script that set the environment variables and a set of hooks. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

36 Wrapper script for mpi-start /2
#!/bin/bash # Pull in the arguments. MY_EXECUTABLE=`pwd`/$1 MPI_FLAVOR=$2 # Convert flavor to lowercase for passing to mpi-start. MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'` # Pull out the correct paths for the requested flavor. eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH` # Ensure the prefix is correctly set. Don't rely on the defaults. eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH export I2G_${MPI_FLAVOR}_PREFIX # Touch the executable. #It exist must for the shared file system check. # If it does not, then mpi-start may try to distribute the executable # when it shouldn't. touch $MY_EXECUTABLE # Setup for mpi-start. export I2G_MPI_APPLICATION=$MY_EXECUTABLE export I2G_MPI_APPLICATION_ARGS= export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh # If these are set then you will get more debugging information. export I2G_MPI_START_VERBOSE=1 #export I2G_MPI_START_DEBUG=1 # Invoke mpi-start. $I2G_MPI_START Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

37 MPI and its implementations Wrapper script for mpi-start
Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 39

38 Hooks for mpi-start /1 The user may write a script which is called before and after the MPI executable is run. The pre-hook script can be used, for example, to compile the executable itself or download data; The post-hook script can be used to analyze results or to save the results on the grid. The pre- and post- hooks script may be defined in separate files, but the name of the functions named exactly “pre_run_hook” and “post_run_hook” Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

39 Hooks for mpi-start /2 #!/bin/sh
# This function will be called before the MPI executable is started. # pre_run_hook () { # Compile the program. echo "Compiling ${I2G_MPI_APPLICATION}" # Actually compile the program. cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c" echo $cmd $cmd if [ ! $? -eq 0 ]; then echo "Error compiling program. Exiting..." exit 1 fi # Everything's OK. echo "Successfully compiled ${I2G_MPI_APPLICATION}" return 0 } # This function will be called before the MPI executable is finished. # A typical case for this is to upload the results to a Storage Elem. post_run_hook () { echo "Executing post hook." echo "Finished the post hook." Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

40 MPI and its implementations Wrapper script for mpi-start
Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 42

41 Defining the job and executable /1
Running the MPI job itself is not significantly different from running a standard grid job. JobType = “Normal"; CpuNumber = 2; Executable = "mpi-start-wrapper.sh"; Arguments = "mpi-test MPICH"; StdOutput = "mpi-test.out"; StdError = "mpi-test.err"; InputSandbox = {"mpi-start-wrapper.sh", "mpi-hooks.sh","mpi-test.c"}; OutputSandbox = {"mpi-test.err","mpi-test.out"}; Requirements = Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment) && Member(“MPICH", other.GlueHostApplicationSoftwareRunTimeEnvironment); The JobType must be “Normal” and the attribute CpuNumber must be defined Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

42 Defining the job and executable /2
#include "mpi.h" #include <stdio.h> int main(int argc, char *argv[]) { int numprocs; /* Number of processors */ int procnum; /* Processor number */ /* Initialize MPI */ MPI_Init(&argc, &argv); /* Find this processor number */ MPI_Comm_rank(MPI_COMM_WORLD, &procnum); /* Find the number of processors */ MPI_Comm_size(MPI_COMM_WORLD, &numprocs); printf ("Hello world! from processor %d out of %d\n", procnum, numprocs); /* Shut down MPI */ MPI_Finalize(); return 0; } Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

43 Hook Helpers The shell variable $MPI_START_SHARED_FS can be checked to figure out if the current site has a shared file system or not. The mpi_start_foreach_host shell function can be used to iterate over all the available machines in the current run. do_foreach_node () { # the first parameter $1 contains the hostname } post_run_hook () { ... mpi_start_foreach_host do_foreach_node Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

44 MPI and its implementations Wrapper script for mpi-start
Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 46

45 Running the MPI job is no different from any other grid job.
If the job ran correctly, then the standard output should contain something like the following: -<START PRE-RUN HOOK> […] -<STOP PRE-RUN HOOK> =[START]========================================================= Hello world! from processor 1 out of 2 Hello world! from processor 0 out of 2 =[FINISHED]====================================================== -<START POST-RUN HOOK> -<STOP POST-RUN HOOK> Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

46 MPI and its implementations Wrapper script for mpi-start
Hooks for mpi-start Defining the job and executable Running the MPI job References Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 48

47 MPI-START Documentation [ link ] Site config for MPI [ link ]
References EGEE Mpi guide [ link ] EGEE MPI WG [ link ] MPI-START Documentation [ link ] Site config for MPI [ link ] Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,

48 Thank you for your kind attention !
Any questions ? Algiers, EUMEDGRID-Support/EPIKH School for Application Porting,


Download ppt "Special Topics: MPI jobs"

Similar presentations


Ads by Google