Presentation is loading. Please wait.

Presentation is loading. Please wait.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Special Topics: MPI jobs Maha Dessokey (

Similar presentations


Presentation on theme: "The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Special Topics: MPI jobs Maha Dessokey ("— Presentation transcript:

1 www.epikh.eu The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Special Topics: MPI jobs Maha Dessokey ( m_dessoky@eri.sci.eg )m_dessoky@eri.sci.eg Electronic Research Institute Joint EPIKH/EUMEDGRID-Support Event in Cairo Cairo, 25.10.2010

2 MPI and its implementations Wrapper script for mpi-start Hooks for mpi-start Defining the job and executable Running the MPI job References Table of Contents Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

3 3 Some Basic Concepts The Message Passing Interface (MPI) is a standard for writing parallel application. An MPI process consists of a C/C++ or Fortran 77 program which communicates with other MPI processes by calling MPI routines All names of MPI routines and constants in both C and Fortran begin with the prefix MPI_ to avoid name collisions. – Fortran routine names are all upper case but C routine names are mixed case.  In general, C MPI routines return an int and Fortran MPI routines have an IERROR argument. Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

4 4 Basic Structures of MPI Programs Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

5 All sub-programs that contains calls to MPI subroutine MUST include the MPI HEADER file The header file contains definitions of MPI constants, MPI types and functions 5 Fortran: include ‘mpi.h’ Fortran: include ‘mpi.h’ C: #include C: #include Header files Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

6 6 Initializing MPI The first MPI routine called in any MPI program must be the initialisation routine MPI_INIT. – Every MPI program must call this routine once, before any other MPI routines. The C version of the routine accepts argc and argv as arguments : int MPI_Init(int &argc, char &argv); The Fortran version takes no arguments other than the error code: MPI_INIT(IERROR) Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

7 7 MPI Communicator The Communicator is a variable identifying a group of processes that are allowed to communicate with each other – There is a default communicator MPI_COMM_WORLD which identify the group of all the processes.  The processes are ordered and numbered consecutively from 0 (in both Fortran and C), the number of each process being known as its rank  The rank identifies each process within the communicator. Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

8 8 Fortran: CALL MPI_XXX (parameter, IERROR) Fortran: CALL MPI_XXX (parameter, IERROR) C: Error = MPI_XXX (parameter, …); C: Error = MPI_XXX (parameter, …); MPI Functions format Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

9 How many processes are associated with a communicator ? Output SIZE 9 Fortran : CALL MPI_COMM_SIZE (COMM,SIZE,IERR) Fortran : CALL MPI_COMM_SIZE (COMM,SIZE,IERR) C : MPI_Comm_size (MPI_Comm comm, int *SIZE); C : MPI_Comm_size (MPI_Comm comm, int *SIZE); Communicator Size Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

10 10 Fortran: CALL MPI_COMM_RANK (COMM, RANK, IERR) Fortran: CALL MPI_COMM_RANK (COMM, RANK, IERR) C: MPI_Comm_rank (MPI_Comm comm, int *RANK); C: MPI_Comm_rank (MPI_Comm comm, int *RANK); Process Rank What is the ID of a process in a group ? Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

11 11 Fortran: CALL MPI_FINALIZE (IERR) Fortran: CALL MPI_FINALIZE (IERR) Finalizing MPI An MPI program should call the MPI routine MPI_FINALIZE when all communications have completed. This routine cleans up all MPI data-structures, etc. Once this routine has been called, no other calls can be made to MPI routines Finalizing the MPI environment Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010 C: int MPI_Finalize (); C: int MPI_Finalize ();

12 12 Point-to-point & collective communication A point-to-point communication always involves exactly two processes. One process sends a MESSAGE to the other. This distinguishes it from the collective communication, which involves a whole group of process at one time. Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

13 13 Blocking Communication Non-Blocking Communication Slow and simpleFast, complex and insecure Between the initiation and the completion the program could do some useful computation (latency hiding) The programmer has to insert code to check for completion Blocking and Non-Blocking Communication Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

14 The format of the standard blocking receive is: Where: – buf is the address where the data should be placed once received (the receive buffer) – count is the number of elements which buf can contain. – datatype is the MPI datatype for the message – source is the rank of the source of the message in the group associated with the communicator comm. – tag is used by the receiving process to specify the message the receiver is waiting for. – comm is the communicator – status contains the status of the receiving process 14 MPI_Send(buffer,count,type,dest,tag,comm) The standard blocking send/receive Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010 MPI_RECV (buf, count, datatype, source, tag, comm, status)

15 15 The standard Non-Blocking send/receive The non-blocking routines have identical arguments to their blocking counterparts except for an extra argument in the non- blocking routines. – This argument, request, is very important as it provides a handle which is used to test when the communication has completed Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010 MPI_Isend(buffer,count,type,dest,tag,comm,request) MPI_Irecv(buffer,count,type,source,tag,comm,request )

16 16 C: MPI_WAIT (MPI_Request *req, MPI_Status *status); C: MPI_WAIT (MPI_Request *req, MPI_Status *status); Waiting and Testing for Completion /1 A call to MPI_WAIT subroutine cause the code to wait until the communication pointed by req is completed. Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010 Fortran : MPI_WAIT (req, status, ierr) Fortran : MPI_WAIT (req, status, ierr)

17 17 A call to MPI_TEST subroutine sets flag to true if the communication pointed by req has completed, set flag to false otherwise. Waiting and Testing for Completion /2 Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010 C: MPI_TEST (MPI_Request *req, int *flag, MPI_Status *status); C: MPI_TEST (MPI_Request *req, int *flag, MPI_Status *status); Fortran : MPI_TEST (req, flag, status, ierr) Fortran : MPI_TEST (req, flag, status, ierr)

18 18 Fortran – MPI Data types Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

19 19 C - MPI Data types Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

20 20 What distinguishes collective communication from point-to-point communication is that it always involves every process in the specified communicator. To perform a collective communication on a subset of the processes in a communicator, a new Communicator has to be created Collective Communication Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

21 21 The following table shows 16 MPI collective communication subroutines that are divided into four categories: The subroutine printed in boldface are used most frequently. All the MPI collective communication subroutine are blocking. Collective Communication Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

22 22 The subroutine MPI_BCAST broadcasts the message from a specific process called root to all the other processes identified by a communicator given as input MPI_BCAST Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

23 23 The subroutine MPI_GATHER transmits data from all the processes in the communicator to a single receiving process. MPI_GATHER Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

24 MPI_ALLGATHER The subroutine MPI_ALLGATHER Concatenate the data to all processes in the communicator. Each process in the group, in effect, performs a one-to-all broadcasting operation 24 Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

25 25 The subroutine MPI_REDUCE does reduction operations, such as summation of data distributed over processes, and brings the result to the root process MPI_REDUCE Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

26 MPI_Allreduce Applies a reduction operation and places the result in all tasks in the group. This is equivalent to an MPI_Reduce followed by an MPI_Bcast 26 Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

27 MPI Reduction Operation 27 Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

28 Barrier Synchronization A call to MPI_BARRIER subroutine blocks the caller until all group members have called it. The call returns at any process only after all group members have entered the call. Algiers, EUMEDGRID-Support/EPIKH School for Application Porting, 04-15.07.2010 28 C: MPI_Barrier (comm) C: MPI_Barrier (comm) Fortran : MPI_BARRIER (comm,ierr) Fortran : MPI_BARRIER (comm,ierr)

29 29 Wrapper script for mpi-start /1 mpi-start is a recommended solution to hide the implementation details for jobs submission. – The design of mpi-start was focused in making the MPI job submission as transparent as possible from the cluster details! – It was developed inside the Int.EU.Grid projectInt.EU.Grid Using the mpi-start system requires the user to define a wrapper script that set the environment variables and a set of hooks. Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

30 30 Wrapper script for mpi-start /2 #!/bin/bash # Pull in the arguments. MY_EXECUTABLE=`pwd`/$1 MPI_FLAVOR=$2 # Convert flavor to lowercase for passing to mpi-start. MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'` # Pull out the correct paths for the requested flavor. eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH` # Ensure the prefix is correctly set. Don't rely on the defaults. eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH export I2G_${MPI_FLAVOR}_PREFIX # Touch the executable. #It exist must for the shared file system check. # If it does not, then mpi-start may try to distribute the executable # when it shouldn't. touch $MY_EXECUTABLE # Setup for mpi-start. export I2G_MPI_APPLICATION=$MY_EXECUTABLE export I2G_MPI_APPLICATION_ARGS= export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER export I2G_MPI_PRE_RUN_HOOK=mpi-hooks.sh export I2G_MPI_POST_RUN_HOOK=mpi-hooks.sh # If these are set then you will get more debugging information. export I2G_MPI_START_VERBOSE=1 #export I2G_MPI_START_DEBUG=1 # Invoke mpi-start. $I2G_MPI_START Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

31 31 Hooks for mpi-start /1 The user may write a script which is called before and after the MPI executable is run. The pre-hook script can be used, for example, to compile the executable itself or download data; The post-hook script can be used to analyze results or to save the results on the grid. The pre- and post- hooks script may be defined in separate files, but the name of the functions named exactly “pre_run_hook” and “post_run_hook” Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

32 32 Hooks for mpi-start /2 #!/bin/sh # This function will be called before the MPI executable is started. # pre_run_hook () { # Compile the program. echo "Compiling ${I2G_MPI_APPLICATION}" # Actually compile the program. cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c" echo $cmd $cmd if [ ! $? -eq 0 ]; then echo "Error compiling program. Exiting..." exit 1 fi # Everything's OK. echo "Successfully compiled ${I2G_MPI_APPLICATION}" return 0 } # This function will be called before the MPI executable is finished. # A typical case for this is to upload the results to a Storage Elem. post_run_hook () { echo "Executing post hook." echo "Finished the post hook." return 0 } Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

33 33 Defining the job and executable /1 Running the MPI job itself is not significantly different from running a standard grid job. JobType = “Normal"; CpuNumber = 2; Executable = "mpi-start-wrapper.sh"; Arguments = "mpi-test MPICH"; StdOutput = "mpi-test.out"; StdError = "mpi-test.err"; InputSandbox = {"mpi-start-wrapper.sh", "mpi-hooks.sh","mpi-test.c"}; OutputSandbox = {"mpi-test.err","mpi-test.out"}; Requirements = Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment) && Member(“MPICH", other.GlueHostApplicationSoftwareRunTimeEnvironment); The JobType must be “Normal” and the attribute CpuNumber must be defined Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

34 34 Defining the job and executable /2 #include "mpi.h" #include int main(int argc, char *argv[]) { int numprocs; /* Number of processors */ int procnum; /* Processor number */ /* Initialize MPI */ MPI_Init(&argc, &argv); /* Find this processor number */ MPI_Comm_rank(MPI_COMM_WORLD, &procnum); /* Find the number of processors */ MPI_Comm_size(MPI_COMM_WORLD, &numprocs); printf ("Hello world! from processor %d out of %d\n", procnum, numprocs); /* Shut down MPI */ MPI_Finalize(); return 0; } Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

35 35 Running the MPI job Running the MPI job is no different from any other grid job. If the job ran correctly, then the standard output should contain something like the following: - ------------------------------------------- […] - --------------------------------------------- =[START]========================================================= Hello world! from processor 1 out of 2 Hello world! from processor 0 out of 2 =[FINISHED]====================================================== - ------------------------------------------- […] - ------------------------------------------- Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

36 36 References MPI-START Documentation http://www.hlrs.de/organization/av/amt/researc h/mpi-start/mpi-start-documentation/ MPI Guide https://computing.llnl.gov/tutorials/mpi/ https://computing.llnl.gov/tutorials/mpi/ Practical Exercise colud be found on the agenda site Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010

37 37 Questions ? Cairo, Joint EPiKH/EUMEDGRID-Support Event in Cairo, 25.10.2010


Download ppt "The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Special Topics: MPI jobs Maha Dessokey ("

Similar presentations


Ads by Google