Presentation is loading. Please wait.

Presentation is loading. Please wait.

FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa

Similar presentations

Presentation on theme: "FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa"— Presentation transcript:

1 FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa PRIMO TUTORIAL GRID PER L’UNIVERSITA’ DI PALERMO 11 Dicembre 2007

2 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 2 Outline Overview Requirements & Settings How to create a MPI job How to submit a MPI job to the Grid

3 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 3 Currently parallel applications use “special” HW/SW Parallel application are “normal” on a Grid Many are trivially parallelizable Grid middleware offers several parallel jobs (DAG, collection) A common solution for non – trivial parallelism is: Message Passing Interface (MPI) – based on send() and receive() primitives – a “master” node starts some processes “slaves” by establishing SSH sessions – all processes can share a common workspace and/or exchange data Overview

4 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 4 Several MPI implementations but only two of them are currently supported by the Grid Middleware: – MPICH – MPICH2 Both “old” GigaBit Ethernet and “new” low-latency InfiniBand nets are supported – Cometa infrastructure will run MPI jobs on either GigaBit (MPICH, MPICH2) or InfiniBand (MVAPICH, MVAPICH2) Currently, MPI parallel jobs can run inside a single Computing Elements (CE) only – several projects are involved into studies concerning the possibility of executing parallel jobs on Worker Nodes (WNs) belonging to different CEs MPI & Grid

5 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 5 From the user’s point of view, MPI jobs are specified by setting the JDL JobType attribute to MPICH, MPICH2, MVAPICH, MVAPICH2 specifying the NodeNumber attribute as well JobType = “MPICH”; NodeNumber = 2; This attribute defines the required number of CPU cores (PEs) Matchmaking: the Resource Broker (RB) chooses a CE (if any!) with enough free Processing Elements (PE = CPU cores) e.g.: free PE# ≥ NodeNumber (otherwise “wait!”) JDL (1/3)

6 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 6 When these two attributes are included in a JDL script the following expression is automatically added: (other.GlueCEInfoTotalCPUs >= NodeNumber) && Member (“MPICH”,other.GlueHostApplicationSoftwareRunTimeEnvironment) to the JDL requirements expression in order to find out the best resource where the job can be executed JDL (2/3)

7 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 7 Executable specifies the MPI executable NodeNumber specifies the number of cores Arguments specifies the WN command line – Executable + Arguments form the command line on the WN is a special script file that is sourced before launching MPI executable – warning: it runs only on the master node actual mpirun command is issued by the middleware (… what if a proprietary script/bin?) is a special script file that is sourced after MPI executable termination – warning: it runs only on the master node JDL (3/3)

8 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 8 In order to assure that a MPI job can run, the following requirements MUST BE satisfied: MPICH/MPICH2/MVAPICH/MVAPICH2 – the MPICH/MPICH2/MVAPICH/MVAPICH2 software must be installed and placed in the PATH environment variable, on all the WNs of the CE – some MPI applications require a file system shared among the WNs:  no shared area currently available to write user data  application may access the area of the master node (requires modifications to the application)  middleware solutions are also possible (as soon as required/designed/tested/deployed) Requirements (1/2)

9 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 9 Requirements (2/2) Job wrapper copies all the files indicated in the InputSandbox on ALL of the “slave” nodes host based ssh authentication MUST BE well configured between all the WNs If some environment variables are needed ONLY on the “master” node, they can be set by the If some environment variables are needed ON ALL THE NODES, a static installation is currently required (middleware extension is under consideration)

10 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 10 [ Type = "Job"; JobType = "MPICH"; Executable = “MPIparallel_exec”; NodeNumber = 2; Arguments = “arg1 arg2 arg3"; StdOutput = "test.out"; StdError = "test.err"; InputSandbox = {“”,“”, “MPIparallel_exec”}; OutputSandbox = {“test.err”, “test.out”, “executable.out”}; Requirements = other.GlueCEInfoLRMSType == "PBS" || other.GlueCEInfoLRMSType == "LSF"; ] mpi.jdl Local Resource Manager (LRMS) = PBS/LSF only Local Resource Manager (LRMS) = PBS/LSF only Pre e Post Processing Scripts Executable

11 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 11 GigaBit vs InfiniBand The advantage of using a low – latency network becomes more evident the greater the number of nodes

12 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 12 CPI Test (1/4) [marcello@infn-ui-01 mpi-0.13]$ edg-job-submit mpi.jdl Selected Virtual Organisation name (from proxy certificate extension): cometa Connecting to host, port 7772 Logging to host, port 9002 ********************************************************************************************* JOB SUBMIT OUTCOME The job has been successfully submitted to the Network Server. Use edg-job-status command to check job current status. Your job identifier (edg_jobId) is: - *********************************************************************************************

13 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 13 CPI Test (2/4) [marcello@infn-ui-01 mpi-0.13]$ edg-job-status https://infn-rb- ************************************************************* BOOKKEEPING INFORMATION: Status info for the Job : https://infn-rb- Current Status: Done (Success) Exit code: 0 Status Reason: Job terminated successfully Destination: reached on: Sun Jul 1 15:08:11 2007 *************************************************************

14 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 14 CPI Test (3/4) [marcello@infn-ui-01 mpi-0.13]$ edg-job-get-output --dir /home/marcello/JobOutput/ https://infn-rb- Retrieving files from host: ( for https://infn-rb- ) ********************************************************************************* JOB GET OUTPUT OUTCOME Output sandbox files for the job: - have been successfully retrieved and stored in the directory: /home/marcello/JobOutput/marcello_vYGU1UUfRnSktGODcwEjMw *********************************************************************************

15 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 15 CPI Test (4/4) [marcello@infn-ui-01 mpi-0.13]$ cat /home/marcello/JobOutput/marcello_vYGU1UUfRnSktGODcwEjMw/test.out preprocessing script ------------------------- Process 0 of 4 on pi is approximately 3.1415926544231239, Error is 0.0000000008333307 wall clock time = 10.002570 Process 1 of 4 on Process 3 of 4 on Process 2 of 4 on TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME ==== ========== ================ ======================= =================== 0001 infn-wn-01 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 0002 infn-wn-01 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 0003 infn-wn-02 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 0004 infn-wn-02 /opt/lsf/6.1/lin Done 07/01/2007 17:04:23 P4 procgroup file is /home/cometa005/.lsf_6826_genmpi_pifile. postprocessing script temporary [marcello@infn-ui-01 mpi-0.13]$

16 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 16 MPI on the web..

17 Palermo, Primo Tutorial Grid per l’Università di Palermo, 11 Dicembre 2007 17 Questions…

Download ppt "FESR Consorzio COMETA - Progetto PI2S2 Using MPI to run parallel jobs on the Grid Marcello Iacono Manno Consorzio Cometa"

Similar presentations

Ads by Google