MPI support in gLite Enol Fernández CSIC
EMI INFSO-RI CREAM/WMS MPI-Start MPI on the Grid Submission/Allocation – Definition of job characteristics – Search and select adequate resources – Allocate (or coallocate) resources for the job Execution – File distribution – Batch system interaction – MPI implementation details
EMI INFSO-RI Allocation / Submission Type = "Job"; CPUNumber = 23; Executable = "my_app"; Arguments = "-n 356 -p 4"; StdOutput = "std.out"; StdError = "std.err"; InputSandBox = {"my_app"}; OutputSandBox = {"std.out", "std.err"}; Requirements = Member("OPENMPI”, other.GlueHostApplicationSoftwareRunTimeEnvironment); Process count specified with the CPUNumber attribute
EMI INFSO-RI MPI-Start Specify a unique interface to the upper layer to run a MPI job Allow the support of new MPI implementations without modifications in the Grid middleware Support of “simple” file distribution Provide some support for the user to help manage his data Grid Middleware MPI-START Resources MPI
EMI INFSO-RI MPI-Start Design Goals Portable – The program must be able to run under any supported operating system Modular and extensible architecture – Plugin/Component architecture Relocatable – Must be independent of absolute path, to adapt to different site configurations – Remote “injection” of mpi-start along with the job “Remote” debugging features
EMI INFSO-RI MPI-Start Architecture CORE Execution Open MPI MPICH2 LAM PACX Scheduler PBS/Torque SGE LSF Hooks Local User Compiler File Dist.
EMI INFSO-RI Using MPI-Start (I) $ cat starter.sh #!/bin/sh # This is a script to call mpi-start # Set environment variables needed export I2G_MPI_APPLICATION=/bin/hostname export I2G_MPI_APPLICATION_ARGS= export I2G_MPI_TYPE=openmpi export I2G_MPI_PRECOMMAND=time # Execute mpi-start $I2G_MPI_START stdout: Scientific Linux CERN SLC release 4.5 (Beryllium) lflip30.lip.pt lflip31.lip.pt stderr: real 0m0.731s user 0m0.021s sys 0m0.013s JobType = "Normal"; CpuNumber = 4; Executable = "starter.sh"; InputSandbox = {"starter.sh”} StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = {"std.out","std.err"}; Requirements = Member("MPI-START”, other.GlueHostApplicationSoftwareRunTimeEnvironment) && Member("OPENMPI”, other.GlueHostApplicationSoftwareRunTimeEnvironment);
EMI INFSO-RI Using MPI-Start (II) … CpuNumber = 4; Executable = ”mpi-start-wrapper.sh"; Arguments = “userapp OPENMPI some app args…” InputSandbox = {”mpi-start-wrapper.sh”}; Environment = {“I2G_MPI_START_VERBOSE=1”, …}... #!/bin/bash MY_EXECUTABLE=$1 shift MPI_FLAVOR=$1 shift export I2G_MPI_APPLICATION_ARGS=$* # Convert flavor to lowercase for passing to mpi-start. MPI_FLAVOR_LOWER=`echo $MPI_FLAVOR | tr '[:upper:]' '[:lower:]'` # Pull out the correct paths for the requested flavor. eval MPI_PATH=`printenv MPI_${MPI_FLAVOR}_PATH` # Ensure the prefix is correctly set. Don't rely on the defaults. eval I2G_${MPI_FLAVOR}_PREFIX=$MPI_PATH export I2G_${MPI_FLAVOR}_PREFIX # Setup for mpi-start. export I2G_MPI_APPLICATION=$MY_EXECUTABLE export I2G_MPI_TYPE=$MPI_FLAVOR_LOWER # Invoke mpi-start. $I2G_MPI_START
EMI INFSO-RI MPI-Start Hooks (I) File Distribution Methods – Copy files needed for execution using the most appropriate method (shared filesystem, scp, mpiexec, …) Compiler flag checking – checks correctness of compiler flags for 32/64 bits, changes them accordingly User hooks: – build applications – data staging
EMI INFSO-RI MPI-Start Hooks (II) #!/bin/sh pre_run_hook () { # Compile the program. echo "Compiling ${I2G_MPI_APPLICATION}" # Actually compile the program. cmd="mpicc ${MPI_MPICC_OPTS} -o ${I2G_MPI_APPLICATION} ${I2G_MPI_APPLICATION}.c" $cmd if [ ! $? -eq 0 ]; then echo "Error compiling program. Exiting..." exit 1 fi # Everything's OK. echo "Successfully compiled ${I2G_MPI_APPLICATION}" return 0 } … InputSandbox = {…, “myhooks.sh”…}; Environment = {…, “I2G_MPI_PRE_HOOK=myhooks.sh”}; …
EMI INFSO-RI MPI-Start: more features Remote injection – Mpi-start can be sent along with the job Just unpack, set environment and go! Interactivity – A pre-command can be used to “control” the mpirun call – $I2G_MPI_PRECOMMAND mpirun …. – This command can: Redirect I/O Redirect network traffic Perform accounting Debugging – 3 different debugging levels: VERBOSE: basic information DEBUG: internal flow information TRACE: set –x at the beginning. Full trace of the execution
EMI INFSO-RI Future work (I) New JDL description for parallel jobs (proposed by the EGEE MPI TF): – WholeNodes (True/False): whether or not full nodes should be reserved – NodeNumber (default = 1): number of nodes requested – SMPGranularity (default = 1): minimum number of cores per node – CPUNumber (default = 1): number of job slots (processes/cores) to use CREAM team working on how to support them
EMI INFSO-RI Future work (II) Management of non MPI jobs – new execution environments (OpenMP) – generic parallel job support Support for new schedulers – Condor and SLURM support Explore support for new architectures: – FPGAs, GPUs,…
EMI INFSO-RI More Info… gLite MPI PT: – teMPI teMPI MPI-Start trac – – contains user, admin and developer docs MPI TCD –
EMI INFSO-RI MPI-Start Execution Flow Do we have a scheduler plugin for the current environment? Trigger pre-run hooks Ask Scheduler plugin for a machinefile in default format Activate MPI Plugin Start mpirun Do we have a plugin for the selected MPI? Prepare mpirun Trigger post-run hooks START EXITDump Env NO Scheduler Plugin Execution Plugin Hooks Plugins