MPI CUSTOMIZATION IN ROMA3 SITE Antonio Budano Federico Bitelli
MPI in ROMA3 Our CE is a Cream CE and is used also to manage local queue (job submitted with pbs –q ) Worker nodes are essentially of two types 16 blades of 8 cores on a HP system 8 blades of 16+ cores on SuperMicro system equipped with Infiniband Pbs nodefile so is composed of lines similar to wn cluster.roma3 np=8 lcgpro wn cluster.roma3 np=16 lcgpro infiniband Goal : Each mpi job must go to Infiniband nodes Local MPI Jobs shoulds exatcly meets users requirement (eg #PBS -l nodes=3:ppn=6) Publish in grid : MPI-Infiniband MPI-START MPICH2 MPICH2-1.6 OPENMPI OPENMPI MPICH1
Local Jobs We had the problem that maui/pbs did not meet the pbs jobs requirement When users asked (#PBS -l nodes=3:ppn=6) System gave him just maximum avalaible slot on a single WN (so 16 in our case) We fixed this upgrading Maui to maui and pbs to version (on worker nodes too!!) We made the upgrades configuring and compiling both from tar.gz files
Grid Jobs We just configured Torque Client in the CE (used to submit grid jobs) to use submit filter ~]# cat /var/spool/torque/torque.cfg SUBMITFILTER /var/spool/pbs/submit_filter
MPI INFINIBAND To route MPI Jobs to WorkerNodes with Infiniband We edited (on CE) /opt/glite/bin/pbs_submit.sh and we added the line [ -z "$bls_opt_mpinodes" ] || echo "#PBS -q mpi_ib" >> $bls_tmp_file That line routes MPI jobs to the queue mpi_ib And we told torque that each job in this queue must go to Infiniband Nodes set queue mpi_ib resources_default.neednodes = infiniband
MPISTART PROBLEM we wanted to use and publish our version of MPICH2 (compiled for Infiniband) MPI-START MPICH2 MPICH2-1.6 To do that official manual says you should edit (on the WNs ) the files /etc/profile.d/mpi_grid_vars.sh (& /etc/profile.d/mpi_grid_vars.csh ) and add export MPI_MPICH2_MPIEXEC=/usr/mpi/gcc/mvapich2-1.6/bin/mpiexec export MPI_MPICH2_PATH=/usr/mpi/gcc/mvapich2-1.6/ export MPI_MPICH2_VERSION=1.6 and similar in /etc/profile.d/grid-env.sh But jobs could not start After some days of troubleshoting we saw that the problem was in i2g-mpi-start package In particular the file /opt/i2g/etc/mpi-start/mpich2.mpi In this files there are some bugs The corrected version will be as soon as possible in WIKI pages
BDII Configuration Remember to publish information into the Bdii So on CE edit /opt/glite/etc/gip/ldif/static-file-Cluster.ldif and add properly GlueHostApplicationSoftwareRunTimeEnvironment: MPI-START GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2 GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2-1.6 GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: MPI-Infiniband Then /etc/init.d/bdii restart
MPI STATUS bin]# ldapsearch -xLLL -h egee-bdii.cnaf.infn.it:2170 -b o=grid '(&(objectClass=GlueHostApplicationSoftware)(GlueSubClusterUniqueID='ce-02.roma3.infn.it'))' GlueHostApplicationSoftwareRunTimeEnvironment |grep MPI GlueHostApplicationSoftwareRunTimeEnvironment: MPI-Infiniband GlueHostApplicationSoftwareRunTimeEnvironment: MPI-START GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2 GlueHostApplicationSoftwareRunTimeEnvironment: MPICH2-1.6 GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI GlueHostApplicationSoftwareRunTimeEnvironment: OPENMPI-1.4.3