Presentation is loading. Please wait.

Presentation is loading. Please wait.

Testing the EGI-DRIHM TestBed

Similar presentations


Presentation on theme: "Testing the EGI-DRIHM TestBed"— Presentation transcript:

1 Testing the EGI-DRIHM TestBed
D.Cesini

2 Preliminary tests Authentication MPI && MPI-START published
CE: HelloWorld JDL submission SE: Lcg-rep of WRF input data file lcg-rep -v -d SE srm://darkstorm.cnaf.infn.it/drihm.eu/generated/ /file05cf726f f c516d (using LFC=lfc.ipb.ac.rs  lfn:/grid/drihm.eu/cesini/genova.tgz) Repeated twice with certificates released by the two replica VOMS servers MPI && MPI-START published Requirements = (other.GlueCEStateStatus == "Production") && Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment) Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment) Published Total CPUs CPUs GlueCEInfoTotalCPUs in SubCluster Published OS version GlueHostOperatingSystemRelease

3 WRF test WRF (v3.4.1) compiled in SL6 using OPENMPI (v1.6.4) and NETCDF libs (v ) Input data prepared by Antonio Parodi for the Genoa flooding case on 4th Nov 2011 Data available for a run that starts on :00 and ends on :00 Two nested domains, one coarse and one fine integration grid Just one simulated hour run Just the coarse grid used (no nesting) Executable, input data, configuration files (namelist.input) and netcdf libs uploaded in Grid in a tgz file (world-readable) lfn:/grid/drihm.eu/cesini/genova.tgz CPUNumber = 40 (because we have the reference timings obtained at LRZ-LMU by Antonio for 40, 80, 120 processors ) No SMPGranularity required Submitted only if the preliminary tests were OK

4 WRF JDL CPUNumber = 40; #SMPGranularity = 8; Executable = "/usr/bin/mpi-start"; Arguments = "-t openmpi -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH:./netcdf-lib/ -d MPI_START_TEMP_DIR=\"$HOME/\" -vvv ./wrf.exe"; StdOutput = "std.out"; StdError = "std.err"; InputSandbox = {"wrf-prologue.sh"}; OutputSandbox = {"std.err", "std.out", "prologue.log" , "rsl.out.0000" , "rsl.error.0000" }; Prologue = "wrf-prologue.sh"; Requirements = ( (other.GlueCEStateStatus == "Production") && Member("MPI-START", other.GlueHostApplicationSoftwareRunTimeEnvironment) Member("OPENMPI", other.GlueHostApplicationSoftwareRunTimeEnvironment) && (other.GlueCEUniqueID=="cream-02.cnaf.infn.it:8443/cream-pbs-prod-sl6") ) ; RetryCount = 0; ShallowRetryCount = -1; MyProxyServer="myproxy.cnaf.infn.it"; FuzzyRank = true; [$$] cat wrf-prologue.sh export LFC_HOST=lfc.ipb.ac.rs lcg-cp -v srm://darkstorm.cnaf.infn.it/drihm.eu/generated/ /file05cf726f f c516d file:genova.tgz tar -xvzf genova.tgz >> prologue.log 2>&1 cp genova/* .

5 Available Resources (CE)
~]$ lcg-infosites --vo drihm.eu ce # CPU Free Total Jobs Running Waiting ComputingElement ce.ceta-ciemat.es:8443/cream-sge-drihm ce.hpgcc.finki.ukim.mk:8443/cream-pbs-drihm ce64.ipb.ac.rs:8443/cream-pbs-drihm cream-02.cnaf.infn.it:8443/cream-pbs-prod-sl cream-ce01.ariagni.hellasgrid.gr:8443/cream-pbs-drihm cream-ce01.marie.hellasgrid.gr:8443/cream-pbs-drihm cream-ce02.marie.hellasgrid.gr:8443/cream-pbs-drihm cream.afroditi.hellasgrid.gr:8443/cream-pbs-drihm cream.ipb.ac.rs:8443/cream-pbs-drihm cream01.athena.hellasgrid.gr:8443/cream-pbs-drihm cream01.grid.uoi.gr:8443/cream-pbs-drihm cream01.kallisto.hellasgrid.gr:8443/cream-pbs-drihm cream02.athena.hellasgrid.gr:8443/cream-pbs-drihm cream1.grid.cesnet.cz:8443/cream-pbs-drihm cream2.grid.cesnet.cz:8443/cream-pbs-drihm dissel.nikhef.nl:2119/jobmanager-pbs-medium emi-ce01.scope.unina.it:8443/cream-pbs-hpc gazon.nikhef.nl:8443/cream-pbs-flex gazon.nikhef.nl:8443/cream-pbs-medium juk.nikhef.nl:8443/cream-pbs-flex juk.nikhef.nl:8443/cream-pbs-medium klomp.nikhef.nl:8443/cream-pbs-flex klomp.nikhef.nl:8443/cream-pbs-medium snf vm.okeanos.grnet.gr:8443/cream-pbs-drihm GT5 LRZ-LMU not publishing in the IS on 18/07 available resource for DRIHM.eu - I had no time to investigate with the sites

6 Available Resources (SE)
~]$ lcg-infosites --vo drihm.eu se Avail Space(kB) Used Space(kB) Type SE SRM darkstorm.cnaf.infn.it SRM dpm.ipb.ac.rs SRM se.hpgcc.finki.ukim.mk SRM se01.afroditi.hellasgrid.gr SRM se01.ariagni.hellasgrid.gr SRM se01.athena.hellasgrid.gr SRM se01.grid.uoi.gr SRM se01.kallisto.hellasgrid.gr SRM se02.marie.hellasgrid.gr SRM tbn18.nikhef.nl

7

8 Results Authentication: MPI && MPI-START Published
4 sites failed using both VOMSes proxies on CE and SE 1 sites failed for one of the VOMSes proxies, ok with the other one 1 site Ok on CE but failing on SE 9 sites OK 1 CE at NIKHEF is a GRAM5 based CE and AUTH worked fine MPI && MPI-START Published 3 sites do not publish OPENMPI and MPI-START in GlueHostSoftwareRunTimeEnvironment in all the CEs 1 site does not publish in all CEs OPENMPI and MPI-START 10 sites publish both TAGs in all CEs Published Total CPUs 1 site has one CE publishing just 4 CPUs Published OS version 6 sites pubblish SL6.x sites pubblish SL5.x The WRF test could be run in 3 sites that passed all the preliminary tests CESNET (prague_cesnet_lcg2 ), BOLOGNA (igi-bologna) and NAPLES (UNINA-EGEE) But it seems that at CESNET 40 cores cannot be allocated for a single job – submitted using 16 cored

9 40 processors used on every system
Performances Time to simulate 1 second in Domain1 (no nesting) during the first simulated hour 40 processors used on every system (s) IGI-BOLOGNA MPI-40 cores in 2 nodes Ethernet (s) UNINA-EGEE MPI - 40 cores in 8nodes-Infiniband (s) AVG 0.68 1.90 1.98 MIN 0.66 1.72 1.84 MAX 7.3 7.4 8.4 Writing operations 0.30s using 80 processors 0.20s using 120 processors


Download ppt "Testing the EGI-DRIHM TestBed"

Similar presentations


Ads by Google