The eMinerals minigrid and the national grid service: A user’s perspective NGS169 (A. Marmier)
Objectives 1. User Profile 2. Two real resources: eMinerals Minigrid National Grid Service 3. Practical Difficulties 4. Amateurish rambling (discussion/suggestions)
User Profile 1 Atomistic modelling community Chemistry/physics/material science Potentially big users of eScience (CPU intensive, NOT data) VASP, SIESTA, DL_POLY, CASTEP … Want to run parallel codes
User Profile 2 Relative proficiency with Unix, mainframes, etc … Scripting parallel programming Note of caution: Speaker might be biased Want to run parallel codes
eMinerals Virtual Organisation, NERC The eMinerals project brings together simulation scientists, applications developers and computer scientists to develop UK eScience/grid capabilities for molecular simulations of environmental issues Grid prototype: the minigrid
eMinerals: Minigrid 3 clusters of 16 pentiums UCL condor pool Earth Science Cambridge condor pool SRB vaults SRB manager at Daresbury
eMinerals: Minigrid philosophy Globus 2 No Login possible (except one debug/compile cluster) No easy Files transfer (have to use SRB, see later) Feels very ‘gridy’, but not painless Promotes condorG and home wrappers
eMinerals: Minigrid example Universe = globus Globusscheduler = lake.bath.ac.uk/jobmanager-pbs Executable = /home/arnaud/bin/vasp-lam-intel Notification = NEVER transfer_executable = true Environment = LAMRSH=ssh -x GlobusRSL = (job_type=mpi)(queue=workq)(count=4)(mpi_type=lam-intel) Sdir = /home/amr.eminerals/run/TST.VASP3 Sget = INCAR,POTCAR,POSCAR,KPOINTS Sget = OUTCAR,CONTCAR SRBHome = /home/srbusr/SRB3_3_1/utilities/bin log = vasp.log error = vasp.err output = vasp.out Queue My_condor_submit script example
NGS: What ? VERY NICE PEOPLE who offer access to LOVELY clusters Real GRID approximation
NGS: Resources “Data” Clusters: 20 compute nodes with dual Intel Xeon 3.06 GHz CPUs, 4 GB RAM grid-data.rl.ac.uk - RAL grid-data.man.ac.uk – Manchester “Compute” Clusters: 64 compute nodes with dual Intel Xeon 3.06 GHz CPUs, 2 GB RAM grid-compute.leeds.ac.uk - WRG Leeds grid-compute.oesc.ox.ac.uk – Oxford Plus Other nodes : HPCx, Cardiff, Bristol …
NGS: Setup Grid-proxy-init Gsi-ssh … Then, a “normal” machine Permanent fixed account (NGS169) unix queuing system With gsi-ftp for file transfer
NGS: example globus-job-run grid-compute.oesc.ox.ac.uk/jobmanager-fork /bin/ls globusrun -b grid-compute.oesc.ox.ac.uk/jobmanager-pbs example1.rsl [EXAMPLE1.RSL: & (executable=DLPOLY.Y) (jobType=mpi) (count=4) (environment=(NGSMODULES intel- math:gm:dl_poly))
Interlude
Difficulty 1: access Well known problem - Certificate - Globus enabled machine - SRB account (2.0)
Difficulty 2: Usability How do I submit a job ? Directly (gsi-ssh…) Remotely (globus,condorG) Direct Login, checkq, submit, (kill), logout Different Batch Queuing Systems (PBS, condor,LoadLeveler …)
Usability 2 Usually requires a “script” Almost nobody writes their own scripts Works by inheritance and adaptation At the moment eScience forces the user to learn the syntax of the B.Q.S.
Usability 3 Remote [EXAMPLE1.RSL: & (executable=DLPOLY.Y) (jobType=mpi) (count=4) (environment=(NGSMODULES intel-math:gm:dl_poly)) Ignores file transfer Ignores more complex submit structures
Usability 4 Ignores more complex submit structures abinit <inp.txt Cpmd.x MgO.inp => User has to learn globus syntax :o/ (environment and RSL)
Finally At the moment no real incentives to submit remotely Mechanism to reward the early adopters Access to special queues Longer walltime ? More cpus ?
CONCLUSION Submission scripts are very important and useful pieces of information Easily accessible examples would save a lot of time Mechanism to encourage remote submission (access to better queues)