Developments in Batch and the Grid Hepix Autumn Meeting 2005 Tim Bell CERN/IT/FIO
Plugins Community Support Problem Grid batch plugins very basic Mappings incomplete or inaccurate Testing difficult for Grid Developers Solution Develop community support structure LRMS Administrators support the code LCG CVS/Savannah are available to assist 13th October 2005 Developments in Batch Tim.Bell@cern.ch
Developments in Batch Tim.Bell@cern.ch HEPiX Batch Web Pages Started using HEPiX Spring input Hosted at Caspur http://hepix.caspur.it/afs/hepix.org/project/batch/index.html Pages Sites specific information Batch system Contacts Overview presentation Product information Batch provides web sites Grid Developments and Requirements What is ongoing ? What is needed ? Please check that the information is complete and correct. Send changes to me. Check for any new grid requirements in the batch area 13th October 2005 Developments in Batch Tim.Bell@cern.ch
HEPiX Batch Systems Site 13th October 2005 Developments in Batch Tim.Bell@cern.ch
Developments in Batch Tim.Bell@cern.ch GLUE 1.2 Improvements Slots rather than CPUs Per-VO views Response Times Free Slots Queue State Open/Draining/Closed Sub-Cluster concept introduced but no link with batch section of schema 13th October 2005 Developments in Batch Tim.Bell@cern.ch
Developments in Batch Tim.Bell@cern.ch GLUE – Improved ERT Old calculation was based on number of waiting jobs and wall clock time for the grid queues This did not consider Group priorities Most jobs finish early Result was that big sites became unattractive very quickly New calculation based on the waiting time of jobs in the queue Take average for those jobs of your VO If free slots and no waiting jobs for the VO, ERT is immediate. 13th October 2005 Developments in Batch Tim.Bell@cern.ch
GLUE – CERN ERT - Results #1 13th October 2005 Developments in Batch Tim.Bell@cern.ch
GLUE – CERN ERT - Results #2 13th October 2005 Developments in Batch Tim.Bell@cern.ch
GLUE – ERT Implementation Implemented by Jeff Templon, NIKHEF with input from Laurence Field, CERN Common front end program with text input with list of jobs, VOs, submit time, start time, etc. Backends developed for LSF and PBS PBS RPM available from Jeff LSF RPM under tests at CERN Volunteers for other batch systems ? Need to implement soon so that NIKHEF does not get sent all the grid jobs 13th October 2005 Developments in Batch Tim.Bell@cern.ch
Developments in Batch Tim.Bell@cern.ch GLUE V2 Requirements GLUE V2 starting in November Requirements to be submitted Slot information for sub-clusters (e.g. how many free slots on machines with more than 4GB RAM running SLC3) CPU units based on Spec benchmarks Contact Laurence.Field@cern.ch 13th October 2005 Developments in Batch Tim.Bell@cern.ch
Developments in Batch Tim.Bell@cern.ch Grid Job Submission Resource Broker/Compute Element/BLAHP chain is losing information Local batch job has no parameters from user JDL Requirements submitted Job Name, CPU Time, Wall Clock Time Total RAM, Swap space, Temporary disk space Specific operating system, Speed of processor Target is to be able to move away from per-VO job queues Send requirements to Maarten.Litmaath@cern.ch 13th October 2005 Developments in Batch Tim.Bell@cern.ch
Developments in Batch Tim.Bell@cern.ch Summary There has been improvement since Spring We need to keep emphasising the issues and participating in the solutions We need to take ownership where we can contribute 13th October 2005 Developments in Batch Tim.Bell@cern.ch