Download presentation
Presentation is loading. Please wait.
Published byBritton Shepherd Modified over 9 years ago
1
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting November 10, 2009
2
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Contents Section overview VOBox project ALICE activities ATLAS activities CMS activities LHCb activities Other activities Conclusions
3
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Section overview Section members –ALICE: Patricia, Lola –ATLAS: Alessandro –CMS: Andrea, Nicolò –LHCb: Roberto, Harry (as IT-LHCb management liason) Ongoing projects –VO-box project –Migration of SAM tests to Nagios Other activities –See next slides
4
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t VO box project: overview Objective –enhance reliability of experiment applications running in the computer centre –Coordinated by Patricia Related activities –Audit of VO boxes what services are run, how critical they are –Ensure existence of operator procedures for critical VO boxes in case of failure. Simplest case: alert the experiment –Expand usage of Lemon and SLS. Sensors can probe the application environment or use meta-information out of SLS. –Where possible automate recovery of a failed application and/or provide simple instructions/tools to the operators
5
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t VO box project: status ALICE: first full prototype with Lemon sensors and operator procedures ready ATLAS: Flavia has coordinated an audit of ATLAS VO boxes and presented plans for increasing reliability and security. The emphasis was on computer security but contains many elements for enhanced reliability. CMS was also contacted CMS: service criticality reviewed. Goal is to provide PhEDEx and DBS with procedures and Lemon alarms by start of data taking LHCb: Jiri Horky (a student) and Roberto did the setup of the basic Lemon infrastructure for some DIRAC sensors. LHCb currently collecting the list of all needed Lemon sensors Andrea has triggered the collection of required metrics for all experiments, to avoid duplication of metrics and efforts
6
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t ALICE support Assist in migration to SL5 of worker nodes and VO boxes –Full scale deployment of SL5 / gLite3.2 VO boxes expected before Christmas Setup of CREAM CE for ALICE at all sites –recently recommended by the MB for all Tier-1/2 sites Preparation for a grid-wide MyProxy service for ALICE –launched a survey of the 102 sites concerned to clean up obsolete registered hosts
7
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t ATLAS support Completed a web interface to display the disk quota and space used by individual ATLAS users and ATLAS sub-groups in the CASTOR analysis stager spaces (Lola) –Data extracted from Lemon (Maarten) –Feedback from Guido Negri, ATLAS “space manager” –To do: web interface to manipulate quota limits for individuals, subgroups and within subgroups Significant contribution to computing operations –Helped debugging FTS 2.2 –Load on operations increased with data taking
8
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t CMS support New Lemon metric for PhEDEx checking for errors in log file is in production. Next steps: –Trigger a Lemon alarm –Define a corresponding procedure On DBS, enabled a Lemon alarm with automatic restart when Tomcat is down Site readiness –Ongoing campaign to fix all known bugs with Dashboard developers Starting to test FTS 2.2 Our script to prestage via SRM adopted by data operations team
9
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t LHCb support Using SLS to monitor free space in SRM space tokens –Information visible in Site Status Board and DIRAC portal Some SAM tests successfully migrated to Nagios Testing submission to CREAM CE in DIRAC via gLite WMS; investigating direct submission to CREAM Working on a new grid JDL ranking expression –to prevent small sites to be flooded by too many pilot jobs –to allow for a more adequate usage of sites using fair share mechanisms. Exploring the idea to use the same ping method as used in SLS to detect hanging experiment applications and services Discussions with FIO to see how to sustain a 3-fold increase in transaction rates for the MySQL databases used by DIRAC –IT supports Oracle but accepts to add independent instances of DIRAC and the databases.
10
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Other activities Participated in revising the HEP-SSC part of the ROSCOE (Robust Scientific Communities for EGI) proposal Increased number of grid pool accounts (500 for ALICE and LHCb, 1000 for ATLAS and CMS) –Shown to be enough in the CMS October exercise Investigating the impact of SCAS to change identities in multi-user pilot jobs on the MyProxy servers Update gLite User Guide (Lola, Andrea) Data management support (Andrea)
11
CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Conclusions Support to integration –CREAM, FTS, SCAS, MyProxy, data management, etc. Support to operations –VO box project, disk space management, troubleshooting, etc. Support to monitoring –Site readiness, Nagios, SLS, Lemon, etc. Support to user community –gLite User guide, ROSCOE proposal, etc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.