CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.

Slides:



Advertisements
Similar presentations
Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.
Advertisements

CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle and Streams Diagnostics and Monitoring Eva Dafonte Pérez Florbela Tique Aires.
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
CERN IT Department CH-1211 Genève 23 Switzerland t Some Hints for “Best Practice” Regarding VO Boxes Running Critical Services and Real Use-cases.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Julia Andreeva CERN (IT/GS) CHEP 2009, March 2009, Prague New job monitoring strategy.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Overview of day-to-day operations Suzanne Poulat.
PanDA Multi-User Pilot Jobs Maxim Potekhin Brookhaven National Laboratory Open Science Grid WLCG GDB Meeting CERN March 11, 2009.
CERN IT Department CH-1211 Geneva 23 Switzerland t Open projects in Grid Monitoring IT-GS-MDS Section Meeting 25 th January 2008.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t DIP Service, status, recent issues and plans for the future Mathias Dutour 28 April 2008.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
22 February 2008GS Group Meeting - EIS section GS-EIS: Experiment Integration Support section Five staff: Harry Renshall Section Leader Simone Campana.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
CERN IT Department CH-1211 Geneva 23 Switzerland t CCRC’08 Tools for measuring our progress CCRC’08 F2F 5 th February 2008 James Casey, IT-GS-MND.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Priorities update Andrea Sciabà IT/GS Ulrich Schwickerath IT/FIO.
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Recent improvements in HLRmon, an accounting portal suitable for national Grids Enrico Fattibene (speaker), Andrea Cristofori, Luciano Gaido, Paolo Veronesi.
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CERN IT Department CH-1211 Genève 23 Switzerland t Streams Service Review Distributed Database Workshop CERN, 27 th November 2009 Eva Dafonte.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Julia Andreeva on behalf of the MND section MND review.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Drupal at CERN Juraj Sucik Jarosław Polok.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
GridView - A Monitoring & Visualization tool for LCG Rajesh Kalmady, Phool Chand, Kislay Bhatt, D. D. Sonvane, Kumar Vaibhav B.A.R.C. BARC-CERN/LCG Meeting.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
CERN - IT Department CH-1211 Genève 23 Switzerland CASTOR F2F Monitoring at CERN Miguel Coelho dos Santos.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
MND section. Summary of activities Job monitoring In collaboration with GridView and LB teams enabled full chain from LB harvester via MSG to Dashboard.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
TIFR, Mumbai, India, Feb 13-17, GridView - A Grid Monitoring and Visualization Tool Rajesh Kalmady, Digamber Sonvane, Kislay Bhatt, Phool Chand,
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
Dissemination and User Feedback Castor deployment team Castor Readiness Review – June 2006.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS Section input to GLM For GLM attended by Director for Computing.
Open Science Grid Configuring RSV OSG Resource & Service Validation Thomas Wang Grid Operations Center (OSG-GOC) Indiana University.
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
Daniele Bonacorsi Andrea Sciabà
Jean-Philippe Baud, IT-GD, CERN November 2007
Presentation transcript:

CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting November 10, 2009

CERN IT Department CH-1211 Genève 23 Switzerland t Contents Section overview VOBox project ALICE activities ATLAS activities CMS activities LHCb activities Other activities Conclusions

CERN IT Department CH-1211 Genève 23 Switzerland t Section overview Section members –ALICE: Patricia, Lola –ATLAS: Alessandro –CMS: Andrea, Nicolò –LHCb: Roberto, Harry (as IT-LHCb management liason) Ongoing projects –VO-box project –Migration of SAM tests to Nagios Other activities –See next slides

CERN IT Department CH-1211 Genève 23 Switzerland t VO box project: overview Objective –enhance reliability of experiment applications running in the computer centre –Coordinated by Patricia Related activities –Audit of VO boxes  what services are run, how critical they are –Ensure existence of operator procedures for critical VO boxes in case of failure. Simplest case: alert the experiment –Expand usage of Lemon and SLS. Sensors can probe the application environment or use meta-information out of SLS. –Where possible automate recovery of a failed application and/or provide simple instructions/tools to the operators

CERN IT Department CH-1211 Genève 23 Switzerland t VO box project: status ALICE: first full prototype with Lemon sensors and operator procedures ready ATLAS: Flavia has coordinated an audit of ATLAS VO boxes and presented plans for increasing reliability and security. The emphasis was on computer security but contains many elements for enhanced reliability. CMS was also contacted CMS: service criticality reviewed. Goal is to provide PhEDEx and DBS with procedures and Lemon alarms by start of data taking LHCb: Jiri Horky (a student) and Roberto did the setup of the basic Lemon infrastructure for some DIRAC sensors. LHCb currently collecting the list of all needed Lemon sensors Andrea has triggered the collection of required metrics for all experiments, to avoid duplication of metrics and efforts

CERN IT Department CH-1211 Genève 23 Switzerland t ALICE support Assist in migration to SL5 of worker nodes and VO boxes –Full scale deployment of SL5 / gLite3.2 VO boxes expected before Christmas Setup of CREAM CE for ALICE at all sites –recently recommended by the MB for all Tier-1/2 sites Preparation for a grid-wide MyProxy service for ALICE –launched a survey of the 102 sites concerned to clean up obsolete registered hosts

CERN IT Department CH-1211 Genève 23 Switzerland t ATLAS support Completed a web interface to display the disk quota and space used by individual ATLAS users and ATLAS sub-groups in the CASTOR analysis stager spaces (Lola) –Data extracted from Lemon (Maarten) –Feedback from Guido Negri, ATLAS “space manager” –To do: web interface to manipulate quota limits for individuals, subgroups and within subgroups Significant contribution to computing operations –Helped debugging FTS 2.2 –Load on operations increased with data taking

CERN IT Department CH-1211 Genève 23 Switzerland t CMS support New Lemon metric for PhEDEx checking for errors in log file is in production. Next steps: –Trigger a Lemon alarm –Define a corresponding procedure On DBS, enabled a Lemon alarm with automatic restart when Tomcat is down Site readiness –Ongoing campaign to fix all known bugs with Dashboard developers Starting to test FTS 2.2 Our script to prestage via SRM adopted by data operations team

CERN IT Department CH-1211 Genève 23 Switzerland t LHCb support Using SLS to monitor free space in SRM space tokens –Information visible in Site Status Board and DIRAC portal Some SAM tests successfully migrated to Nagios Testing submission to CREAM CE in DIRAC via gLite WMS; investigating direct submission to CREAM Working on a new grid JDL ranking expression –to prevent small sites to be flooded by too many pilot jobs –to allow for a more adequate usage of sites using fair share mechanisms. Exploring the idea to use the same ping method as used in SLS to detect hanging experiment applications and services Discussions with FIO to see how to sustain a 3-fold increase in transaction rates for the MySQL databases used by DIRAC –IT supports Oracle but accepts to add independent instances of DIRAC and the databases.

CERN IT Department CH-1211 Genève 23 Switzerland t Other activities Participated in revising the HEP-SSC part of the ROSCOE (Robust Scientific Communities for EGI) proposal Increased number of grid pool accounts (500 for ALICE and LHCb, 1000 for ATLAS and CMS) –Shown to be enough in the CMS October exercise Investigating the impact of SCAS to change identities in multi-user pilot jobs on the MyProxy servers Update gLite User Guide (Lola, Andrea) Data management support (Andrea)

CERN IT Department CH-1211 Genève 23 Switzerland t Conclusions Support to integration –CREAM, FTS, SCAS, MyProxy, data management, etc. Support to operations –VO box project, disk space management, troubleshooting, etc. Support to monitoring –Site readiness, Nagios, SLS, Lemon, etc. Support to user community –gLite User guide, ROSCOE proposal, etc.