Andrea Sciabà CERN CMS availability in December Critical services  CE, SRMv2 (since December) Critical tests  CE: job submission (run by CMS), CA certs.

Slides:



Advertisements
Similar presentations
Applications Area Issues RWL Jones GridPP13 – 5 th June 2005.
Advertisements

Exporting Raw/ESD data from Tier-0 Tier-1s Wrap-up.
Storage Issues: the experiments’ perspective Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 9 September 2008.
Castor F2F Meeting Barbara Martelli Castor Database CNAF.
WLCG Service Report (for the SCOD team) ~~~ WLCG Management Board, 22 nd January 2013 Thanks to Maria Dimou, Mike Kenyon, David.
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
Monitoring in EGEE EGEE/SEEGRID Summer School 2006, Budapest Judit Novak, CERN Piotr Nyczyk, CERN Valentin Vidic, CERN/RBI.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
CERN IT Department CH-1211 Geneva 23 Switzerland t Storageware Flavia Donno CERN WLCG Collaboration Workshop CERN, November 2008.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI AMOD report – Fernando H. Barreiro Megino CERN-IT-ES-VOS.
CCRC’08 Weekly Update Jamie Shiers ~~~ LCG MB, 1 st April 2008.
LHCb: March/April Operational Report NCB 10 th May 2010.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
WLCG Service Report ~~~ WLCG Management Board, 9 th August
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Alberto Aimar CERN – LCG1 Reliability Reports – May 2007
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Feedback from the Tier1s GDB, September CNAF 24x7 support On-call person for all critical infrastructural services (cooling, power etc..) Manager.
ATLAS Bulk Pre-stageing Tests Graeme Stewart University of Glasgow.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
LCG Report from GDB John Gordon, STFC-RAL MB meeting February24 th, 2009.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
WLCG Service Report ~~~ WLCG Management Board, 7 th July 2009.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE4015 ATLAS CMS LHCb Totals
4 March 2008CCRC'08 Feb run - preliminary WLCG report 1 CCRC’08 Feb Run Preliminary WLCG Report.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
WLCG Service Report ~~~ WLCG Management Board, 18 th September
WLCG Service Report ~~~ WLCG Management Board, 23 rd November
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
SRM-2 Road Map and CASTOR Certification Shaun de Witt 3/3/08.
SAM Database and relation with GridView Piotr Nyczyk SAM Review CERN, 2007.
LHCb status and plans Ph.Charpentier CERN. LHCb status and plans WLCG Workshop 1-2 Sept 2007, Victoria, BC 2 Ph.C. Status of DC06  Reminder:  Two-fold.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
WLCG Service Report ~~~ WLCG Management Board, 9 th February
WLCG Service Report ~~~ WLCG Management Board, 14 th February
CERN IT Department CH-1211 Genève 23 Switzerland t CCRC’08 Review from a DM perspective Alberto Pace (With slides from T.Bell, F.Donno, D.Duelmann,
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Summary of SC4 Disk-Disk Transfers LCG MB, April Jamie Shiers, CERN.
LCG Tier1 Reliability John Gordon, STFC-RAL CCRC09 November 13 th, 2008.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
CERN IT Department CH-1211 Genève 23 Switzerland t CMS SAM Testing Andrea Sciabà Grid Deployment Board May 14, 2008.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
WLCG Operations Coordination Andrea Sciabà IT/SDC GDB 11 th September 2013.
WLCG ‘Weekly’ Service Report ~~~ WLCG Management Board, 19 th August 2008.
Daniele Bonacorsi Andrea Sciabà
James Casey, IT-GD, CERN CERN, 5th September 2005
Flavia Donno CERN GSSD Storage Workshop 3 July 2007
CASTOR-SRM Status GridPP NeSC SRM workshop
WLCG Management Board, 16th July 2013
WLCG Service Report 5th – 18th July
Site availability Dec. 19 th 2006
Dirk Duellmann ~~~ WLCG Management Board, 27th July 2010
The LHCb Computing Data Challenge DC06
Presentation transcript:

Andrea Sciabà CERN CMS availability in December Critical services  CE, SRMv2 (since December) Critical tests  CE: job submission (run by CMS), CA certs (run by OPS)  SRMv2: “lcg-cp” (copies a local file to the SRM in the provided tokens and default space and then it copies it back) (run by CMS)

Andrea Sciabà CERN CERN Just a few SRM failures; downtime on 9/12 due to Oracle upgrade Excellent reliability

Andrea Sciabà CERN ASGC Oracle problem caused a very high load on CASTOR database. Debugged with help from CERN CASTOR and DB team Poor reliability; no downtime was scheduled to fix the problem

Andrea Sciabà CERN CNAF Excellent reliability A problem not shown by any CMS critical test: software area was unavailable on 1- 2/12 due to GPFS problems

Andrea Sciabà CERN FNAL CE at FNAL still ignored in GridView...  CE and SRMv2 are in two different BDII sites, GridView does not aggregate them Very good reliability, but risk of overestimating it due to CE missing

Andrea Sciabà CERN IN2P3 CE: electrical maintenance on 1-4/12; jobs stuck on 26/12 SRMv2: downtimes declared 17-19/12 (why barely visible on Gridview?). SRM overloaded due to recursive srmls issued by CMS ProdAgent Very poor reliability, diagnosis took long

Andrea Sciabà CERN PIC Maintenance from 14 to 16 due to storage service upgrade Unreliability on 16/12 “amplified” due to CE test failure in a short time window not in maintenance? Modulo that, excellent reliability

Andrea Sciabà CERN FZK CE: extended downtime of all CEs from 25/12 to 29/12, then frequent job aborts (“unspecified gridmanager error”) SRMv2: several dCache errors, most about full devices Poor reliability during the holidays

Andrea Sciabà CERN RAL Excellent reliability