GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE5016 ATLAS282156249 CMS6118 LHCb238141 Totals412549304 1.

Slides:



Advertisements
Similar presentations
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS – CERN SNOW (Service Now) interface 3 rd update For T1SCM.
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS – CERN SNOW (Service Now) interface 2 nd update For T1SCM.
GGUS summary (5 weeks) VOUserTeamAlarmTotal ALICE2002 ATLAS CMS6208 LHCb Totals
WLCG Service Report (for the SCOD team) ~~~ WLCG Management Board, 22 nd January 2013 Thanks to Maria Dimou, Mike Kenyon, David.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
Use Cases. Summary Define and understand slow transfers – Identify weak links, narrow down the source – Understand what perfSONAR measurements mean wrt.
WLCG Service Report ~~~ WLCG Management Board, 27 th January 2009.
WLCG Service Report ~~~ WLCG Management Board, 27 th October
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
GGUS summary (7 weeks) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1 To calculate the totals for this slide and copy/paste the usual graph please:
GGUS summary ( 4 weeks ) VOUserTeamAlarmTotal ALICE ATLAS CMS LHCb Totals 1.
WLCG Service Report ~~~ WLCG Management Board, 24 th November
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Overview ROC_LA CERN
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Graeme Stewart: ATLAS Computing WLCG Workshop, Prague ATLAS Suspension and Downtime Procedures Graeme Stewart (for ATLAS Central Operations Team)
WLCG Service Report ~~~ WLCG Management Board, 9 th August
MW Readiness WG Update Andrea Manzi Maria Dimou Lionel Cons 10/12/2014.
ATLAS Experience with GGUS Guido Negri INFN – Milano Italy.
WLCG Service Report ~~~ WLCG Management Board, 16 th December 2008.
Handling ALARMs for Critical Services Maria Girone, IT-ES Maite Barroso IT-PES, Maria Dimou, IT-ES WLCG MB, 19 February 2013.
GGUS Slides for the 2012/07/24 MB Drills cover the period of 2012/06/18 (Monday) until 2012/07/12 given my holiday starting the following weekend. Remove.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES GGUS Ticket review T1 Service Coordination Meeting 2010/10/28.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE1102 ATLAS CMS LHCb Totals
WLCG Service Report ~~~ WLCG Management Board, 17 th March 2009.
WLCG Service Report ~~~ WLCG Management Board, 7 th September 2010 Updated 8 th September
WLCG Service Report ~~~ WLCG Management Board, 7 th July 2009.
GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE4015 ATLAS CMS LHCb Totals
4 March 2008CCRC'08 Feb run - preliminary WLCG report 1 CCRC’08 Feb Run Preliminary WLCG Report.
CERN IT Department CH-1211 Genève 23 Switzerland t Experiment Operations Simone Campana.
WLCG Service Report ~~~ WLCG Management Board, 16 th September 2008 Minutes from daily meetings.
WLCG Service Report ~~~ WLCG Management Board, 31 st March 2009.
WLCG Service Report ~~~ WLCG Management Board, 7 th June
WLCG Service Report ~~~ WLCG Management Board, 18 th September
WLCG Service Report ~~~ WLCG Management Board, 23 rd November
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
Operation Issues (Initiation for the discussion) Julia Andreeva, CERN WLCG workshop, Prague, March 2009.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
WLCG Operations Coordination Andrea Sciabà IT/SDC 10 th July 2013.
WLCG critical services update Andrea Sciabà WLCG operations coordination meeting December 18, 2014.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
WLCG Service Report ~~~ WLCG Management Board, 20 th January 2009.
WLCG Service Report ~~~ WLCG Management Board, 14 th February
WLCG Service Report Jean-Philippe Baud ~~~ WLCG Management Board, 24 th August
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
WLCG Service Report ~~~ WLCG Management Board, 17 th February 2009.
WLCG Service Report ~~~ WLCG Management Board, 10 th November
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
Outcome should be a documented strategy Not everything needs to go back to square one! – Some things work! – Some work has already been (is being) done.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI GGUS Report Generator Günter Grein, KIT Helmut Dres, KIT Torsten Antoni,
Scuola Grid - Martina Franca, Thursday 08 November Il Sistema di Supporto INFNGrid & GGUS ( Global Grid User.
Site notifications with SAM and Dashboards Marian Babik SDC/MI Team IT/SDC/MI 12 th June 2013 GDB.
GGUS summary ( 9 weeks ) VOUserTeamAlarmTotal ALICE2608 ATLAS CMS LHCb Totals
GGUS summary (2 weeks) VOUserTeamAlarmTotal ALICE2046 ATLAS CMS26210 LHCb Totals
1 VO User Team Alarm Total ALICE 12 ATLAS CMS
1 VO User Team Alarm Total ALICE ATLAS CMS
1 VO User Team Alarm Total ALICE ATLAS CMS
1 VO User Team Alarm Total ALICE 1 2 ATLAS CMS 4 LHCb 20
WLCG Management Board, 16th July 2013
1 VO User Team Alarm Total ALICE ATLAS CMS
1 VO User Team Alarm Total ALICE 2 ATLAS CMS LHCb 14
Take the summary from the table on
Work flow changes after the end of EMI
Dirk Duellmann ~~~ WLCG Management Board, 27th July 2010
Presentation transcript:

GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE5016 ATLAS CMS6118 LHCb Totals

6/23/2016WLCG MB Report WLCG Service Report 2 Support-related events since last MB We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support. GGUS:61440 (CNAF-BNL network problem) re-opened by ATLAS till network problem fully understood.GGUS:61440 EMI insists on changing the GGUS supporters’ privileges, such that assignment to middleware-related Support Units (SUs) be only possible by the EGI DMSU (Deployed Middleware SU). Although this matches the ‘Service Desk’ spirit, it might slow things down. As we have no more USAG, we need the WLCG community input offline a.s.a.p. There were 9 ALARM tickets since the Sept. 28 th MB (4 weeks), 5 of which were real, all submitted by ATLAS. No ALARMs since the Oct 12 th MB (where WLCG report was not given). Details follow…

ATLAS ALARM->CERN-CNAF TRANSFERS 6/23/2016WLCG MB Report WLCG Service Report 3 What time UTCWhat happened 2010/10/05 9:13GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_Italy. 2010/10/05 10:23Site acknowledges ticket and finds a StoRM backend problem. 2010/10/05 12:03Service restored. Site puts the ticket to ‘solved’ and refers to GGUS:62745 for details.GGUS: /10/11 9:48Submitter of ticket GGUS:62745 sets status ‘verified’. No explanation on any of the 2 tickets what the problem/diagnostic/solution actually was…GGUS:62745

ATLAS ALARM->TRANSFERS TO.FR CLOUD 6/23/2016WLCG MB Report WLCG Service Report 4 What time UTCWhat happened 2010/10/08 5:56GGUS ALARM ticket opened, automatic notification to AND automatic assignment to 2010/10/08 6:31Site acknowledges ticket and finds a network problem preventing all DB server access. 2010/10/08 7:29Service restored. 2010/10/08 10:41Site puts ticket to status ‘solved’. 2010/10/14 8:39Submitter sets the ticket to status ‘verified’.

ATLAS ALARM-> CERN SLOW LSF 6/23/2016WLCG MB Report WLCG Service Report 5 What time UTCWhat happened 2010/09/27 15:34GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_CERN. 2010/09/27 16:01Operator acknowledges ticket and contacts the expert. 2010/09/27 16:37Expert’s 1 st diagnosis. Too many queries. 2010/09/27 20:10Service mgr kills a home-made robot by another experiment launching >> bjob queries and puts ticket to status ‘solved’. 2010/09/28 12:21Submitter sets ticket to status ‘verified’.

ATLAS ALARM-> CERN SLOW AFS 6/23/2016WLCG MB Report WLCG Service Report 6 What time UTCWhat happened 2010/10/01 7:13GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_CERN. 2010/10/01 7:33Operator acknowledges ticket and contacts the expert. 2010/10/01 9:37IT Service manager re-classifies in CERN Remedy PRMS. 2010/10/11 15:33Still ‘in progress’. Reminder sent during this drill. 2010/10/25 15:56Still ‘in progress’. No reaction to the Oct 11 th reminder

ATLAS ALARM-> CERN CASTOR 6/23/2016WLCG MB Report WLCG Service Report 7 What time UTCWhat happened 2010/10/01 16:24GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_CERN. 2010/10/01 16:41Operator acknowledges ticket and contacts the expert. 2010/10/01 16:42Expert starts investigation. 2010/10/01 17:23Solved. Put DONE in SRM not propagated to CASTOR. Done by hand. 2010/10/01 17:45Submitter ‘verified’. Shifter added x-ref to GGUS:62705 GGUS:62705