GGUS summary (4 weeks) VOUserTeamAlarmTotal ALICE5016 ATLAS CMS6118 LHCb Totals
6/23/2016WLCG MB Report WLCG Service Report 2 Support-related events since last MB We need WLCG shifters, alarmers, management to give us meaningful values for the GGUS ‘Problem Type’ field, in order for periodic reporting to show better weak areas in support. GGUS:61440 (CNAF-BNL network problem) re-opened by ATLAS till network problem fully understood.GGUS:61440 EMI insists on changing the GGUS supporters’ privileges, such that assignment to middleware-related Support Units (SUs) be only possible by the EGI DMSU (Deployed Middleware SU). Although this matches the ‘Service Desk’ spirit, it might slow things down. As we have no more USAG, we need the WLCG community input offline a.s.a.p. There were 9 ALARM tickets since the Sept. 28 th MB (4 weeks), 5 of which were real, all submitted by ATLAS. No ALARMs since the Oct 12 th MB (where WLCG report was not given). Details follow…
ATLAS ALARM->CERN-CNAF TRANSFERS 6/23/2016WLCG MB Report WLCG Service Report 3 What time UTCWhat happened 2010/10/05 9:13GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_Italy. 2010/10/05 10:23Site acknowledges ticket and finds a StoRM backend problem. 2010/10/05 12:03Service restored. Site puts the ticket to ‘solved’ and refers to GGUS:62745 for details.GGUS: /10/11 9:48Submitter of ticket GGUS:62745 sets status ‘verified’. No explanation on any of the 2 tickets what the problem/diagnostic/solution actually was…GGUS:62745
ATLAS ALARM->TRANSFERS TO.FR CLOUD 6/23/2016WLCG MB Report WLCG Service Report 4 What time UTCWhat happened 2010/10/08 5:56GGUS ALARM ticket opened, automatic notification to AND automatic assignment to 2010/10/08 6:31Site acknowledges ticket and finds a network problem preventing all DB server access. 2010/10/08 7:29Service restored. 2010/10/08 10:41Site puts ticket to status ‘solved’. 2010/10/14 8:39Submitter sets the ticket to status ‘verified’.
ATLAS ALARM-> CERN SLOW LSF 6/23/2016WLCG MB Report WLCG Service Report 5 What time UTCWhat happened 2010/09/27 15:34GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_CERN. 2010/09/27 16:01Operator acknowledges ticket and contacts the expert. 2010/09/27 16:37Expert’s 1 st diagnosis. Too many queries. 2010/09/27 20:10Service mgr kills a home-made robot by another experiment launching >> bjob queries and puts ticket to status ‘solved’. 2010/09/28 12:21Submitter sets ticket to status ‘verified’.
ATLAS ALARM-> CERN SLOW AFS 6/23/2016WLCG MB Report WLCG Service Report 6 What time UTCWhat happened 2010/10/01 7:13GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_CERN. 2010/10/01 7:33Operator acknowledges ticket and contacts the expert. 2010/10/01 9:37IT Service manager re-classifies in CERN Remedy PRMS. 2010/10/11 15:33Still ‘in progress’. Reminder sent during this drill. 2010/10/25 15:56Still ‘in progress’. No reaction to the Oct 11 th reminder
ATLAS ALARM-> CERN CASTOR 6/23/2016WLCG MB Report WLCG Service Report 7 What time UTCWhat happened 2010/10/01 16:24GGUS ALARM ticket opened, automatic notification to AND automatic assignment to ROC_CERN. 2010/10/01 16:41Operator acknowledges ticket and contacts the expert. 2010/10/01 16:42Expert starts investigation. 2010/10/01 17:23Solved. Put DONE in SRM not propagated to CASTOR. Done by hand. 2010/10/01 17:45Submitter ‘verified’. Shifter added x-ref to GGUS:62705 GGUS:62705