LHCOPN operational working group Guillaume Cessieux (CNRS/FR-CCIN2P3 – EGEE SA2) third meeting CERN – December th,
2 Background LHCOPN Copenhagen, – Sites & NRENs to give feedbacks – Improve relationships with LCG – Where is GGUS? – What is the roadmap? GCX
3 Agenda 1.Ops model – Feedbacks (sites - NRENs - LCG) 2.Information repository – CERN Twiki – GGUS 3.Implementation – Testing – Assessment – Roadmap GCX
1- Ops model
5 Overview of sites’ feedbacks SiteRemark CA-TRIUMFNo clear agreement CH-CERNOps wg member DE-KITOps wg member ES-PICOps wg member FR-CCIN2P3Ops wg member IT-INFN-CNAFNo answer NDGF NL-T1 TW-ASGCNo clear agreement UK-T1-RALOps wg member & confirmed US-FNAL-CMSNo answer US-T1-BNLNo answer GCX
6 Summary of sites’ feedbacks (1/3) CA-TRIUMF – Fear of significant additional load for small events ES-PIC – For L3 IM indicate to look at monitoring – L3 IM: Create ticket and then investigate – Manage duplicate ticket by flagging one as duplicated – Interface with CIC portal to centralise action needed? GCX
7 Summary of sites’ feedbacks (2/3) FR-IN2P3CC – Could be great if no need of Grid certificates (i.e certificates of the institute) for TTS TW-ASGC – How to deal with links outside of LHCOPN but affecting the LHCOPN? GCX
8 Summary of sites’ feedbacks (3/3) NL-T1 – Report incidents still solved when noticed – Open a ticket and then investigate – What are « major » changes? – L2 IM: T{0,1} sites should interact – Escalation process quite vague – Several other details GCX
9 Summary of network providers’ feedbacks (1/2) DANTE – Model not enough reliable “Not prepared for the worst” DFN – Model cannot work seriously in a stable mode – Inappropriate way to operate such a network Hot potatoes, cost, distributed ownership of trouble – Work only if L3 topology mapped on L2 GCX
10 RENATER – E2ECU should be there and playing a role USLHCNET – Twiki seems unclear – Model to tie more closely with LCG – (What about links for T2 traffic?) Summary of network providers’ feedbacks (2/2) GCX
11 Grid feedbacks (1/2) November’s GDB – Ops model seems ok... Rename Grid data manager → Grid data contact – To be nominated by sites (FTS managers?) – Role? Still unclear: No way to smartly warn VO, experiments and Grid operation – Grid interaction to be sorted out EGEE broadcast not sufficient – Need finer and more formalised GCX
12 Grid feedbacks (2/2) Change management DB access policy Where is monitoring? Scheduled downtime policy? WLCG rule to be checked AOB raised: Sister notion for each T1 – Is this ok on network side? Asymmetric routing and performance key issue T1-T1 traffic with IT-INFN-CNAF... GCX
2- Information repository
14 Twiki (1/3) Authentication – View/change allowed only for people authenticated on CERN twiki: <!-- * Set NOSEARCHALL = on * Set DENYTOPICVIEW = TWikiGuest * Set ALLOWTOPICVIEW = * Set DENYTOPICCHANGE = TWikiGuest * Set ALLOWTOPICCHANGE = * Set DENYTOPICRENAME = TWikiGuest * Set ALLOWTOPICRENAME = --> Some pages are now protected with that – Contacts, access details... GCX
15 Twiki (2/3) Notifications through WebNotify are OK GCX
16 Twiki (3/3) Ok to have change management DB into? Reorganisation of some areas? – Technical contacts, operational contacts, NOC... Not obvious where this is Only one regular twiki problem: GCX
17 GGUS Access opened Some feedbacks on the system? Group certificate? Reminder and notifications strategy Calendar/planning – requirements? LHCOPN look? Logos, stylesheets – Should we use: – Licensing? GCX
18 Samples notifications/reminders GCX *********************************************************************** This is an automated REMINDER mail. Please DO NOT REPLY!!! *********************************************************************** Dear support staff, this is a list of currently open tickets for support unit "VOMS" ordered by priority colour. Reference link: =1&timeframe=nohttps://gus.fzk.de/ws/ticket_search.php?supportunit=VOMS&status=open&radiotf 1 open ticket(s) RED: open ticket(s) AMBER: 0 open ticket(s) YELLOW: 0 open ticket(s) GREEN: Dear support staff, this is a list of currently open tickets for support unit "TPM" ordered by priority colour. Reference link: 1&timeframe=nohttps://gus.fzk.de/ws/ticket_search.php?supportunit=TPM&status=open&radiotf= 4 OPEN TICKET(S) RED: OPEN TICKET(S) AMBER: 2 OPEN TICKET(S) YELLOW: OPEN TICKET(S) GREEN: Dear T1 network staff, ticket #39579 is updated. Reference Link : Ticket-ID : Responsible T1 : Status : in progress Short description: test51 Impacted Links : CERN-ASGC-LHCOPN-002,CERN-BNL-LHCOPN-001, Priority : less urgent Type of Impact : Connectivity Ticket Category : Incident L2 Last Modifier : Guillaume Cessieuxhttps://iwrgustrain.fzk.de/ws/ticket_info.php?ticket=39579
19 Wrap up on Ops model E2emon deployment/ reliability how this is really followed? Change “problem management” name because confusing Define the term “unreasonable” for escalation process New link-IDs for “hidden” links – LHCOPN-TW-ASGC-AMS-TPE-001? – DE-KIT-I-II-LHCOPN-001? Trouble/ticket’s responsibility GCX
3- Implementation
21 Implementation: Next steps? Fill the GGUS authentication table... GCX
22 Ops model implementation testing? Reminders, notifications... – Use “Test” tickets? – Be sure people are really reached Ops model testing – L2: Ask site to disconnect? To filter all traffic? – L3: Ask NREN to simulate a rogue cut? – Use backup tests? Part of a (regular?) process? GCX
23 Ops model roadmap LHCOPN Copenhagen Ops model v2 proposed 6 Beta release of LHCOPN GGUS TTS LHC startup LHCOPN Berlin Public release of TTS Trial implementation Working implementation Ops testing dates Backup tests... End of EGEE-III GCX
24 Quality assessment Infrastructure and operations Regular standard reports Way to be protected from passivity of sites? Service view provided for the LCG project Responsibilities for that? – Separate conclusion from processes to gather metrics GCX
25 Backup tests Process seems ok – – Frequency unclear – Roadmap? Responsibilities for that? GCX
26 Pending actions GCX
27 AOB Monitoring? – Role of ENOC’s ASPDrawer? – ENOC’s DownCollector for the LHCOPN? What to present at LHCOPN meeting? Next ops meeting/phoneconf? – Mutualised with any other event? – More people? Provide some real life example of Ops model implementation GCX
Extra slides
29 Links Ops model – Ops WG page – GGUS – Dashboard: – Submit interface: GCX
30 Sites (T0/T1) Proposed site implementation Grid Project (LCG) Sites (T0/T1s) Grid Data Manager Router Operators/ Site NOC Grid Network Networks providers Network providers A B C GCX
31 L3 mapping over L2 ~ NowFuture? Site 1 Site 2 Site 3 L3 L2 Site 1 Site 2 Site 3 How does this impacts the ops model? GCX
32 LHCOPN’s “hidden” links GCX
33 TW-ASGC LHCOPN connectivity CH-CERN R1 CH-CERN R2 US-FNAL-CMS AMS TPE TW-ASGC NL-T1 ASGC-FERMI-LHCOPN-001, 1Gb CERN-ASGC-LHCOPN-004, 2Gb CERN-ASGC-LHCOPN-003, 10Gb ASGC-SARA-LHCOPN-001, 1Gb AMS-CHI-001, 2.5Gb CHI-TPE-001, 2.5Gb AMS-TPE-001, 10Gb CHI LHCOPN links Not LHCOPN links, but could affect LHCOPN connectivity GCX