Download presentation
Presentation is loading. Please wait.
Published byKellie Bradley Modified over 8 years ago
1
LHCOPN operational working group Guillaume Cessieux (CNRS/FR-CCIN2P3 – EGEE SA2) third meeting CERN – December 11-12 th, 2008 http://indico.cern.ch/conferenceDisplay.py?confId=44050
2
2 Background LHCOPN meeting @ Copenhagen, 2008-10 – Sites & NRENs to give feedbacks – Improve relationships with LCG – Where is GGUS? – What is the roadmap? GCX 2008-12-11
3
3 Agenda 1.Ops model – Feedbacks (sites - NRENs - LCG) 2.Information repository – CERN Twiki – GGUS 3.Implementation – Testing – Assessment – Roadmap GCX 2008-12-11
4
1- Ops model
5
5 Overview of sites’ feedbacks SiteRemark CA-TRIUMFNo clear agreement CH-CERNOps wg member DE-KITOps wg member ES-PICOps wg member FR-CCIN2P3Ops wg member IT-INFN-CNAFNo answer NDGF NL-T1 TW-ASGCNo clear agreement UK-T1-RALOps wg member & confirmed US-FNAL-CMSNo answer US-T1-BNLNo answer GCX 2008-12-11
6
6 Summary of sites’ feedbacks (1/3) CA-TRIUMF – Fear of significant additional load for small events ES-PIC – For L3 IM indicate to look at monitoring – L3 IM: Create ticket and then investigate – Manage duplicate ticket by flagging one as duplicated – Interface with CIC portal to centralise action needed? GCX 2008-12-11
7
7 Summary of sites’ feedbacks (2/3) FR-IN2P3CC – Could be great if no need of Grid certificates (i.e certificates of the institute) for TTS TW-ASGC – How to deal with links outside of LHCOPN but affecting the LHCOPN? GCX 2008-12-11
8
8 Summary of sites’ feedbacks (3/3) NL-T1 – Report incidents still solved when noticed – Open a ticket and then investigate – What are « major » changes? – L2 IM: T{0,1} sites should interact – Escalation process quite vague – Several other details GCX 2008-12-11
9
9 Summary of network providers’ feedbacks (1/2) DANTE – Model not enough reliable “Not prepared for the worst” DFN – Model cannot work seriously in a stable mode – Inappropriate way to operate such a network Hot potatoes, cost, distributed ownership of trouble – Work only if L3 topology mapped on L2 GCX 2008-12-11
10
10 RENATER – E2ECU should be there and playing a role USLHCNET – Twiki seems unclear – Model to tie more closely with LCG – (What about links for T2 traffic?) Summary of network providers’ feedbacks (2/2) GCX 2008-12-11
11
11 Grid feedbacks (1/2) November’s GDB – http://indico.cern.ch/conferenceDisplay.py?confId=20235 http://indico.cern.ch/conferenceDisplay.py?confId=20235 Ops model seems ok... Rename Grid data manager → Grid data contact – To be nominated by sites (FTS managers?) – Role? Still unclear: No way to smartly warn VO, experiments and Grid operation – Grid interaction to be sorted out EGEE broadcast not sufficient – Need finer and more formalised GCX 2008-12-11
12
12 Grid feedbacks (2/2) Change management DB access policy Where is monitoring? Scheduled downtime policy? WLCG rule to be checked AOB raised: Sister notion for each T1 – Is this ok on network side? Asymmetric routing and performance key issue T1-T1 traffic with IT-INFN-CNAF... GCX 2008-12-11
13
2- Information repository
14
14 Twiki (1/3) Authentication – View/change allowed only for people authenticated on CERN twiki: <!-- * Set NOSEARCHALL = on * Set DENYTOPICVIEW = TWikiGuest * Set ALLOWTOPICVIEW = * Set DENYTOPICCHANGE = TWikiGuest * Set ALLOWTOPICCHANGE = * Set DENYTOPICRENAME = TWikiGuest * Set ALLOWTOPICRENAME = --> Some pages are now protected with that – Contacts, access details... GCX 2008-12-11
15
15 Twiki (2/3) Notifications through WebNotify are OK GCX 2008-12-11
16
16 Twiki (3/3) Ok to have change management DB into? Reorganisation of some areas? – Technical contacts, operational contacts, NOC... Not obvious where this is Only one regular twiki problem: GCX 2008-12-11
17
17 GGUS Access opened 2008-12-10 Some feedbacks on the system? Group certificate? Reminder and notifications strategy Calendar/planning – requirements? LHCOPN look? Logos, stylesheets – Should we use: http://indico.cern.ch/getFile.py/access?contribId=14&resId=0&materialId=0&confId=15319 http://indico.cern.ch/getFile.py/access?contribId=14&resId=0&materialId=0&confId=15319 – Licensing? GCX 2008-12-11
18
18 Samples notifications/reminders GCX 2008-12-11 *********************************************************************** This is an automated REMINDER mail. Please DO NOT REPLY!!! *********************************************************************** Dear support staff, this is a list of currently open tickets for support unit "VOMS" ordered by priority colour. Reference link: https://gus.fzk.de/ws/ticket_search.php?supportunit=VOMS&status=open&radiotf =1&timeframe=nohttps://gus.fzk.de/ws/ticket_search.php?supportunit=VOMS&status=open&radiotf 1 open ticket(s) RED: 40498 0 open ticket(s) AMBER: 0 open ticket(s) YELLOW: 0 open ticket(s) GREEN: Dear support staff, this is a list of currently open tickets for support unit "TPM" ordered by priority colour. Reference link: https://gus.fzk.de/ws/ticket_search.php?supportunit=TPM&status=open&radiotf= 1&timeframe=nohttps://gus.fzk.de/ws/ticket_search.php?supportunit=TPM&status=open&radiotf= 4 OPEN TICKET(S) RED: https://gus.fzk.de/ws/ticket_info.php?ticket=355 https://gus.fzk.de/ws/ticket_info.php?ticket=346 https://gus.fzk.de/ws/ticket_info.php?ticket=337 https://gus.fzk.de/ws/ticket_info.php?ticket=309 0 OPEN TICKET(S) AMBER: 2 OPEN TICKET(S) YELLOW: https://gus.fzk.de/ws/ticket_info.php?ticket=347 https://gus.fzk.de/ws/ticket_info.php?ticket=338 2 OPEN TICKET(S) GREEN: https://gus.fzk.de/ws/ticket_info.php?ticket=356 https://gus.fzk.de/ws/ticket_info.php?ticket=354 Dear T1 network staff, ticket #39579 is updated. Reference Link : https://iwrgustrain.fzk.de/ws/ticket_info.php?ticket=39579 Ticket-ID : 39579 Responsible T1 : Status : in progress Short description: test51 Impacted Links : CERN-ASGC-LHCOPN-002,CERN-BNL-LHCOPN-001, Priority : less urgent Type of Impact : Connectivity Ticket Category : Incident L2 Last Modifier : Guillaume Cessieuxhttps://iwrgustrain.fzk.de/ws/ticket_info.php?ticket=39579
19
19 Wrap up on Ops model E2emon deployment/ reliability how this is really followed? Change “problem management” name because confusing Define the term “unreasonable” for escalation process New link-IDs for “hidden” links – LHCOPN-TW-ASGC-AMS-TPE-001? – DE-KIT-I-II-LHCOPN-001? Trouble/ticket’s responsibility GCX 2008-12-11
20
3- Implementation
21
21 Implementation: Next steps? Fill the GGUS authentication table... GCX 2008-12-11
22
22 Ops model implementation testing? Reminders, notifications... – Use “Test” tickets? – Be sure people are really reached Ops model testing – L2: Ask site to disconnect? To filter all traffic? – L3: Ask NREN to simulate a rogue cut? – Use backup tests? Part of a (regular?) process? GCX 2008-12-11
23
23 Ops model roadmap 2009 11810 LHCOPN meeting @ Copenhagen Ops model v2 proposed 6 Beta release of LHCOPN GGUS TTS LHC startup 34791212581091211 2010 634712581091211 LHCOPN meeting @ Berlin Public release of TTS Trial implementation Working implementation Ops testing dates Backup tests... End of EGEE-III GCX 2008-12-11
24
24 Quality assessment Infrastructure and operations Regular standard reports Way to be protected from passivity of sites? Service view provided for the LCG project Responsibilities for that? – Separate conclusion from processes to gather metrics GCX 2008-12-11
25
25 Backup tests Process seems ok – https://twiki.cern.ch/twiki/bin/view/LHCOPN/LhcopnBackupTests https://twiki.cern.ch/twiki/bin/view/LHCOPN/LhcopnBackupTests – Frequency unclear – Roadmap? Responsibilities for that? GCX 2008-12-11
26
26 Pending actions https://twiki.cern.ch/twiki/bin/viewauth/LHCOPN/OpsWG GCX 2008-12-11
27
27 AOB Monitoring? – Role of ENOC’s ASPDrawer? – ENOC’s DownCollector for the LHCOPN? What to present at LHCOPN meeting? Next ops meeting/phoneconf? – Mutualised with any other event? – More people? Provide some real life example of Ops model implementation GCX 2008-12-11
28
Extra slides
29
29 Links Ops model – https://twiki.cern.ch/twiki/bin/view/LHCOPN/WebHome https://twiki.cern.ch/twiki/bin/view/LHCOPN/WebHome Ops WG page – https://twiki.cern.ch/twiki/bin/view/LHCOPN/OpsWG https://twiki.cern.ch/twiki/bin/view/LHCOPN/OpsWG GGUS – Dashboard: https://iwrgustrain.fzk.de/pages/all_lhcopn.php https://iwrgustrain.fzk.de/pages/all_lhcopn.php – Submit interface: https://iwrgustrain.fzk.de/pages/ticket_lhcopn.php https://iwrgustrain.fzk.de/pages/ticket_lhcopn.php GCX 2008-12-11
30
30 Sites (T0/T1) Proposed site implementation Grid Project (LCG) Sites (T0/T1s) Grid Data Manager Router Operators/ Site NOC Grid Network Networks providers Network providers A B C GCX 2008-12-11
31
31 L3 mapping over L2 ~ NowFuture? Site 1 Site 2 Site 3 L3 L2 Site 1 Site 2 Site 3 How does this impacts the ops model? GCX 2008-12-11
32
32 LHCOPN’s “hidden” links GCX 2008-12-11
33
33 TW-ASGC LHCOPN connectivity CH-CERN R1 CH-CERN R2 US-FNAL-CMS AMS TPE TW-ASGC NL-T1 ASGC-FERMI-LHCOPN-001, 1Gb CERN-ASGC-LHCOPN-004, 2Gb CERN-ASGC-LHCOPN-003, 10Gb ASGC-SARA-LHCOPN-001, 1Gb AMS-CHI-001, 2.5Gb CHI-TPE-001, 2.5Gb AMS-TPE-001, 10Gb Guillaume.Cessieux@cc.in2p3.fr 2008-04-22 CHI LHCOPN links Not LHCOPN links, but could affect LHCOPN connectivity GCX 2008-12-11
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.