Presentation is loading. Please wait.

Presentation is loading. Please wait.

LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) on behalf of the Ops WG LHCOPN meeting, 2008-10-16, Copenhagen.

Similar presentations


Presentation on theme: "LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) on behalf of the Ops WG LHCOPN meeting, 2008-10-16, Copenhagen."— Presentation transcript:

1 LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) on behalf of the Ops WG LHCOPN meeting, 2008-10-16, Copenhagen

2 Schedule 1 - Ops WG: Ops WG: Who, what, when The proposed tightened operational model Remaining work 2 – Things around GGUS and demo 2GCX - LHCOPN meeting - 2008-10-16

3 Background Last LHCOPN’s meeting (CERN, June 2008) actions on Operations – A1: Put together a working group to complete the ops models and publish – A2: Take input from ISHARE work of GN2 – A3: Clarify the operational issues with E2ECU What is the status of the E2ECU? What does it manage? What is the perfsonar deployment status? How is the E2ECU service measured? – A4: Demonstration of the GGUS/OPN ticketing system at the next meeting – A5: Regular tests must be part of the operational procedures 3GCX - LHCOPN meeting - 2008-10-16

4 Current way to follow LHCOPN’s troubles Essence of: Confusion – No guideline, no role, no responsibilities Hope – Mail with 10 people in CC Result: – Running around like chickens without head (c) – No transparency – Operational model required 4GCX - LHCOPN meeting - 2008-10-16

5 Operational working group: reloaded 11 members after public call for membership Very interesting mix of viewpoints – 1 NREN, 5 sites, DANTE, EGEE Administrative things – project-lhcopn-opswg AT cern.ch project-lhcopn-opswg AT cern.ch – https://twiki.cern.ch/twiki/bin/view/LHCOPN/OpsWG https://twiki.cern.ch/twiki/bin/view/LHCOPN/OpsWG Emma Apted (DANTE)David Foster (CH-CERN)Ludwig Pregernig (CH-CERN) Gerard Bernabeu (ES-PIC)Bruno Hoeft (DE-KIT)Franck Simon (RENATER, FR) James Casey (CH-CERN, EGEE-SA1)Xavier Jeannin (CNRS, EGEE-SA2)Robin Tasker (UK-T1-RAL) Guillaume Cessieux (FR-CCIN2P3, EGEE-SA2)Edoardo Martelli (CH-CERN) 5GCX - LHCOPN meeting - 2008-10-16

6 Main work done Two fruitful meetings September 9-10 th @CERN – http://indico.cern.ch/conferenceDisplay.py?confId=37175 http://indico.cern.ch/conferenceDisplay.py?confId=37175 October 9-10 th @CERN – http://indico.cern.ch/conferenceDisplay.py?confId=38583 http://indico.cern.ch/conferenceDisplay.py?confId=38583 New method to document was powerfull Tightening the operational model – concrete proposal – Light and driven by things currently working 6GCX - LHCOPN meeting - 2008-10-16

7 The proposed operational model Now structured, explained and published on twiki https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel https://twiki.cern.ch/twiki/bin/view/LHCOPN/OperationalModel Key changes – Simplification – E2ECU’s role – Grid interactions removed → 7GCX - LHCOPN meeting - 2008-10-16

8 Structure of the Ops model Foundation – Drawing convention – Actors & Information repository management Processes: – Incident – Change Maintenance 8GCX - LHCOPN meeting - 2008-10-16

9 Drawing conventions Actor D Information repository 1 A is responsible for 1 (the set up, not for its contents) Process E Actor C * Actor A (Current implementation) Actor B A starts process E A «interacts » with B Information repository 2 B reads and writes into 1 C reads into 2 2 notifies D (alarms…) 1 and 2 exchange TT Possible initiator of the process = optional (relations) or not yet existing (actors and information repositories) B may « interact » with C 9

10 Grid Projects (LCG (EGEE)) Sites (T0/T1) Sites (T0/T1) L2 Networks providers (GEANT2,NRENs) European / Non European Public/Private L2 Networks providers (GEANT2,NRENs) European / Non European Public/Private LHCOPN Actors Sites (T0/T1) LCU Actor L2 Networks providers (GÉANT2,NRENs…) European / Non European Public/Private NOC/ Router operators Grid data managers L2 NOC Infrastructure Operators Users DANTE Operation 10

11 Grid TTS (GGUS) Global web repository (Twiki) DANTE Actors and information repositories management Operation LHCOPN TTS (GGUS) L2 Monitoring (perfSONAR e2emon) L3 monitoring LCU (ENOC) Information repository Actor MDMBGP A is responsible for B BA Operational procedures Operational contacts Technical information Change management DB Statistics reports Grid Project operation (EGEE SA1) L2 NOC 11

12 Information access BA BA A reads B A reads and writes B Sites LHCOPN TTS (GGUS) L2 Monitoring (perfSONAR e2emon) L3 monitoring LCU (ENOC) L2 network providers Global web repository (Twiki) Statistics L2 NOC 12

13 Problem management process Global web repository (Twiki) L2 - L3 Monitoring Site * Router operators * Grid Data manager LHCOPN TTS (GGUS) A goes to process BAB Start L3 incident management OK L2 incident management OK escalated incident management BAA reads BA B A interacts with B 1 2 3 4 5 13

14 L3 Incident management process Source site involved Site involved A notifies B Grid Data manager * Router operators Router operators A AB B A interacts with B Other Sites 1.2 LHCOPN TTS (GGUS) L2 incident management 1.4 1.1 2(1.3) BAA reads and writes BA goes to process BAB 14 Scope: Router down, BGP filtering, bad routing...

15 Sites linked L2 Incident management process Sites linked * L2 NOC Grid Data manager * Router operators All sites LHCOPN TTS (GGUS) * End of L3 incident management A notifies B A AB B A interacts with BBAA reads and writes B 1.11.3 1.2 2 escalated incident management (3) 15 Scope: Dark fibres outages...

16 Escalated incident management If problem not understood or solved within reasonable delay Backup process – Started by router operator – Phoneconf with all potentially involved actors – Workplan to fix issue to be decided GCX - LHCOPN meeting - 2008-10-1616

17 Change vs Maintenance Change management is a top process – Tracks and document Change with impacts – Committed with maintenance Some maintenances without change... GCX - LHCOPN meeting - 2008-10-1617

18 Linked Sites L3 Change Management Source site Grid Data manager Router * operators Affected Sites Router operators L3 maintenance management Global web repository (Twiki) All sites A notifies B A AB B A interacts with BBAA reads and writes B Monitoring 1.1 1.2 2.1 2.2 (2.3) (4) LHCOPN TTS (GGUS) 3 18 Scope: IP addresses change, new prefix propagated, new filtering

19 L2 Change Management Linked site Grid Data manager Router operators L2 maintenance management Global web repository (Twiki) All sites * L2 NOC Monitoring Linked Sites Affected Sites Router operators A notifies B A AB B A interacts with BBAA reads and writes B 1.1 1.2 1.3 2.1 2.2 2.3 3 LHCOPN TTS (GGUS) L3 change management (4) 19 Scope: New LHCOPN L2 link, change of L2 network provider for a segment...

20 Impacted sites L3 Maintenance management process Source sites Grid Data manager * Router operators All sites A notifies B A AB B A interacts with BBAA reads and writes B Impacted sites Router operators LHCOPN TTS (GGUS) 1.1 1.2 2 3 20 Scope: scheduled power outage on site, router IOS upgrade,...

21 Linked Sites L2 Maintenance management process * L2 NOC Linked Sites Grid Data manager Router operators All sites A notifies B A AB B A interacts with BBAA reads and writes B Linked Sites Router operators LHCOPN TTS (GGUS) 1.1 1.4 1.2 1.3 2 21 Scope: optical transmitter to be changed, fibre physically rerouted...

22 Sample workflow for L2 incident/maintenance –Delay and reliability of the propagation +The way it currently works! Site ASite B NREN A* NREN BNREN C LHCOPN TTS (GGUS) All sites 1 2 3 Users 4 22GCX - LHCOPN meeting - 2008-10-16

23 Remaining areas of work Grid interactions (the users!) Grid data managers Authentication for LHCOPN community – Certificate! – Restricted area on twiki... But with twiki/CERN account Quality assessment – Network, processes, monitoring (L2) Implementation details – Tools (GGUS...), notifications, communication channels... – Lot of work... 23GCX - LHCOPN meeting - 2008-10-16

24 Conclusion about ops model Not perfect... open to improvements – Ops WG ready for improvement process – Constructive feedback is welcome Need also to know if this is suitable – Key responsibilities on sites – Commitment from actors to follow it? Guillaume.Cessieux AT cc.in2p3.fr Implementation needed ASAP 24GCX - LHCOPN meeting - 2008-10-16

25 GGUS supporting the LHCOPN Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) Thanks to the GGUS team LHCOPN meeting, 2008-10-16, Copenhagen

26 Strong acknowledgements to the GGUS team (DE-KIT / EGEE-SA1), particularly: – Torsten Antoni – Helmut Dres – Guenter Grein For providing the LHCOPN helpdesk 26 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

27 Schedule What is GGUS Why GGUS Live Demo! – Main screenshots in slides Remaining work 27 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

28 What is GGUS 28 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

29 Why GGUS for the LHCOPN ? +Existing, reliable, mature, secure, well known +Key features: tracking of events, reminders, notifications... +Grid world +Successful experience with the ENOC +Web interface and web services access -Very complex -Grid world ?Sustainability as part of EGEE 29 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

30 What we have for LHCOPN Ticket handling – By router operators – Tickets ‘public’ for anyone authenticated to GGUS Acting only by ‘support staff’ Dashboard for LHCOPN tickets – Will be mapped on a calendar = planning Really tailored for the LHCOPN – Grid complexity removed! VO, ROC, TPM... 30 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

31 Network Support - ENOC GGUS architecture ….. … Central Application (GGUS) Deployment Support RC 1RC X Middleware Support Operations Support TPM BIOMEDESR DS 1 DS 5 … MS 1 MS 8 … ROC 1 ROC 12 ROC… RC 1RC X… VO Support ALICE RC 1RC X… Interface Webportal LHCOPN Support 31 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

32 Live Demo 32 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

33 33 Submit form

34 Ticket view & history GCX - LHCOPN meeting - Copenhagen - 2008-10-16 34

35 Update form GCX - LHCOPN meeting - Copenhagen - 2008-10-16 35

36 LHCOPN dashboard GCX - LHCOPN meeting - Copenhagen - 2008-10-16 36

37 Works remaining Authentication! – List of certificate to be gathered Strategy for reminder and notification – Target e-mails to be gathered → Private area on twiki! Template for common tickets – Minimum information required… Light documentation 37 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

38 Conclusion around GGUS LHCOPN helpdesk available Several details to be sorted out before production use – Authentication, notification, workflow, documentation,... – Not yet perfect – will follow the ops model Will it be accepted? 38 GCX - LHCOPN meeting - Copenhagen - 2008-10-16

39 Questions & discussion GCX - LHCOPN meeting - Copenhagen - 2008-10-16 39


Download ppt "LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) on behalf of the Ops WG LHCOPN meeting, 2008-10-16, Copenhagen."

Similar presentations


Ads by Google