LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) on behalf of the Ops WG LHCOPN meeting, 2008-10-16, Copenhagen.

Slides:



Advertisements
Similar presentations
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Advertisements

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Helmut Dres, Institute For Scientific Computing – GDB Meeting Global Grid User Support.
Africa & Arabia ROC tutorial Model for L1-L2 user support based on x-GUS Mario Reale GARR - Italy ASREN-JUNET Grid School - 24 November 2011 Africa & Arabia.
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Wofgang Thöne, Institute For Scientific Computing – EGEE-Meeting August 2004 Welcome to the User.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Network trouble ticket standardisation -
Connect. Communicate. Collaborate Place your organisation logo in this area End-to-End Coordination Unit Toby Rodwell, Network Engineer, DANTE TNLC, 28.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operations update Guillaume Cessieux.
INFSO-RI Enabling Grids for E-sciencE GLOBAL GRID USER SUPPORT THE MODEL AND EXPERIENCE IN LCG/EGEE Gilles Mathieu(1), Torsten Antoni(2),
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Ops WG Act 4 – Conclusion Guillaume.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN operations Presentation and training.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Ops WG Act 5 Guillaume Cessieux (CNRS/IN2P3-CC,
INFSO-RI Enabling Grids for E-sciencE EGEE 1 st EU Review – 9 th to 11 th February 2005 CERN.
1 LHC-OPN 2008, Madrid, th March. Bruno Hoeft, Aurelie Reymund GridKa – DE-KIT procedurs Bruno Hoeft LHC-OPN Meeting 10. –
EGEE is a project funded by the European Union under contract IST User support in EGEE Alistair Mills Torsten Antoni EGEE-3 Conference 20 April.
EGEE-III Enabling Grids for E-sciencE EGEE and gLite are registered trademarks 2008 report on LHCOPN from ASPDrawer
LHCOPN operational working group Guillaume Cessieux (CNRS/FR-CCIN2P3 – EGEE SA2) third meeting CERN – December th, 2008
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operations Automation Team James Casey EGEE’08.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-EGI Grid Operations Transition Maite.
Proposal of interface between GUS + Call Center and Experiments GDB Meeting – Klaus-Peter Mickel GridKa Karlsruhe.
Connect communicate collaborate Design and Set Up of the New GÉANT NOC Toby Rodwell, DANTE TNC09, 9 June 2009.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE-III Network activity overall Xavier.
EGEE is a project funded by the European Union under contract IST Support in EGEE Ron Trompert SARA NEROC Meeting, 28 October
Connect. Communicate. Collaborate Using PerfSONAR tools in a production environment Marian Garcia, Operations Manager, DANTE Joint Tech Workshop, 16 th.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ENOC - Status and plans Guillaume Cessieux.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Standard network trouble tickets exchange.
Operations Working Group Summary Ian Bird CERN IT-GD 4 November 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
PIC port d’informació científica EGEE – EGI Transition for WLCG in Spain M. Delfino, G. Merino, PIC Spanish Tier-1 WLCG CB 13-Nov-2009.
INFSO-RI Enabling Grids for E-sciencE NRENs & Grids Workshop Relations between EGEE & NRENs Mathieu Goutelle (CNRS UREC) EGEE-SA2.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Alistair.
LHCOPN: Operations status LHCOPN: Operations status cc.in2p3.fr Network team, FR-CCIN2P3 LHCOPN meeting, Barcelona,
INFSO-RI SA2 ETICS2 first Review Valerio Venturi INFN Bruxelles, 3 April 2009 Infrastructure Support.
INFSO-RI Enabling Grids for E-sciencE User and Virtual Organisation Support in EGEE Flavia Donno, CERN Torsten Antoni, FZK Alistair.
Charaka Palansuriya EPCC, The University of Edinburgh An Alarms Service for Federated Networks Charaka.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A three years thorough review of a project’s.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1 & SA2-ENOC Interactions status and plans.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operations WS: Introduction & Objectives.
CERN IT Department CH-1211 Geneva 23 Switzerland t James Casey CCRC’08 April F2F 1 April 2008 Communication with Network Teams/ providers.
LHCOPN operational model - 4 use-cases Guillaume Cessieux (FR-CCIN2P3 / EGEE networking support) on behalf of the Ops WG LHCOPN meeting, , Berlin.
David Foster, CERN GDB Meeting April 2008 GDB Meeting April 2008 LHCOPN Status and Plans A lot more detail at:
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operational model: Roles and functions.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN operations Presentation and training.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN operations Presentation and training.
Opensciencegrid.org Operations Interfaces and Interactions Rob Quick, Indiana University July 21, 2005.
CERN - IT Department CH-1211 Genève 23 Switzerland t IT-GD-OPS attendance to EGEE’09 IT/GD Group Meeting, 09 October 2009.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
INFSO-RI Enabling Grids for E-sciencE Network Services Development Network Resource Provision 3 rd EGEE Conference, Athens, 20 th.
LCG Workshop User Support Working Group 2-4 November 2004 – n o 1 Some thoughts on planning and organization of User Support in LCG/EGEE Flavia Donno LCG.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operational Procedures (Contacts, procedures,
EGI-Engage is co-funded by the Horizon 2020 Framework Programme of the European Union under grant number GGUS Service Provider GGUS –
LHC-OPN operations Roberto Sabatino LHC T0/T1 networking meeting Amsterdam, 31 January 2006.
LHCOPN operational model Guillaume Cessieux (CNRS/FR-CCIN2P3, EGEE SA2) On behalf of the LHCOPN Ops WG GDB CERN – November 12 th, 2008.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
Scuola Grid - Martina Franca, Thursday 08 November Il Sistema di Supporto INFNGrid & GGUS ( Global Grid User.
Connect. Communicate. Collaborate Place your organisation logo in this area End-to-End Coordination Unit Marian Garcia, Operations Manager, DANTE LHC Meeting,
INFN-Grid WS, Bari, 2004/10/15 Andrea Caltroni, INFN-Padova Marco Verlato, INFN-Padova Andrea Ferraro, INFN-CNAF Bologna EGEE User Support Report.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ENOC status LHC-OPN meeting – ,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Operating an Optical Private Network: the.
LHCOPN operational handbook Documenting processes & procedures Presented by Guillaume Cessieux (CNRS/IN2P3-CC) on behalf of CERN & EGEE-SA2 LHCOPN meeting,
Bob Jones EGEE Technical Director
Operations Interfaces and Interactions
Ian Bird GDB Meeting CERN 9 September 2003
LCG/EGEE Incident Response Planning
LHCOPN Operations: Yearly review
Networking support (SA2) tasks for EGI
‘s tools targeted to be useful for COD activity
LCG Operations Workshop, e-IRG Workshop
Presentation transcript:

LHCOPN operational working group report Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) on behalf of the Ops WG LHCOPN meeting, , Copenhagen

Schedule 1 - Ops WG: Ops WG: Who, what, when The proposed tightened operational model Remaining work 2 – Things around GGUS and demo 2GCX - LHCOPN meeting

Background Last LHCOPN’s meeting (CERN, June 2008) actions on Operations – A1: Put together a working group to complete the ops models and publish – A2: Take input from ISHARE work of GN2 – A3: Clarify the operational issues with E2ECU What is the status of the E2ECU? What does it manage? What is the perfsonar deployment status? How is the E2ECU service measured? – A4: Demonstration of the GGUS/OPN ticketing system at the next meeting – A5: Regular tests must be part of the operational procedures 3GCX - LHCOPN meeting

Current way to follow LHCOPN’s troubles Essence of: Confusion – No guideline, no role, no responsibilities Hope – Mail with 10 people in CC Result: – Running around like chickens without head (c) – No transparency – Operational model required 4GCX - LHCOPN meeting

Operational working group: reloaded 11 members after public call for membership Very interesting mix of viewpoints – 1 NREN, 5 sites, DANTE, EGEE Administrative things – project-lhcopn-opswg AT cern.ch project-lhcopn-opswg AT cern.ch – Emma Apted (DANTE)David Foster (CH-CERN)Ludwig Pregernig (CH-CERN) Gerard Bernabeu (ES-PIC)Bruno Hoeft (DE-KIT)Franck Simon (RENATER, FR) James Casey (CH-CERN, EGEE-SA1)Xavier Jeannin (CNRS, EGEE-SA2)Robin Tasker (UK-T1-RAL) Guillaume Cessieux (FR-CCIN2P3, EGEE-SA2)Edoardo Martelli (CH-CERN) 5GCX - LHCOPN meeting

Main work done Two fruitful meetings September 9-10 – October 9-10 – New method to document was powerfull Tightening the operational model – concrete proposal – Light and driven by things currently working 6GCX - LHCOPN meeting

The proposed operational model Now structured, explained and published on twiki Key changes – Simplification – E2ECU’s role – Grid interactions removed → 7GCX - LHCOPN meeting

Structure of the Ops model Foundation – Drawing convention – Actors & Information repository management Processes: – Incident – Change Maintenance 8GCX - LHCOPN meeting

Drawing conventions Actor D Information repository 1 A is responsible for 1 (the set up, not for its contents) Process E Actor C * Actor A (Current implementation) Actor B A starts process E A «interacts » with B Information repository 2 B reads and writes into 1 C reads into 2 2 notifies D (alarms…) 1 and 2 exchange TT Possible initiator of the process = optional (relations) or not yet existing (actors and information repositories) B may « interact » with C 9

Grid Projects (LCG (EGEE)) Sites (T0/T1) Sites (T0/T1) L2 Networks providers (GEANT2,NRENs) European / Non European Public/Private L2 Networks providers (GEANT2,NRENs) European / Non European Public/Private LHCOPN Actors Sites (T0/T1) LCU Actor L2 Networks providers (GÉANT2,NRENs…) European / Non European Public/Private NOC/ Router operators Grid data managers L2 NOC Infrastructure Operators Users DANTE Operation 10

Grid TTS (GGUS) Global web repository (Twiki) DANTE Actors and information repositories management Operation LHCOPN TTS (GGUS) L2 Monitoring (perfSONAR e2emon) L3 monitoring LCU (ENOC) Information repository Actor MDMBGP A is responsible for B BA Operational procedures Operational contacts Technical information Change management DB Statistics reports Grid Project operation (EGEE SA1) L2 NOC 11

Information access BA BA A reads B A reads and writes B Sites LHCOPN TTS (GGUS) L2 Monitoring (perfSONAR e2emon) L3 monitoring LCU (ENOC) L2 network providers Global web repository (Twiki) Statistics L2 NOC 12

Problem management process Global web repository (Twiki) L2 - L3 Monitoring Site * Router operators * Grid Data manager LHCOPN TTS (GGUS) A goes to process BAB Start L3 incident management OK L2 incident management OK escalated incident management BAA reads BA B A interacts with B

L3 Incident management process Source site involved Site involved A notifies B Grid Data manager * Router operators Router operators A AB B A interacts with B Other Sites 1.2 LHCOPN TTS (GGUS) L2 incident management (1.3) BAA reads and writes BA goes to process BAB 14 Scope: Router down, BGP filtering, bad routing...

Sites linked L2 Incident management process Sites linked * L2 NOC Grid Data manager * Router operators All sites LHCOPN TTS (GGUS) * End of L3 incident management A notifies B A AB B A interacts with BBAA reads and writes B escalated incident management (3) 15 Scope: Dark fibres outages...

Escalated incident management If problem not understood or solved within reasonable delay Backup process – Started by router operator – Phoneconf with all potentially involved actors – Workplan to fix issue to be decided GCX - LHCOPN meeting

Change vs Maintenance Change management is a top process – Tracks and document Change with impacts – Committed with maintenance Some maintenances without change... GCX - LHCOPN meeting

Linked Sites L3 Change Management Source site Grid Data manager Router * operators Affected Sites Router operators L3 maintenance management Global web repository (Twiki) All sites A notifies B A AB B A interacts with BBAA reads and writes B Monitoring (2.3) (4) LHCOPN TTS (GGUS) 3 18 Scope: IP addresses change, new prefix propagated, new filtering

L2 Change Management Linked site Grid Data manager Router operators L2 maintenance management Global web repository (Twiki) All sites * L2 NOC Monitoring Linked Sites Affected Sites Router operators A notifies B A AB B A interacts with BBAA reads and writes B LHCOPN TTS (GGUS) L3 change management (4) 19 Scope: New LHCOPN L2 link, change of L2 network provider for a segment...

Impacted sites L3 Maintenance management process Source sites Grid Data manager * Router operators All sites A notifies B A AB B A interacts with BBAA reads and writes B Impacted sites Router operators LHCOPN TTS (GGUS) Scope: scheduled power outage on site, router IOS upgrade,...

Linked Sites L2 Maintenance management process * L2 NOC Linked Sites Grid Data manager Router operators All sites A notifies B A AB B A interacts with BBAA reads and writes B Linked Sites Router operators LHCOPN TTS (GGUS) Scope: optical transmitter to be changed, fibre physically rerouted...

Sample workflow for L2 incident/maintenance –Delay and reliability of the propagation +The way it currently works! Site ASite B NREN A* NREN BNREN C LHCOPN TTS (GGUS) All sites Users 4 22GCX - LHCOPN meeting

Remaining areas of work Grid interactions (the users!) Grid data managers Authentication for LHCOPN community – Certificate! – Restricted area on twiki... But with twiki/CERN account Quality assessment – Network, processes, monitoring (L2) Implementation details – Tools (GGUS...), notifications, communication channels... – Lot of work... 23GCX - LHCOPN meeting

Conclusion about ops model Not perfect... open to improvements – Ops WG ready for improvement process – Constructive feedback is welcome Need also to know if this is suitable – Key responsibilities on sites – Commitment from actors to follow it? Guillaume.Cessieux AT cc.in2p3.fr Implementation needed ASAP 24GCX - LHCOPN meeting

GGUS supporting the LHCOPN Guillaume Cessieux (FR-CCIN2P3 / EGEE-SA2) Thanks to the GGUS team LHCOPN meeting, , Copenhagen

Strong acknowledgements to the GGUS team (DE-KIT / EGEE-SA1), particularly: – Torsten Antoni – Helmut Dres – Guenter Grein For providing the LHCOPN helpdesk 26 GCX - LHCOPN meeting - Copenhagen

Schedule What is GGUS Why GGUS Live Demo! – Main screenshots in slides Remaining work 27 GCX - LHCOPN meeting - Copenhagen

What is GGUS 28 GCX - LHCOPN meeting - Copenhagen

Why GGUS for the LHCOPN ? +Existing, reliable, mature, secure, well known +Key features: tracking of events, reminders, notifications... +Grid world +Successful experience with the ENOC +Web interface and web services access -Very complex -Grid world ?Sustainability as part of EGEE 29 GCX - LHCOPN meeting - Copenhagen

What we have for LHCOPN Ticket handling – By router operators – Tickets ‘public’ for anyone authenticated to GGUS Acting only by ‘support staff’ Dashboard for LHCOPN tickets – Will be mapped on a calendar = planning Really tailored for the LHCOPN – Grid complexity removed! VO, ROC, TPM GCX - LHCOPN meeting - Copenhagen

Network Support - ENOC GGUS architecture ….. … Central Application (GGUS) Deployment Support RC 1RC X Middleware Support Operations Support TPM BIOMEDESR DS 1 DS 5 … MS 1 MS 8 … ROC 1 ROC 12 ROC… RC 1RC X… VO Support ALICE RC 1RC X… Interface Webportal LHCOPN Support 31 GCX - LHCOPN meeting - Copenhagen

Live Demo 32 GCX - LHCOPN meeting - Copenhagen

33 Submit form

Ticket view & history GCX - LHCOPN meeting - Copenhagen

Update form GCX - LHCOPN meeting - Copenhagen

LHCOPN dashboard GCX - LHCOPN meeting - Copenhagen

Works remaining Authentication! – List of certificate to be gathered Strategy for reminder and notification – Target s to be gathered → Private area on twiki! Template for common tickets – Minimum information required… Light documentation 37 GCX - LHCOPN meeting - Copenhagen

Conclusion around GGUS LHCOPN helpdesk available Several details to be sorted out before production use – Authentication, notification, workflow, documentation,... – Not yet perfect – will follow the ops model Will it be accepted? 38 GCX - LHCOPN meeting - Copenhagen

Questions & discussion GCX - LHCOPN meeting - Copenhagen