Download presentation
Presentation is loading. Please wait.
Published byHugo Snow Modified over 9 years ago
1
EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks LHCOPN Operational model: Roles and functions Mathieu Goutelle (EGEE-II SA2 – CNRS), on behalf of the Operations WG LHCOPN Operations Workshop – CERN, 2007-11-05
2
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 2 Outline Work done since July; Service Quality definition for the LHCOPN; Operational model: –Main ideas; –Roles of the identified entities; –Interactions between the identified entities; –Incident & Maintenance classifications.
3
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 3 Since the last OPN meeting… Based on the first draft we produced; Identification of the roles and functions for the LHCOPN operations: –The End-to-End Coordination Unit (E2ECU), –the LHC IP Coordination Unit (LIPCU), –the Router Operators (R-Op), –and, the Grid Operations Managers (Grid-OM). Definition of the Service Quality required for the LHCOPN; Classification of sources for Incidents & Maintenance; Definition of procedures for each class; A first draft version has been produced: –Circulated to the LHCOPN mailing-lists; –Need some additional work to be finalized…
4
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 4 Service Quality Networking is mentioned in the LCG MoU (version of the 2nd of August 2007): –Requirements are very basic; –Accurate definition missing for the metrics & indicators used. Definition of requirements and criteria: –Behaviour of the OPN in case of outage; –Protection of the primary traffic against the backup one; –Time to respond after the very first basic investigations; –Maximum failure duration; Still a need to be elaborated later to meet: –The LHCOPN primary goal, –And the stakeholders constraints and means.
5
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 5 Main ideas Multi-domain issue: –Rationale behind the introduction of the Coordination entities; Speaking only about roles here: –Implementation can differ between domains/sites/… –Some implementation choice is not yet decided. We won’t deal about the low layer here: –Dealt with NRENs/E2ECU processes in a separate E2ECU document. FunctionsOperationsCoordination Data movements service Grid Operations Mgrs IP serviceRouter OperatorsLIPCU E2E Layer 2 serviceNRENsE2ECU
6
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 6 The End-to-End Coordination Unit The physical layer of the LHCOPN is handled by the E2ECU whose functions are: –Fault detection on end-to-end links: supervision of the e2e links and fault/maintenance announcement via Trouble Ticket distribution to the LIPCU. Currently restricted to Tier-0/Tier-1 links and Tier-1/Tier-1; –Co-ordination of the troubleshooting of network incidents; –Provision of monthly reports to NREN NOCs. Already exists (implemented by DANTE) and shared between various projects that use e2e links including the LHCOPN. Procedures described in another document.
7
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 7 The Grid Operations Managers Play the role of the LHCOPN users: –Represent all the LHC experiments; –No differentiation between specific groups. The functions of the Grid Operations Managers are the following: –Raising tickets with the LIPCU in case it detects a trouble or a trouble is reported to them. –Acting as the sole contact point for LHCOPN related issues. This means that apart from the Grid-OMs, no one else from the LHC user community is entitled to send a request to the LIPCU. –Receiving updates about any LIPCU ticket and distributing them as they consider appropriate.
8
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 8 The Router Operators Responsible for the Layer 3 equipments of the LHCOPN: –Able to modify their configurations; –In respect to LHCOPN operations, only contacts with the LIPCU who coordinates their actions; –Also responsible for the part of the monitoring framework required for the operations of the service that lies at their site. Often implemented in the T0/T1 NOCs: –Exceptions exist however where the responsible entity of the router lies also in another domain. Their functions are: –Receive incident report from the LIPCU ; –Act upon any incident report in case they are responsible (even partially) in the causes, following the agreed procedures; –Notify the LIPCU of any likely incident they detect or they schedule to carry out on the LHCOPN; –Maintain the part of the monitoring framework they're responsible for (located at their sites);
9
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 9 The LHC IP Coordination Unit The service layer is co-ordinated by the LIPCU: –Fault detection on the service provided between the sites: supervision of the IP service (routing status, performance issues) and fault/maintenance announcement via Trouble Ticket distribution; –Helpdesk service: reception of troubles from R-Ops or Grid-OM ; –Co-ordination of the troubleshooting of the issues; –Provision of periodic reports. Interface between the Grid-OM and the rest of the LHCOPN operations; Interface between the R-Ops and the E2ECU: –To streamline procedures of R-Ops; –Left to the LIPCU to deal with a request internally or forward it to the E2ECU depending on the results of its preliminary investigations.
10
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 10 Interactions NRENs & GÉANT2 NOCs E2ECU Routers Operators LHC IP CU Grid Operations Managers Grid Operations Managers LIPCU tickets request E2ECU update notifications LIPCU update notifications E2ECU tickets request
11
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 11 Incident & Maintenance classification Classified by their primary sources: –Incident reported by a Grid-OM, –Incident detected by the LIPCU e.g. by means of monitoring tools, –Scheduled maintenance reported by an R-Op, –Incidents reported by an R-Op, –Scheduled maintenance reported by the E2ECU, –Incident reported by the E2ECU. Missing sources? –Procedures will be updated/added accordingly.
12
Enabling Grids for E-sciencE EGEE-II INFSO-RI-031688 LHCOPN Operations Workshop – CERN, 2007-11-05 12 Comments / Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.