EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN Operational model: Roles and functions Mathieu Goutelle (EGEE-II SA2 – CNRS), on behalf of the Operations WG LHCOPN Operations Workshop – CERN,
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, Outline Work done since July; Service Quality definition for the LHCOPN; Operational model: –Main ideas; –Roles of the identified entities; –Interactions between the identified entities; –Incident & Maintenance classifications.
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, Since the last OPN meeting… Based on the first draft we produced; Identification of the roles and functions for the LHCOPN operations: –The End-to-End Coordination Unit (E2ECU), –the LHC IP Coordination Unit (LIPCU), –the Router Operators (R-Op), –and, the Grid Operations Managers (Grid-OM). Definition of the Service Quality required for the LHCOPN; Classification of sources for Incidents & Maintenance; Definition of procedures for each class; A first draft version has been produced: –Circulated to the LHCOPN mailing-lists; –Need some additional work to be finalized…
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, Service Quality Networking is mentioned in the LCG MoU (version of the 2nd of August 2007): –Requirements are very basic; –Accurate definition missing for the metrics & indicators used. Definition of requirements and criteria: –Behaviour of the OPN in case of outage; –Protection of the primary traffic against the backup one; –Time to respond after the very first basic investigations; –Maximum failure duration; Still a need to be elaborated later to meet: –The LHCOPN primary goal, –And the stakeholders constraints and means.
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, Main ideas Multi-domain issue: –Rationale behind the introduction of the Coordination entities; Speaking only about roles here: –Implementation can differ between domains/sites/… –Some implementation choice is not yet decided. We won’t deal about the low layer here: –Dealt with NRENs/E2ECU processes in a separate E2ECU document. FunctionsOperationsCoordination Data movements service Grid Operations Mgrs IP serviceRouter OperatorsLIPCU E2E Layer 2 serviceNRENsE2ECU
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, The End-to-End Coordination Unit The physical layer of the LHCOPN is handled by the E2ECU whose functions are: –Fault detection on end-to-end links: supervision of the e2e links and fault/maintenance announcement via Trouble Ticket distribution to the LIPCU. Currently restricted to Tier-0/Tier-1 links and Tier-1/Tier-1; –Co-ordination of the troubleshooting of network incidents; –Provision of monthly reports to NREN NOCs. Already exists (implemented by DANTE) and shared between various projects that use e2e links including the LHCOPN. Procedures described in another document.
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, The Grid Operations Managers Play the role of the LHCOPN users: –Represent all the LHC experiments; –No differentiation between specific groups. The functions of the Grid Operations Managers are the following: –Raising tickets with the LIPCU in case it detects a trouble or a trouble is reported to them. –Acting as the sole contact point for LHCOPN related issues. This means that apart from the Grid-OMs, no one else from the LHC user community is entitled to send a request to the LIPCU. –Receiving updates about any LIPCU ticket and distributing them as they consider appropriate.
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, The Router Operators Responsible for the Layer 3 equipments of the LHCOPN: –Able to modify their configurations; –In respect to LHCOPN operations, only contacts with the LIPCU who coordinates their actions; –Also responsible for the part of the monitoring framework required for the operations of the service that lies at their site. Often implemented in the T0/T1 NOCs: –Exceptions exist however where the responsible entity of the router lies also in another domain. Their functions are: –Receive incident report from the LIPCU ; –Act upon any incident report in case they are responsible (even partially) in the causes, following the agreed procedures; –Notify the LIPCU of any likely incident they detect or they schedule to carry out on the LHCOPN; –Maintain the part of the monitoring framework they're responsible for (located at their sites);
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, The LHC IP Coordination Unit The service layer is co-ordinated by the LIPCU: –Fault detection on the service provided between the sites: supervision of the IP service (routing status, performance issues) and fault/maintenance announcement via Trouble Ticket distribution; –Helpdesk service: reception of troubles from R-Ops or Grid-OM ; –Co-ordination of the troubleshooting of the issues; –Provision of periodic reports. Interface between the Grid-OM and the rest of the LHCOPN operations; Interface between the R-Ops and the E2ECU: –To streamline procedures of R-Ops; –Left to the LIPCU to deal with a request internally or forward it to the E2ECU depending on the results of its preliminary investigations.
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, Interactions NRENs & GÉANT2 NOCs E2ECU Routers Operators LHC IP CU Grid Operations Managers Grid Operations Managers LIPCU tickets request E2ECU update notifications LIPCU update notifications E2ECU tickets request
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, Incident & Maintenance classification Classified by their primary sources: –Incident reported by a Grid-OM, –Incident detected by the LIPCU e.g. by means of monitoring tools, –Scheduled maintenance reported by an R-Op, –Incidents reported by an R-Op, –Scheduled maintenance reported by the E2ECU, –Incident reported by the E2ECU. Missing sources? –Procedures will be updated/added accordingly.
Enabling Grids for E-sciencE EGEE-II INFSO-RI LHCOPN Operations Workshop – CERN, Comments / Questions?