Presentation is loading. Please wait.

Presentation is loading. Please wait.

SA2: Networking Support Status Report

Similar presentations


Presentation on theme: "SA2: Networking Support Status Report"— Presentation transcript:

1 SA2: Networking Support Status Report
Xavier Jeannin Activity Manager CNRS EGEE-III First Review, June, 2009

2 SA2 Overview 6 countries and one international entity SA2 Budget
Country Total PM planned at M24 Total FTE France 96 4.0 Germany 12 0.5 Greece 18 0.8 Italy Russia 6 0.3 Spain DANTE (GEANT2) 3 0.1 Total PM planned at M24 153 6.4 SA2 Budget Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

3 SA2 Global view SA2 – EGEE-III TSA2.1 Running the ENOC
TSA2.4 Management and general project tasks TSA2.2 Support for the ENOC Operational procedures (CNRS) TSA2.3 Overall Networking coordination WLCG Support (CNRS) IPv6 (GARR, CNRS) Operational tools and maintenance (RRC-KI, CNRS) IPv6 (GARR, CNRS) TT exchange standardization (GRNET) Monitoring (DFN) Advanced network services (GRNET) Troubleshooting (DFN) Site networking needs (RedIRIS) TNLC Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

4 EGEE Network Operation Centre
A single point of contact between EGEE and the NRENs Sites GGUS Users Support Units NRENs GÉANT2 EGEE Network ENOC Role of the ENOC GÉANT2 NREN A RC 1 Grid site 1 NREN B RC 2 Grid site 2 Operated by DANTE Operated by NOC of NREN A Operated by NOC of NREN B Operated by NOC of RC2 Operated by NOC of RC1 A single point of contact between EGEE and the NRENs where EGEE and the network can exchange operational information A Network support unit in GGUS GGUS = global grid user support ENOC ensuring E2E connectivity for Grid sites Assess the impact on the Grid of network trouble Troubleshoot problems Provide support to users Identify the faulty domain Assess the network connectivity of the Grid sites ENOC ensuring E2E connectivity for Grid sites on the whole path Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

5 Network connectivity assessment
Assessment for year 2008 on EGEE certified Grid sites (~ 300) (Tool DownCollector ) Network troubles are not concentrated on few sites More than half of connectivity problems detected are on-sites 80% of off-site network troubles are solved within 30 minutes Only ~ 45/month last more 80% Networking Support – Xavier Jeannin - EGEE-III First Review June 2009 5

6 ENOC metrics 19 NRENS sending their tickets, 11 languages
Network Language Kind ACONET German NREN CESNET Czech DFN E2ECU English LHCOPN GARR Italian GEANT2 REGIONAL GRNET Greek HEANET HUNGARNET Hungarian ILAN JANET NORDUNET PIONIER Polish RBNET/RUNNET Russian REDIRIS Spanish RENATER French SURFNET SWITCH TWAREN Chinese Total: 11 Very few Grid user notifications about network problems 19 NRENS sending their tickets, 11 languages Steady stream of s/mth, 800 tickets/mth 75% of European EGEE certified sites covered Usage information processed by the ENOC is more and more used Nb of Hits has been multiplied by 6 since 2008 Data downloaded have increased by 5 since 2008 Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

7 WLCG Support EGEE will be the main user of the LHCOPN
SA2 has taken the lead in designing and implementing a pioneering federated operational model for the LHCOPN Distributed not centralized. Tiers are responsible for network operation ( Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

8 WLCG Support Processes were documented and disseminated
Several meetings and training sessions help the dissemination Related tools were released, including a GGUS helpdesk tailored for the LHCOPN Implementation is ongoing and will be ready for LHC start-up Example of layer 2 incident management Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

9 Operational tools and maintenance
Trouble matching and correlation for the ENOC Correlate tickets with monitoring data Better assessment of the impact on the Grid of trouble tickets Be able to warn the Grid operation in case of network connectivity outage of EGEE sites Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

10 Operational tools and maintenance
First stage of our study The results are experimental and should improve Future work plan includes: Moving from experiment to production Automatic ticket ranking based on matching results Tuning of matching algorithm, possibly through more extensive use of the topology knowledge Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

11 Network monitoring tools
Network monitoring tools for efficient troubleshooting PerfSONAR-Lite TroubleShooting Services Based on PerfSONAR-PS Launch test on demand from a Grid site under central server control: Bandwidth measurements DNS lookup Traceroute Port testing Ping Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

12 Network monitoring tools
First beta-release is expected in June Beta-testers: CNRS, NorduNET, GARR. First version Autumn 2009 Detection of asymmetric traffic by launching a traceroute test on the remote site Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

13 Sites networking needs
Assess network requirements (bandwidth, delay, jitter, etc.) for a site within the Grid, according to the kind of site and VOs supported Empirical approach Deployment of perfSONAR at country scale RedIRIS provides significant additional effort for this task than funded through EGEE First deployment in Europe over several domains (4 domains, 8 sites) of such solution (no appliance box is used) PerfSONAR is deployed into EGEE sites and into networks used. Issue about interoperability between perfSONAR versions perfSONAR MDM (Multi-Domain Monitoring) and perfSONAR PS First deployment end of September Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

14 Sites networking needs
EGEE site USC EGEE site CESGA EGEE site IFAE EB-Santander0 IFCA EB-Bilbao0 TIER 1 EB-Santiago0 EGEE site PIC UB Regional Network EB-Iris4 GW-Barcelona0 Anella CESCA GW-Nacional2 GW-Madrid0 CAM EB-Barcelona0 GW-Nacional1 GW-Valencia0 UAM EB-Madrid0 EGEE site CIEMAT EGEE site EB-Iris2 IFIC EGEE site Topology of the network monitored by this task Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

15 Advanced network services
Collaboration with AMPS team - Advanced Multi-domain Provisioning System – in order to automate network SLA establishment Development of a web interface to manage the EGEE SLA requests Store and manage the EGEE users’ SLA requests ENOC will act on behalf of the user The user request is stored into the ENOC The ENOC validates it and will then forward it to the AMPS system to make the reservation AutoBAHN (Automated Bandwidth Allocation across Heterogeneous Networks) has also been studied but seems not mature at the moment Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

16 Technical Network Liaison Committee
TNLC (Technical Network Liaison Committee): Set up during EGEE in order to ease the technical discussions between EGEE, the NRENs and the GÉANT2 project Participants: EGEE SA2, GÉANT2 (represented by DANTE as coordinator of GÉANT2), some of the NRENs involved in the EGEE activities and CERN 2 meetings Work mainly focused on: Monitoring Design a solution for the Grid infrastructure Improvement of trouble ticket contents Improve the assessment of the impact of problems on the Grid Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

17 Trouble ticket exchange standardization
Ticket normalization is very important to improve efficiency of project’s wide network operations (impact assessment) Standardizing interfaces with network providers EGEE initiated a standardization process Dissemination was also made through a submission of a RFC (draft-dzis-nwg-nttdm-00) about the normalization of the trouble tickets “The Network Trouble Ticket Data Model” Internet Draft GRNET and the CNRS provided the ENOC with a central server translating NREN’s tickets into standard tickets Designed and implemented with open source software Trouble ticket status transition diagram Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

18 IPv6 IPv4 public address exhaustion  Hard to deploy new Grid sites
Analysis of the gLite source code Using the IPv6 metric (IPv6 code checker) in ETICS to point out 75 parts of the code where there are indications of possible of non-compliant function calls: 16 invalid (i.e. duplicate, obsolete component, false positive, etc.), 29 fixed, 30 being fixed This analysis effectively helped developers to work on IPv6 Assessment of the evolution obtained on the gLite repository of ETICS IPv6 compliance of external dependencies Networking Support – Xavier Jeannin - EGEE-III First Review June 2009 18

19 Current stand on gLite and IPv6
IPv6 compliance Full IPv6 compliance – for the production version LFC DPM globus-url-copy/gridFTP Full IPv6 compliance – for a prototype version BDII(perl)‏ IPv6 compliance to be tested/verified by SA2 – gLite part of the deployment module claimed to be IPv6 compliant CREAM BDII(python)‏ WMproxy/Job submission blah IPv6 porting currently on-going gfal lcgutils VOMS WMS-server IPv6 porting plan exist FTS Currently no known porting plans PX VObox MON dCache Torque C/S MPIutils Condorutils AMGA Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

20 IPv6 support 1/2 A new IPv6 code checker developed by SA2 IPv6 CARE It monitors the execution of any program - even if you don’t have the source code - and detects networking function calls and provides the diagnosis Many informative studies IPv6 programming method C/C++, Java, Python and Perl / IPv6 testing method gSOAP / Axis / Axis2 / Boost:asio / gridFTP / PythonZSI / PerlSOAPLite Assessment of the IPv6 compliance of gLite components: DPM & LFC Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

21 IPv6 support 2/2 SA2 provides 2 testbeds (Rome/Paris) to check IPv6 compliance Dissemination: meetings, training session, demonstration, video Demonstration of the 2 first dual stack IPv4/IPv6 sites of EGEE at User Forum 09  smooth transition to IPv6 IPv6 next step Integration into EGEE validation process Testing new gLite IPv6 modules Networking Support – Xavier Jeannin - EGEE-III First Review June 2009

22 SA2 summary SA2 activity has completed all tasks and objectives for this first year of EGEE-III ENOC Deployment of PerfSONAR-Lite TroubleShooting Services SA2 is providing an extra effort to design a network monitoring solution with NRENs and DANTE support Improve the impact assessment of trouble ticket by fostering collaboration with NRENs WLCG / LHCOPN: Design of the LHCOPN operational model IPv6 Improvement of gLite / 2 first dual-stack sites / smooth transition to IPv6 Trouble ticket exchange standardization Submission of a RFC, “The Network Trouble Ticket Data Model”, Internet Draft Collaboration with NRENs, TNLC EGEE 09 – TERENA NRENs & Grid joint meeting, Barcelona Sept. 2009 Transition toward EGI-NGI Network activity understaffed within the EGI-NGI structure Networking Support – Xavier Jeannin - EGEE-III First Review June 2009


Download ppt "SA2: Networking Support Status Report"

Similar presentations


Ads by Google