EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Xavier Jeannin Activity Manager CNRS EGEE-III First Review, June, 2009 SA2: Networking Support Status Report
Enabling Grids for E-sciencE EGEE-III INFSO-RI SA2 Overview Networking Support – Xavier Jeannin - EGEE-III First Review June Country Total PM planned at M24 Total FTE France964.0 Germany120.5 Greece180.8 Italy120.5 Russia60.3 Spain60.3 GEANT2-DANTE30.1 Total PM planned at M Total FTE 6.4
Enabling Grids for E-sciencE EGEE-III INFSO-RI SA2 – EGEE-III SA2 Global view 3 TSA2.2 Support for the ENOC IPv6 (GARR, CNRS) Operational procedures (CNRS) WLCG Support (CNRS) Operational tools and maintenance (RRC-KI, CNRS) TSA2.3 Overall Networking coordination TSA2.1 Running the ENOC TT exchange standardization (GRNET) Advanced network services (GRNET) TNLC IPv6 (GARR, CNRS) Monitoring (DFN) Site networking needs (RedIRIS) Troubleshooting (DFN) TSA2.4 Management and general project tasks Networking Support – Xavier Jeannin - EGEE-III First Review June 2009
Enabling Grids for E-sciencE EGEE-III INFSO-RI EGEE Network Operation Centre (CNRS) Networking Support – Xavier Jeannin - EGEE-III First Review June Sites GGUS Users Support Units NRENs GÉANT2 EGEENetwork Sites NRENs ENOC ENOC ensuring E2E connectivity for Grid sites on the whole path GÉANT2 NREN A RC 1 Grid site 1 NREN B RC 2 Grid site 2 Operated by DANTE Operated by NOC of NREN A Operated by NOC of NREN B Operated by NOC of RC2 Operated by NOC of RC1 A single point of contact between EGEE and the NREN Role of the ENOC
Enabling Grids for E-sciencE EGEE-III INFSO-RI Network connectivity assessment Assessment for year 2008 on EGEE certified Grid sites (~ 300) (Tool DownCollector) Network troubles are not concentrated on few sites More than half of connectivity problems detected are on-sites 80% of off-site network troubles are solved within 30 minutes Only ~ 45/month last more 80% Networking Support – Xavier Jeannin - EGEE-III First Review June
Enabling Grids for E-sciencE EGEE-III INFSO-RI ENOC metrics Networking Support – Xavier Jeannin - EGEE-III First Review June Very few Grid user notifications about network problems 19 NRENS sending their tickets, 11 languages Steady stream of s/mth, 800 tickets/mth 75% of European EGEE certified sites covered Usage information processed by the ENOC is more and more used Nb of Hits has been multiplied by 6 since 2008 Data downloaded have increased by 5 since 2008 NetworkLanguageKind ACONETGermanNREN CESNETCzechNREN DFNGermanNREN E2ECUEnglishLHCOPN GARRItalianNREN GEANT2EnglishREGIONAL GRNETGreekNREN HEANETEnglishNREN HUNGARNETHungarianNREN ILANEnglishNREN JANETEnglishNREN NORDUNETEnglishREGIONAL PIONIERPolishNREN RBNET/RUNNETRussianNREN REDIRISSpanishNREN RENATERFrenchNREN SURFNETEnglishNREN SWITCHEnglishNREN TWARENChineseNREN Total:11
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June WLCG Support (CNRS) SA2 has also taken the lead in designing and implementing a pioneering federated operational model for the LHCOPN –Distributed not centralized. Tiers are responsible for network operation –
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June WLCG Support Processes were documented and disseminated –Several meeting and training sessions help the dissemination Related tools were released, including a GGUS helpdesk tailored for the LHCOPN Implementation is ongoing and will be ready for LHC start-up Example of layer 2 incident management
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June Operational tools and maintenance (RRC-KI) Trouble matching and correlation for the ENOC –Correlate tickets with monitoring data –Better assessment of the impact on the grid of trouble tickets
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June Operational tools and maintenance First stage of our study The results are experimental and should improve Future work plan includes: Moving from experiment to production Automatic ticket ranking based on matching results Tuning of matching algorithm, possibly through more extensive use of the topology knowledge
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June Network monitoring tools (DFN) Network monitoring tools for efficient troubleshooting –PerfSONAR-Lite TroubleShooting Services –Based on PerfSONAR-PS –Launch test on demand from a Grid site under central server control : –Bandwidth measurements –DNS lookup –Traceroute –Ping –Nmap
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June Network monitoring tools First beta-release is expected in June –Beta-testers: CNRS, NorduNET, GARR. –First version Autumn 2009 Detection of asymmetric traffic by launching a traceroute test on the remote site
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June Sites networking needs (RedIRIS) Assess network requirements (bandwidth, delay, jitter, etc.) for a site within the Grid, according to the kind of site and VOs supported Empirical approach Deployment of perfSONAR at country scale –RedIRIS provides significant additional effort for this task than funded through EGEE –First deployment in Europe over several domains (4 domains, 8 sites) of such solution (no appliance box is used) –PerfSONAR is deployed into EGEE sites and into networks used. Issue about interoperability between perfSONAR versions –perfSONAR MDM (Multi-Domain Monitoring) and perfSONAR PS First deployment end of September
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June Sites networking needs Topology of the network monitored by this task GW-Nacional2 GW-Nacional1 GW-Madrid0 GW-Barcelona0 GW-Valencia0 EB-Iris2 EB-Iris4 EB-Santiago0 EB-Bilbao0 EB-Santander0 IFIC Anella CESCA CAM USC IFCA EB-Barcelona0 EB-Madrid0 UAM CESGA CIEMAT UB PIC TIER 1 EGEE site Regional Network EGEE site IFAE
Enabling Grids for E-sciencE EGEE-III INFSO-RI Advanced network services (GRNET) Collaboration with AMPS team - Advanced Multi-domain Provisioning System – in order to automate network SLA establishment Development of web interface to manage the EGEE SLA requests –Store and manage the EGEE users’ SLA requests ENOC will act on behalf of the user The user request is stored into the ENOC The ENOC validates it and will then forward it to the AMPS system to make the reservation AutoBAHN (Automated Bandwidth Allocation across Heterogeneous Networks) has also been studied but seems not mature at the moment Networking Support – Xavier Jeannin - EGEE-III First Review June
Enabling Grids for E-sciencE EGEE-III INFSO-RI Technical Network Liaison Committee TNLC (Technical Network Liaison Committee): –Set up during EGEE in order to ease the technical discussions between EGEE, the NRENs and the GÉANT2 project –Participants: EGEE SA2, GÉANT2 (represented by DANTE as coordinator of GÉANT2), some of the NRENs involved in the EGEE activities and CERN –2 meetings Work mainly focused on: Monitoring Design a solution for the Grid infrastructure Improvement of trouble ticket contents Improve the assessment of the impact of problems on the Grid Networking Support – Xavier Jeannin - EGEE-III First Review June
Enabling Grids for E-sciencE EGEE-III INFSO-RI Networking Support – Xavier Jeannin - EGEE-III First Review June Trouble ticket exchange standardization (GRNET) GRNET and the ENOC team provide the ENOC with a central server translating NREN’s tickets into standard tickets –Designed and implemented with open source software Ticket normalization is very important to improve efficiency of project’s wide network operations Dissemination was also made through a submission of a RFC about the normalization of the trouble tickets –“The Network Trouble Ticket Data Model”, Internet Draft
Enabling Grids for E-sciencE EGEE-III INFSO-RI Analysis of the gLite source code –Using the IPv6 metric (IPv6 code checker) in ETICS to point out 75 parts of the code where there are suspicion of non-compliant function calls: –16 invalid (i.e. duplicate, obsolete component, false positive, etc.), 29 fixed, 30 being fixed –This analysis effectively helped developers to work on IPv6 Assessment of IPv6 compliance of external gLite dependencies IPv6 (GARR/CNRS) Networking Support – Xavier Jeannin - EGEE-III First Review June IPv6 compliance of external dependencies Assessment of the evolution obtained on the gLite repository of ETICS
Enabling Grids for E-sciencE EGEE-III INFSO-RI Current stand on gLite and IPv6 Full IPv6 compliance – for the production version Full IPv6 compliance – for a prototype version IPv6 compliance to be tested/verified by SA2 – gLite part of the deployment module claimed to be IPv6 compliant IPv6 porting currently on-going IPv6 porting plan exist Currently no known porting plans IPv6 compliance LFCDPM globus-url-copy/gridFTP BDII (perl) CREAM VObox lcgutils VOMS PXMONdCache Torque C/SMPI utils Condor utils AMGA gfal FTS BDII (python) WMproxy/Job submission blah WMS-server 19 Networking Support – Xavier Jeannin - EGEE-III First Review June 2009
Enabling Grids for E-sciencE EGEE-III INFSO-RI IPv6 support A new IPv6 code checker developed by SA2 IPV6 CARE –It monitors the execution of any programs - even if you don’t have the source code - and detects networking function calls and provides the diagnosis Many informative studies –IPv6 programming method C/C++, Java, Python and Perl / IPv6 testing method gSOAP / Axis / Axis2 / Boost:asio / gridFTP / PythonZSI / PerlSOAPLite gSOAP AxisAxis2Boost:asiogridFTP PythonZSIPerlSOAPLite –Assessment of the IPv6 compliance of gLite components: DPM & LFAssessment Dissemination: meetings, training session, demonstration, videovideo Demonstration of the 2 first dual stack IPv4/IPV6 sites of EGEE at User Forum 09 smooth transition to IPv6 IPv6 next step –Integration into EGEE validation process –Testing new gLite IPv6 modules 20 Networking Support – Xavier Jeannin - EGEE-III First Review June 2009
Enabling Grids for E-sciencE EGEE-III INFSO-RI SA2 summary ENOC –Deployment of PerSONAR-Lite TroubleShooting Services –SA2 is providing an extra effort to design network monitoring with NRENs and DANTE support –Improve the impact assessment of trouble ticket by fostering collaboration with NRENs WLCG / LHCOPN: Design of the LHCOPN operational model IPv6 –Improvement of gLite / 2 first dual-stack sites / smooth transition to IPv6 Trouble ticket exchange standardization –Submission of a RFC, “The Network Trouble Ticket Data Model”, Internet Draft Collaboration with NRENs, TNLC –EGEE 09 – TERENA NRENs & Grid joint meeting, Barcelona Sept Transition toward EGI-NGI –Network activity understaffed within the EGI-NGI project Networking Support – Xavier Jeannin - EGEE-III First Review June