EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Update on Network Performance Monitoring Jeremy Nowell, EPCC ARM-9, Karlsruhe 5-6 February 2007
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 2 Overview Reminder of NPM Current Status –e2emonit deployment –Diagnostic Tool demo –Requirements gathering Future Plans
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 3 NPM Reminder Aim to provide easy, standardised, access to NPM data –Regardless of how and where the data is collected Do not mandate which networking tools are used –Although provide examples to help kick start process Data is provided for –Grid Operations via a web browser To help diagnose application and site problems –Grid middleware To help optimise application performance, eg file transfers
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 4 NPM Reminder Network performance data can be obtained in various ways –Different measurement types end-to-end Appropriate to what a user or application experiences, eg TCP achievable bandwidth Backbone Lower level measurements, used to pin-point source of problems –Different measurement tools –Different data formats EGEE NPM (as developed in EGEE-I JRA4) provided world’s first single-point, single interface access to heterogeneous data from heterogeneous frameworks Note – we are not building measurement tools, but standardising access to NPM data across multiple domains, and providing mechanisms to use that data
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 5 What’s available - Software Clients –The Diagnostic Tool (DT) For use by people –The Publisher For use by middleware Middleware –Mediator/Discoverer Monitoring Frameworks –e2emonit Formerly EDG::WP7 Provided and maintained by NPM team –PerfSONAR GÉANT, Abilene and ESNet data, more networks soon –LHC-OPN Soon via PerfSONAR?
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 6 What’s available - Data Data depends on which monitoring tools are deployed! –We will allow access to any relevant data, provided it is available via a GGF NM-WG compliant interface e2emonit –ping(er) Round trip time, packet loss –iperf TCP achievable bandwidth –udpmon UDP achievable bandwidth, one-way delay, UDP packet loss PerfSONAR –Developed by GÉANT, Internet2 and ESNet –Link capacity and utilisation
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 7 Current Status PPS deployment of e2emonit has started! –Simulate Tier0-Tier1 sites (not a true correspondence between sites) –CERN site running for a few weeks, others started last week –Hope to get useful feedback on installation, usage, robustness etc (although running on a smaller testbed for many months) Interested particularly in R-GMA performance –Can still get interesting network data from this deployment PerfSONAR data has been available for many months –Maintaining good relationship with GEANT/PerfSONAR developers to keep abreast of developments
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 8 DT Usage (1) Step 1: Access the NPM Diagnostic Tool. – The Diagnostic Tool can be accessed using a standard web browser, which users are individually authorised to use. In the future, we plan to use VOMS for authorisation. Please mail us for access! – The intended user is a NOC/GOC/ROC operator
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 9 DT Usage (2) Step 2: Select a Time. – The end-user does not have a specific time, but knows the problem occurred within the past two days. – The user enters the appropriate time range, specifying an End date/time of :30:00 (the current time), and a period of 2 days. – The user presses the Set button to confirm and the alternate time range representations update.
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 10 DT Usage (3) Step 3: Select a Path. – The end-user experienced the problem between CERN and Imperial College. – The user selects e2emonit sites at CERN and Imperial, adds the path and then selects “Find Data For This Query”
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 11 DT Usage (4) Step 4: Select a Metric. – The end-user experienced throughput problems. – Although there are several possibly relevant metrics to choose from (and only those measured are available to select from), the user decides to look at the Achievable Bandwidth on the path. – Achievable Bandwidth is selected from the Metrics box and the Set button pressed to confirm.
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 12 DT Usage (5) Step 5: Select a Statistic. – Several types of statistical data are available, such as Minimum, Maximum, Mean. – A particular interval can be applied to each, to provide, for example, an hourly mean over the past two days. – The user just wants a general overview of measurements and elects to retrieve raw data (Statistic check-box not checked).
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 13 DT Usage (6) Step 6: Select a View. – Currently Data Table and Time Plot views are available. – The user wants an overview of how the Achievable Bandwidth has changed over time, so selects the Time Plot. – The Query entry is complete, and the user selects Submit Query.
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 14 DT Usage (7) Step 7: Examine results. – The results are plotted, with Time on the x-axis and Achievable Bandwidth on the y-axis. – The parameters used to gather measurements are shown - here, showing that the iperf tool was used to gather the achievable bandwidth information. – These parameters can be useful in interpreting the results.
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 15 Current Status Requirements document MSA 1.6 “User requirements for Network Performance Monitoring Diagnostic Tool captured” produced October/November 2006 –SA2 requirements collected For both problem diagnosis and SLA monitoring –Direct SA1 requirements somewhat lacking Relied on previous documents No “real” data available at time for people to see via DT Very poor response to SA1 NPM / SA2 questionnaire about network monitoring tools sent to ROC-managers list just before EGEE ’06
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 16 Plans Feedback from PPS deployment of e2emonit –Real data to look at in Diagnostic Tool Wider/production deployment of e2emonit? –What data is useful? –Can or should we try and get on LHC-OPN connected boxes? They plan to use PerfSONAR “Real” end to end data may still be interesting to users… –Scalability issues Do federations already have network data that we can publish via the DT? –Or would any want to use e2emonit? Balticgrid – waiting for PCPD availability
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 17 Plans SA2 pushing for (and need) wide coverage of sites –e2emonit probably unable to do what they want scalability –PerfSONAR is widening its reach –Develop NM-WG interface to site tools (eg netflow)? Unlikely to provide useful end-to-end data for EGEE specific paths Need information –What is useful and interesting for SA1? –What can sites provide by way of networking data already? –What would they be willing to deploy? –How can we satisfy SA2’s requirements? Need people to use DT and give feedback
Enabling Grids for E-sciencE EGEE-II INFSO-RI NPM Update - Jeremy Nowell, ARM-9 FZK 18 Summary e2emonit is (at last!) starting to be deployed, on a pre- production basis. Getting truly relevant NPM data on a ongoing, production basis, that is both relevant and useful will be hard. We need feedback on what people want.