Eric L. Boyd, Internet2 Nicolas Simar, DANTE

Slides:



Advertisements
Similar presentations
Nicolas Simar, Network Engineer 12/01/2005 Brussels DANTE GN2-JRA1 Performance Monitoring.
Advertisements

TF-NGN performance monitoring - UCL workshop -- Nicolas Simar Performance Monitoring UCL workshop, London (UK), 15/05/03 Nicolas.
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Connect. Communicate. Collaborate Towards Multi-domain Monitoring for the Research Networks Nicolas Simar, Dante TNC 2005, Poznan, June 2005.
Check Disk. Disk Defragmenter Using Disk Defragmenter Effectively Run Disk Defragmenter when the computer will receive the least usage. Educate users.
User-Perceived Performance Measurement on the Internet Bill Tice Thomas Hildebrandt CS 6255 November 6, 2003.
GEANT Performance Monitoring Infrastructure – Joint Techs meeting July Nicolas Simar GEANT’s Performance Monitoring.
GN2 Performance Monitoring & Management : AA Needs – Nicolas Simar - 2 nd AA Workshop Nov 2003 Malaga, Spain GN2 Performance Monitoring & Management.
Detective and NDT Live’n on the edge of Network Performance Joint Techs Winter 2006 Bob Riddle/Rich Carlson Joint Techs Winter.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE II - Network Service Level Agreement (SLA) Establishment EGEE’07 Mary Grammatikou.
Performance Monitoring - Internet2 Member Meeting -- Nicolas Simar Performance Monitoring Internet2 Member Meeting, Indianapolis.
The Network Performance Advisor J. W. Ferguson NLANR/DAST & NCSA.
27-Jan-2005 Internet2 Activities Toward a Global Measurement Infrastructure Matt Zekauskas Network Performance Measurement and Monitoring APAN19.
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
PiPEs Server Discovery – Adding NDT testing to the piPEs architecture Rich Carlson Internet2 April 20, 2004.
Connect. Communicate. Collaborate Implementing Multi-Domain Monitoring Services for European Research Networks Szymon Trocha, PSNC A. Hanemann, L. Kudarimoti,
Connect. Communicate. Collaborate BANDWIDTH-ON-DEMAND SYSTEM CASE-STUDY BASED ON GN2 PROJECT EXPERIENCES Radosław Krzywania (speaker) PSNC Mauro Campanella.
E2Epi piPEs Update Eric L. Boyd. 2 Decomposing the Monolithic Measurement Architecture.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Nicolas Simar – DANTE - Sequin: Monitoring Infrastructure Monitoring Premium IP.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
13-Oct-2003 Internet2 End-to-End Performance Initiative: piPEs Eric Boyd, Matt Zekauskas, Internet2 International.
Jeremy Nowell EPCC, University of Edinburgh A Standards Based Alarms Service for Monitoring Federated Networks.
1 Distributed Monitoring CERNET's experience Xing Li
Interoperable Measurement Frameworks: Joint Monitoring of GEANT & Abilene Eric L. Boyd, Internet2 Nicolas Simar, DANTE.
E2Epi Network Performance Workshops Eric L. Boyd.
INFSO-RI Enabling Grids for E-sciencE NRENs & Grids Workshop Relations between EGEE & NRENs Mathieu Goutelle (CNRS UREC) EGEE-SA2.
Company LOGO Network Management Architecture By Dr. Shadi Masadeh 1.
Connect. Communicate. Collaborate GN2 Activities and the LOBSTER Project Nicolas Simar, DANTE TNC 2005, Poznan, June 2005.
EGEE is a project funded by the European Union under contract IST JRA4 Overview Javier Orellana JRA4 Coordinator EGEE Kick Off Meeting SA2.
DICE Diagnostic Service Joe Metzger Joint Techs Measurement Working Group January
1 Network related topics Bartosz Belter, Wojbor Bogacki, Marcin Garstka, Maciej Głowiak, Radosław Krzywania, Roman Łapacz FABRIC meeting Poznań, 25 September.
PiPEs Tools in Action Rich Carlson SMM Tools Tutorial May 3, 2005.
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
1 Network Measurement Challenges LHC E2E Network Research Meeting October 25 th 2006 Joe Metzger Version 1.1.
Bob Jones EGEE Technical Director
Overview of the Internet2 E2E piPEs project for EGEE-JRA4 people G.V.
Chapter 19: Network Management
Internet2 End-to-End Performance Initiative
Ian Bird GDB Meeting CERN 9 September 2003
Networking for the Future of Science
Robert Szuman – Poznań Supercomputing and Networking Center, Poland
PerfSONAR: Development Status
Monitoring Appliance Status
Integration of Network Services Interface version 2 with the JUNOS Space SDK
WP7 objectives, achievements and plans
Network Requirements Javier Orellana
CHAPTER 3 Architectures for Distributed Systems
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Internet2 Performance Update
Deployment & Advanced Regular Testing Strategies
ESnet Network Measurements ESCC Feb Joe Metzger
Internet2 E2E piPEs Joining the Federation of Network Measurement Infrastructures Eric L. Boyd 14 November 2018.
Leigh Grundhoefer Indiana University
Big-Data around the world
Internet2 E2E piPEs Project
E2E piPES Project Russ Hobby, Internet2 HENP Working Group Meeting
Internet2 E2E piPEs Joining the Federation of Network Measurement Infrastructures Eric L. Boyd 26 December 2018.
Transatlantic Performance Monitoring Workshop 2004
E2E piPEs Overview Eric L. Boyd Internet2 24 February 2019.
Internet2 E2E piPEs Update
E2E piPEs / AMI / OWAMP Status
Internet2 E2E piPEs Project
Cloud-Enabling Technology
Interoperable Measurement Frameworks: Internet2 E2E piPEs and NLANR Advisor Eric L. Boyd Internet2 17 April 2019.
“Detective”: Integrating NDT and E2E piPEs
Internet2 E2E piPEs Project
E2E piPEfitters A Collaborative, Services-based Approach to a Measurement Framework Eric L. Boyd Jeff W. Boote 4 August 2019.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Presentation transcript:

Eric L. Boyd, Internet2 Nicolas Simar, DANTE Interoperable Measurement Frameworks: Joint Monitoring of GEANT & Abilene Eric L. Boyd, Internet2 Nicolas Simar, DANTE First goal was the original goal. Over time the other goals have emerged as equally important.

Overview Internet2 E2E piPEs Geant2 Joint Research Activity 1 Joint Monitoring of GEANT & Abilene Measurement Domain and Framework Interoperability and Collaboration

Internet2 E2E piPEs Overview What is piPEs? Goals E2E piPEs Measurement Infrastructure Abilene Measurement Domain

Internet2 E2E piPEs Project: End-to-End Performance Initiative Performance Environment System (E2E piPEs) Approach: Collaborative project combining the best work of many organizations, including DANTE/GEANT, EGEE, GGF NMWG, NLANR/DAST, UCL, Georgia Tech, etc.

Internet2 E2E piPEs Goals Enable end-users & network operators to: determine E2E performance capabilities locate E2E problems contact the right person to get an E2E problem resolved. Enable remote initiation of partial path performance tests Make partial path performance data publicly available Interoperable with other performance measurement frameworks

Measurement Infrastructure Components For any given “partial path” segment (dotted black line) … We might run regularly scheduled tests and/or on-demand tests. This means, we require the ability to make a test request, have test results stored in a database, request those test resutls, and retrieve the test results.

Sample piPEs Deployment Deployment is an inside-out approach. Start with regularly scheduled tests inside, make sure it plays well with regularly scheduled tests outside. Imagine the “cloud” is Abilene (but stress that it need not be), the regional node might be a GigaPoP, the end nodes might be a user’s laptop, a web server, etc. A cloud can also conceivably interconnect networks Nodes inside of Abilene An Abilene node to a regional node An Abilene node to a node in Europe The power here is that the framework developed is extensible and scaleable Regularly scheduled test data across the network backbone might go in to per-administrative-domain databases. We have one for all of Abilene (11 nodes). A campus or set of campuses (e.g. CENIC, various Ohio universities, various North Carolina universities) might set up their own measurement domain consisting of regional nodes, nodes at campus edges, and possibly internal network edges (e.g. edge of CS dept.). They might store regional test data in a database. Members of a distributed research community (e.g. astronomers of VLBI community) might have a small set of distributed servers across the globe they regularly test between … they might store data in an application domain test database.

piPEs Deployment 1) Abilene Backbone Deployment (Complete) Hawaii 2) Hawaii Campus Deployment (Complete) OSU NC State Europe UCSD 3) In Progress Campus and European Deployment (Q1 2004) Abilene Backbone deployment in Green. Campus deployment in Solid Red. In progress roll-out to Campuses and European partners in Dotted Red. “Europe” in this case means a handful of nodes scattered across the GEANT backbone and some nodes at CERN. We are trying to work very closely with DANTE/GEANT and EGEE to ensure interoperability (at a minimum) if not a shared code base across the US and Europe. This is important as many scientists in the US routinely ship data across the Atlantic and thus their E2E concerns are international in nature.

Measurement Software Components Boxes in black working and deployed (either released or in prototype form). Boxes in red under development. Theses are the software components that make up the piPEs measurement framework. Some are released (BWCTL, OWAMP), some are in “prototype format” (Database, Traceroute, PMP, PMC, web service, network monitoring), and some are under development (“Detective Applet”, Discovery module, Analysis module, MDI, NDT). The Measurement Domain Interface (MDI) is a web services interface that speaks the GGF NMWG Request/Report schema and handles authentication and authorization. It is being designed to be interoperable with other measurement frameworks (current and future). The Network Diagnostic Tool (NDT) is an existing tool that the original author is integrating into piPEs. It is designed to detect common problems in the first mile (the common case for most network “issues”).

Abilene Measurement Domain Part of the Abilene Observatory: http://abilene.internet2.edu/observatory Regularly scheduled OWAMP (1-way latency) and BWCTL (Iperf wrapper) Tests Web pages displaying: Latest results http://abilene.internet2.edu/ami/bwctl_status.cgi/TCP/now “Weathermap” http://abilene.internet2.edu/ami/bwctl_status_map.cgi/TCP/now Worst 10 Performing Links http://abilene.internet2.edu/ami/bwctl_worst_case.cgi/TCP/now Data available via web service: http://abilene.internet2.edu/ami/webservices.html The E2E team is building the piPEs measurement framework. Internet2 has deployed an instance of that framework, the Abilene Measurement Domain (AMD). AMD is part of the Abilene Observatory. Currently, the AMD consists of regularly scheduled OWAMP and BWCTL tests, plus the ability of a user “on the edge” to test “to the middle” (a crude divide-and-conquer approach to diagnosis E2E problems). Network Monitoring is live (a prototype that will eventually be released) that allows simple analysis of network monitoring data across the backbone. In addition, we’ve made that data available via a web service (conforming to the schemata of the GGF NMWG). Other tools, such as NLANR’s Advisor and the HENP community’s MonALISA tool can now consume that data.

Data Collection / Correlation Collection Today: Iperf (Throughput) OWAMP (1-Way Latency, Loss) SNMP Data Anonymized Netflow Data Per Sender, Per Receiver, Per Node Pair IPv4 and IPv6 Collection in the Future NTP (Data) Traceroute BGP Data First Mile Analysis Correlation Today: “Worst 10” Throughputs “Worst 10” Latencies Correlation in the Future: 99th Percentile Throughput over Time Throughput/Loss for all E2E paths using a specific link Commonalities among first mile analyzers Sum of Partial Paths vs. Whole Path Throughput is sustained megabits per second across a link. We can do both TCP and UDP versions. (The former is much more useful.) This is a standard measure of TCP performance along a path. Latency is the time it takes a single packet to get from one node to another. Loss is the % of packets that do not make it to their destination (usually 0% in the case of Abilene). SNMP data is gathered on a router and queryable via a protocol (SNMP). Netflow data is passively collected data about traffic through a router. It is anonymized so as to prevent the ability to track a user’s network usage. All this data can be sliced on a per sender, per receiver, or per pair of nodes basis. We are also running IPv4 and IPv6 tests (different protocols for addressing remote computers). NTP data details the accuracy of clocks. (This is very important for evaluating the quality of one-way-latency data, as the clocks need to be synchronized if the data is to mean anything.) Traceroute is a crude form of path detection. BGP data is the right way to determine of routing has changed. First mile analysis is looking for common problems in the first mile (computer to campus edge) where it is believed (but not proved) that most problems occur. Today we can look for the “worst 10” throughputs and latencies. If they are all “good enough” then it’s “not the network”. If there they are not “good enough” then “we have a problem” … crude Network Operations Center (NOC) alarm system. In the future, we expect to be able to look at 99th percentile throughput over time … sliced all the ways mentioned previously … this will give us insights as to the long-term quality of links as traffic generally rises on the network. In the future, we expect to be able to isolate all paths sharing a link in common … see if there’s a network problem common to that link that only manifests over longer paths. (This blurs into the fourth point.) We expect to be able to detect if many users are getting the same first mile problem … perhaps indicative of a pervasive error in configuration within a campus, for example. Some problems only manifest in the “long-haul” … we expect to be able to compare the “long haul” to a succession of “short hauls” and see if a problem manifests. Etc.

Data Analysis Analysis Today: Analysis in the Future: Throughput over Time Latency over Time Loss over Time Worrisome Tests? (Any bad apples in “Worst Ten”?) “Not the Network” (If “Worst Ten” is good enough) Analysis in the Future: Latency vs. Loss How good is the network? Do common first mile problems exist? Does a link have problems that only manifest in the long-haul? Is the network delivering the performance required by a funded project? First column shows things we can do today in AMD. Second column shows things we could do (i.e. we have the data, but haven’t built the scripts to do the analysis and we’re not sure exactly how to interpret the data yet).

Data Discovery / Interoperability Discovery in the Future: Where are the measurement nodes corresponding to a specific node? Where are the test results for a specific partial path? Interoperability in the Future: Can I have a test within or to another measurement framework? Can I have a measurement result from within or to another measurement framework? Discovery of measurement nodes and corresponding measurement results is a big, untouched problem. Imagine a “supertraceroute” that could tell you all the nodes (routers) along a path as well as the associated measurement nodes for each router … that’s what we need effectively. Many ways to do it … a need common to lots of groups … no one (as far as we know) yet funded to do such work.

Overview Internet2 E2E piPEs Geant2 Joint Research Activity 1 Joint Monitoring of GEANT & Abilene Measurement Domain and Framework Interoperability and Collaboration

GN2 JRA1 - Performance Monitoring and Management 3 year project, starting in September 2004 (15 European NRENs and DANTE involved). Development of a Performance Monitoring infrastructure operating across multiple domains User can access data from different domains in a uniform way and start on-demand tests.

Goals Make network management and performance information from different domains available to various authorised user communities GÉANT, NRENs, Regional networks NOCs PERT - Performance Enhancement Response Team high data volume transfer users (as GRID) end-user who would like to see or understand the E2E behaviour of R&D networks.

Goals Multi-domain focus, interoperable with other frameworks. Tailor data representation for a subset of users. Modify existing tools and integrate them within the infrastructure.

GN2 Framework Layers

Measurement Points and Metric Activity will focus in five areas: One-way delay, IPDV and traceroute Available Bandwidth (IP for sure, TCP/UDP less sure) Flow Based Traffic Measurement Passive Monitoring Network Equipment information Quite IP-centric.

Domain Controller Ensure that different monitoring agents deployed in the various domains can inter-operate and be queried in the same way by User Interface instances independently of their localisation. High level functionality: provide the user interface, AA, Resource discovery (pathfinder), negotiate test, interface with other domain/framework

User Interface A User Interface retrieves data from different domains and tailors the data representation to a specific group of user. Targets NOC, PERT and generic end-user. Topology based view SLA verification

Starting Point Starts from GN1 Performance Monitoring framework (data retrieval, storage and export using a well define interface) and take into account other experiences. Uses NRENs experience in tool development. Need to take into account the variety of tools and metric existing across the NRENs

Overview Internet2 E2E piPEs Geant2 Joint Research Activity 1 Joint Monitoring of GEANT & Abilene Measurement Domain and Framework Interoperability and Collaboration

American/European Collaboration Goals Awareness of ongoing Measurement Framework Efforts / Sharing of Ideas (Good / Not Sufficient) Interoperable Measurement Frameworks (Minimum) Common means of data extraction Partial path analysis possible along transatlantic paths Open Source Shared Development (Possibility, In Whole or In Part) End-to-end partial path analysis for transatlantic research communities VLBI: Onsala, Sweden  Haystack, Mass. HENP: CERN, Switzerland  Caltech, Calif.

American/European Demonstration Goals Demonstrate ability to do partial path analysis between “Caltech” (Los Angeles Abilene router) and CERN. Demonstrate ability to do partial path analysis involving nodes in the GEANT network. Compare and contrast measurement of a “lightpath” versus a normal IP path. Demonstrate interoperability of piPEs and analysis tools such as Advisor and MonALISA

Demonstration Details Path 1: Default route between LA and CERN is across Abilene to Chicago, then across Datatag circuit to CERN Path 2: Announced addresses so that route between LA and CERN traverses GEANT via London node Path 3: “Lightpath” (discussed earlier by Rick Summerhill) Each measurement “node” consists of a BWCTL box and an OWAMP box “next to” the router.

All Roads Lead to Geneva

Results BWCTL: http://abilene.internet2.edu/ami/bwctl_status_eu.cgi/BW/now OWAMP: http://abilene.internet2.edu/ami/owamp_status_eu.cgi/now MONALISA: http://vinci.cacr.caltech.edu:8080 NLANR Advisor

Insights (1) Even with shared source and a single team of developer-installers, inter-administrative domain coordination is difficult. Struggled with basics of multiple paths. IP addresses, host configuration, software (support source addresses, etc.) Struggled with cross-domain administrative coordination issues. AA (accounts), routes, port filters, MTUs, etc. Struggled with performance tuning measurement nodes. host tuning, asymmetric routing, MTUs We had log-in access … still struggled with IP addresses, accounts, port filters, host tuning, host configuration (using the proper paths), software. Port filters, MTUs.

Insights (2) Connectivity takes a large amount of coordination and effort; performance takes even more of the same. Current measurement approaches have limited visibility into “lightpaths.” Having hosts participate in the measurement is one possible solution.

Insights (3) Consider interaction with security; lack of end-to-end transparency is problematic. Security filters are set up based on expected traffic patterns Measurement nodes create new traffic Lightpaths bypass expected ingress points

Next Steps for Internet2 E2E, GN2 JRA1, and EGEE To be decided … Transatlantic Performance Monitoring Workshop, CERN, May 17, 2004 Sharing of ideas Determining areas of overlap Determining degree of collaboration Construct trial OWAMP and BWCTL measurement domain at GEANT nodes

Overview Internet2 E2E piPEs Geant2 Joint Research Activity 1 Joint Monitoring of GEANT & Abilene Measurement Domain and Framework Interoperability and Collaboration

Measurement Infrastructure Federation Why a Federation? Multiple measurement frameworks in existence and under development (piPEs, NLANR Advisor, NLANR AMP, etc.). No static “best practice” measurement framework is likely to emerge, given academics being academics. Future measurement frameworks can build on shoulders of current efforts, not feet. Performance Measurement Architecture Workshop (NSF Grant # ANI-0314723)

Measurement Infrastructure Federation Interfaces

Measurement Infrastructure Federation Requirements Agreement on Characteristic Names Access and Authentication Discovery (Measurement Frameworks, Domains, Nodes, Databases) Test/Data Request Schema Result Report Schema Inter-Framework Tests Resource Allocation Broker for Tools Concatenation of Homogeneous Characteristics Results Gathered by Heterogeneous Tools

GGF Network Measurement Working Group Hierarchy of Network Performance Characteristics Request Schema Requirements and Sample Implementation Report Schema Requirements and Sample Implementation

Establishing a Performance Measurement Mesh Issues include: Scheduling in the presence of scarce resources Making the tool bidirectional Adding security Ensuring correct source/target pairs Data collection / mining / analysis / display Example: BWCTL for Iperf plus prototype PMD

Open Research Issues Access and Authentication Discovery of Measurement Nodes (“Super-Traceroute”) Discovery of Measurement Databases Inter-framework Testing Compilation of results on partial paths Normalization of identical characteristics gathered by heterogenous tools

Conclusions We can do partial path analysis, although making sense of the results is still a big issue. We can speak the same measurement language, although it’s still evolving. We are working together in growing numbers, but we need critical mass (become de facto standard). We need to be able to find each other. We need to be able to verify each other’s identity. We’re trying to create tools to do partial path analysis. We’re trying to expose diagnostic output to Grid Middleware.

Feedback Are we on the right track? (As conceptualized, would our individual and joint goals meet the needs of the DataTag community?) What’s missing? What is of particular importance?