Internet2 E2E piPEs Project

Slides:



Advertisements
Similar presentations
Network Performance Measurement Atlas Tier 2 Meeting at BNL December Joe Metzger
Advertisements

ESnet Abilene 3+3 Measurements Presented at the Joint Techs Meeting in Columbus July 19 th 2004 Joe Metzger ESnet Network Engineer
E-VLBI over TransPAC Masaki HirabaruDavid LapsleyYasuhiro KoyamaAlan Whitney Communications Research Laboratory, Japan MIT Haystack Observatory, USA Communications.
27-Jan-2005 Internet2 Activities Toward a Global Measurement Infrastructure Matt Zekauskas Network Performance Measurement and Monitoring APAN19.
Internet2 Performance Update Jeff W. Boote Senior Network Software Engineer Internet2.
PiPEs Server Discovery – Adding NDT testing to the piPEs architecture Rich Carlson Internet2 April 20, 2004.
NDT: Update Duplex Mismatch Detection Rich Carlson Winter Joint Tech February 15, 2005.
ASCR/ESnet Network Requirements an Internet2 Perspective 2009 ASCR/ESnet Network Requirements Workshop April 15/16, 2009 Richard Carlson -- Internet2.
Masaki Hirabaru Network Performance Measurement and Monitoring APAN Conference 2005 in Bangkok January 27, 2005 Advanced TCP Performance.
E2Epi piPEs Update Eric L. Boyd. 2 Decomposing the Monolithic Measurement Architecture.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Internet2 Network Observatory Update Matt Zekauskas, Measurement SIG 2006 Fall Member Meeting 4-Dec-2006.
Masaki Hirabaru NICT Koganei 3rd e-VLBI Workshop October 6, 2004 Makuhari, Japan Performance Measurement on Large Bandwidth-Delay Product.
Connect. Communicate. Collaborate perfSONAR MDM Service for LHC OPN Loukik Kudarimoti DANTE.
13-Oct-2003 Internet2 End-to-End Performance Initiative: piPEs Eric Boyd, Matt Zekauskas, Internet2 International.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
Internet2 End-to-End Performance Initiative Eric L. Boyd Director of Performance Architecture and Technologies Internet2.
Interoperable Measurement Frameworks: Joint Monitoring of GEANT & Abilene Eric L. Boyd, Internet2 Nicolas Simar, DANTE.
1 LHCOPN Monitoring Directions January 2007 Joe Metzger
DICE Diagnostic Service Joe Metzger Joint Techs Measurement Working Group January
Advanced Network Diagnostic Tools Richard Carlson EVN-NREN workshop.
PiPEs Tools in Action Rich Carlson SMM Tools Tutorial May 3, 2005.
INDIANAUNIVERSITYINDIANAUNIVERSITY IRNC Measurement John Hicks HPCC Engineer Indiana University 18 th APAN Meeting – Cairns 4-July-2004.
1 Deploying Measurement Systems in ESnet Joint Techs, Feb Joseph Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
HOPI Update Rick Summerhill Director Network Research, Architecture, and Technologies Internet2 Joint Techs 17 July 2006 University of Wisconsin, Madison,
UNM SCIENCE DMZ Sean Taylor Senior Network Engineer.
T0-T1 Networking Meeting 16th June Meeting
Overview of the Internet2 E2E piPEs project for EGEE-JRA4 people G.V.
CompTIA Security+ Study Guide (SY0-401)
Chapter 19: Network Management
Measurement team Hans Ludwing Reyes Chávez Network Operation Center
Internet2 End-to-End Performance Initiative
Architecture and Algorithms for an IEEE 802
Overview – SOE PatchTT November 2015.
Dynamic Network Services In Internet2
Eric L. Boyd, Internet2 Nicolas Simar, DANTE
Overview – SOE PatchTT December 2013.
Networking for the Future of Science
PerfSONAR: Development Status
Monitoring Appliance Status
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Internet2 Performance Update
Deployment & Advanced Regular Testing Strategies
CompTIA Security+ Study Guide (SY0-401)
ESnet Network Measurements ESCC Feb Joe Metzger
Internet2 E2E piPEs Joining the Federation of Network Measurement Infrastructures Eric L. Boyd 14 November 2018.
Wide Area Networking at SLAC, Feb ‘03
i-Path : Network Transparency Project
File Transfer Issues with TCP Acceleration with FileCatalyst
Big-Data around the world
Internet2 E2E piPEs Project
E2E piPES Project Russ Hobby, Internet2 HENP Working Group Meeting
Internet2 E2E piPEs Joining the Federation of Network Measurement Infrastructures Eric L. Boyd 26 December 2018.
Transatlantic Performance Monitoring Workshop 2004
A tool for locating QoS failures on an Internet path
Network Performance Measurement
Sciences & Engineering
E2E piPEs Overview Eric L. Boyd Internet2 24 February 2019.
Internet2 E2E piPEs Update
Advanced Networking Collaborations at SLAC
Beyond FTP & hard drives: Accelerating LAN file transfers
Wide-Area Networking at SLAC
Interoperable Measurement Frameworks: Internet2 E2E piPEs and NLANR Advisor Eric L. Boyd Internet2 17 April 2019.
“Detective”: Integrating NDT and E2E piPEs
Internet2 E2E piPEs Project
The New Internet2 Network: Expected Uses and Application Communities
Abilene Update Rick Summerhill
E2E piPEfitters A Collaborative, Services-based Approach to a Measurement Framework Eric L. Boyd Jeff W. Boote 4 August 2019.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Presentation transcript:

Internet2 E2E piPEs Project Eric L. Boyd 28 February 2019

Internet2 E2E piPEs Goals Enable end-users & network operators to: determine E2E performance capabilities locate E2E problems contact the right person to get an E2E problem resolved. Enable remote initiation of partial path performance tests Make partial path performance data publicly available Interoperable with other performance measurement frameworks 2/28/2019

Sample piPEs Deployment Deployment is an inside-out approach. Start with regularly scheduled tests inside, make sure it plays well with regularly scheduled tests outside. Hope that projects working on the end nodes will meet us in the middle. 2/28/2019

Project Phases Phase 1: Tool Beacons BWCTL (Complete), http://e2epi.internet2.edu/bwctl OWAMP (Complete), http://e2epi.internet2.edu/owamp NDT (Complete), http://e2epi.internet2.edu/ndt Phase 2: Measurement Domain Support General Measurement Infrastructure (Prototype) Abilene Measurement Infrastructure Deployment (Complete), http://abilene.internet2.edu/observatory Phase 3: Federation Support AA (Prototype – optional AES key, policy file, limits file) Discovery (Measurement Nodes, Databases) (Prototype – nearest NDT server, web page) Test Request/Response Schema Support (Prototype – GGF NMWG Schema) 2/28/2019

piPEs Deployment 2/28/2019

NDT (Rich Carlson) Network Diagnostic Tester Developed at Argonne National Lab Ongoing integration into piPEs framework Redirects from well-known host to “nearest” measurement node Detects common performance problems in the “first mile” (edge to campus DMZ) In deployment on Abilene: http://ndt-seattle.abilene.ucaid.edu:7123 2/28/2019

NDT Milestones New Features added Open Source Shared Development Configuration file support Scheduling/queuing support Simple server discovery protocol Federation mode support Command line client support Open Source Shared Development http://sourceforge.net/projects/ndt/ 2/28/2019

NDT Future Directions Focus on improving problem detection algorithms Duplex mismatch Link detection Complete deployment in Abilene POPs Expand deployment into University campus/GigaPoP networks 2/28/2019

How Can you Participate? Set up BWCTL, OWAMP, NDT Beacons Set up a measurement domain Place tool beacons “intelligently” Determine locations Determine policy Determine limits “Register” beacons Install piPEs software Run regularly scheduled tests Store performance data Make performance data available via web service Make visualization CGIs available Solve Problems / Alert us to Case Studies 2/28/2019

Example piPEs Use Cases Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools http://vinci.cacr.caltech.edu:8080/ Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 2/28/2019

Test from the Edge to the Middle Divide and conquer: Partial Path Analysis Install OWAMP and / or BWCTL Where are the nodes? http://e2epi.internet2.edu/pipes/pmp/pmp-dir.html Begin testing!: http://e2epi.internet2.edu/pipes/ami/bwctl/ Key Required http://e2epi.internet2.edu/pipes/ami/owamp/ No Key Required 2/28/2019

Example piPEs Use Cases Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools http://vinci.cacr.caltech.edu:8080/ Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 2/28/2019

Abilene Measurement Domain Part of the Abilene Observatory: http://abilene.internet2.edu/observatory Regularly scheduled OWAMP (1-way latency) and BWCTL/Iperf (Throughput, Loss, Jitter) Tests Web pages displaying: Latest results http://abilene.internet2.edu/ami/bwctl_status.cgi/TCP/now “Weathermap” http://abilene.internet2.edu/ami/bwctl_status_map.cgi/TCP/now Worst 10 Performing Links http://abilene.internet2.edu/ami/bwctl_worst_case.cgi/TCP/now Data available via web service: http://abilene.internet2.edu/ami/webservices.html The E2E team is building the piPEs measurement framework. Internet2 has deployed an instance of that framework, the Abilene Measurement Domain (AMD). AMD is part of the Abilene Observatory. Currently, the AMD consists of regularly scheduled OWAMP and BWCTL tests, plus the ability of a user “on the edge” to test “to the middle” (a crude divide-and-conquer approach to diagnosis E2E problems). Network Monitoring is live (a prototype that will eventually be released) that allows simple analysis of network monitoring data across the backbone. In addition, we’ve made that data available via a web service (conforming to the schemata of the GGF NMWG). Other tools, such as NLANR’s Advisor and the HENP community’s MonALISA tool can now consume that data. 2/28/2019

Quality Control of Abilene Measurement Infrastructure (1) Problem Solving Approach Ongoing measurements start detecting a problem Ad-hoc measurements used for problem diagnosis On-going Measurements Expect Gbps flows on Abilene Stock TCP stack (albeit tuned) Very sensitive to loss “Canary in a coal mine” Web100 just deployed for additional reporting Skeptical eye Apparent problem could reflect interface contention 2/28/2019

Quality Control of Abilene Measurement Infrastructure (2) Regularly Scheduled Tests Track TCP and UDP Flows (BWCTL/Iperf) Track One-way Delays (OWAMP) IPv4 and IPv6 Observe: Worst 10 TCP flows First percentile TCP flow Fiftieth percentile TCP flow What percentile breaks 900 Mbps threshold General Conclusions: On Abilene, IPv4 and IPv6 statistically indistinguishable Consistently low values to one host or across one path indicates a problem 2/28/2019

A (Good) Day in the Life of Abilene 2/28/2019

First two weeks in March 50th percentile right at 980 Mb/s 1st percentile about 900 Mb/s Take it as a baseline. 2/28/2019

Beware the Ides of March 1st percentile down to 522 Mb/s Circuit problems along west coast. nb: 50th percentile very robust. 2/28/2019

Recovery – sort of; life through 29 April 1st percentile back up to mid-800s, lower and shakier. nb: 50th percentile still very robust. 2/28/2019

Ah, sudden improvement through 5-May 1st percentile back up above 900 Mb/s and more stable. But why?? 2/28/2019

Then, while Matt Z is tearing up the tracks 1st percentile back down to the 500s. Diagnosis: something is killing Seattle. Oh, and Sunnyvale is off the air. 2/28/2019

1st percentile right at 500 Mb/s. Diagnosis: web100 interaction. Matt fixes Sunnyvale, and things get (slightly) worse: both Seattle and Sunnyvale are bad. 1st percentile right at 500 Mb/s. Diagnosis: web100 interaction. 2/28/2019

Matt fixes the web100 interaction. 1st percentile cruising through 700 Mb/s. Life is good. 2/28/2019

Friday the (almost) 13th; JUNOS upgrade induces packet loss for about four hours along many links. 1st percentile falls to 63 Mb/s. Long-distance paths chiefly impacted. 2/28/2019

A “Known” Problem Mid-May: routers all got a new software load to enable a new feature Everything seemed to come up, but on some links, utilization did not rebound Worst-10 reflected very low performance across those links QoS parameter configuration format change… 2/28/2019

2/28/2019

1st percentile rises to 968 Mb/s. But why?? Nice weekend. 1st percentile rises to 968 Mb/s. But why?? 2/28/2019

2/28/2019

We Found It First Streams over SNVA-LOSA link all showed problems NOC responded: Found errors on SNVA-LOSA link (NOC is now tracking errors more closely…) Live (URL subject to change): http://abilene.internet2.edu/ami/bwctl_percentile.cgi/TCPV4/1/50/14118254811367342080_14169839516075950080 2/28/2019

Example piPEs Use Cases Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools http://vinci.cacr.caltech.edu:8080/ Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities ESNet / ITECs (3+3) [See Joe Metzger’s talk to follow] eVLBI Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 2/28/2019

Example Application Community: VLBI (1) Very-Long-Baseline Interferometry (VLBI) is a high-resolution imaging technique used in radio astronomy. VLBI techniques involve using multiple radio telescopes simultaneously in an array to record data, which is then stored on magnetic tape and shipped to a central processing site for analysis. Goal: Using high-bandwidth networks, electronic transmission of VLBI data (known as “e-VLBI”). 2/28/2019

Example Application Community: VLBI (2) Haystack <-> Onsala Abilene, Eurolink, GEANT, NorduNet, SUNET User: David Lapsley, Alan Whitney Constraints Lack of administrative access (needed for Iperf) Heavily scheduled, limited windows for testing Problem Insufficient performance Partial Path Analysis with BWCTL/Iperf Isolated packet loss to local congestion in Haystack area Upgraded bottleneck link 2/28/2019

Example Application Community: VLBI (3) Result First demonstration of real-time, simultaneous correlation of data from two antennas (32 Mbps, work continues) Future Optimize time-of-day for non-real-time data transfers Deploy BWCTL at 3 more sites beyond Haystack, Onsala, and Kashima 2/28/2019

TSEV8 Experiment Intensive experiment Network: Data Antennas: 18 scans, 13.9 GB of data Antennas: Westford, MA and Kashima, Japan Network: Haystack, MA to Kashima, Japan Initially, 100 Mbps commodity Internet at each end, Kashima link upgraded to 1 Gbps just prior to experiment 2/28/2019

TSEV8 e-VLBI Network 2/28/2019

Network Issues In week leading up to experiment, network showed extremely poor throughput ~ 1 Mbps! Network analysis/troubleshooting required: Traditionally: pair-wise iperf testing between hosts along transfer path, step-by-step tracing of link utilization via Internet2/Transpac-APAN network monitoring websites: Time consuming, error prone, not conclusive New approach: automated iperf-testing using Internet2’s bwctl tool (allows partial path analysis), one single website to integrate link utilization statistics into one single website No maintenance required once setup, for the first time an overall view of the network and bandwidth on segment-by-segment basis 2/28/2019

E-VLBI Network Monitoring http://web.haystack.mit.edu/staff/dlapsley/tsev7.html 2/28/2019

E-VLBI Network Monitoring http://web.haystack.mit.edu/staff/dlapsley/tsev7.html 2/28/2019

E-VLBI Network Monitoring Use of centralized/integrated network monitoring helped to enable identification of bottleneck (hardware fault) Automated monitoring allows view of network throughput variation over time Highlights route changes, network outages Automated monitoring also helps to highlight any throughput issues at end points: E.g. Network Inteface Card failures, Untuned TCP Stacks Integrated monitoring provides overall view of network behavior at a glance 2/28/2019

Result Successful UT1 experiment completed June 30 2004. New record time for transfer and calculation of UT1 offset: 4.5 hours (down from 21 hours) 2/28/2019

Acknowledgements Yasuhiro Koyama, Masaki Hirabaru and colleagues at National Institute for Information and Communications Technology Brian Corey, Mike Poirier and colleagues from MIT Haysack Observatory Internet2, TransPAC/APAN, JGN2 networks Staff at APAN Tokyo XP Tom Lehman - University of Southern California - Information Sciences Institute East 2/28/2019

Example piPEs Use Cases Edge-to-Middle (On-Demand) Automatic 2-Ended Test Set-up Middle-to-Middle (Regularly Scheduled) Raw Data feeds for 3rd-Party Analysis Tools http://vinci.cacr.caltech.edu:8080/ Quality Control of Network Infrastructure Edge-to-Edge (Regularly Scheduled) Quality Control of Application Communities Edge-to-Campus DMZ (On-Demand) Coupled with Regularly Scheduled Middle-to-Middle End User determines who to contact about performance problem, armed with proof 2/28/2019

How Can you Participate? Set up BWCTL, OWAMP, NDT Beacons Set up a measurement domain Place tool beacons “intelligently” Determine locations Determine policy Determine limits “Register” beacons Install piPEs software Run regularly scheduled tests Store performance data Make performance data available via web service Make visualization CGIs available Solve Problems / Alert us to Case Studies 2/28/2019

2/28/2019

Extra Slides 2/28/2019

American / European Collaboration Goals Awareness of ongoing Measurement Framework Efforts / Sharing of Ideas (Good / Not Sufficient) Interoperable Measurement Frameworks (Minimum) Common means of data extraction Partial path analysis possible along transatlantic paths Open Source Shared Development (Possibility, In Whole or In Part) End-to-end partial path analysis for transatlantic research communities VLBI: Haystack, Mass.  Onsala, Sweden HENP: Caltech, Calif.  CERN, Switzerland 2/28/2019

American / European Collaboration Achievements UCL E2E Monitoring Workshop 2003 http://people.internet2.edu/~eboyd/ucl_workshop.html Transatlantic Performance Monitoring Workshop 2004 http://people.internet2.edu/~eboyd/transatlantic_workshop.html Caltech <-> CERN Demo Haystack, USA <-> Onsala, Sweden piPEs Software Evaluation (In Progress) Architecture Reconciliation (In Progress) 2/28/2019

Example Application Community: ESnet / Abilene (1) 3+3 Group US Govt. Labs: LBL, FNAL, BNL Universities: NC State, OSU, SDSC http://measurement.es.net/ Observed: 400 usec 1-way Latency Jump Noticed by Joe Metzger Detected: Circuit connecting router in the CentaurLab to the NCNI edge router moved to a different path on metro DWDM system 60 km optical distance increase Confirmed by John Moore 2/28/2019

Example Application Community: ESnet / Abilene (2) 2/28/2019

American/European Demonstration Goals Demonstrate ability to do partial path analysis between “Caltech” (Los Angeles Abilene router) and CERN. Demonstrate ability to do partial path analysis involving nodes in the GEANT network. Compare and contrast measurement of a “lightpath” versus a normal IP path. Demonstrate interoperability of piPEs and analysis tools such as Advisor and MonALISA 2/28/2019

Demonstration Details Path 1: Default route between LA and CERN is across Abilene to Chicago, then across Datatag circuit to CERN Path 2: Announced addresses so that route between LA and CERN traverses GEANT via London node Path 3: “Lightpath” (discussed earlier by Rick Summerhill) Each measurement “node” consists of a BWCTL box and an OWAMP box “next to” the router. 2/28/2019

All Roads Lead to Geneva 2/28/2019

Results BWCTL: http://abilene.internet2.edu/ami/bwctl_status_eu.cgi/BW/14123130651515289600_14124243902743445504 OWAMP: http://abilene.internet2.edu/ami/owamp_status_eu.cgi/14123130651515289600_14124243902743445504 MONALISA NLANR Advisor 2/28/2019

Insights (1) Even with shared source and a single team of developer-installers, inter-administrative domain coordination is difficult. Struggled with basics of multiple paths. IP addresses, host configuration, software (support source addresses, etc.) Struggled with cross-domain administrative coordination issues. AA (accounts), routes, port filters, MTUs, etc. Struggled with performance tuning measurement nodes. host tuning, asymmetric routing, MTUs We had log-in access … still struggled with IP addresses, accounts, port filters, host tuning, host configuration (using the proper paths), software. Port filters, MTUs. 2/28/2019

Insights (2) Connectivity takes a large amount of coordination and effort; performance takes even more of the same. Current measurement approaches have limited visibility into “lightpaths.” Having hosts participate in the measurement is one possible solution. 2/28/2019

Insights (3) Consider interaction with security; lack of end-to-end transparency is problematic. Security filters are set up based on expected traffic patterns Measurement nodes create new traffic Lightpaths bypass expected ingress points 2/28/2019