Network Path and Application Diagnostics

Slides:



Advertisements
Similar presentations
Martin Suchara, Ryan Witt, Bartek Wydrowski California Institute of Technology Pasadena, U.S.A. TCP MaxNet Implementation and Experiments on the WAN in.
Advertisements

Autotuning in Web100 John W. Heffner August 1, 2002 Boulder, CO.
Configuration management
RED-PD: RED with Preferential Dropping Ratul Mahajan Sally Floyd David Wetherall.
Pushing Up Performance for Everyone Matt Mathis 7-Dec-99.
Top Causes for Poor Application Performance Case Studies Mike Canney.
Doc.: IEEE /0604r1 Submission May 2014 Slide 1 Modeling and Evaluating Variable Bit rate Video Steaming for ax Date: Authors:
Iperf Tutorial Jon Dugan Summer JointTechs 2010, Columbus, OH.
QoS Solutions Confidential 2010 NetQuality Analyzer and QPerf.
The War Between Mice and Elephants Presented By Eric Wang Liang Guo and Ibrahim Matta Boston University ICNP
High-performance bulk data transfers with TCP Matei Ripeanu University of Chicago.
1 Chapter 3 Transport Layer. 2 Chapter 3 outline 3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4.
Transport Level Protocol Performance Evaluation for Bulk Data Transfers Matei Ripeanu The University of Chicago Abstract:
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Error Checking continued. Network Layers in Action Each layer in the OSI Model will add header information that pertains to that specific protocol. On.
Performance Diagnostic Research at PSC Matt Mathis John Heffner Ragu Reddy 5/12/05 PathDiag ppt.
Ch 11 Managing System Reliability and Availability 1.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Troubleshooting Your Network Networking for Home and Small Businesses.
CIS 725 Wireless networks. Low bandwidth High error rates.
Pathdiag: Automatic TCP Diagnosis Matt Mathis John Heffner Ragu Reddy 8/01/08 PathDiag ppt.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
KIS – Cvičenie #5 IP konfigurácia v prostredí OS Windows Marián Beszédeš, B506
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 4. Active Monitoring Techniques.
Networked & Distributed Systems TCP/IP Transport Layer Protocols UDP and TCP University of Glamorgan.
High-speed TCP  FAST TCP: motivation, architecture, algorithms, performance (by Cheng Jin, David X. Wei and Steven H. Low)  Modifying TCP's Congestion.
1 BWdetail: A bandwidth tester with detailed reporting Masters of Engineering Project Presentation Mark McGinley April 19, 2007 Advisor: Malathi Veeraraghavan.
Network Path and Application Diagnostics Matt Mathis John Heffner Ragu Reddy 4/24/06 PathDiag ppt.
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Basil Irwin & George Brett.
The TCP-ESTATS-MIB Matt Mathis John Heffner Raghu Reddy Pittsburgh Supercomputing Center Rajiv Raghunarayan Cisco Systems J. Saperia JDS Consulting, Inc.
1 Evaluating NGI performance Matt Mathis
TCP: Transmission Control Protocol Part II : Protocol Mechanisms Computer Network System Sirak Kaewjamnong Semester 1st, 2004.
Web100 Basil Irwin National Center for Atmospheric Research Matt Mathis Pittsburgh Supercomputing Center Halloween, 2000.
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 Roll Out I2 Members Meeting.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
National Center for Atmospheric Research Pittsburgh Supercomputing Center National Center for Supercomputing Applications Web100 and Logistical Networking.
Network Path and Application Diagnostics Matt Mathis John Heffner Ragu Reddy 7/19/05 PathDiag ppt.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking Congestion Control 0.
Uni Innsbruck Informatik th IETF, PMTUD WG: Path MTU Discovery Using Options draft-welzl-pmtud-options-01.txt Michael Welzl
PiPEs Tools in Action Rich Carlson SMM Tools Tutorial May 3, 2005.
Advanced Computer Networks
Top-Down Network Design Chapter Thirteen Optimizing Your Network Design Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Reliable Transport I: Concepts
Transport Protocols over Circuits/VCs
Experimental Networking (ECSE 4963)
Deployment & Advanced Regular Testing Strategies
TCP Extended Option Space in the Payload of a Supplementary Segment
TCP-LP: A Distributed Algorithm for Low Priority Data Transfer
2018 Huawei H Real Questions Killtest
Lecture 19 – TCP Performance
Software Defined Networking (SDN)
Inferring Queue Sizes in Access Networks by Active Measurement
Appcelerator Arrow: Build APIs in Minutes. Connect to Any Data Source
AMP: A Better Multipath TCP for Data Center Networks
IT351: Mobile & Wireless Computing
Jiyong Park Seoul National University, Korea
An Integrated Congestion Management Architecture for Internet Hosts
Congestion Control, Internet Transport Protocols: UDP
Hardware-less Testing for RAS Software
TCP Congestion Control
“Detective”: Integrating NDT and E2E piPEs
Algorithms for Selecting Mirror Sites for Parallel Download
Transport Protocols: TCP Segments, Flow control and Connection Setup
Anant Mudambi, U. Virginia
TCP: Transmission Control Protocol Part II : Protocol Mechanisms
6. Application Software Security
Modeling and Evaluating Variable Bit rate Video Steaming for ax
Error Checking continued
Lecture 6, Computer Networks (198:552)
Impact of transmission errors on TCP performance
Presentation transcript:

Network Path and Application Diagnostics Matt Mathis John Heffner Ragu Reddy 7/17/06 http://www.psc.edu/~mathis/papers/ PathDiag20060717.ppt (Corrected)

Outline NPAD/Pathdiag - Why should you care? What are the real performance problems? Automatic diagnosis Deployment

NPAD/Pathdiag - Why should you care? One click automatic performance diagnosis Designed for (non-expert) end users Accurate end-systems and last mile diagnosis Eliminate most false pass results Accurate distinction between host and path flaws Accurate and specific identification of most flaws Basic networking tutorial info Help the end user understand the problem Help train 1st tier support (sysadmin or netadmin) Backup documentation for support escalation Empower the user to get it fixed The same reports for users and admins

Recalibrate user expectations Long history of very poor network performance Users do not know what to expect Users have become completely numb Goal for new baseline user expectations: 1 Gigabyte in less than 2 minutes (~67 Mb/s) Everyone should be able to reach these rates by default People who can’t should know why or be very angry

The Wizard Gap

The Wizard Gap Updated Experts have topped out end systems & links 10 Gb/s NIC bottleneck 40 Gb/s “link” bandwidth (striped) Median I2 bulk rate is 3 Mbit/s See http://netflow.internet2.edu/weekly/ Current Gap is about 3000:1 Closing the first factor of 30 should now be “easy”

TCP tuning requires expert knowledge By design TCP/IP hides the ‘net from upper layers TCP/IP provides basic reliable data delivery The “hour glass” between applications and networks This is a good thing, because it allows: Invisible recovery from data loss, etc Old applications to use new networks New application to use old networks But then (nearly) all problems have the same symptom Less than expected performance The details are hidden from nearly everyone

TCP tuning is painful debugging All problems reduce performance But the specific symptoms are hidden Any one problem can prevent good performance Completely masking all other problems Trying to fix the weakest link of an invisible chain General tendency is to guess and “fix” random parts Repairs are sometimes “random walks” Repair one problem at time at best The solution is to instrument TCP

The Web100 project Instrumentation and autotuning for TCP New insight TCP has the ideal diagnostic vantage point TCP-ESTATS-MIB now past IETF WG last-call Will be a standard track RFC soon Prototypes for Linux (www.Web100.org) and Windows Vista TCP Autotuning Automatically adjusts TCP buffers Linux 2.6.17 default maximum window size is 4 M Bytes Announced for Vista - details unknown New insight Nearly all symptoms scale with round trip time

Nearly all symptoms scale with RTT For example TCP Buffer Space, Network loss and reordering, etc On a short path TCP can compensate for the flaw Local Client to Server: all applications work Including all standard diagnostics Remote Client to Server: all applications fail Leading to faulty implication of other components Talk through the diagram: typical campus, clean backbone, S, LC and RC

The confounded problems For nearly all network flaws The only symptom is reduced performance But the reduction is scaled by RTT Therefore, flaws are undetectable on short paths False pass for even the best conventional diagnostics Leads to faulty inductive reasoning about flaw locations Diagnosis often relies on tomography and complicated inference techniques This is the real end-to-end problem Ask for questions

The NPAD solution: For applications (and upper layers) Bench test over an (emulated) ideal long path Topic of a future talk “Pathdiag” tests short path sections to localize a flaw Use Web100 to collect detailed statistics Loss, delay, queuing properties, etc Use models to extrapolate results to the full path Assume that the rest of the path is ideal You have to specify the end-to-end performance goal Data rate and RTT Pass/Fail on the basis of the extrapolated performance

Deploy as a Diagnostic Server Use pathdiag in a Diagnostic Server (DS) Specify End to End target performance From server (S) to client (C) (RTT and data rate) Measure the performance from DS to C Use Web100 in the DS to collect detailed statistics On both the path and client Extrapolate performance assuming ideal backbone Pass/Fail on the basis of extrapolated performance Reversed data from eariler slide

Demo Laptop PSC

Key NPAD/pathdiag features Results are intended for end-users Provides a list of specific items to be corrected Failed tests are showstoppers for fast apps Includes explanations and tutorial information Clear differentiation between client and path problems Accurate escalation to network or system admins The reports are public and can be viewed by either Coverage for a majority of OS and last-mile network flaws Most of the remaining flaws can be detected with pathdiag in the client or traceroute Eliminates nearly all(?) false pass results

More features Tests becomes more sensitive as the path gets shorter Conventional diagnostics become less sensitive Depending on models, perhaps too sensitive New problem is false fail (e.g. queue space tests) Flaws no longer completely mask other flaws A single test often detects several flaws E.g. find both OS and network flaws in the same test They can be repaired concurrently Archived DS results include raw web100 data Can reprocess with updated reporting SW New reports from old data Critical feedback for the NPAD project We really want to collect “interesting” failures

NPAD/pathdiag deployment Why should a campus networking organization care? “Zero effort” solution to miss-tuned end-systems Accurate reports of real problems You have the same view as the user Saves time when there really is a problem You can document reality for management Suggestion: require pathdiag reports for all performance problems

What about impact of the test traffic? NPAD/pathdiag is single threaded Only one test at a time Same load as any well tuned TCP application Protected by TCP “fairness” Large flows are generally “softer” than small flows Large flows are easily disturbed by small flows

Impact Automatically diagnose first level problems Easily expose all path bottlenecks that limit performance to less than 100 Mb/s Easily expose all end-system/OS problems that limit performance to less than 100 Mb/s (Will become moot as autotuning is deployed) Empower the users to apply the proper motivation Still need to recalibrate user expectations Less than 1 gigabyte / 2 minutes is too slow Many paths should support 5 gigabytes/minute Less than 1 Gb/s

Download and install User documentation: http://www.psc.edu/networking/projects/pathdiag/ Follow the link to “Installing a Server” Easily customized with a site specific skin Designed to be easily upgraded with new releases Roughly every 2 months Improving reports through ongoing field experience Drops into existing NDT servers Plans for future integration Enjoy!

Backup slides

Blast from the past Same base algorithm as “Windowed Ping” [Mathis, INET’94] Aka “mping” See http://www.psc.edu/~mathis/wping/ Killer diagnostic in use at PSC in the early 90s Stopped working with the advent of “fast path” routers Use a simple fixed window protocol Scan window size in 1 second steps Measure data rate, loss rate, RTT, etc as window changes