1 Correlating Internet Performance & Route Changes to Assist in Trouble- shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil.

Slides:



Advertisements
Similar presentations
Pathload A measurement tool for end-to-end available bandwidth Manish Jain, Univ-Delaware Constantinos Dovrolis, Univ-Delaware Sigcomm 02.
Advertisements

Cs/ee 143 Communication Networks Chapter 6 Internetworking Text: Walrand & Parekh, 2010 Steven Low CMS, EE, Caltech.
Path Optimization in Computer Networks Roman Ciloci.
1 End-to-end Monitoring of High Performance Network Paths Les Cottrell, Connie Logg, Jerrod Williams SLAC, for the ESCC meeting, Columbus Ohio, July 2004.
1 Traceanal: a tool for analyzing and representing traceroutes Les Cottrell, Connie Logg, Ruchi Gupta, Jiri Navratil SLAC, for the E2Epi BOF, Columbus.
1 SLAC Internet Measurement Data Les Cottrell, Jerrod Williams, Connie Logg, Paola Grosso SLAC, for the ISMA Workshop, SDSC June,
Chapter 5 The Network Layer.
An Overlay Data Plane for PlanetLab Andy Bavier, Mark Huang, and Larry Peterson Princeton University.
1 Evaluation of Techniques to Detect Significant Performance Problems using End-to-end Active Network Measurements Les Cottrell, SLAC 2006 IEEE/IFIP Network.
MAGGIE NIIT- SLAC On Going Projects Measurement & Analysis of Global Grid & Internet End to end performance.
INCITE – Edge-based Traffic Processing for High-Performance Networks R. Baraniuk, E. Knightly, R. Nowak, R. Riedi Rice University L. Cottrell, J. Navratil,
Available bandwidth measurement as simple as running wget D. Antoniades, M. Athanatos, A. Papadogiannakis, P. Markatos Institute of Computer Science (ICS),
1 End-to-End Detection of Shared Bottlenecks Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Internet Bandwidth Measurement Techniques Muhammad Ali Dec 17 th 2005.
17/10/2003TCP performance over ad-hoc mobile networks. 1 LCCN – summer 2003 Uri Silbershtein Roi Dayagi Nir Hasson.
Network Layer4-1 NAT: Network Address Translation local network (e.g., home network) /24 rest of.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9 Internet Control Message.
What we have learned from developing and running ABwE Jiri Navratil, Les R.Cottrell (SLAC)
Towards Highly Reliable Enterprise Network Services via Inference of Multi-level Dependencies Paramvir Bahl, Ranveer Chandra, Albert Greenberg, Srikanth.
Chapter 4. After completion of this chapter, you should be able to: Explain “what is the Internet? And how we connect to the Internet using an ISP. Explain.
1 IP: putting it all together Part 2 G53ACC Chris Greenhalgh.
1 End-to-end Monitoring of High Performance Network Paths Les Cottrell, Connie Logg, Jerrod Williams, Jiri Navratil, SLAC, for the ESCC meeting, Columbus.
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 4. Active Monitoring Techniques.
LAN and WAN Monitoring at SLAC Connie Logg September 21, 2005.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
1 Using Netflow data for forecasting Les Cottrell SLAC and Fawad Nazir NIIT, Presented at the CHEP06 Meeting, Mumbai India, February
TCP1 Transmission Control Protocol (TCP). TCP2 Outline Transmission Control Protocol.
IEPM-BW Deployment Experiences Connie Logg SLAC Joint Techs Workshop February 4-9, 2006.
Comparison of Public End-to-End Bandwidth Estimation tools on High-Speed Links Alok Shriram, Margaret Murray, Young Hyun, Nevil Brownlee, Andre Broido,
1 Internet Control Message Protocol (ICMP) Used to send error and control messages. It is a necessary part of the TCP/IP suite. It is above the IP module.
Integration of AMP & Tracenol By: Qasim Bilal Lone.
Measurement & Analysis of Global Grid & Internet End to end performance (MAGGIE) Network Performance Measurement.
1 Overview of IEPM-BW - Bandwidth Testing of Bulk Data Transfer Tools Connie Logg & Les Cottrell – SLAC/Stanford University Presented at the Internet 2.
IEPM-BW: Bandwidth Change Detection and Traceroute Analysis and Visualization Connie Logg, Joint Techs Workshop February 4-9, 2006.
1 Measurements of Internet performance for NIIT, Pakistan Jan – Feb 2004 PingER From Les Cottrell, SLAC For presentation by Prof. Arshad Ali, NIIT.
1 Network Measurement Summary ESCC, Feb Joe Metzger ESnet Engineering Group Lawrence Berkeley National Laboratory.
Iperf Quick Mode Ajay Tirumala & Les Cottrell. Sep 12, 2002 Iperf Quick Mode at LBL – Les Cottrell & Ajay Tirumala Iperf QUICK Mode Problem – Current.
Bandwidth Estimation Workshop 2003 Evaluating pathrate and pathload with realistic cross-traffic Ravi Prasad Manish Jain Constantinos Dovrolis (ravi, jain,
1 Internet End-to-end Monitoring Project - Overview Les Cottrell – SLAC/Stanford University Partially funded by DOE/MICS Field Work Proposal on Internet.
1 SLAC IEPM PingER and BW monitoring & tools PingER Presented by Les Cottrell, SLAC At LBNL, Jan 21,
IEPM. Warren Matthews (SLAC) Presented at the ESCC Meeting Miami, FL, February 2003.
1 High Performance Network Monitoring Challenges for Grids Les Cottrell, SLAC Presented at the International Symposium on Grid Computing 2006, Taiwan
CSC 600 Internetworking with TCP/IP Unit 5: IP, IP Routing, and ICMP (ch. 7, ch. 8, ch. 9, ch. 10) Dr. Cheer-Sun Yang Spring 2001.
1 IEPM/PingER Project Les Cottrell, SLAC DoE 2004 PI Network Research Meeting, FNAL Sep ‘04
Internet Connectivity and Performance for the HEP Community. Presented at HEPNT-HEPiX, October 6, 1999 by Warren Matthews Funded by DOE/MICS Internet End-to-end.
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Navigating PingER Les Cottrell – SLAC Presented at the Optimization Technologies for Low-Bandwidth Networks, ICTP Workshop,
Igniting Internet Innovation
1 WAN Monitoring Prepared by Les Cottrell, SLAC, for the Joint Engineering Taskforce Roadmap Workshop JLab April 13-15,
TCP/IP1 Address Resolution Protocol Internet uses IP address to recognize a computer. But IP address needs to be translated to physical address (NIC).
1 IEPM / PingER project & PPDG Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99 Partially funded by DOE/MICS Field Work Proposal on.
BOF Discussion: Uploading IEPM-BW data to MonALISA Connie Logg SLAC Winter 2006 ESCC/Internet2 Joint Techs Workshop ESCCInternet2ESCCInternet2 February.
PATH DIVERSITY WITH FORWARD ERROR CORRECTION SYSTEM FOR PACKET SWITCHED NETWORKS Thinh Nguyen and Avideh Zakhor IEEE INFOCOM 2003.
INFSO-RI Enabling Grids for E-sciencE Diagnostic Tool Brainstorming Ratnadeep Abrol EGEE JRA4 F2F, DANTE, Cambridge 9 th May 2005.
Toward a Measurement Infrastructure. Warren Matthews (SLAC) Presented at the e2e Workshop Miami, FL, February 2003.
PlanetSeer: Internet Path Failure Monitoring and Characterization in Wide-Area Services Ming Zhang, Chi Zhang Vivek Pai, Larry Peterson, Randy Wang Princeton.
The CALgorithm for Detecting Bandwidth Changes
BOF Discussion: Uploading IEPM-BW data to MonALISA
Using Netflow data for forecasting
Connie Logg, Joint Techs Workshop February 4-9, 2006
Prepared by Les Cottrell & Hadrien Bullot, SLAC & EPFL, for the
Wide Area Networking at SLAC, Feb ‘03
Internet Control Message Protocol Version 4 (ICMPv4)
End-to-end Anomalous Event Detection in Production Networks
Experiences in Traceroute and Available Bandwidth Change Analysis
Experiences in Traceroute and Available Bandwidth Change Analysis
SLAC monitoring Web Services
Correlating Internet Performance & Route Changes to Assist in Trouble-shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil SLAC.
The CALgorithm for Detecting Bandwidth Changes
Presentation transcript:

1 Correlating Internet Performance & Route Changes to Assist in Trouble- shooting from an End-user Perspective Les Cottrell, Connie Logg, Jiri Navratil SLAC Passive and Active Monitoring Workshop Antibes, Juan-les-Pins, France April 19-20, Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP

2 Outline Set of integrated measurement tools to aid in troubleshooting for end “user” Traceroute measurements/analysis Topology visualization Lightweight bandwidth estimation Overall visualization Level change anomaly automated detection Correlation of performance & route changes

3 Traceroute measurement Every 10 minutes for each host –Run standard traceroute 2 sec timeout, 1 query/hop, <= 30hops For some hosts use ICMP traceroute –End host responds (7/40) –Intermediate host responds (1/40) Two cases UDP probes better than ICMP One case neither ICMP or UDP probes help –Both forward & reverse (use ssh for reverse route) Need ssh access to remote host for rev trace –Else no reverse route (not a disaster)

4 Significant changes Compare current and previous traceroutes: –If traceroute reports “unknown host” => unknown (!) –Else for each hop/node If both current & previous hops have valid IP addresses (i.e. router does not respond & traceroute reports “*”) –If different i.e. some kind of Route Change has occurred »If IPs same for 1 st 3 octets then => same subnet/colo ( : ) »Else if IPs in same AS then => same AS ( a ) »Else significant change => assign unique route number If only one hop different => color route # orange ( ) Else color route => color route # red ( ) –Elseif 30 hops => no route change but last hop unreachable (|) »If last hop not pingable => color red (|) –Else => no route change (●) Elseif one or both IPs are “*” => route change unclear (*) If “Icmp checksum is wrong” color character orange If significant bandwidth change color cell

5 Route table Compact so can see many routes at once History navigation Multiple route changes (due to GEANT), later restored to original route Available bandwidth Raw traceroute logs for debugging Textual summary of traceroutes for to ISP Description of route numbers with date last seen User readable (web table) routes for this host for this day Route # at start of day, gives idea of root stability Mouseover for hops & RTT

6 Another example TCP probe type Host not pingable Intermediate router does not respond ICMP checksum error Level change Get AS information for routes

7 Topology Choose times and hosts and submit request DL CLRCCLRC CLRC IN2P3 CESnet ESnet JAnet GEANT Nodes colored by ISP Mouseover shows node names Click on node to see subroutes Click on end node to see its path back Also can get raw traceroutes with AS’ Alternate rt SLAC Alternate route Hour of day

8 Available bandwidth Uses ABwE/Abing (packet pair dispersion) –Needs server at remote end or ssh to launch server –Fast (< 1 sec) –Lightweight < 40 packets for both forward & reverse estimates (5800 Bytes) –Uses min delay for capacity –Inter packet dispersion for cross-traffic –Available BW = Capacity (min RTT) – Cross-traffic (var) Good agreement with other methods Even if poor absolute agreement (25% cases) can spot changes –Also provides RTT Make measurements to about 60 hosts at 5 minute intervals (deployed in IEPM, MonALISA, PlanetLab)

9 Available Bandwidth From SLAC to Caltech Mar 19, 2004 Dynamic bandwidth capacity (DBC) Available bandwidth = DBC – X-traffic Cross-traffic Iperf

10 Achievable throughput & file transfer IEPM-BW –High impact (iperf, bbftp, GridFTP …) measurements min intervals Select focal area Fwd route change Rev route change Min RTT Iperf bbftp iperf1 abing Min RTT

11 Put it all together Two examples –Agreement of iperf & abing –Route changes and available bandwidth

12 AbWE Iperf 28 days bandwidth history. During this time we can see several different situations caused by different routing from SLAC to CALTECH Drop to 100 Mbits/s by Routing (BGP) errors Drop to 622 Mbits/s path back to new CENIC path New CENIC path 1000 Mbits/s Reverse Routing changes Forward Routing changes Scatter plot graphs of Iperf versus ABw on different paths (range 20– 800 Mbits/s) showing agreement of two methods (28 days history) RTT Bbftp Iperf 1 stream

13 Changes in network topology (BGP) can result in dramatic changes in performance Snapshot of traceroute summary table Samples of traceroute trees generated from the table ABwE measurement one/minute for 24 hours Thurs Oct 9 9:00am to Fri Oct 10 9:01am Drop in performance (From original path: SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos (100Mbps) -Caltech ) Back to original path Changes detected by IEPM-Iperf and AbWE Esnet-LosNettos segment in the path (100 Mbits/s) Hour Remote host Dynamic BW capacity (DBC) Cross-traffic (XT) Available BW = (DBC-XT) Mbits/s Notes: 1. Caltech misrouted via Los-Nettos 100Mbps commercial net 14:00-17:00 2. ESnet/GEANT working on routes from 2:00 to 14:00 3. A previous occurrence went un-noticed for 2 months 4. Next step is to auto detect and notify Los-Nettos (100Mbps)

14 Automatic Step change Detection Too many graphs to review each morning! Motivated by drop in bandwidth between SLAC &Caltech –Started late August 2003 –Reduced achievable throughput by factor of 5 –Not noticed until October 2003 –Caused by faulty routing over commercial network –After notifying ISP, it was fixed in 4 hours! –See for detailshttp:// SLAC Caltech achievable throughput April – November 2003Started

15 Automatic available bandwidth step change detection Still developing, evolving from earlier work: –Arithmetic weighted moving averages –NLANR work, see Roughly speaking: –Has a history buffer to describe past behavior History buffer duration currently 600 mins –Plus a trigger buffer of data suggesting a change Trigger buffer duration (evaluating typically mins) indicates how long the change has to occur for –History mean (  ) and std. dev. (  ) use by trigger selector If new_value outside  +- sensitivity  add to trigger buffer If new_value outside  +- 2*sensitivity  then also an outlier (don’t add to stats) Else goes in history buffer

16 Algorithm If this is a trigger value compare with  and save direction of change If this is a trigger and the direction has changed, reset trigger buffer –Move trigger data to history buffer, recalculate stats, clear trigger buffer If trigger buffer full calculate trigger mean  t and  t –If (  -  t )/  threshold then a & reset trigger buffer –Else remove oldest value from trigger buffer

17 Examples SLAC to Caltech available bandwidth April 6-8, 2004 Alerts History duration: 600 mins, trigger duration: 30 mins, threshold: 40%, sensitivity: 2 With trigger duration: 60 only see one alert, with trigger duration: 10 catch alerts Route change SLAC to NIKHEF (Amsterdam) Mbit/s Avail BW Route changesSLAC - NIKHEF Unreachable

18 BW vs Route changes Route & throughput changes from 11/28/03 thru 2/2/04 –Most (80%) route changes do not result in throughput change –About half throughput changes are due to route changes Location (# nodes) # route chgs # with thru inc. # with thru decr. # thru chgs # thru with rte # thru chg w/o rte Europe (8) Canada & US (21) Japan (13)

19 More Information ABwE: – IEPM – – Traceroute examples: – day.htmlwww.slac.stanford.edu/comp/net/iepmlite/tracesummaries/to day.html Step change analysis –