Experiences in Traceroute and Available Bandwidth Change Analysis Connie Logg, Les Cottrell & Jiri Navratil SIGCOMM’04 Workshops September 3, 2004 Modern data intensive science such as HENP requires the ability to copy large amounts of data between collaborating sites. This in turn requires high-performance reliable end-to-end network paths and the ability to take advantage of them. End-users thus need both long-term and near real-time estimates of the network and application performance of such paths for planning, setting expectations, and trouble-shooting. The IEPM-BW (Internet End-to-end Performance Monitoring - BandWidth) project was instigated in 2001 to meet the above needs for the BaBar HENP community. This produced a toolkit for monitoring Round Trip Times (RTT), TCP throughput (iperf), file copy throughput (bbftp, bbcp and GridFTP), traceroute and more recently lightweight cross-traffic and available bandwidth measurements (ABwE). Since then it has been extended to LHC, CDF, D0, ESnet, Grid, and high performance network Research & Education sites, about 60-70 paths are now being monitored (including about 50 remote sites) and the monitoring toolkit has been installed at ten sites and is in production at three or four sites, in particular FNAL (for CMS, CDF and D0) and SLAC (for BaBar and PPDG). Each monitoring site is relatively independent and the monitoring is designed to map to the design of modern HENP tiering of sites, i.e. it is hierarchical rather than full mesh. The monitoring toolkit is installed at the site and its contact chooses the remote hosts it wishes to monitor. Current work is in progress to analyze and visualize the traceroute meaurements and to automatically detect anomalous step down changes in bandwidth. Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
Motivation High Energy Nuclear Physics (HENP) analysis requires the effective distribution of large amounts of data between collaborators world wide. In 2001 we started development on a project (IEPM-BW) for Internet End-to-end Performance Monitoring of BandWidth) for the network paths to our collaborators It has evolved over time from network intensive measurements (run about every 90 minutes) to light weight non-intensive measurements (run about every 3-5 minutes)
IEPM-BW Version 1 IEPM-BW version 1 (September 2001) performed sequential heavy weight (iperf and file transfer tests) and light weight (ping & traceroute) measurements to a few dozen collaborator nodes – it could only be run a few times a day Concurrently an Available BandWidth Estimation (ABWE) tool was being developed by Jiri to perform light weight available bandwidth (ABW) and link dynamic bottleneck capacity estimates (DBCAP)
IEPM-BW Version 2 IEPM-BW version 2 incorporated ABWE as a probing tool, and extensive comparisons were made between the heavy weight iperf and file transfer tests and the ABWE results ABWE results tracked well with iperf and file transfer tests in many cases
Forward Routing changes Reverse Routing changes Examples New CENIC path 1000 Mbits/s Forward Routing changes ABWE Iperf back to new CENIC path Bbftp Iperf 1 stream Drop to 622 Mbits/s path Reverse Routing changes 28 days bandwidth history. During this time we can see 3 different situations caused by different routing from SLAC to CALTECH Scatter plot graphs of Iperf versus ABW on different paths (range 20–800 Mbits/s) showing agreement of two methods (28 days history)
Challenges The monitoring was very useful to us but: Too many graphs and reports to examine manually every day We could only run it a few times a day We needed to automate what the brain does – pick out changes Changes of concern: route and bandwidth
Traceroute Analysis Need a way to visualize traceroutes taken at regular intervals to several tens of remote hosts Report the pathologies identified Allow quick visual inspection for: Multiple routes changes Significant route changes Pathologies Drill down to more detailed information Histories Topologies Related bandwidth & alerts
Display Many Routes on Single Page One page per day One row per host, one column per hour Identify unique routes with a number Be able to inspect the route associated with a route number Provide for analysis of long term route evolutions Use single character to ID a route that has not significantly changed Character identifies pathology of route (usually period(.) = no change) Route # at start of day, gives idea of route stability Multiple route changes (due to GEANT), later restored to original route Period (.) means no change
Pathology Encodings Hop does not respond (*) End host does not respond, i.e. 30 hops (|) Stutters (“) Hop change only affects 4th octet (: ) Hop change but address in same AS (a) ICMP checksum (orange) ! Annotation e.g. network unreachable, admin blocked Multi-homed host Probe type: UDP or ICMP There are several pathologies associated with traceroutes. We needed to find a way to encode them.
Pathology Encodings Change but same AS No change Probe type Change in only 4th octet End host not pingable Hop does not respond Stutter Multihomed ICMP checksum ! Annotation (!X)
Navigation traceroute to CCSVSN04.IN2P3.FR (134.158.104.199), 30 hops max, 38 byte packets 1 rtr-gsr-test (134.79.243.1) 0.102 ms … 13 in2p3-lyon.cssi.renater.fr (193.51.181.6) 154.063 ms !X
History Channel
AS’ information
Esnet-LosNettos segment in the path Changes in network topology (BGP) can result in dramatic changes in performance Hour Samples of traceroute trees generated from the table Los-Nettos (100Mbps) Remote host Snapshot of traceroute summary table Notes: 1. Caltech misrouted via Los-Nettos 100Mbps commercial net 14:00-17:00 2. ESnet/GEANT working on routes from 2:00 to 14:00 3. A previous occurrence went un-noticed for 2 months 4. Next step is to auto detect and notify Drop in performance (From original path: SLAC-CENIC-Caltech to SLAC-Esnet-LosNettos (100Mbps) -Caltech ) Back to original path Dynamic BW capacity (DBC) Changes detected by IEPM-Iperf and AbWE Mbits/s Available BW = (DBC-XT) Cross-traffic (XT) Esnet-LosNettos segment in the path (100 Mbits/s) ABwE measurement one/minute for 24 hours Thurs Oct 9 9:00am to Fri Oct 10 9:01am
Data Display Many different ways to look at traceroute data Output from traceroute command Tabular format: facilitates comparisions #date # time #hops epoch rtno node route 08/31/2004 11:13:19 14 1093975999 3 node1.cesnet.cz ...,134.55.209.1,...,134.55.209.58,62.40.103.213,...,195.113.xxx.xxx 08/31/2004 11:23:37 14 1093976617 2 node1.cesnet.cz ...,134.55.209.1,...,134.55.209.200,62.40.103.214,...,195.113.xxx.xxx 08/31/2004 11:33:38 14 1093977218 3 node1.cesnet.cz ...,134.55.209.1,...,134.55.209.58,62.40.103.213,...,195.113.xxx.xxx Topology maps
Data Display Historical list of routes (route number, first seen date, last seen date, hops) #rt# firstseen lastseen route 0 1086844945 1089705757 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 1 1087467754 1089702792 ...,192.68.191.83,171.64.1.132,137,...,131.215.xxx.xxx 2 1087472550 1087473162 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 3 1087529551 1087954977 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 4 1087875771 1087955566 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,(n/a),131.215.xxx.xxx 5 1087957378 1087957378 ...,192.68.191.83,137.164.23.41,137.164.22.37,...,131.215.xxx.xxx 6 1088221368 1088221368 ...,192.68.191.146,134.55.209.1,134.55.209.6,...,131.215.xxx.xxx 7 1089217384 1089615761 ...,192.68.191.83,137.164.23.41,(n/a),...,131.215.xxx.xxx 8 1089294790 1089432163 ...,192.68.191.83,137.164.23.41,137.164.22.37,(n/a),...,131.215.xxx.xxx
Summary One page per day to eyeball for route changes Links provided for ease of further examination Do not alert on traceroute changes yet, but is integrated with Bandwidth Change Analysis
Bandwidth Change Analysis Purpose is to generate “alerts” about “major” drops in available bandwidth and/or link capacity Simplistically: Data is buffered into History & Trigger Examine time spacing of the data – calculate size of History and Trigger buffers – we have chosen “History” buffer (about 24 hours) “Trigger” buffer (about 3 hours) Pick a threshhold of change to alert on (~40%) Start with the oldest data Load about 3 hours data into the history buffer Calculate the mean and standard deviation (histmean and histsd)
Methodology Start examining the data in order of oldest to newest. If value > histmean -2*histsd: Put it in the history buffer Remove oldest value from the trigger buffer Recalculate the histmean and histsd Else put it in the trigger buffer Trigger buffer is not full Trigger buffer is full
Trigger Buffer is Full Calculate the trigger buffer mean (trigmean) and standard deviation (trigsd) If (histmean – trigmean)/histmean > threshhold 1st level alert Load trigger buffer data into History buffer, clear trigger buffer, and continue processing the data NOTE: Process actually has various filtering conditions in it. This is not the end: only identifies first level trigger conditions
Examples
Examples
Challenges Diurnal Variations Capacity Available bandwidth RTT X traffic
Challenges Unusual variations Trigger buffer length 10 points In this case, we wanted to know that this was happening Trigger buffer length 30 points
Considerations From the performance monitoring perspective of managing production networks, we are primarily concerned about pathologies that interfere with the production process We are not really interested in the minor ebb and flow of network traffic We do want to be alerted to pathologies which may be affecting the production process
Challenges There are many algorithms which may be useful in analyzing data for various types of variations with which we may not be concerned Developing code for this is challenging and complex, but can be done. Problem: CPU power and “elapsed” time to analyze monitoring data with all these analysis tools is impractical – most of us cannot afford supercomputers or farms to do it Analysis and identification of “events” must be timely
Solutions Quick first level trigger analysis which can be done frequently to check for “events” Provide web page for looking at general health and first level trigger occurrences Can also invoke immediate but synchronized more extensive tests to verify drops Input event data (and longer term data) into more sophisticated analysis to filter for serious “alert”s Save event signatures for future reference
IEPM-BW Future IEPM-BW Version 3 is being architected to facilitate frequent light available bandwidth and link capacity measurements Will use SQL database to manage the probe, monitoring host, and target host specifications as well as the probe data and analysis results Frequent lightweight first trigger level change analysis
Long Term Facility for scheduling on demand and automatic heavyweight bandwidth tests in response to triggers Automatically feed results into more complex analysis code to filter only for “real” alerts Distributed Monitoring
References ABWE: A Practical Approach to Available Bandwidth Estimation, Jiri Navratil and Les CottrellAutomated Event Detection for Active Measurement Systems, A. J. McGregor and H-W. Braun, Passive and Active Measurements 2001. Overview of IAEPM-BW Bandwidth Testing of Bulk Data Transfer, Les Cottrell and Connie Logg Experiences and Results from a New High Performance Network and Application Monitoring Toolkit, Les Cottrell, Connie Logg, and I-Heng Mei Correlating Internet Performance Changes and Route Changes to assist in Trouble-Shooting from an End-User Perspective, Connie Logg, Jiri Navratil, and Les Cottrell Miscellaneous, SLAC
Future - continues Further develop Bandwidth Change Analysis Now have 1st level trigger mechanism Develop more extensive analysis to analyze identified events Develop algorithms to automatically conduct other tests integrate those results into further trigger analysis
IEPM-BW Version 3 Architecture changes SQL data base for host specifications, probe tool specifications, probe scheduling mechanism, analysis results, knowledge base Scheduling mechanisms for lightweight vs heavyweight probes Distributed monitoring and remote data retrieval for “grid” analysis Change analysis (route and bandwidth) and alerts
… and Topology Choose times and hosts and submit request Hour of day SLAC ESnet Alternate rt Alternate route GEANT JAnet Nodes colored by ISP Mouseover shows node names Click on node to see subroutes Click on end node to see its path back Also can get raw traceroutes with AS’ CESnet IN2P3 DL CLRC
In Progress Code is being rewritten to: Allow for standalone use Integrate into IEPM-BW version 3 Integrate with Bandwidth Change Analysis
Bandwidth Change Analysis Available BandWidth Estimation (ABWE), developed by Jiri Navratil is used to perform frequent probes for link capacity, available bandwidth and cross traffic load During its development we noticed that ABWE results and iperf tracked very closely. Iperf is network intensive and it is practical to only do a few measurements day ABWE is very light weight and can do measurements about every 3 minutes for about 60 nodes Took a look at using ABWE measurements for monitoring and alerting on bandwidth changes
Futures IEPM-BW Version 3 - Technology changing rapidly – Asynchronous frequent lightweight probes Synchronous less frequent heavy weight probes which can be used to check out changes indicated by lightweight probes Technology changing rapidly – New routing, buffering, hardware, software, protocols etc. will require new probe techniques Provide a framework within which to evaluate new probe techniques
Futures IEPM-BW will continue to be useful for developing countries
More information Where to get it: Topology: IEPM-BW home page Example: http://www.slac.stanford.edu/comp/net/bandwidth-tests/hercules/tracesummaries/today.html Where to get it: Topology: http://pcgiga.cern.ch:8080/cgi-bin/pnets.pl IEPM-BW home page http://www-iepm.slac.stanford.edu/bw/ ABwE lightweight bandwidth estimation http://www-iepm.slac.stanford.edu/abing/
Forward Routing changes Reverse Routing changes New CENIC path 1000 Mbits/s Forward Routing changes AbWE Iperf back to new CENIC path Bbftp Iperf 1 stream Drop to 100 Mbits/s by Routing (BGP) errors RTT Drop to 622 Mbits/s path Reverse Routing changes 28 days bandwidth history. During this time we can see several different situations caused by different routing from SLAC to CALTECH ABwE also works well on DSL and wireless networkss. Scatter plot graphs of Iperf versus ABw on different paths (range 20–800 Mbits/s) showing agreement of two methods (28 days history)