End-to-End Routing Behavior in the Internet Vern Paxson Presented by Zhichun Li
Idea Use end-to-end measurement to determine: Route pathologies Route stability Route symmetry Key property (N 2 scale) Use N sites to measure N 2 Internet pathes
Definitions Virtual path: network level abstraction of “direct link” between two hosts. At the network layer, it is realized by a single route. Autonomous system (AS): collection of routers and hosts controlled by a single administrative entity.
Routing Protocols Interior Gateway Protocol (IGP): routing protocol for entities within the same AS. Border Gateway Protocol (BGP): for inter-AS routing. Each AS keeps a routing table with reachable hosts and corresponding costs. Upon detected changes, only affected part of routing table is shared.
Methodology Run Network Probes Daemon (NPD) on a number of Internet sites (37)
Methodology Each NPD site periodically measure the route to another NPD site, by using traceroute Two sets of experiments D1 – measure each virtual path between two NPD’s with a mean interval of 1-2 days, Nov-Dec 1994 D2 – measure each virtual path using a bimodal distribution inter-measurement interval, Nov-Dec % with mean of 2 hours 40% with mean of 2.75 days Measurements in D2 were paired Measure A=>B and then B<= A
Methodology Links traversed during D1 and D2
Methodology Exponential sampling Unbiased sampling – measures instantaneous signal with equal probability PASTA principle – Poisson Arrivals See Time Averages Is data representative? Argue that sampled AS’s are on half of the Internet routes Confidence intervals for probability that an event occurs
Limitations Just a small subset of Internet paths Just two points at a time Difficult to say why is something happened, only with end-to-end measurements 5%-8% of time couldn’t connect to NPD’s Introduces bias toward underestimation, why?
Routing Pathologies Persistent routing loops Temporary routing loops Erroneous routing Connectivity altered mid-stream Temporary outages (> 30 sec)
Routing Loops & Erroneous Routing Persistent routing loops (10 in D1 and 50 in D2) Several hours long (e.g., > 10 hours) Largest: 5 routers All loops intra-domain Transient routing loops (2 in D1 and 24 in D2) Several seconds Usually occur after outages Erroneous routing (one in D1) A route UK=>USA goes through Israel
Route Changes Connectivity change in mid-stream (10 in D1 and 155 in D2) Route changes during measurements Recovering bimodal: (1) 100’s msec to seconds; (2) order of minutes Route fluttering Rapid route oscillation Very little fluttering was seen and only happened within the AS.
Example of Route Fluttering wustl (St. Loutis) to umann(Mannheim, Germany) Solid: 17 hops, dotted: 29 hops
Problems with Fluttering Path properties difficult to predict This confuses RTT estimation in TCP, may trigger false retransmission timeouts Packet reordering TCP receiver generates DUPACK’s, may trigger spurious fast retransmits These problems are bad only for large scale flutter; for localized flutter is usually ok
Infrastructure Failures “host unreachable” from router well inside the network. 0.21% in D1, estimate availability rate 99.8%. This dropped to 99.5% in D2.
NPD’s unreachable due to many hops (6 in D2) Unreachable more than 30 hops Path length not necessary correlated with distance 1500 km end-to-end route of 3 hops 3 km (MIT – Harvard) end-to-end route of 11 hops
Temporary Outages Sequence of traceroute packets lost due to temporary loss of connectivity or heavy congestion. In D1(D2), 55% (43%) had 0 losses, 44% (55%) had 1 to 5 losses, and 0.96% (2.2%) had 6 or more.
Distribution of Long Outages (>30 sec )
Time-of-Day patterns Mean time-of-day between source and destination is associated with each measurement. Temporary outages: min (0.4%) occurred during the 1:00-2:00 h, max (8.0%) during the 15:00-16:00 h. Infrastructure failures: min (1.2%) at 9:00- 10:00 h, peak during 15:00-16:00 h.
Pathology Summary
Routing Stability Two definitions of stability: Prevalence: likelihood to observe a particular route Steady state probability that a virtual path at an arbitrary point in time uses a particular route Conclusion: In general Internet paths are strongly dominated by a single route Persistence: how long a route remains unchanged Affects utility of storing state in routers Conclusion: routing changes occur over a wide range of time scales, i.e., from minutes to days
Routing Stability Routing Prevalence Let r be the steady-state probability that a VP uses route r at an arbitrary time. Due to PASTA, an unbiased estimator of r can be computed as The prevalence of the dominant route is analyzed.
Routing Prevalence In general, Internet paths are strongly dominated by a single route, especially if observed at higher granularity.
Routing Persistence The notion of persistence depends on what is deemed persistent. A series of measurements are undertaken to classify routes according to their alternation frequency.
Routing Symmetry Sources of Routing Asymmetry Link cost metrics contain an asymmetry themselves along the two directions. “hot potato” routing problem due to the competing providers.
Routing Symmetry Analysis of Routing Symmetry Measurements were paired to ensure that an asymmetry is actually being captured. Asymmetry is quite common (49% on a city granularity, 30% AS granularity). Size of Asymmetries Majority confined to one hop (one city or AS)
Summary Pathologies doubled during 1995 Asymmetry is quite common Paths heavily dominated by a single route Over 2/3 of Internet paths are reasonable stable (> days). The other 1/3 varies over many time scales