1 Locating Internet Bottlenecks: Algorithms, Measurement, and Implications Ningning Hu (CMU) Li Erran Li (Bell Lab) Zhuoqing Morley Mao (U. Mich) Peter Steenkiste (CMU) Jia Wang (AT&T) SIGCOMM’04
2 Goal Locate network bottleneck along end-to-end paths With such information, network operators can improve routing
3 Difficulties End users cannot gain information of network internals High measurement overhead
4 Proposed algorithm – Pathneck Pathneck is an active probing tool Low overhead (i.e., in order of 10s-100s KB) Fast (i.e., in order of seconds) Single-end control (sender only) High accuracy
5 Outline Algorithm Internet validation Testbed validation Internet measurement Applications Conclusion
6 Definition Bottleneck link Link with smallest available bandwidth Available bandwidth Residual bandwidth Choke link Link has lower available bandwidth than the partial path from source to that link Choke point Upstream router of choke link R1R2R3 L1 L2
7 Definition Last choke link is bottleneck link R1R2R3 L1 L2 R4R5 L3 L4L5L6 R6R7 Choke link / Bottleneck
8 Recursive Packet Train (PRT) in Pathneck Load packets 60 pkts, 500 B TTL 255 measurement packets measurement packets 30 pkts, 60 B Load packets are used to measure available bandwidth Measurement packets are used to obtain location information UDP packets
9 Gap value RouterSender Packet train Time axis
10 Gap value RouterSender Drop m. packet Send ICMP
11 Gap value RouterSender Drop m. packet Send ICMP Recv ICMP
12 Gap value RouterSender Drop m. packet Send ICMP Recv ICMP Drop m. packet Send ICMP
13 Gap value RouterSender Drop m. packet Send ICMP Recv ICMP Drop m. packet Send ICMP Recv ICMP Gap value
14 Train length Link capacity train_rate > a_bw train_length increases train_rate ≤ a_bw train_length keeps same Traffic load Heavily loaded train_length increases Lightly loaded train_length keeps same
15 Transmission of RPT R1 S R2 R g1 g2 g gap values are the raw measurement
16 Inference Model – Step 1 Label gap sequence Remove data if cannot get both ICMP Remove the entire probing data if cannot get more than half routers on path Fix hill and valley point Given a certain of steps, minimize the total distance between individual values and the average step values
17 Inference Model – Step 2 Confidence Threshold (conf) Percentage change of available bandwidth To filter out the gap measurement noise Default: conf ≥ 10% available bandwidth change Detection Rate (d_rate) # positive probing / # total probing A hop must appear as a choke point for at least M times (d_rate ≥ M/N) To select the most frequent choke point Default: d_rate ≥ 5/10 = 50%
18 Inference Model – Step 3 Rank choke points Bottleneck is the choke point with largest gap value
19 Pathneck – configuration Each probing set contains packets Probe the same destination times Each probing set take one RTT (wait for 3 seconds, max RTT) conf ≥ 10% filtering d_rate ≥ 50% filtering
20 Output from Pathneck Bottleneck location (last choke point) Upper or lower bound for the link available bandwidth Based on the gap values from each router (details in the paper)
21 Limitations Cannot measure the last hop Limited ICMP rate ICMP packet generation time and reverse path congestion can introduce measurement error Generation time is insignificant Filter out measurement outliers Drop m. packet Recv ICMP Drop m. packet Send ICMP Recv ICMP Measured Gap value True Gap value Send ICMP
22 Limitations Packet loss and route change will disable the measurements Multiple probings can help Cannot pass firewalls Similar to most other tools and usually not bottleneck Bias towards early choke points If change is insignificant, filtered out by confidence threshold
23 Validation Internet validation Abilene network Testbed validation Emulab, a fully controlled environment
24 Internet validation (Abilene) Source: CMU and University of Utah 22 probing destination for each source Each 11 major routers on the Abilene backbone is included in at least one probing path Each destination, probe 100 times with a 2- second interval between consecutive probing
25 Internet validation (Abilene) Detect only 5 non-first hop bottleneck Abilene paths are over-provisioned Detected bottleneck are outside Abilene network, so it cannot be verified
26 Testbed validation (Emulab) 100 probing sets Use the result received all ICMP Entire probing interval is about 1 min
27 Comparing impact of capacity and load Left figure Fix X to 50Mbps Vary Y from 21 to 30 Mbps with step size 1Mbps Right figure Set X and Y to 50Mbps Very CBR loads to Y from 29 to 20 Mbps Bottleneck available bandwidth change from 21 to 30Mbps
28 Testbed validation (Emulab) Probing set can identify Y as bottleneck 86 individual probing: 7 X (correct), 65 Y (correct), 14 X (incorrect) Due to small difference
29 Testbed validation (Emulab) 67 X (correct), 2 Y (correct), 8 X (incorrect) Due to small difference
30 Testbed validation (Emulab)
31 Measurement Methodology Probing sources 58 probing sources (from PlanetLab & RON) Probing destinations Over 3,000 destinations from each source Covers as many distinct AS paths as possible 10 probings for each destination conf 10%, d_rate 50% Duration is within 2 days
32 Popularity <2% paths report more than 3 choke links Popularity = # positive probe of link b / # probe that traverse link b Half of choke links are detected in 20% or less Cannot detect sometimes due to bursty traffic (filtered)
33 Bottleneck Distribution Common Assumption: bottlenecks are most likely to appear on the peering and access links, i.e., on Inter-AS links Identifying Inter/Intra-AS links Only use AS# is not enough (Mao et al [SIGCOMM03]) We define Intra-AS links as links at least one hop away from links where AS# changes Two types of Inter-AS links: Inter0-AS & Inter1-AS links We identify a subset of the real intra-AS links
34 Bottleneck Distribution (cont.) Up to 40% of bottleneck links are Intra-AS Consistent with earlier results [Akella et al IMC03]
35 Location
36 Stability Sample 30 destination randomly Divide 3 hour measurement into 9 epochs of 20 minute each Each epoch, run 5 probing trains
37 Conclusion Pathneck is effective and efficient in locating bottlenecks Sender modified, low overhead Up to 40% of bottleneck links are Intra-AS 54% of the bottlenecks can be inferred correctly Guide Overlay and multihoming
38 References Ningning Hu and et. al., “Locating Internet Bottleneck: Algorithms, Measurements, and Implications,” SIGCOMM’04 Related technical report
39 Abilene Network Map
40 MRTG for Abilene Network
41 Table 4: Probing sources from PlanetLab (PL) and RON
42 Table 4: Probing sources from PlanetLab (PL) and RON
43 Impact of configuration parameters
44 Inference 54% of inferences are successful for 12,212 paths with “enough information” SD RRRRR Help to reduce the measurement overhead
45 Inference Take lowest upper bound and highest lower bound Include upper bound if standard deviation is less than 20% of average Divide into training set and testing set Exclude if testing set cannot identify bottleneck
46 Inference