End-to-end Bandwidth Estimation in the Wide Internet Daniele Croce PhD dissertation, April 16, 2010
“Breakfast Can Wait. The Day’s First Stop Is Online.” [NYTimes‘09] but is our connection performing well? Internet is wonderful 2
Inter-connected networks –Different technologies, many operators –No global view The Internet 3 Net1 Net2 Net3 Net4 Objective: characterize the E2E performance
Simple metrics –Packet loss –Delay (One-Way, RTT), jitter –(TCP) throughput Advanced metrics –End-to-end capacity C=min(C i ) –End-to-end available bandwidth (AB) i.e., the unused capacity A=min(A i ) Performance metrics 4
On a generic link i : Available Bandwidth 5
An example: Narrow link and tight link 6 Narrow LinkTight Link 100 Mbps 90 Mbps 1000 Mbps 400 Mbps 155 Mbps 20 Mbps Available bandwidth C = AB = Capacity
Tools require access to both end hosts –Impossible between different organizations! Three single-ended tools Contribution 1 7 Net1 Net2 Net3 Net4
Large-scale deployments of active AB tools –Routing, P2P optimization, improve TCP Performance evaluation of AB techniques in large-scale measurement systems Contribution 2 8 Net1 Net2 Net3 Net4
Three AB measurement paradigms exist: –PRM (Probe Rate Model) “Is rate higher than the AB?” –PGM (Probe Gap Model) “Has the Inter-Packet Gap increased?” –PDM (Probe Delay Model) “Has the packet queued?” Only analytical or simulative studies Better than PRM or PGM? Real implementation and comparison with other classic PRM and PGM tools Contribution 3 9 NEW!!!
SINGLE-ENDED TECHNIQUES 10
Non-cooperative estimation RTT = OWD f DSLAM ACK probes TCP RSTs Sender Receiver Can we separate the effects of the two paths? Sender Receiver ACKs Sender RSTs + OWD r 11
Where is the tight link? 12 Sender ACKs Sender RSTs RSTs are always 40 Bytes No matter the size of the ACK probes By varying the ACK size We can load the two paths equally (S ACK = S RST ) We can load the downlink more than the uplink (S ACK > S RST ) We can NOT load the uplink more than the downlink (S ACK < S RST )
ABw-Probe (ABP) 13 Measuring the downlink (no uplink traffic) Impact of “cross”-traffic on the uplink cooperative non-coop.
Uplink cross-traffic 14
Filtering uplink cross-traffic Cross-traffic is not just MTU packets –Use DT to remove large packets –Then use RR for refining 15
FAB-probe (large-scale) 16 Do we really need a 40 kbps precision?
Real-world experience Tested on 1244 ADSL hosts, 10 different ISPs –Participating in Kademlia DHT (eMule) Used KAD crawler (ACM IMC 2007) Selected ADSL using Maxmind 1.Capacity of the ADSL link 2.A snapshot of the available bandwidth 3.Average AB on over 10 days –82 hosts online for over one month –Static IP address –Measured every 5 minutes On average 6 seconds per measurement 17
Capacity estimation Comparison of 2 large ISPs The policy used by Free is quite uncommon (see IMC07) 0.7Mbps 2.5Mbps 0.3Mbps 1Mbps DownlinkUplink 18
Available bandwidth (I) Snapshot of 1244 (eMule) hosts Hosts are divided in congested or idle 19
Available bandwidth (II) 82 hosts, 10 days average –Each point is an average of one user over 10 days 30% congested, 30-40% frequently idle 20
ANALYSIS OF LARGE-SCALE AB MEASUREMENT SYSTEMS 21
Motivation We have a dream: measure AB everywhere –Route selection, server selection –Overlay performance optimization –Improve TCP –... Naïve approach: –pick one of the existing techniques! BUT what if we all do the same simultaneously? Interference between measurements 22
In brief Existing techniques –Pathload, Spruce, pathChirp Experimental testbed –All tools suffer from mutual interference But not in the same way!!! –High intrusiveness and overhead Analytical models –Probability of interference –Measurement bias What can we do? 23
Pathload – Packet Trains Probing strategy: –Iteratively send N trains at different rates –Binary search to converge to the AB Inference: –Detect One-Way Delay increase (rate > AB) 24
Spruce – Packet Pairs Probing strategy: –Two packets with specific inter-packet gap Inference: –Measure dispersion (gap increase) of the pair –Accuracy is debated, out of our scope ∆in Bottleneck ∆out 25
Interference in Spruce One pair interfering… What is the probability that this happens? –Hint: similar to ALOHA protocol 0 26 ∆in ∆out 100% error!
pathChirp – Packet “chirps” Probing strategy: –One train with exponentially increasing rate Inference: –Detect One-Way Delay increase 27 ABw Limit
Testbed results 62 hosts running linux –Half are senders, half receivers Single bottleneck (10 Mbps), CBR traffic –Ideal conditions for ABw tools –Errors are due to mutual interference only 28
Pathload 29
Spruce True?? How much OVERHEAD? 30 Results are biased
pathChirp Results seem better 31 True?? High OVERHEAD!
Intrusiveness x10 x100 32
Possible Solutions Mutual interference –Direct probing more promising Simple, Spruce-like algorithms. No binary search –Identify interference (and correct it)?? Overhead –“In-band” measurements (piggy-backing) Best, no overhead at all Complex! (SIGCOMM09) + delay constraints –“Out-of-band” measurements At least, make the overhead scale with the ABw! Lets help each other! Network Tomography 33
Conclusions Non-cooperative estimation –Three highly optimized tools –No need to install software or buy new equipment An Italian ISP already interested! Analysis of large-scale AB measurements –Tools can not be used off-the-shelf Mutual interference, Intrusiveness, Overhead –Interference can be predicted and modeled –Discussed possible solutions Future work includes –Technologies different from ADSL (cable, FTTH) –New, lightweight techniques (passive?), tomography 34
BACKUP 35
Collision with ON-OFF meas. 36 Few hosts cause > 10% collisions!
Non-cooperative estimation Who is answering to what (Monarch, IMC’06)
Measurement bias: Spruce Measurement error in Spruce –Depends on the # of interfering pairs n : The average number of interf. pairs is This explains why Spruce bias is proportional to 38
Pathload interference, two trains 39
PathChirp, two chirps With only two trains, errors up to 80%!!! 40
Measurement Overhead Spruce –Overhead = min(240kbps, 5% of Bneck Capacity) –Few hosts can consume a LOT of Bw! Pathload –Overhead ≈ ABw –Cons: measurements consume all the ABw –Pro: overhead “scales” with the ABw pathChirp –Overhead = 300kbps (tunable parameter) –What if 10 hosts are measuring? If 100? 41
With traffic load 20 hosts running, ABw=6 Mbps 42
All tools together 9 hosts per type (27 senders) 43
Delay-based tools Consider a single server queue –The utilization can be computed as 0 is the probability of the queue being empty –Probe-Delay-Model (PDM) tools estimate 0 PDM tools –Make no assumptions on cross-traffic –Inject very little overhead no need for high probing rates 44
Forecaster Model 45 The AB is estimated by “projecting” the utilization
Threshold problem 46 In our experiments, must allow ~100us for inaccuracies!