Aditya Akella Thesis Oral June 22, 2005 End Point-Based Routing Strategies for Improving Internet Performance and Resilience Aditya Akella Thesis Oral June 22, 2005
Internet Access Speeds are Improving >100 Mbps Few kbps (years ago) 1.5 Mbps 10 Mbps 45 Mbps > Well-connected
Higher Access Speed Better Internet Experience? Download times Download times Internet 1 Tbps 1Gb 1Tb 1Gb 1Tb 100Mb 100Mb Well-connected
Performance Bottlenecks in the Internet Constrained network links limit performance Response times, transfer speeds, resilience Wide-Area Bottlenecks: little spare capacity; Inside or between ISPs Verio ATT Cogent MCI Sprint Verizon
Wide-area Bottlenecks are Prevalent Quite prevalent, very congested [Akella03] 50% paths have bottlenecks; spare capacity < 10Mbps No hope for good performance? Wide-area bottleneck
Internet Routing is Rigid Must enable extra flexibility for end-points Extra flexibility More routes, informed choice Special mechanisms that work over Internet routing Internet’s rich topology many alternate paths! But Internet routing is rigid, not performance-aware Provides one path per destination Wide-area bottleneck
Central Questions How much extra flexibility do end-points need? What mechanisms could they employ? Past research advocates arbitrary flexibility Internet-wide special-purpose infrastructures My Thesis: Arbitrary flexibility not necessary Sufficient to intelligently choose from 2 or 3 routes No special infrastructure End-point based
Multihoming Route Control Up to 30% better Internet performance (RTT, throughput, availability) Performance comparable with enabling arbitrary flexibility Can realize benefits via simple techniques, e.g., ISP probing Verizon ATT Cogent MCI Sprint Verio Use multiple ISP connections in a smart manner
Dissertation Road-Map Where do performance problems lie? What mechanisms help overcome the problems? (IMC 03) (SIGCOMM 03, SIGCOMM 04) Thesis proposal Practical Multihoming Route Control System This talk (SIGCOMM 03, USENIX 04) Will such systems be effective in the future? (PODC 03)
Outline of the Talk Wide-Area Bottlenecks Past work on Improving Internet Performance Benefits Multihoming Route Control Route Control implementation Performance Scaling in the Internet Summary and Open Issues
Measuring Wide-Area Bottlenecks Wide-area bottleneck where an unconstrained TCP flow sees delays and losses 78 ISPs probed in all From 26 sources (PlanetLab) A new TCP-like tool to identify bottlenecks: BFind tier-3 tier-3 tier-3 tier-3 tier-4 tier-4 tier-2 tier-2 tier-2 tier-3 tier-2 tier-1 tier-4 tier-4 tier-1 tier-3 tier-3 tier-1 tier-2 tier-4 tier-4 tier-4 tier-4 tier-4 tier-2 tier-3 tier-4 tier-4
%bottlenecks %all links %bottlenecks %all links Measurement Results Intra-ISP links Inter-ISP links Found bottlenecks in 900 paths (out of 2028) ~45% of all paths 55% had >40Mbps available capacity %bottlenecks %all links %bottlenecks %all links Tiers 3, 4 25% 9% Tier 1 15% 51% Low-latency 38% 57% Tiers (3,4) – Tiers (3,4) 17% 4% High-latency 19% Tier 1 – Tier 1 2% 6% Available bandwidth at bottlenecks Tier-1 ISPs bottlenecks highest available b/w Tier-2 and tier-3 bottlenecks identical
Outline of the Talk Wide-Area Bottlenecks Past work on Improving Internet Performance Benefits Multihoming Route Control Route Control implementation Performance Scaling in the Internet Summary and Open Issues
Performance-Aware Internet Routing Intra-domain traffic engineering [Shaikh99] Inter-domain traffic engineering [Feamster03] ATT A B C D B A A - 1.0 0.6 0.1 B 0.2 - 0.3 1.0 C C 0.8 0.4 - 0.2 D Sprint D’ C’ A’ B’ D 0.5 0.7 0.5 - Estimate traffic, assign routing weights such that no link is over-loaded Problems: Limited to 1-2 ISPs; not end-to-end Coarse time-scales
Overlay Routing for Better End-to-End Performance Overlay network Significantly improve Internet performance [Savage99, Andersen01] Compose Internet routes on the fly n! route choices; Very high flexibility Overlay nodes Problems: Poor interaction with ISP policies Expensive Is arbitrary flexibility essential? Download cnn.com over Internet2
Outline of the Talk Wide-Area Bottlenecks Past work on Improving Internet Performance Benefits Multihoming Route Control Route Control implementation Performance Scaling in the Internet Summary and Open Issues
Multihoming Route Control Multiple ISPs multiple BGP paths ISPs differ in connectivity and performance Also, ISP paths non-overlapping [Akella03, Teixeira03] Cleverly exploit variation, independence: Multihoming Route Control Moderately improved routing flexibility; End-only mechanism Better at night Better in mornings Verio Sprint ATT Clever use of multiple ISP routes
Multihoming Route Control Challenges Oracle tells which ISP is best Potential benefits? How many ISPs? Which ISPs? Verio Benefits in practice? Deployment issues? Pricing, Global effects, routing tables Sprint ATT Later in the talk…
Multihoming emulation Measurement Testbed Multihoming emulation Potential Benefits? How many ISPs? Which ISPs? Concurrent measurements over 1 week offline analysis Multihoming set up Control traffic scheduling Multiple destinations Many cities Atlanta 4 Boston 3 Chicago 8 Dallas Los Angeles 6 New York San Francisco 10 Seattle Washington DC Others 13 Akamai: 68 nodes, 17 US cities Singly-homed to distinct ISPs Attached to ISPs of different “sizes”
Potential Benefits of Multihoming to k ISPs Sprint Verio k = 2 Tot ISPs in city: 3 Emulated Multihoming ATT Potential benefits? How many ISPs? Which ISPs? Pick k ISPs from all serving the city Compute ratio; average across destinations, time Perf from k ISPs Perf when using all ISPs Best set of k ISPs k-Multihoming Best when using Sprint & Verio Best when using Sprint, ATT & Verio Compare
Round-Trip Time (RTT) Improvement k-Multihoming RTT 30% better All-Multihoming RTT NYC: 2-multihoming relative to best ISP Median RTT Improves by 8ms 90th percentile RTT Improves by 40ms Averaged across destinations Negligible Coast-to-coast: 50ms Significant for Web transfers (~10 RTTs) Up to 30% average improvement relative to single ISP Diminishing returns beyond 3 ISPs Throughput (1MB transfers) and availability (pings, traceroutes) benefits are similar (e.g., 20% for throughput)
Choosing Your ISPs: A Case Study in SF Potential Benefits? ~30% improvement How many ISPs? No benefit beyond 3 Which ISPs? Good combination Potential Benefits? ~30% improvement How many ISPs? No benefit beyond 3 Which ISPs? Ranks Performance Optimal Greedy & Smart 1, 2, 5 1.06 Ranks Performance Moderate improvement in flexibility helps a lot! Moderately improved end-only flexibility vs. arbitrary flexibility from Overlay routing? 1 1.38 1, 2, 3 1.14 Naive ? Ranks Performance 10, 9, 8 1.25
Comparing Against Overlay Routing “k-Overlay routing” is strictly better than k-Multihoming k.n! paths k paths By what extent is Overlay routing better? Use testbed machines as intermediate overlay nodes
k-Overlays vs. k-Multihoming k-Multihoming RTT k-Overlay RTT 3-Overlays relative to 3-Multihoming Across city- destination pairs 1-Overlays Median RTT difference 85% are less than 5ms 90th percentile RTT difference 85% are less than 10ms k-Multihoming 1-Overlays vs 3-Multihoming Multihoming ~2% better in some cities, identical in others Multihoming essential to overcome serious first hop ISP problems 3-Overlay routing RTT 6% better on average than 3-Multihoming (Throughput difference less than 3%)
Benefits of Multihoming Route Control Multihoming: compliant with Internet routing Good Internet performance achievable with Internet routing End-to-end performance today: There is still some hope in Internet Routing!
Outline of the Talk Wide-Area Bottlenecks Past work on Improving Internet Performance Benefits Multihoming Route Control Route Control implementation Performance Scaling in the Internet Summary and Open Issues
Ideal Multihoming Three assumptions: Perfect information No overhead Traffic control Best at 10:00am Best at 8:00am Best at 8:30am Best at 9:00am Practical implementation must address these Best ISP
Keeping Up-to-date Information Passive probing use existing transfers Active probing send out-of-band probes How many destinations? Top few by request volume Current sample good estimate of future ISP performance no history! Verio Sprint ATT Regularly monitor performance over ISP links
Directing Traffic on Optimal Links Response sent to 10.0.192.1 Outbound control easy Inbound hard! Internet routing destination address based Routing table impact? Use provider-assigned addresses Verio Sprint ATT 10.0.0.0/18 Owns: 10.0.0.0/16 10.0.64.0/18 Split net block into 3 parts 10.0.192.0/18 CNN.com x.x.x.x 10.0.192.1
Performance Evaluation Route control implementation over a Linux-based Web proxy Trace-based evaluation of a 3-multihomed network using DummyNet Web performance < 10% away from optimal (i.e., oracle) About 30% better than single ISP Sampling every 60s good response times, resilience Performance penalty due to probing, NAT < 2%
Related Work on Route Control Commercial route control products for data centers, e.g., Internap, netVMG Multihomed Overlays for reliable Web access [Andersen05] Multihomed load balancing across DSL links [Guo04] Path switching for VoIP applications [Tao04] Follow-up study [Qiu04]: Cost Dedicated links: Not an issue Usage-based charging: Online algorithms with cost and performance 10% away from optimal offline cases Global effects: Simulation of equilibria Many multihomed users Avg. latency worsens by < 1ms Other singly-homed users Negligibly worse latencies Good news for multihoming!
Route Control Contributions End-only flexibility of Multihoming Route Control: 30% better Internet performance from 3 ISPs Using 3 ISPs Comparable with overlay routing NAT, active/passive measurement can help realize benefits in practice Verizon MCI Cogent Sprint ATT Verio Use multiple ISP connections in a smart manner
Outline of the Talk Wide-Area Bottlenecks Past work on Improving Internet Performance Benefits Multihoming Route Control Route Control implementation Performance Scaling in the Internet Summary and Open Issues
Performance Scaling in the Internet Network growth Maintains “power-law” Theorem: The expected maximum load on edges in the ISP-level graph is W(n1.8) (n nodes; unit traffic; routed along shortest paths) Real-life example: Japan [Fukuda05] Traffic volumes increase, usage patterns may change But link capacities will also improve (Moore’s law) However, routing, structure may impose restrictions Some portions carry lot more traffic Speeds need to scale faster Poor scaling properties? Routing-based schemes ineffective at ensuring good performance Straw man approach: alter the structure Add parallel links between adjacent ISPs Function of the “degrees” of ISPs Adding parallel links in proportion to minimum degree achieves linear scaling Good future performance Basic changes to Topology and/or Routing
Outline of the Talk Wide-Area Bottlenecks Past work on Improving Internet Performance Benefits Multihoming Route Control Route Control implementation Performance Scaling in the Internet Summary and Open Issues
Summary of Contributions “An integrated approach to optimizing Internet performance” In-depth analysis of current performance bottlenecks Analysis of methods to circumvent bottlenecks Study of performance scaling in the future Internet Several important lessons for the design of Internet’s routing protocol and infrastructure! The most significant contribution, though: Do end-networks today require special support from protocols or infrastructure? NO
Open Issues Longer-term measurement analysis Why diminishing returns? Bottlenecks: persistence Multihoming: ISP choice Why diminishing returns? A result of Internet’s ISP hierarchy? Global effect of route control? Dynamics of interactions, pricing, traffic engineering Better models for congestion scaling ISP-level graph may not stay “power-law”? What then? Analysis of the router-level structure