BGP Inefficiencies Supplemental slides 02/14/2007 Aditya Akella
BGP Complexity BGP is a very complicated protocol –Too many knobs –Need to accommodate (sub-optimal) ISP policies –Requires complex, human configuration For all its complexity, BGP offers no guarantees –Performance?? –Reliability?? –Correctness?? –Reachability?? All of BGPs complexity begets… Headache!
BGP Pitfalls and Problems Pitfalls and problems –Misconfiguration –Convergence –Performance –Reliability –Stability –Security –And the list goes on…
Favorite Scapegoat! BGP Networking community
Misconfiguration [Mahajan02Sigcomm] Origin misconfiguration: accidentally inject routes for prefixes into global BGP tables Old RouteNew Route Self deaggregation (failure to aggregate) a.b.0.0/16 X Y Za.b.c.0/24 X Y Z Related origin (likely connected to the network– human error) a.b.0.0/16 X Y Za.b.0.0/16 X Y a.b.0.0/16 X Y Z O a.b.c.0/24 X Y a.b.c.0/24 X Y Z O Foreign origin (address space hijack!) a.b.0.0/16 X Y Za.b.0.0/16 X Y O a.b.c.0/24 X Y O e.f.g.h/i X Y O
Misconfiguration Export misconfiguration: export route to a peer in violation of policy ExportPolicy Violation Provider AS ProviderRoute exported to provider was imported from a provider Provider AS PeerRoute exported to peer was imported from a provider Peer AS ProviderRoute exported to provider was imported from a peer Peer AS PeerRoute exported to peer was imported from a peer
Interesting Observations Origin misconfig –72% of new routes may be misconfig –11-13% of misconfig incidents affect connectivity Pings and checks –Self de-aggregation is the main cause Export misconfig –Upto 500 misconfiguration incidents per day –All forms are prevalent, although provider-AS-provider is more likely
Effects and Causes Effects –Routing load –Connectivity disruption –Extra traffic –Policy violation Causes (Origin misconfig) –Router vendor software bugs: announce and withdraw routes on reboot –Reliance on upstream filtering –New configuration not saved to stable storage (separate command and no autosave!) –Hijacks of address spaces –Forgotten to install filter –Human operators and poor interface P1P2 A C Intended policy: Provide transit to C through link A-C Configured policy: Export all routes originated by C to P1 and P2 Correct policy: export only when AS path is “C” Export Misconfig
BGP Convergence [Labovitz00Sigcomm] Conventional beliefs –Path vector converges faster than traditional DV (eliminates the count to infinity problem) –Internet path restoration takes order of 10s of seconds Convergence –Recovery after a fault may take as much as ten minutes –Single routing fault could result in multiple announcements and withdrawals –Loss and RTT around times of faults are much worse Upon route withdrawal, explore paths of increasing length –In the worst case, could explore n! paths –Depends which messages are processed and when Limit between update message could reduce messages –Forces all outstanding messages to be processed
End-to-End Routing Behavior [Paxson96Sigcomm] Large scale routing behavior as seen by end- hosts, based on analysis of traceroutes Pathologies: persistent routing loops, routing failures and long connectivity outages Stability: 9% or routes changed every 10s of minutes, 30% about ~6hrs and 68% took a few days Symmetry: more than half of paths probed were asymmetric at router level
Inefficiencies in BGP & Internet Routing Route convergence and oscillations Poor reliability –No way to exploit redundancy in Internet paths Inefficiency: sub-optimal RTTs and throughputs –What are some of the causes? Policies in routing: Inter-domain and Intra-domain Lack of direct routes, “sparseness” of the Internet graph
Inefficiency of Routes [Spring03Sigcomm] Three classes of reasons for poor performance (“inflation”) –Intra-domain topology and policy Topology: no direct link between all cities Routing policy: “shortest paths” may be avoided due to engineering –ISP Peering Peeering topology: limited peering between ISPs Peering policy: hot-potato routing or early-exit routing –Inter-domain Topology: AS graph is sparse Inter-domain policies: policies are policies
Path Inflation Summary
Internet Bottlenecks High-speed “core” Slow, flaky home connection Big, fat Pipe(s) Last-mile, slow access links limit transfer bandwidth Most bottlenecks are last-mile As access technology improves… Non-access or Wide-Area Bottlenecks? 100Mbps home connection
Wide-Area Bottlenecks Wide-Area Internet/ High-speed “core” Small ISP Small ISP Sprint ATT Very Small ISP Tiny ISP Small ISP Small ISP Tiny ISP Very Small ISP UUNet Small ISP Small ISP Small ISP Small ISP Small ISP Unconstrained TCP flow Link with the least available bandwidth Not the “traditional” bottlenecks may not be congestedWide-area bottleneck where an unconstrained TCP flow sees delays and losses
Measurement Tool: BFind Monitor queues, identify where queues build up bottleneck source dest Ideally… But no control over destination Emulate the whole process from the source!
Measurement Tool: BFind source dest Rate controlled UDP stream Rounds of Traceroutes Monitor links for queueing Report to UDP process 1Mbps Round j: Queueing on #2! Rate for round 2:1+ MbpsRate for round 3: 1+2 Mbps Flag #2, keep curent rate for round j+1 force queueing Round 1: No queueing! If #2 flagged too many times quit. Identify #2 as bottleneck Round 2: No queueing! Round 1 Round 2Round j BFind functions like TCP: gradually increase send rate until hits bottleneck Can identify key properties of the bottleneck –Location, latency, available bandwidth (== send rate of BFind before quitting) –Single-ended control Quits after 180s and before send rate hits 50Mbps Bfind validation: wide-area experiments and simulations
Results: Location Intra-ISP links Inter-ISP links Tier 43%1% Tier 39%8% Tier 212%13% Tier 125%63% Tier 4 – 4, 3, 2, 114%1% Tier 3 – 3, 2, 117%3% Tier 2 – 2, 112%4% Tier 1 – 18%6% %bottlenecks %all links 49% 51% Peering Link Intra-ISP Link One of the two peering links with 50% chance One of the four non-peering links with 50% chance Probability of being the bottleneck = 0.25 Probability of being the bottleneck = 0.125
Results: Available Bandwidth Intra-ISP links Inter-ISP links Tier-1 ISPs are the best Tier-3 ISPs have slightly higher available bandwidth than tier-2 Tier-1 –1 peering is the best Peering involving tiers-2,3 similar
Performance: End-to-End Perspective From an end-to-end view… –Is there a way of extracting better performance? Is there scope? How do we realize this? Scope: Savage99, CMU Multihoming work Reality: UW’s “Detour” system, MIT’s RON, Akamai’s SureRoute, CMU’s Route Control implementation
Quantifying Performance Loss [Savage99Sigcomm] Measure round trip time (RTT) and loss rate between pairs of hosts Alternate path characteristics –30-55% of hosts had lower latency –10% of alternate routes have 50% lower latency –75-85% have lower loss rates
Bandwidth Estimation RTT & loss for multi-hop path –RTT by addition –Loss either worst or combine of hops – why? Large number of flows combination of probabilities Small number of flows worst hop Bandwidth calculation –TCP bandwidth is based primarily on loss and RTT 70-80% paths have better bandwidth 10-20% of paths have 3x improvement
Possible Sources of Alternate Paths A few really good or bad AS’s –No, benefit of top ten hosts not great Better congestion or better propagation delay? –How to measure? Propagation = 5th percentile of delays –Both contribute to improvement of performance
Overlay Networks Basic idea: –Treat multiple hops through IP network as one hop in overlay network –Run routing protocol on overlay nodes Why? –For performance – like the Savage 99 paper showed –For efficiency – can make core routers very simple E.g. CSFQ, Also aid deployment. E.g. Active networks –For functionality – can provide new features such as multicast, active processing
Future of Overlay Application specific overlays –Why should overlay nodes only do routing? Caching –Intercept requests and create responses Transcoding –Changing content of packets to match available bandwidth Peer-to-peer applications
Overlay Challenges “Routers” no longer have complete knowledge about link they are responsible for How do you build efficient overlay –Probably don’t want all N 2 links – which links to create? –Without direct knowledge of underlying topology how to know what’s nearby and what is efficient? Do we need overlays for performance?
Number of Route Choices Flexible control of end- to-end path many route choices Multiple candidate paths Single path Multiple BGP paths BGP: one path via each ISP choices linked to #ISPs Few more route choices…?
Route Selection Mechanism BGP: simple, coarse metrics such as least AS hops, policy Best performing path Least AS hops Policy compliant Current best performing BGP path Overlays: complex, performance-oriented selection Sophisticated selection among multiple BGP routes Smart selection “Multihoming route control”
Overlay Routing vs. Multihoming Route Control Cost Operational issues Route ControlOverlay Routing Sprint $$ Genuity $$ ATT $$ Overlay provider $$ ATT $$ Overlay node forces inter- mediate ISP to provide transit /18 netblock Announce /20 sub-blocks to ISPs If all multihomed ends do this Routing table expansionBad interactions with policies Connectivity feesConnectivity fees + overlay fee