Internet Routing: Measurement, Modeling, and Analysis Dr. Jia Wang AT&T Labs Research Florham Park, NJ 07932, USA Prof. Zhuoqing Morley Mao Department of EECS University of Michigan Ann Arbor, MI 48109, USA ACM Sigmetrics 2005 Tutorial
2 Outline 1.Overview of Inter-domain routing 2.Measuring inter-domain paths 3.BGP Measurement 4.BGP Modeling Our opinions should not be taken to represent AT&T policies
Part I: Overview of Inter- domain Routing
4 Internet Loose cooperative effort of Internet Service Providers (ISPs) E.g., AT&T, Sprint, UUNet, AOL Best effort service Connectedness Anyone connected to the Internet can exchange traffic with anyone else connected to the Internet
5 : Routing session routes Control plane: exchange routes Internet routing rusty.cs.berkeley.edu IP= Prefix= /16 IP= Prefix= /20 Internet IP traffic Data plane: forward traffic Fail over to alternate route
6 Internet routing domain Autonomous routing domain Network devices under same technical and administrative control Common routing policy E.g., ISPs, enterprise networks Autonomous system Autonomous routing domain with an AS number (ASN) AS numbers: 16 bits integer Public AS number: 1 – Private AS number: – Examples AT&T: 7018, 6431, … Sprint: 1239, 1240, … MIT: 3
7 More than 20,000 ASes today Berkeley Internet CNN Calren Level3 GNN IP traffic QwestSprintUUnet University company AT&T business ISP Autonomous System Berkeley Calren Level3 QwestSprintUUnet University company AT&T business ISP Berkeley Calren Level3 QwestSprintUUnet University company AT&T business ISP
8 Internet routing architecture IP traffic Berkeley CNN Level3 Internet CalrenGNN Inter-domain routing Intra-domain routing
9 Run within a certain network infrastructure Optimize routes taken between points within a network Internal Gateway Protocols (IGPs) Metrics based OSPF (Open Shortest Path First) RIP (Routing Information Protocol) IS-IS (Intermediate System to Intermediate System)
10 Inter-domain routing Run between networks Provide full connectivity of entire Internet External Gateway Protocol (EGP) Policy based BGP (Border Gateway Protocol)
11 Link state protocols Examples: OSPF, IS-IS Based on Dijkstra’s shortest path computation Each router periodically floods immediate reachability information to other routers Fast convergence High communication and computation overhead Not scalable for large networks Requires periodic refreshes
12 Vectoring protocols Distance vs. Path Vector Distance: hop count (RIP) Path: entire path (BGP) Helps identify loops Supports policy-based routing based on path Minimal communication overhead Takes longer to converge, i.e., in proportion to the maximum path length
13 Link state vs. vectoring OSPF IS-IS RIP BGP IGP EGP Link stateVectoring BGP is a path vector protocol
14 Classful addressing IPv4: 32 bits Five classes of networks ClassAddressMask# of networks# of hosts A0* ~1.6M B10* C110* ~2.1M255 DUsed for multicast EReserved and currently unused Improve scaling factor of routing in the Internet => classless
15 CIDR: Classless Inter-domain Routing (RFC1519) No implicit mask based on the class of the network Explicit masks passed in the routing protocol Allow aggregation and hierarchical routing IP address: Mask: CIDR representation: /22 Address Mask Network prefix Host identifier
16 Address aggregation Internet / / / /24 ISP A ISP B / /16
17 Routing and forwarding Routing The decision process of choosing optimal path that is consistent with the administrative or technical policy Forwarding The act of receiving a packet, doing a lookup, and copying a packet to the next hop
18 Classless forwarding Internet IP traffic PrefixNext hop / / /
19 Inter-domain routing with CIDR support BGP-4 [RFC1771] De facto EGP Carry routing information between ASes Path vector protocol Policy based routing Run on top of TCP for reliability Basic operations Set up BGP session Exchange all candidate routes Send incremental updates
20 Establish BGP session Establish neighboring session between and PrefixNext hop / / PrefixNext hop / / TCP 179
21 Exchange all candidate routes PrefixNext hop / / / / PrefixNext hop / / / / / / / /
22 Send incremental updates PrefixNext hop / / / / PrefixNext hop / / / / Withdraw /16
23 BGP messages OPEN: set up a peering session UPDATE: announce new routes or withdraw previously announced routes NOTIFICATION: shut down a peering session KEEPALIVE: confirm active connection at regular interval
24 Internal vs. external BGP Internet I-BGP E-BGP AS A AS B AS C E-BGP update I-BGP update I-BGP update
25 Scaling I-BGP for large AS Route reflectors Confederations E-BGP update RR Only best paths being sent by RR AS 1000 EBGP IBGP AS AS 65020
26 Establish connectivity / PrefixNext hop AS path / EBGP IBGP EBGP AS 1 AS 2 AS 3 PrefixNext hop AS path / PrefixNext hop AS path /
27 IGP and BGP working together / PrefixNext hop AS path / EBGP IBGP EBGP AS 1 AS 2 AS 3 PrefixNext hop AS path / PrefixNext hop / / /30
28 Policy routing ISP1 ISP4ISP3 Cust1Cust2 ISP2 traffic Connectivity DOES NOT imply reachability! Policy determines how traffic can flow on the Internet
29 BGP routing process Apply input policy Routes received from peers Select best route Best routes Apply output policy Routes advised to peers Routing table Forwarding table BGP is not shortest path routing!
30 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules
31 Best route selection Highest local preference To enforce economical relationships between domains Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules
32 Best route selection Highest local preference Shortest AS path Compare the quality of routes, assuming shorter AS-path length is better Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules
33 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) To implement “cold potato” routing between neighboring domains I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules
34 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Prefer EBGP routes to IBGP routes Lowest I-BGP cost to E-BGP egress Tie breaking rules
35 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Prefer routes via the nearest IGP neighbor To implement “hot potato” routing Tie breaking rules
36 Best route selection Highest local preference Shortest AS path Lowest MED (Multi-Exit-Discriminator) I-BGP < E-BGP Lowest I-BGP cost to E-BGP egress Tie breaking rules Router ID based: lowest router ID Age based: oldest route
37 BGP route propagation Not all possible routes propagate Commercial relationships determine policies for Route import Route selection Route export
38 Typical AS relationships Provider-customer customer pay money for transit Peer-peer typically exchange respective customers’ traffic for free Siblings Mutual transit agreement Provide connectivity to the rest of the Internet for each other
39 AS relationships translate into BGP export rules Export to a provider or a peer Allowed: its routes and routes of its customers and siblings Disallowed: routes learned from other providers or peers Export to a customer or a sibling Allowed: its routes, the routes of its customers and siblings, and routes learned from its providers and peers
40 Which AS paths are legal? Valley-free: After traversing a provider-customer or peer-peer edge, cannot traverse a customer-provider or peer-peer edge Invalid path: >= 2 peer links, downhill- uphill, downhill-peer, peer-uphill
41 Example of valley-free paths X X [1 2 3], [ ] are valley-free [1 4 3], [ ] are not valley free
42 Inferring AS relationships Identify the AS-level hierarchy of Internet Not shortest path routing Predict AS-level paths Traffic engineering Understand the Internet better Correlate with and interpret BGP update Identify BGP misconfigurations E.g., errors in BGP export rules
43 Existing approaches On inferring Autonomous Systems Relationships in the Internet, by L. Gao, IEEE Global Internet, Characterizing the Internet hierarchy from multiple vantage points, by L. Subramanian, S. Agarwal, J. Rexford, and R. Katz, IEEE Infocom, Computing the Types of the Relationships between Autonomous Systems, by G. Battista, M. Patrignani, and M. Pizzonia, IEEE Infocom, On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang, and Y. Zhang, ACM Sigmetrics, 2005.
44 Policy routing causes path inflation End-to-end paths are significantly longer than necessary Why? Topology and routing policy choices within an ISP, between pairs of ISPs, and across the global Internet Peering policies and interdomain routing lead to significant inflation Interdomain path inflation is due to lack of BGP policy to provide convenient engineering of good paths across ISPs
45 Path inflation Based on [Mahajan03] Comparing actual Internet paths with hypothetical “direct” link
Part II: Measuring Inter- domain Forwarding Paths
47 Why do we care? Characterize end-to-end network paths Latency Capacity Link utilization Loss rate. Diagnose routing anomalies Forwarding loop, blackholes, routing changes, unexpected paths, main component of end-to-end latency. Discover Internet topology Server placement
48 Key challenge Need to understand how packets flow through the Internet without real-time access to proprietary routing data from each domain. Identify accurate packet forwarding paths Characterize the performance metrics of each hop along the paths
49 Existing approaches With access to the source AS-level traceroute Towards an Accurate AS-Level Traceroute Tool, by Z. Mao, J. Rexford, J. Wang, and R. Katz, ACM Sigcomm, Scalable and Accurate Identification of AS-Level Forwarding Paths, by Z. Mao, D. Johnson, J. Rexford, J. Wang, and R. Katz, IEEE Infocom, Without access to the source Routescope On AS-level Path Inference, by Z. Mao, L. Qiu, J. Wang, and Y. Zhang, ACM Sigmetrics, 2005.
50 AS-Level Traceroute Traceroute gives IP level forwarding path IP address of the router interfaces on a forwarding path RTT statistics for each hop along the way
51 Traceroute from AT&T Research to traceroute to cnn.com ( ), 30 hops max, 40 byte packets 1 oden ( ) 1 ms 1 ms 1 ms 2 * * * 3 attlr-gate ( ) 2 ms 2 ms 2 ms ( ) 3 ms 4 ms 4 ms 5 gbr6-p52.n54ny.ip.att.net ( ) 4 ms 4 ms 4 ms 6 tbr2-p n54ny.ip.att.net ( ) 4 ms (ttl=249!) 5 ms (ttl=249!) 5 ms (ttl=249!) 7 ggr2-p390.n54ny.ip.att.net ( ) 4 ms 5 ms 4 ms 8 att-gw.ny.aol.net ( ) 4 ms 4 ms 4 ms 9 bb2-nye-P1-0.atdn.net ( ) 4 ms 4 ms 4 ms 10 bb2-vie-P8-0.atdn.net ( ) 13 ms (ttl=245!) 12 ms (ttl=245!) 12 ms (ttl=245!) 11 bb1-vie-P11-0.atdn.net ( ) 10 ms 10 ms 10 ms 12 bb1-cha-P7-0.atdn.net ( ) 20 ms 20 ms 20 ms 13 bb1-atm-P6-0.atdn.net ( ) 25 ms 25 ms 25 ms 14 pop1-atl-P4-0.atdn.net ( ) 25 ms (ttl=243!) 24 ms (ttl=243!) 24 ms (ttl=243!) 15 * * * 16 * * * 17 * * * 18 * * * 19 * * * 20 * * * 21 * * * 22 * * * 23 * * * 24 * * * 25 * * * 26 * * * 27 * * * 28 * * * 29 * * * 30 * * * Who is responsible for the forwarding problem? Destination unreachable!
52 Need to know Inter-domain level path Obtain AS level paths BGP AS path Traceroute AS path
53 BGP AS path AS A AS B AS C Prefix d Forwarding path: data traffic Signaling path: control traffic d: path=[C] d: path=[BC] PrefixAS path dA B C… Is BGP AS path the answer?No!
54 BGP AS path is not the answer Requires timely access to BGP data Signaling path may differ from forwarding path Route aggregation and filtering Routing anomalies: e.g., deflections, loops [Griffin2002] BGP misconfigurations: e.g., incorrect AS prepending Two paths may differ precisely when operators most need accurate data to diagnose a problem!
55 AS AAS BAS CAS D Traceroute AS path Obtain IP level path using traceroute Map IP addresses to ASes Is traceroute AS path the answer?NO! SourceDestination a bcde
56 Traceroute AS path is not the answer Identifying ASes along forwarding path is surprisingly difficult! Internet route registry Origin AS in BGP routes
57 Internet route registry Whois database E.g. NANOG traceroute, prtraceroute Out-of-date, incomplete Address allocation to customers Acquisition, mergers, break-ups
58 Origin AS in BGP routes Last AS in the AS path for each prefix More accurate and complete than whois data PrefixAS path dA B C ……
59 Limitations of BGP origin AS Multiple Origin AS (MOAS) Multi-homing misconfiguration Internet eXchange Points (IXPs) Infrastructure addresses may not be advertised Does not require to be announced publicly Security concerns Addresses announced by someone else Static routed customers Shared equipments at boundary between ASes Need accurate IP-to-AS mapping!
60 Accurate AS-level traceroute Combine BGP and traceroute data to find a better answer!
61 Assumptions IP-to-AS mapping Mappings from BGP tables are mostly correct. Change slowly BGP paths and forwarding paths mostly match. 70% of the BGP path and traceroute path match
62 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Traceroute problems Legitimate mismatches
63 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Internet eXchange Points (IXPs) Sibling ASes Unannounced infrastructure addresses Traceroute problems Legitimate mismatches
64 Internet eXchange Points (IXPs) Shared infrastructure connected to multiple service providers Exchange BGP routes and data traffic May have its own AS number or announced by participating ASes Dedicated BGP sessions between pairs of participating ASes E.g., Mae-East, Mae-West, PAIX.
65 IXPs cause extra AS hop Extra AS hop in traceroute path Large number of fan-in and fan-out ASes Non transit AS, small address block, likely MOAS A B C D E F G Traceroute AS pathBGP AS path B C F G AE
66 Sibling ASes Single organization owns and manages multiple ASes May share address space Cause extra AS hop Large fan-in and fan-out for the “sibling AS pair” Traceroute AS path BGP AS path A B C D E F G H A B C D E F G
67 Unannounced infrastructure addresses ASes do not necessarily announce infrastructure via BGP Lead to “unmapped” addresses Sometimes fall into supernet announced by AS’s provider or sibling
68 Unannounced infrastructure addresses 1. A,C AS A AS B AS C 2. A 3. B,A4. A,C,A Extra AS hop in traceroute path Missing AS hop in traceroute path Substitute AS hop AS loop in traceroute path
69 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Traceroute problems Forwarding path changing during traceroute Interface numbering at AS boundaries ICMP response refers to outgoing interface Legitimate mismatches
70 Forwarding path changing during traceroute AS AAS BAS C AS AAS C AS DAS E AS D AS hop B is substituted by AS D in the traceroute path Route flaps between A B C and A D E
71 Interface numbering at AS boundaries AS AAS BAS C AS AAS C Missing AS hop B in traceroute path
72 ICMP response refers to outgoing interface AS B AS AAS C ICMP message Extra AS hop B in traceroute path
73 BGP path and traceroute path could differ! Inaccurate IP-to-AS mapping Traceroute problems Legitimate mismatches Route aggregation and filtering Routing anomalies, e.g., deflections
74 Route aggregation/filtering /8 B C /8 C /16 C D AS BAS CAS A Extended traceroute path due to filtering by AS B
75 Mismatch patterns and causes Extra AS Miss AS AS Loop Subst AS Other IXPX Sibling ASesXXXX Unannounced IPXXXX Aggregation/ filteringX Inter-AS interfaceXX ICMP source addressXXXX Routing anomalyXXXXX
76 BGP and traceroute data collection Initial mappings from origin AS of a large set of BGP tables Traceroute paths from multiple locations Compare Look for known causes of mismatches (e.g., IXP, sibling ASes) Edit IP-to-AS mappings (a single change explaining a large number of mismatches) For each location: Combine all locations: Local BGP pathsTraceroute AS paths For each location: (Ignoring unstable paths)
77 Measurement setup Eight vantage points Upstream providers: US-centric tier-1 ISPs Sweep all routable IP address space About 200,000 IP addresses, 160,000 prefixes, 15,000 destination ASes
78 Preprocessing BGP paths Discard prefixes with BGP paths containing Routing changes based on BGP updates Private AS numbers ( ) Empty AS paths (local destinations) AS loops from misconfiguration AS SET instead of AS sequence Less than 1% prefixes affected
79 Preprocessing traceroute paths Resolving incomplete traceroute paths Unresolved hops within a single AS map to that AS Unmapped hops between ASes Try match to neighboring AS using DNS, Whois Trim unresponsive (*) hops at the end Compare with the beginning of local BGP paths MOAS at the end of paths Assume multi-homing without BGP Validation using AT&T router configurations More than 98% cases validated
80 Initial IP-to-AS Mapping WhoisCombined BGP tables Resolving incompletes Match44.7%73.2%78.0% Mismatch29.4%8.3%9.0% Ratio
81 Heuristics to improve mappings Overall modification to mappings 10% IP-to-AS mappings modified 25 IXPs identified 28 pairs of sibling ASes found 1150 of the /24 prefixes shared IXPSibling ASes Unannounced address space Match84.4%85.9%90.6% Mismatch8.7%7.8%3.5% Ratio
82 Systematic optimization Dynamic-programming and iterative improvement Initial IP-to-AS mapping derived from BGP routing tables Identify a small number of modifications that significantly improve the match rate. 95% match ratio, less than 3% changes, very robust
83 Optimization results Input mappingMismatch Full initial Mapping5.23% Heuristically optimized mapping3.08% Omit 10% initial mapping6.57% Omit 4 probing sources6.34% Omit probing destinations (one probe per unique BGP path) 7.12%
84 AS-level path inference Without access to the source Challenges Asymmetric routes: 60% Complicated routing policies Multihomed networks Find the shortest policy path that conforms with AS relationships
85 Routescope Assumptions Explicit AS relationship Peer-peer Provider-customer Shortest policy AS path preferred Valley-free Uniform routing policy within an AS AS destination based uniform routing Stability These assumptions are mostly correct.
86 AS path inference algorithm Compose AS graph based on BGP tables Infer AS relationship Classify edges based on AS relationship Customer-provider (UP) link Provider-customer (DOWN) link Peering (FLAT) link Compute shortest policy path conforming the “valley- free” rule using modified Dijkstra’s algorithm Infer the first AS hop if multiple paths returned
87 AS path inference accuracy TotalMatchMatch length Exact match ShorterLonger AS7018 (tier-1) %83%35%17%0% AS2152 (tier-2) % 10%35%0% AS8121 (tier-3) %27%3%69%4% All BGP gateways %73%30%22%4% US BGP gateways %62%27%34%4% If the first hop is known, 15% of mismatches can be eliminated.
88 First hop inference Gather candidate first hop ASes from S by launch traceroute to S from multiple vantage points Identify the transition point T that is likely to be on the path from S to D by testing hop_count(S,T) + hop_count(T,D) = hop_count(S,D) Source Destination AS S AS T2 AS T1AS D AS C Transition point T1 T2 Only have access to D
89 Hop count inference Hop_count(S,T) = hop_count(T,S) To infer hop_count(H,D): H = T or S Send ping packet to H Guess the initial TTL value TTL0 set by H Get TTL value TTL1 in ICMP response packet received from H Hop_count(H,D) = TTL0 - TTL1 + 1 Common value for TTL0: 32 (Win95/98/Me) 64 (Linux, Compaq Tru64) 128 (Win NT/2000/XP) 255 (most UNIX systems)
90 Improvement with known first AS hop TotalMatch lengthImprovement AS7018 (tier-1) %3% AS2152 (tier-2) %12% AS8121 (tier-3) %21% All BGP gateways %8% US BGP gateways %15%
91 Possible causes of inaccuracy Complicated AS relationships: 15% paths Two consecutive FLAT links DOWN link followed by a FLAT link FLAT link followed by UP link Routing policies Shortest path vs. customer routes Inconsistent advertisement to different peering locations BGP tie-breaking rules AS prepending:>28% ASes
Part III: BGP Measurement
93 BGP routing updates Route updates at prefix level No activity in “steady state” Routing messages indicate changes, no refreshes
94 Internet routing instability Large # of BGP updates Failures Policy changes Redundant messages Routing instability Route keeps changing, e.g., routes keep going up and down
95 Implications Router overhead Transient delay and loss Unreachable hosts High loss rate High jitter Long delays Significant packet reordering Poor predictability of traffic flow How do we know if the instability is due to routing or network congestion?
96 Measure BGP stability First work by Labovitz et al. Methodology Collect routing messages from five public exchange points BGP information considered AS path Next hop: next hop to reach a network Two routes are the same if they have the same AS path and next hop Other attributes (e.g., MED, communities) ignored Focus on forwarding path stability
97 Measurement methodology
98 BGP information exchange Announcements: a router has either Learned of a new route, or Made a policy decision that it prefers a new route Withdrawals: a router concludes that a network is no longer reachable Explicit: associated to the withdrawal message Implicit: (in effect an announcement) when a route is replaced as a result of an announcement message In steady state BGP updates should be only the result of infrequent policy changes BGP is stateful, requires no refreshes Update rate: indication of network stability
99 Example of delayed convergence Example topology: d Assuming node 1 has a route to a destination, and it withdraws the route: Stage (msg processed)Msg queued 0: 1->{2,3,4}W 1: 1->{2,3,4}W2->{3,4}A[241], 3->{2,4}A[341], 4->{2,3}A[431] 0 2: [1] 3: [1] 4: [1] 1 [41] [31] 2: 2->{3,4}A[241] 3->{2,4}A[341], 4->{2,3}A[431] 3: 3->{2,4}A[341]4->{2,3}A[431], 4->{2,3}W 4: 4->{2,3}A[431] 4 [431] [241] -- MinRouteAdver timer expires:4->{2,3}W, 3->{2,4}A[3241], 2->{3,4}A[2431] … (omitted) Note: In response to a withdrawal from 1, node 3 sends out 3 messages: 3->{2,4}A[341], 3->{2,4}A[3241], 3->{2,4}W 9: 3->{2,4}W stage node 9 --
100 Types of inter-domain routing updates Forwarding instability may reflect topology changes Policy fluctuations (routing instability) may reflect changes in routing policy information Pathological updates redundant updates that are neither routing nor forwarding instability Instability forwarding instability and policy fluctuation change forwarding path
101 Routing successive events (instability) WADiff W: a route is explicitly withdrawn as it becomes unreachable A: is later replaced with an alternative route Forwarding instability AADiff A: a route is implicitly withdrawn A: then replaced by an alternative route as the original route becomes unavailable or a new preferred route becomes available Forwarding instability
102 Routing successive events (pathological instability) WADup W: a route is explicitly withdrawn A: then reannounced later forwarding instability or pathological behavior AADup A: a route is implicitly withdrawn A: then replaced with a duplicate of the original route pathological behavior or policy fluctuation WWDup The repeated transmission of BGP withdrawals for a prefix that is currently unreachable (pathological behavior)
103 Measurement findings: overview Year 2000 BGP updates more than one order of magnitude larger than expected Routing information dominated by pathological updates Implementation problems BGP self-synchronization Unconstrained routing policies
104 Routing problem findings Implementation problems Redundant updates Routers do not maintain the history of the announcements sent to neighbors Self-synchronization BGP routers exchange information simultaneously may lead to periodic link/router failures Unconstrained routing policies may lead to persistent route oscillations
105 Instability measurement Instability and redundant updates exhibits strong correlation with load (30 seconds, 24 hours and seven days periods) Instability usually exhibits high frequency Pathological updates exhibits both high and low frequencies
106 Non-localized instability No single AS dominates instability statistics No correlation between size of AS and its impact on instability statistics There is no small set of paths that dominate instability statistics
107 Measurement conclusions Routing in the Internet exhibits many undesirable behaviors Instability over a wide range of time scales Asymmetric routes Network outages Problem seems to worsen Many problems are due to software bugs or inefficient router architectures
108 Lessons Even after decades of experience routing in the Internet is not a solved problem This attests the difficulty and complexity of building distributed algorithm in the Internet, i.e., in a heterogeneous environment with products from various vendors Simple protocols may increase the chance to be Understood Implemented right
109 Better understanding of BGP dynamics Difficulties Multiple administrative domains Unknown information (policies, topologies) Unknown operational practices Ambiguous protocol specs Proposal: a controlled active measurement infrastructure for continuous BGP monitoring – BGP Beacons.
110 What is a BGP Beacon? An unused, globally visible prefix with known Announce/Withdrawal schedule For long-term, public use
111 Who will benefit from BGP Beacon? Researchers: study BGP dynamics To calibrate and interpret BGP updates To study convergence behavior To analyze routing and data plane interaction Network operators Serve to debug reachability problems Test effects of configuration changes: E.g., flap damping setting
112 Related work Differences from Labovitz’s “BGP fault- injector” Long-term, publicly documented Varying advertisement schedule Beacon sequence number (AGG field) Enabler for many research in routing dynamics RIPE Ris Beacons Set up at 9 exchange points
113 Internet Active measurement infrastructure BGP Beacon # /24 1:Oregon RouteViews Stub AS Upstream provider Upstream provider ISP Many Observation points: 2. RIPE ISP 6.Berkeley 4. Verio 3.AT&T 5. MIT Send route update
114 Deployed PSG Beacons PrefixSrc AS Start date Upstream provider AS Beacon host Beacon location / /10/022914, 1239Randy BushWA, US / /4/023701, 2914Dave MeyerOR, US / /25/021221Geoff HustonAustralia / /24/022914, 8001Andrew PartanMD, US / /12/032914, 1239Randy BushWA, US
115 Deployed PSG Beacons B1, 2, 3, 5: Announced and withdrawn with a fixed period (2 hours) between updates 1st daily ANN: 3:00AM GMT 1st daily WD: 1:00AM GMT B4: varying period B5: fail-over experiments Software available at:
116 Beacon 5 schedule Live host behind the beacon for data analysis Study fail-over Behavior for multi-homed customers
117 Beacon terminology Input signal: Beacon-injected change 3:00:00 GMT: Announce (A0) 5:00:00 GMT: Withdrawal (W) Beacon prefix: /24 Beacon AS RouteView AT&T Output signal: 5:00:10 A1 5:00:40 W 5:01:10 A2 Signal length: number of updates in output signal (3 updates) Signal duration: time between first and last update in the signal (5:00: :01:10, 60 seconds) Inter-arrival time: time between consecutive updates Internet
118 Process Beacon data Identify output signals, ignore external events Data cleaning Anchor prefix as reference Same origin AS as beacon prefix Statically nailed down Minimize interference between consecutive input signals Beacon period is set to 2 hours Time stamp and sequence number Attach additional information in the BGP updates Make use of a transitive attribute: Aggregator fields
119 Beacon data cleaning process Goal Clearly identify updates associated with injected routing change Discard beacon events influenced by external routing changes
120 Cumulative Beacon statistics: significant noise Current observation points: 111 peers: RIPE, Route-View, Berkeley, MIT, MIT-RON nodes, ATT-Research, AT&T, AMS-IXP, Verio Avg expansion: 2*0.2+1*0.8=1.2
121 Cumulative Beacon statistics: significant noise Example response to ANN-beacon at peer p R1: ASpath= R2: ASpath= 100 events: 20: R1 R2, 80: R2 BeaconMax no. transient routes Max ANN- out-signal length Max WD- out-signal length Max ANN-avg expansion Max WD-avg expansion Out-signal length=1No. transient routes=2
122 Cisco vs. Juniper update rate-limiting Known last-hop Cisco and Juniper routers from the same AS and location Average signal length: average number of updates observed for a single beacon-injected change
123 “Cisco-like” last-hop routers (sec) Linear increase in signal duration wrt signal length Slope=30 second Due to Cisco’s default rate-limiting setting
124 (sec) “Juniper-like” last-hop routers Signal duration relatively stable wrt increase in signal length Shorter signal duration compared to “Cisco-like” last-hops
125 Route flap damping A mechanism to punish unstable routes by suppressing them Reduce router processing load due to instability Prevent sustained routing oscillations Do not sacrifice convergence times for well-behaved routes There is conjecture a single announcement can cause route suppression.
126 RFC2439: Route flap damping Exponentially decayed Scope Inbound external routes Per neighbor, per destination Penalty Flap: route change Increases for each flap Decays exponentially Reuse threshold 750 Time (min) Penalty Cisco default setting Suppress threshold 2000
127 Strong evidence for withdrawal- and announcement- triggered suppression. Route flap damping analysis
128 Distinguish between announcement and withdrawal Summary : WD-triggered sup more likely than ANN- triggered sup Cisco: overall more likely trigger sup than Juniper (AAAW-pattern) Juniper: more aggressive for AWAW pattern
129 Convergence analysis Summary: Withdrawals converge slower than announcements Most beacon events converge within 3 minutes
130 Output signal duration
131 Beacon 1’s upstream change Single-homed (AS2914) Multi-homed (AS1,2914) Multi-homed (AS1239, 2914)
132 Beacon for identifying router behavior Beacon 2 seen from RouteView data Rate-limiting timer 30 second Different rate-limiting behavior: Cisco vs. Juniper
133 Inter-arrival time analysis
134 Inter-arrival time modeling Geometric distribution (body): Update rate-limiting behavior: every 30 sec Prob(missing update train) independent of how many already missed Mass at 1: Discretization of timestamps for times<1 Shifted exponential distribution (tail): Most likely due to route flap damping
135 Motivation C BR C C C AS1 AS2AS3 destination A B C D Failure Disruption Congestion Mitigation AS4 source A backbone network is vulnerable to routing changes that occur in other domains.
136 Goal Identify important routing anomalies Lost reachability Persistent flapping Large traffic shifts Contributions: Build a tool to identify a small number of important routing disruptions from a large volume of raw BGP updates in real time. Use the tool to characterize routing disruptions in an operational network
137 Capturing Routing Changes C BR C CPE BGP Monitor C BR C C C C C C C C C iBGP eBGP Updates Best routes A large operational network (8/16/2004 – 10/ )
138 Challenges Large volume of BGP updates Millions daily, very bursty Too much for an operator to manage Different from root-cause analysis Identify changes and their effects Focus on actionable events rather than diagnosis Diagnose causes in/near the AS
139 System Architecture Event Classification Event Classification “Typed” Events EE BR EE EE BGP Updates (10 6 ) BGP Update Grouping BGP Update Grouping Events Persistent Flapping Prefixes (10 1 ) (10 5 ) Event Correlation Event Correlation Clusters Frequent Flapping Prefixes (10 3 ) (10 1 ) Traffic Impact Prediction Traffic Impact Prediction EE BR EE EE Large Disruptions Netflow Data (10 1 ) From millions of updates to a few dozen reports
140 Grouping BGP Update into Events Challenge: A single routing change leads to multiple update messages affects routing decisions at multiple routers Approach: Group together all updates for a prefix with inter-arrival < 70 seconds Flag prefixes with changes lasting > 10 minutes. BGP Update Grouping BGP Update Grouping EE BR EE EE BGP Updates Events Persistent Flapping Prefixes
141 Grouping Thresholds Based on our understanding of BGP and data analysis Event timeout: 70 seconds 2 * MRAI timer + 10 seconds 98% inter-arrival time < 70 seconds Convergence timeout: 10 minutes BGP usually converges within a few minutes 99.9% events < 10 minutes
142 Persistent Flapping Prefixes Types of persistent flapping Conservative damping parameters (78.6%) Protocol oscillations due to MED (18.3%) Unstable interfaces or BGP sessions (3.0%) A surprising finding: 15.2% of updates were caused by persistent-flapping prefixes even though flap damping is enabled.
143 Example: Unstable eBGP Session ISP Peer Customer E C E B E A E D p Flap damping parameters is session-based Damping not implemented for iBGP sessions
144 Event Classification Challenge: Major concerns in network management Changes in reachability Heavy load of routing messages on the routers Change of flow of the traffic through the network Event Classification Event Classification Events “Typed” Events, e.g., Loss/Gain of Reachability Solution: classify events by severity of their impact
145 Event Category – “No Disruption” ISP E A p E B E C E E AS 2 E D AS 1 No Traffic Shift “No Disruption”: no border routers have any traffic shift. (50.3%)
146 Event Category – “Internal Disruption” ISP E A p E B E C E E AS 2 E D AS 1 Internal Traffic Shift “Internal Disruption”: all traffic shifts are internal. (15.6%)
147 Event Category – “Single External Disruption” ISP E A p E B E C E E AS 2 E D AS 1 external Traffic Shift “Single External Disruption”: only one of the traffic shifts is external (20.7%)
148 Statistics on Event Classification EventsUpdates No Disruption50.3%48.6% Internal Disruption15.6%3.4% Single External Disruption20.7%7.9% Multiple External Disruption7.4%18.2% Loss/Gain of Reachability6.0%21.9% First 3 categories have significant day-to-day variations Updates per event depends on the type of events and the number of affected routers
149 Event Correlation Challenge: A single routing change affects multiple destination prefixes Event Correlation Event Correlation “Typed” Events Clusters Solution: group the same-type, close-occurring events
150 EBGP Session Reset Caused most of “single external disruption” events Check if the number of prefixes using that session as the best route changes dramatically Validation with Syslog router report (95%) time Number of prefixes session failure session recovery
151 Hot-Potato Changes Hot-Potato Changes Caused “internal disruption” events Validation with OSPF measurement (95%) [Teixeira et al – SIGMETRICS’ 04] ISP P E A E B E C “Hot-potato routing” = route to closest egress point
152 Traffic Impact Prediction Challenge: Routing changes have different impacts on the network which depends on the popularity of the destinations Traffic Impact Prediction Traffic Impact Prediction EE BR Clusters Large Disruptions Netflow Data EE BR EE Solution: weigh each cluster by traffic volume
153 Traffic Impact Prediction Traffic weight Per-prefix measurement from netflow 10% prefixes accounts for 90% of traffic Traffic weight of a cluster the sum of “traffic weight” of the prefixes A small number of large clusters have large traffic weight Mostly session resets and hot-potato changes
154 Performance Evaluation Memory Static memory: “current routes”, 600 MB Dynamic memory: “clusters”, 300 MB Speed 99% of intervals of 1 second of updates can be process within 1 second Occasional execution lag Every interval of 70 seconds of updates can be processed within 70 seconds Measurements were based on 900MHz CPU
155 Conclusion of BGP Troubleshooting Tool BGP troubleshooting system Fast, online fashion Operators’ concerns (reachability, flapping, traffic) Significant information reduction millions of update a few dozens of large disruptions Uncovered important network behavior Hot-Potato changes Session resets Persistent-flapping prefixes
Part IV BGP Modeling
157 BGP Is Not Guaranteed to Converge! BGP is not guaranteed to converge to a stable routing. Policy inconsistencies can lead to “livelock” protocol oscillations. Goal: Design a simple, tractable and complete model of BGP modeling Example application: sufficient condition to guarantee convergence.
158 BGP is Solving What Problem? X can aid in the design of policy analysis algorithms and heuristics, aid in the analysis and design of BGP and extensions, help explain some BGP routing anomalies, provide a fun way of thinking about the protocol Underlying problem Shortest Paths Distributed means of computing a solution. X? RIP, OSPF, IS-IS BGP
159 Separate Dynamic and Static Semantics Static semantics: BGP policies Stable Paths Problem Dynamic semantics: BGP SPVP SPVP: Simple Path Vector Protocol A distributed algorithm for solving Stable Paths Problem
160 What is Stable Paths Problem? Example: A graph of nodes and edges, Node 0, called the origin, For each non-zero node, a set or permitted paths to the origin. This set always contains the “null path”. A ranking of permitted paths at each node. Null path is always least preferred most preferred … least preferred (not null)
161 A Solution to SPP A solution is an assignment of permitted paths to each node such that node u’s assigned path is either the null path or is a path uwP, where wP is assigned to node w and {u,w} is an edge in the graph, each node is assigned the highest ranked path among those consistent with the paths assigned to its neighbors
162 A Solution to SPP A solution need not represent a shortest path tree or a spanning tree
163 There can be Multiple Solutions to an SPP First solution Second solution DISAGREE
164 Multiple Solutions Can Occur Due to Recovery: Remove primary linkRestore primary link primary link backup link
165 Ranking BGP Paths Highest local Preference Shortest AS path Length Origin: IGP<EGP<INCOMPLETE Lowest MED value IBGP preferred over EBGP Lowest IGP cost Tie breaking
166 Bad Gadget: No Solution Stage 1: 1: [10] 2: [210] 3: [30] Stage 2: 1:[130] 2:[20] 3:[320] Back to stage 1
167 Bad Gadget: No Solution Stage 1: 1: [10] 2: [20] 3: [320] Stage 2: 1:[130] 2:[210] 3:[30] Back to stage 1
168 Has A Solution, But Can Get Trapped: As with DISAGREE, this part has two distinct solutions This part has a solution only when node 1 is assigned the direct path (1 0).
169 Has A Solution, But Can Get Trapped: As with DISAGREE, this part has two distinct solutions This part has a solution only when node 1 is assigned the direct path (1 0).
170 How To Solve An SPP? Exponential complexity Just enumerate all path assignments, And check stability of each…. NP-complete 3-SAT can be reduced to SPP
171 Distributed Algorithms to Solve SPP OSPF-like Distributed topology, path ranks Solve SPP locally Exponential worst case How to avoid loops if multiple solutions exist? RIP-like: Pick the best path form neighbors’ paths Tell neighbors about changes Can diverge Not guaranteed to find a solution even if it exists No bound on convergence time
172 SPVP Protocol Pick the best path available at any time process spvp[u] { receive P from w { rib-in(u w) := u P if rib(u) != best(u) { rib(u) := best(u) foreach v in peers(u) { send rib(u) to v }
173 SPVP and SPP SPVP wanders around assignment space SPP SolvableSPVP Can Diverge must converge must diverge
174 A sufficient condition for sanity If an instance of SPP has an acyclic dispute digraph, then Static (SPP) solvable Dynamic (SPVP) unique solution safe (can’t diverge) predictable restoration all sub-problems uniquely solvable robust with respect to link/node failures
175 Dispute Digraph Example BAD GADGET II CYCLE
176 Dispute Wheels u_0 u_1 u_2 u_i u_(i+1) u_k Q_0 Q_1 Q_2 Q_k Q_(I+1) Q_i R_0 R_1 R_i R_k At u_i, rank of Q_i is less than or equal to rank of R_iQ_(i+1) There exists a dispute wheel iff there exists cycle in the dispute digraph
177 Dispute Wheel Example
178 A Dynamic Solution Extend SPVP with a history attribute, A route’s history contains a path in the dispute digraph that “explains” how the route was obtained, A route history will contain a dispute cycle if and only if a policy dispute is dynamically realized. If a route’s history contains a cycle, then suppress it ….