Understanding and Limiting BGP Instabilities Zhi-Li Zhang Jaideep Chandrashekar Kuai Xu
BGP: Internet Glue “Path-vector” routing protocol. Allows networks to tell other networks about destinations that they are “responsible” for and how to reach them Using “route advertisements”, also called “NLRI” or “network-layer reachability information”
BGP: Internet Glue (cont’d) Policy-based: allow ISPs to richly express their routing policy, both in selecting outbound paths and in announcing internal routes Relatively “simple” protocol, but configuration is complex and the entire world can see, and be impacted by, mis- configurations.
ASes & AS Numbers (ASNs) An autonomous system is an independent routing domain that has been assigned an Autonomous System Number (ASN). Currently over 15,000 in use through are “private” Examples AS57U of Minnesota GigaPoP AS217U of Minnesota AS701UUNET AS1239Sprint ASNs represent atoms of BGP routing policy.
AS 1 Genuity AS 57 UMN GigaPoP AS 7911 Wiltel AS Internet2 AS 217 UMN AS 1998 State of Minnesota /16 Internet Connectivity of University of Minnesota
Architecture of Internet Routing AS 1 AS 2 BGP EGP = Exterior Gateway Protocol IGP = Interior Gateway Protocol Metric based: OSPF, IS-IS, RIP Policy based: BGP ISIS OSPF
Simplified BGP Operations Establish session on TCP port 179 Exchange all active routes Exchange incremental updates AS1 AS2 While connection is ALIVE exchange route UPDATE messages BGP session
Types of BGP Messages Open : Establish a peering session. Keep Alive : Handshake at regular intervals. Notification : Shuts down a peering session. Update : announce new routes or withdraw previously announced routes. Announcement : prefix + attribute values Withdrawals : prefix only
BGP Attributes Value Code Reference ORIGIN [RFC1771] 2 AS_PATH [RFC1771] 3 NEXT_HOP [RFC1771] 4 MULTI_EXIT_DISC [RFC1771] 5 LOCAL_PREF [RFC1771] 6 ATOMIC_AGGREGATE [RFC1771] 7 AGGREGATOR [RFC1771] 8 COMMUNITY [RFC1997] 9 ORIGINATOR_ID [RFC2796] 10 CLUSTER_LIST [RFC2796] 11 DPA [Chen] 12 ADVERTISER [RFC1863] 13 RCID_PATH / CLUSTER_ID [RFC1863] 14 MP_REACH_NLRI [RFC2283] 15 MP_UNREACH_NLRI [RFC2283] 16 EXTENDED COMMUNITIES [Rosen] reserved for development Not all attributes need to be present in every announcement
Two Types of BGP Neighbor Relationships External Neighbor (eBGP) in a different Autonomous Systems Internal Neighbor (iBGP) in the same Autonomous System AS1 AS2 eBGP iBGP iBGP is routed (using IGP!) eBGP
iBGP Peers Must be Fully Meshed iBGP neighbors do not announce routes received via iBGP to other iBGP neighbors. eBGP update iBGP updates iBGP is needed to avoid routing loops within an AS Injecting external routes into IGP does not scale and causes BGP policy information to be lost BGP does not provide “shortest path” routing
AS PATH Attribute AS /16 AS Path = 6341 AS 1239 Sprint AS 1755 Ebone AT&T AS 3549 Global Crossing /16 AS Path = /16 AS Path = AS /16 AT&T Research Prefix Originated AS RIPE NCC RIS project AS 1129 Global Access /16 AS Path = /16 AS Path = /16 AS Path = /16 AS Path =
Inter-domain Loop Prevention BGP at AS YYY will never accept a route with ASPATH containing YYY. AS /16 ASPATH = Don’t Accept! AS 1
BGP Best Path Selection Ignore if exit point unreachable Highest local preference Lowest AS path length Lowest origin type Lowest MED (with same next hop AS) Lowest IGP cost to next hop Lowest router ID of BGP speaker
In a nutshell BGP = Path Vector Protocol + Policies. The Path vector protocol is very simple Distribute Reachability. Prevent Loops. All the complexity is introduced by locally administered policies. Determine which paths are selected. And which neighbors they are exported to.
Path Exploration and Slow Convergence
What is Path Exploration? When a link fails (or is repaired), routers “go through” a sequence of paths before selecting a “converged” path. Results from dependencies in advertised “path vectors”. Router’s best path is an extension of a neighbors’ best path. Which extends a best path from one of its own neighbors. And so on……
What is Path Exploration (cont’d) When a link fails, a set of dependent paths becomes invalid (or obsolete). Removed one by one from the system. Router selects and propagates it. Receives withdrawal. Selects next best path (possibly invalid). Receive withdrawal, repeat till no more invalid paths.
Path Exploration example Network in a steady state
Path Exploration Example (cont’d) W 9 8
Path Exploration Example (cont’d) Paths and both contain the “problem edge” additional messages to force 7 to flush “bad paths”. Number of “spurious messages” increases with the “richness” of connectivity … …… 8 9
Impact of Path Exploration In general, convergence time is O(LΔ) ‘L’ is the longest simple path in the network. ‘Δ’ is the time between successive announcements. From measurements: up to 15 minutes to converge (after link failure).
Impact of Path Exploration (cont’d) Delays a router from picking valid, alternate paths. Have to first go through all the invalid paths. Large scale packet losses in a short duration. Core routers process millions of packets a second. In the absence of path exploration, convergence time is Ω(Dh). ‘D’ is “diameter” of the network (D << L) ‘h’ is message processing time at a node.
Causes for Path Exploration Invalid paths are selected, propagated, then withdrawn. Routers waste time processing “stale information” Delay convergence to valid, perhaps less preferred, alternate paths Key Issue: How to distinguish invalid paths from valid” paths Difficult in BGP: AS Paths --high level, abstract
AS PATHS: High Level Connectivity AS 81 AS 217 AS 1239 AS 3 AS AS 217 and AS 3 receive the same AS PATH [ ] Underlying physical paths are disjoint.
Naive Solutions Fail. TAG withdrawals: When router generates withdrawal, tag it with cause/location WDRAW: (2,1) failed
Naïve Solutions Fail (cont’d) AS Paths do not describe (or reflect) internal AS topology. When an internal edge fails, which AS Path affected? [10] or [210]? 2
Naïve Solutions Fail (cont’d) Link between 3.2 and 6.1 fails. 6.1 generates a withdrawal and tags with Should 6.3 remove all paths containing ? AS 3 AS
EPIC --- A Simple Solution Exploit Path dependencies to Invalidate Paths. To avoid Path Exploration: When link fails, a set of dependent paths becomes invalid. All the dependent paths must be removed from the system. Dependent paths cannot be described using only AS Paths. AS Paths are annotated with additional information (forward edge sequence numbers). Can capture path dependencies. Can distinguish valid and invalid paths.
Forward Edge Sequence Numbers When AS Path being advertised to an external AS neighbor, include fesn of “forward” external edge. fesn = edge identifier + sequence number AS XAS Y Edge
Forward Edge Sequence Numbers (cont’d) Defined per destination, for every AS-AS edge. When AS X sends a route to AS Y, the fesn (X:Y, n) is attached; If route already has a previously attached fesn, new fesn is prepended to it ---- fesnList. AS X AS Z AS Y AS W (X:Y, n) (X:Z, m) (X:Y, n)
fesn Management When a link fails, its fesn does not change. Same value carried in withdrawals. When is repaired: AS X increments the sequence number. Subsequent route announcements carry “updated” fesn. So a larger fesn always corresponds to “newer” information
fesnList Propagation [4210] {(0:1, 7)(1:2, 7)(2:4, 14)(4:7, 11)} [0] {(0:1, 7)} [10] {(0:1, 7)(1:3, 3)} [10] {(0:1, 7)(1:2, 7)} Same AS Path, distinct fesnLists
fesnList Propagation (4:7, 11) (2:4, 14) (1:2, 7) (0:1, 7) 5210(5:7, 10) (2:5, 14) (1:2, 7) (0:1, 7) 6310(6:7, 3) (3:6, 7) (1:3, 7) (0:1, 7) After the routes are processed at all nodes Routing Table at AS 7
Invalidating Paths upon Failure When router generates a withdrawal: The fesnList of withdrawn route (“path stem”) is attached to the withdrawal. When router receives a withdrawal: 1. Invalidates all routes containing the fesnList 2. Selects a new best path 3. If best path has changed, it sends new best route to its neighbors, and the withdrawal is piggybacked. 4. If no valid path, only withdrawal is forwarded.
Invalidating Paths: Example W: {(1:2, 7), (0:1, 7)} 4210(4:7, 11) (2:4, 14) (1:2, 7) (0:1, 7) 5210(5:7, 10) (2:5, 14) (1:2, 7) (0:1, 7) 6310(6:7, 3) (3:6, 7) (1:3, 7) (0:1, 7) 76310
Handling Link Repairs When is repaired: 1. AS X increments the fesn for the edge 2. Generates a new route announcement to send to AS Y (reflects updated fesn) 3. At AS Y, the route is installed into routing table and a subsequent route update may be generated. 4. After all updates have been processed, every fesnList containing (X:Y, n) will reflect the updated value.
What about Multiple Edges? Each edge is associated with a minor fesn Contrast with major fesn for “logical” AS-AS edge. All edges between ASes share the same major fesn, but have distinct minor fesn’s. Minor fesn is incremented with corresponding edge. major fesn incremented only if all edges are affected.
Minor fesn’s Minor fesn’s are only used between adjacent ASes. All routers in AS 6 include minor fesn in route updates. When the updates exported externally (to AS 7) minor fesn is removed. AS 3 AS (11) 7 (13) common major fesn distinct minor fesn’s
fesn – Key Properties Sequence number is monotonic --- new events will have higher values. Imposes a partial ordering on the fesnLists. Old information can be easily detected, and discarded. Allows compact, correct description of invalid paths i.e. the fesnList in a withdrawal captures all obsolete paths.
EPIC Properties No router will select an invalid path after receiving any update triggered by a single failure event. No router will select an invalid path after receiving at least one update triggered by each of a set of multiple failure events. Achieves optimal bounds for a path vector protocol. Routers may still explore paths. But these paths are all valid.
EPIC Performance (vs BGP) Time(L-2)Δ(D-1)h Messages(L-2)(|E| -1)|E| - 1 Time(L’+D’–1) ΔD’(h+Δ) Messages(L’+D’-1)(|E’|-1)(|E’|-1)D’(h+Δ)/Δ Fail Down Fail Over BGPEPIC
Root Cause Analysis of BGP Events
BGP Routing Dynamics BGP routing instabilities BGP routing suffers from many problems, e.g., mis- configurations, link failures, policy changes, slow convergence, etc. BGP update streams are visible from all BGP- monitoring vantage points. Open research problems What are the common characteristics of BGP dynamics? What are primary causes of BGP routing dynamics? How to visualize BGP dynamics?
BGP Routing Update (per second) View: UMN Time: 2003/12/07 – 2003/12/14 Time vs. Number of BGP updates at prefix level BGP Update Burst BGP Update Noise
BGP Routing Update (per second) (cont.) View: UMN Time: 2003/12/07 – 2003/12/14 Time vs. Number of BGP updates at AS level BGP Update Burst BGP Update Noise
Modeling BGP Routing Dynamics Modeling BGP dynamics on all prefixes/ASes is challenging. ~120, 000 prefixes, ~16,000 ASes High-dimensional time-series BGP updates are temporally and spatially correlated