Beyond BGP Dan Massey Colorado State University
24 October Internet Routing l Challenges Facing Internet Routing n Internet Has Grown Dramatically –Large number of routing entries –High volumes of updates –Frequent topological changes n Fault-Model Has Changed Dramatically –More malfunctioning components –Intentional attacks l Do we need a fundamentally new routing architecture?
24 October Toward a New Architecture l One claim: BGP is nearing the end of its useful lifetime n The Internet will soon collapse unless we act!! l Other claim: BGP is the best engineering solution we are likely to produce n We need incremental patches to new problems l Who is right? n Beyond BGP uses –Measurements to assess where we are –Identification of (new?) routing requirements –Development of changes (incremental or new system) to address the above
24 October How Did We Get To BGP l Simple Distance Vector Routing Algorithms n Used in early Internet routing designs n Convey only limited information n Prone to long lasting loops l Expensive Link State Routing Algorithms n Learn the Full Network Topology n Signal every change in every link l Path Vector Routing (BGP) n Middle ground that signals some path data n But does not signal the full topology
24 October RIP and DBF RIP Keep shortest path only Distributed Bellman-Ford(DBF) Keep distance info from all neighbors A B C EF D D:1 D:3 D:2 D:3 B’s route to D: Nexthop=A, Dist=4 B’s route to D: Nexthop=A, dist=4 Alternate Nexthop=C, Dist=4 D: infinity 30sec refreshing interval Damping timer to space out two triggered updates: 1~5 seconds Poison reverse: B sends infinity distance to A RIP and DBF: Exchange distance info.
24 October Internet: composed of thousands of Autonomous Systems(ASes). BGP Background BGP (Border Gateway Protocol): the de facto inter-AS routing protocol AS A R1 R2 R3 AS B AS C R4 R5 AS E R6 BGP Routers
24 October How BGP works l Uses path vector protocol –similar to distance vector protocol. what if no path available? Consider an AS as a node Route via A = Route via C = B’s route to D: route includes entire path(sequence of nodes) D A B C E D:
24 October Path Vector Routing Changes l Worms triggered edge instabilty n Routers crashed due to ARP cache overflow. n Links were congested by worm traffic. l BGP Path Exploration Exacerbates Dynamics B’s route to D Route via A= Route via C= D ABC E Obsolete backup path is used and convergence is delayed withdraw
24 October Policies and Policy Withdrawal But A could stop advertising to B due to a policy change, path is still valid! ABC E policy withdraw D Attach a Failure Withdrawal Community Attribute Only apply the approach to failure withdrawal B’s route to D Route via A= Route via C= Route via A= A B C E
24 October BGP Traffic Engineering BGP Traffic Engineering: R4 chooses path R5 chooses path We assumed an AS could be modeled as a node with a single best path to the destination But a single AS may advertise more than one path. Divide one AS into Logical ASes such that All routers within a logical AS have the same best path each logical AS can be modeled as a node.
24 October Number of Updates Number of ASes in Network Number of Updates Original BGP Enhanced BGP Substantial reduction is achieved. E.g to 1419 in the 60-AS topology MinRouteAdver timer: within 30 seconds, only one advertisement is allowed. It “packs” consecutive changes into one update.
24 October Convergence time Number of ASes in Network Convergence Time(seconds) Original BGP Enhanced BGP Enhanced BGP reduces the convergence time substantially. E.g seconds to 19.5 seconds in the 60- AS topology Elimination of one advertisement can cut convergence time by 30 seconds
24 October Improving Path Vector Convergence l Infocom 02 [4] uses consistency to detect invalid paths. n Reject path if r1 is adirect neighbor r1’s path is not n Adjusted to account for policy and implement in BGP l Infocom 03 [Afek, et al] quickly flushes invalid paths. n BGP requires updates be separated by a min interval n Send withdraw (to flush route) if blocked by the interval l Our recent work [5] attaches a new attribute: Root Cause Notification (RCN) n Identifies the failed link and includes a sequence number. n Allows any route relying on the failed link to be rejected.
24 October Analyzing Path Vector Convergence l Route fail-over has two stages. l First, nodes inside the blue triangle lose routes and explore backup paths. n All short invalid paths are explored l Second, an edge (a0) eventually selects the valid backup path via Sk. n Valid routes begin to propagate through the blue triangle.
24 October Generic Convergence Results Algorithm Fail-Over Convergence Bounds SPVP (BGP)(N-1) (M + ld) + 3 Pmax(|E|-degree(G,0)) SPVP-AS(N- degree(G,0) ) (M+ld) + 3Pmax(|E| - |E^| + Degree(G^)) SPVP-GF(N-1) ld + 3Pmax(|E| - degree(G,0)) SPVP-RCNDistance(G,0) (ld) + (Pmax) Distance(G,0) Pmax = Node Processing Delay, ld = Link Delay M = Minimum Advertisement Interval
24 October Simulation Results
24 October What About Security? l Convergence Discussion Neglects Security n What if routers send intentionally bad information? l What is the Simplest Possible Attack? n Announce someone elses routes l Example: Suppose Univ. of Colorado announces it is the origin for /16 n In other words, CU announces CSU IP Address Space l Can this Happen and/or What Would Prevent It?
24 October Multiple Origin AS (MOAS) Cases l Prefixes originate from Multiple Origin AS (MOAS) n Lower curve likely due to valid operational needs l Spikes are errors that disrupt routing to prefix n Includes loss of routes to top level DNS servers
24 October Infrastructure Faults and Attacks Internet c.gtld-servers.net BGP monitor originates route to /24 l BGP and DNS Provide No Authentication n Faults and attacks can mis-direct traffic. n One (of many) examples observed from BGP logs. n Server could have replied with false DNS data. ISPs announced new path for 20 minutes to 3 hours
24 October BGP-based Solution Example router bgp 59 neighbor remote-as 52 neighbor send-community neighbor route-map setcommunity out route-map setcommunity match ip address /8 set community 59:MOAS 58:MOAS additive Example configuration: AS58 18/8, PATH, MOAS{4,58,59} AS /8 18/8, PATH, MOAS{58,59} 18/8, PATH, MOAS{52, 58} AS52
24 October (b) Two Origin AS’s(a) One Origin AS BGP false origin detection Simulation Results
24 October A Simple Filter l Current BGP provides dynamic routes n Explore the opposite extreme... l Select a single static route to each server. n Apply AS path filters to block all other announcements. –Also filter against more specifics. l Route changes on a frequency of months, if at all. n Change in IP address, origin AS, or transit policy. n Adjust route only after off-line verification
24 October Why This Works: Theory l Scale is limited to a small number of routes. n No exponential growth in top level DNS servers. l Loss of a server is tolerable, invalid server is not. n Resolvers detect and time-out unreachable servers. –Provided surviving servers handle load, cost is some delay. l Expect predictable properties and stable routes. n Servers don’t change without non-trivial effort. n Servers located in highly available locations.
24 October Why This Works: Data l Analysis based on BGP updates from RIPE. n Archive of BGP updates sent by each peer. n 9 ISPs from US, Europe, and Japan. n February April 2002 l Some data collection notes n Used only peers that exchange full routing tables –Otherwise some route changes are hidden by policies n Adjusted data to discount multi-hop effect. –Multi-hop peering session resets don’t reflect ISP ops.
24 October Impact on Reachability ISP1 (US/Tier 1)
24 October How Static Are The Routes? l 3 changes in route to “A” over 14 months. l 2 (valid) changes in the origin AS n 5/19/01 origin AS changed from 6245 to n 6/4/01 origin AS changed from to l 1 change in transit AS routing policy n 11/8/01 (*,10913, 10913, 10913,*) -> (*,10913, *) n Could have built filter to allow this...
24 October What Routes Are Lost? l Results from 3/1/01 until 5/19/01 AS change. n Reduced reachability to “A” from % to % l 18 events when trusted route was withdrawn n 2 resulted in no route available (28 secs, 103 secs) n 8 instances of a back-up route lasting over 3 minutes n Longest lasting back-up advertised for 15 minutes l Similar results for other time periods and servers.
24 October Example of Filtered Routes l With filter no route at 16:06: * server No route at 16:08:30
24 October Worst Case In Study ISP 3 (Europe) ISP 3 used one main route and a small number of consistent back-up routes.
24 October Toward a More Balanced Approach l Required infrequent updates to the filter. n Especially useful to automate infrequent tasks. –Natural tendency to forget task or forget how to do task l More paths improves robustness n Simple filtered allowed only 1 path. n ISP3’s reachability can be improved if filter allows two routes… l Strike a balance between allowing dynamic changes and restricting to trusted paths.
24 October BGP Adaptive Filters l Slow down the route dynamics and add validation. n Apply hysteresis before accepting new paths n Add options for validating new paths: –Believe route based purely on hysteresis –Probabilistic query/response testing against known data. –Trigger off-line checking (did origin AS really change?)
24 October Impacts on Reachability ISP1 Root servers gTLD servers
24 October Impacts on Reachability ISP3 Root servers gTLD servers
24 October Convergence And Authentication l BGP Suffers From Both Convergence Problems and Authentication Problems n Convergence fixes are good, if no attacks. n Authentication fixes work for redundant sites l Can you improve both convergence and authentication in a realistic environment? n Do you need to replace BGP? –If yes, with what? n Would you pick BGP for your new network? –If no, what would you do instead? l Wide Variety of Other Routing Challenges n Check out CS 580 and BBGP Project if interested
24 October BGP Measurement and Artifacts l BGP peers establish TCP session and send full route table (120K+ routes) n Updates sent only if routes change. l Our results show frequent session resets between ISP routers and the monitoring point. n Monitoring point sessions cross multiple systems in the Internet. n Each reset adds 120K updates. n But very few ISP-ISP session resets. l Our work in [1] presents rules to remove session reset artifacts. Initial Table (120K+ routes) Route Changes Initial Table (120K+ routes)
24 October BGP Updates During Slammer Worm
24 October BGP Updates During Nimda Worm Measurement Artifacts Routing Changes Total Attack
24 October What Our Analysis Shows 40.2% A substantial percentage of the BGP messages during the worm attack were not about route changes 37.6% 8.8% 8.3%
24 October FRTR: Improving Peer Communication l BGP Updates Are Not (Topology) Event Driven n Session resets trigger high volume surges –Govindan shows cascade failures can result. l Lifetime of Invalid Routes is Unbounded n Never recover (until reset) if update is somehow lost. –Despite TCP, we found cases of “lost” withdrawals. n Attacker can poison a route with one update. l Soft-state (periodic re-announce) is too costly… l FRTR Uses Periodic Bloom Filter Digests n Digests quickly confirm state after session reset. n Periodic digests bound lifetime of faults (w/ high prob). n Co-Author Keyur Patel (Cisco) is exploring Cisco development.
24 October FRTR Performance l For each route at receiver, check against the digest. n Bloom filter results in no false negatives. l Compare total digests for missing route detection. n False positive possible with known rate. n Add salts to reduce the chance of repeated false positives. l Overhead is a function of digest size and frequency. l Work with Cisco suggests a 1.3% overhead increase. l Complete Details to appear in [2] (DSN 2004)
24 October Packet Delivery during Routing Convergence l Failures do occur in the Internet –20% of intra-ISP links have a MTTF < 1 day[Diot:IMW02] –40% of Inter-ISP routes have a MTT-Change < 1 day [Labovitz:FTCS-29] l Routing convergence after failure takes time –IS-IS(Intra-ISP protocol): 5+ seconds [Diot:IMW02] –BGP(Inter-ISP protocol): 3+ minutes [Labovitz:Sigcomm00] l Packets can be delivered during convergence ABC E F D G
24 October What Is the Goal of Routing l How to maximize packet delivery during routing convergence? –Topological connectivity’s impact? –Studying: RIP, Distributed Bellman-Ford( DBF ), BGP – Previous work focused on: preventing loops, minimizing convergence time and routing overhead This problem becomes more important with Larger Internet topology [Huston01] --> higher freq. of component failures Richer connectivity[Huston01] --> potentially helps with more alternate paths Higher bandwidth --> more packets sent during convergence
24 October Simulation conducted 7 by 7 mesh topologies similar those in [Baran64] 20 pkts/second l Measure Packet loss, loops, path convergence time, throughput, and e2e delay. Simulated node degree range [3 ~ 16]
24 October Packet Losses (I) : Observation RI P DBF, BGP’ and BGP Packet losses of DBF, BGP’ and BGP decrease to zero at degree 6. Richer connectivity helps RIP little. Node Degree Packet Loss
24 October Packet Loss(II): Lessons Learned l Keeping alternate paths F D A B C E F D A B C E Connectivity Matters no immediate available alternative due to poor connectivity and poison reverse RIP: DBF, BGP: alternative is more likely with richer connectivity
24 October Is an alternate path valid? l Valid Alternate Paths: not using the failed link n Poison reverse and BGP’s path information are not enough! [Pei:Infocom2002] F D A B C E U X V W Richer connectivity --> reduces one single link’s impact better availability of valid(but may be suboptimal) path C2 D:
24 October Transient Loops(I): Observation DB F BGP’ BGP BGP has the most loops! RIP has no loops Richer connectivity reduces the chance of looping. Node Degree Losses due to loops
24 October F D A B C E Transient Loops(II): Msg Propagation Damping timer slows the msg propagation, causing looping U X V W Y D: D:<BAEF>D:<BAEF> Richer connectivity can reduce the chance of looping More details in: “A Study of Transient Loops in BGP” 30 seconds! D:
24 October Instantaneous Throughput RIP DBF BGP’ BGP RIP Time Throughput(pkts/second
24 October Packet Delay During Convergence
24 October Forwarding Path Convergence time BGP: no loss at degree 6 or higher Shall we still tune MRAI timer to minimize convergence time(with the risk of increasing overhead)? Node Degree BGP:70 BGP’:10 Time till there is no routing msg. BGP:13 BGP’:2 Time till the forwarding path from S to D stabilizes.
24 October Packet Delivery After a Failure