Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford Tuesdays/Thursdays.

Slides:



Advertisements
Similar presentations
Routing Convergence and the Impact of Scale Dan Massey Colorado State University.
Advertisements

Part IV: BGP Routing Instability. March 8, BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages.
Advanced Networks 1. Delayed Internet Routing Convergence 2. The Impact of Internet Policy and Topology on Delayed Routing Convergence.
BGP Convergence Jennifer Rexford. Outline Border Gateway Protocol (BGP) –Prefix-based routing at the AS level –Policy-based path-vector protocol –Incremental.
CS Summer 2003 CS672: MPLS Architecture, Applications and Fault-Tolerance.
Network Layer: Internet-Wide Routing & BGP Dina Katabi & Sam Madden.
CS540/TE630 Computer Network Architecture Spring 2009 Tu/Th 10:30am-Noon Sue Moon.
1 Experimental Study of Internet Stability and Wide-Area Backbone Failure Craig Labovitz, Abha Ahuja Merit Network, Inc Presented by Changchun Zou.
© J. Liebeherr, All rights reserved 1 Border Gateway Protocol This lecture is largely based on a BGP tutorial by T. Griffin from AT&T Research.
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs)
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
BGP Safety with Spurious Updates Martin Suchara in collaboration with: Alex Fabrikant and Jennifer Rexford IEEE INFOCOM April 14, 2011.
PATH VECTOR ROUTING AND THE BORDER GATEWAY PROTOCOL READING: SECTIONS PLUS OPTIONAL READING COS 461: Computer Networks Spring 2010 (MW 3:00-4:20.
1 Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network Jian Wu (University of Michigan) Z. Morley Mao (University.
Traffic Engineering With Traditional IP Routing Protocols
Traffic Engineering Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Internet Routing (COS 598A) Today: BGP Routing Table Size Jennifer Rexford Tuesdays/Thursdays 11:00am-12:20pm.
The Border Gateway Protocol (BGP) Sharad Jaiswal.
1 Policy-Based Path-Vector Routing Reading: Sections COS 461: Computer Networks Spring 2006 (MW 1:30-2:50 in Friend 109) Jennifer Rexford Teaching.
More on BGP Check out the links on politics: ICANN and net neutrality To read for next time Path selection big example Scaling of BGP.
Internet Routing (COS 598A) Today: Interdomain Traffic Engineering Jennifer Rexford Tuesdays/Thursdays.
Inherently Safe Backup Routing with BGP Lixin Gao (U. Mass Amherst) Timothy Griffin (AT&T Research) Jennifer Rexford (AT&T Research)
Protecting the BGP Routes to Top Level DNS Servers NANOG-25, June 11, 2002 UCLA Lan Wang Dan Pei Lixia Zhang USC/ISI Xiaoliang Zhao Dan Massey Allison.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Internet Routing (COS 598A) Today: Multi-Homing Jennifer Rexford Tuesdays/Thursdays 11:00am-12:20pm.
1 Interdomain Routing Policy Reading: Sections plus optional reading COS 461: Computer Networks Spring 2008 (MW 1:30-2:50 in COS 105) Jennifer Rexford.
Interdomain Routing and the Border Gateway Protocol (BGP) Reading: Section COS 461: Computer Networks Spring 2011 Mike Freedman
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Stub.
Computer Networks Layering and Routing Dina Katabi
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Authors Renata Teixeira, Aman Shaikh and Jennifer Rexford(AT&T), Tim Griffin(Intel) Presenter : Farrukh Shahzad.
Information-Centric Networks04a-1 Week 4 / Paper 1 Open issues in Interdomain Routing: a survey –Marcelo Yannuzzi, Xavier Masip-Bruin, Olivier Bonaventure.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
1 Interdomain Routing (BGP) By Behzad Akbari Fall 2008 These slides are based on the slides of Ion Stoica (UCB) and Shivkumar (RPI)
Inter-domain Routing Simulation by SSFNet Wang Lijun Tsinghua University Jul 3, 2006.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
PATH VECTOR ROUTING AND THE BORDER GATEWAY PROTOCOL 1.
Interdomain Routing Security. How Secure are BGP Security Protocols? Some strange assumptions? – Focused on attracting traffic from as many Ases as possible.
BGP topics to be discussed in the next few weeks: –Excessive route update –Routing instability –BGP policy issues –BGP route slow convergence problem –Interaction.
Routing Convergence Dan Massey Colorado State University.
T. S. Eugene Ngeugeneng at cs.rice.edu Rice University1 COMP/ELEC 429/556 Introduction to Computer Networks Inter-domain routing Some slides used with.
On Understanding of Transient Interdomain Routing Failures Feng Wang, Lixin Gao, Jia Wang, and Jian Qiu Department of Electrical and Computer Engineering.
By, Matt Guidry Yashas Shankar.  Analyze BGP beacons which are announced and withdrawn, usually within two hour intervals.  The withdraws have an effect.
Eliminating Packet Loss Caused by BGP Convergence Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs.
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
An internet is a combination of networks connected by routers. When a datagram goes from a source to a destination, it will probably pass through many.
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
BGP Routing Stability of Popular Destinations Jennifer Rexford, Jia Wang, Zhen Xiao, and Yin Zhang AT&T Labs—Research Florham Park, NJ All flaps are not.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—7-1 Optimizing BGP Scalability Using BGP Route Dampening.
Inter-domain Routing Outline Border Gateway Protocol.
Traffic-aware Inter-Domain Routing for Improved Internet Routing Stability Zhenhai Duan Florida State University 1.
A survey of Internet routing reliability Presented by Kundan Singh IRT internal talk April 9, 2003.
1 Internet Routing: BGP Routing Convergence Jennifer Rexford Princeton University
1 Internet Routing 11/11/2009. Admin. r Assignment 3 2.
BGP Routing Stability of Popular Destinations
Jian Wu (University of Michigan)
Border Gateway Protocol
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
BGP Policies Jennifer Rexford
BGP Interactions Jennifer Rexford
COS 461: Computer Networks
COS 561: Advanced Computer Networks
BGP Instability Jennifer Rexford
Computer Networks Protocols
Presentation transcript:

Internet Routing (COS 598A) Today: Interdomain Routing Convergence Jennifer Rexford Tuesdays/Thursdays 11:00am-12:20pm

Outline BGP convergence –Causes of routing changes –Detecting session failures –BGP path exploration Route-flap damping –Damping persistent flapping –Interaction with path exploration Stability of popular destinations –Are things really all that bad? Reducing convergence delay –Avoiding complete path exploration –Why this is harder than it looks

Causes of BGP Routing Changes Topology changes –Equipment going up or down –Deployment of new routers or sessions BGP session failures –Due to equipment failures, maintenance, etc. –Or, due to congestion on the physical path Changes in routing policy –Reconfiguration of preferences –Reconfiguration of route filters Persistent protocol oscillation –More on this next week!

BGP Session Operation Establish session on TCP port 179 Exchange all active routes Exchange incremental updates AS1 AS2 While connection is ALIVE exchange route UPDATE messages BGP session

BGP Session Failure BGP runs over TCP –BGP only sends updates when changes occur –TCP doesn’t detect lost connectivity on its own Detecting a failure –Keep-alive: 60 seconds –Hold timer: 180 seconds Reacting to a failure –Discard all routes learned from the neighbor –Send new updates for any routes that change AS1 AS2

Routing Change: Before and After (1,0) (2,0) (3,1,0) (2,0) (1,2,0) (3,2,0)

Routing Change: Path Exploration AS 1 –Delete the route (1,0) –Switch to next route (1,2,0) –Send route (1,2,0) to AS 3 AS 3 –Sees (1,2,0) replace (1,0) –Compares to route (2,0) –Switches to using AS (2,0) (1,2,0) (3,2,0)

Routing Change: Path Exploration Initial situation –Destination 0 is alive –All ASes use direct path When destination dies –All ASes lose direct path –All switch to longer paths –Eventually withdrawn E.g., AS 2 –(2,0)  (2,1,0) –(2,1,0)  (2,3,0) –(2,3,0)  (2,1,3,0) –(2,1,3,0)  null (1,0) (1,2,0) (1,3,0) (2,0) (2,1,0) (2,3,0) (2,1,3,0) (3,0) (3,1,0) (3,2,0)

Convergence Overhead and Delay Path exploration is expensive –Large number of possible paths –Might have to explore (nearly) all of them Minimum Route Advertisement Interval –Minimum time between advertisement of routes for a given destination to a given neighbor –Rate limit on BGP update messages –… and allows combining multiple messages in one –Typical value of 30 seconds Convergence delay –(30 seconds) * (# of paths)

Four Kinds of BGP Routing Changes Destination becomes reachable –Switch from no path to a new path Better path becomes available –Switch from old path to new, better path Best path becomes unavailable –Switch from old path to new, worse path Destination becomes unreachable –Switch from old path to no path at all higher delay lower delay

Questions About Convergence Delay Reduce the MRAI timer? –High message overhead on the router? –Delays from overloading the CPU? –What is the right value? Dependence on topology? –Worst-case: n! Fully-connected graph (i.e., a clique) No filtering of advertisements Shortest-path routing Destination dies completely –Typical case?????

Route Flap Damping

Persistent Routing Changes Causes –Link with intermittent connectivity –Congestion causing repeated session resets –Persistent oscillation due to policy conflicts Effects –Lots of BGP update messages –Disruptions to data traffic –High overhead on routers Solution –Suppress paths that go up/down repeatedly –… to avoid updates and prefer stable paths

Route Flap Damping BGP-speaking router –One or more BGP neighbors –Keep an “RIB-in” per neighbor –Select single best route per destination prefix Route-flap damping –Penalty counter per (peer, prefix) pair –Increment penalty when peer changes route –Decrease penalty over time when route is stable Design and deployed in the mid 1990s –Widely viewed as helping improve stability

Example Why Damping is Good Consider AS 3 –Path #1: (3,1,0) –Path #2: (3,2,0) If link (1,0) fails –AS 3 switches routes If link (1,0) restores –AS 3 switches routes If this happens a lot –Better for AS 3 to stick with (3,2,0) (1,0) (2,0)

Damping Penalty Function time penalty reuse threshold suppression threshold

Configurable Damping Parameters Penalty for a routing change –May vary with the type of update message –Advertisement vs. withdraw? Attributes change? Decaying in absence of a change –Exponent in the exponential decay Suppression threshold –Trigger for damping the route –Determines how many updates are tolerated Reuse threshold –Trigger for considering the route again –Determines how long the route is not usable

Best Common Practices for Damping Different parameters for different prefixes –More aggressive with small address blocks –Disable damping on certain prefixes (e.g., corresponding to the DNS root servers) Avoid suppressing stable routes –Tolerate at least four routing changes Suppress unstable routes for quite a while –Values ranging from 10 minutes to 1 hour –Values for 30 minutes are not uncommon

Interaction with Path Exploration BGP routing convergence –Explore one or more alternate paths –Number of alternate paths may be quite high –Time between steps is small (e.g., 30 seconds) Triggering route-flap damping –Increasing penalty with each step –Only small amount of decay between steps Convergence may trigger route flap damping –Convergence may involve more than 4 changes –Routing change may trigger lost connectivity!!! –Confirmed by recent active measurement studies

Effects of Damping are Confusing AS 0 is a stable network Link (1,3) fails a lot –AS 3 switches routes back and forth a lot –Sends new BGP updates to its customers –Suppose AS 3 does not apply route-flap damping AS 3’s customers –Eventually dampen route –Causes lost reachability to destination in AS 0 How can AS 0 diagnose this problem, and fix it?

Open Questions Want to suppress unstable routes –Otherwise, lots of update messages –… and lots of transient disruptions Yet, want to tolerate path exploration –Otherwise, you suppress stable routes –… and black-hole otherwise reachable destinations How to reconcile? –Better flap-damping parameters? –More information in update messages? –Something more gentle than suppression?

BGP Stability of Popular Destinations

BGP Routing and Traffic Popularity A possible saving grace… –Most BGP updates due to few prefixes –… and, most traffic due to few prefixes –... but, hopefully not the same prefixes Popularity vs. BGP stability –Do popular prefixes have stable routes? Yes, for ~ 10 days at a stretch! –Does most traffic travel on stable routes? A resounding yes! –Direct correlation of popularity and stability? Well, no, not exactly…

BGP Updates BGP updates for March 2002 –AT&T route reflector –RouteViews and RIPE-NCC Data preprocessing –Filter duplicate BGP updates –Filter resets of monitor sessions –Removes 7-30% of updates Grouping updates into “events” –Updates for the same prefix –Close together in time (45 sec) –Reduces sensitivity to timing Confirmed: few prefixes responsible for most events

Two Views of Prefix Popularity AT&T traffic data –Netflow data on peering links –Aggregated to the prefix level –Outbound from AT&T customers –Inbound to AT&T customers NetRatings Web sites –NetRatings top-25 list –Convert to site names –DNS to get IP addresses –Clustered into 33 prefixes Amazon /20 Internet AT&T in out

Traffic Volume vs. BGP Events (CDF) 50% of events 1.4% of traffic (4.5% of prefixes) 50% of traffic 0.1% of events (0.3% of prefixes)

Update Events/Day (CCDF, log-log plot) 1% had > 5 events per day No “popular” prefix had > 3 events per day Most “popular” prefixes had < 0.2 events/day and just 1 update/event

An Interpretation of the Results Popular  stable –Well-managed –Few failures and fast recovery –Single-update events to alternate routes Unstable  unpopular –Persistent flaps: hard to reach –Frequent flaps: poorly-managed sites Unpopular does not imply unstable –Most prefixes are quite stable –Well-managed, simple configurations –Managed by upstream provider

Avoiding Path Exploration

Reducing Path Exploration By Tagging When AS 1 sees (1,0) fail –Switches to (1,2,0) –Why not say “because the link (1,0) has failed”? –Allow ASes to discard all paths that use edge (1,0) Should reduce exploration –E.g., AS 3 should not consider (3,2,1,0) –E.g., AS 2 should not consider (2,3,1,0) Seems appealing, but… (1,0) (1,2,0) (1,3,0) (2,0) (2,1,0) (2,3,0) (3,0) (3,1,0) (3,2,0)

Problem #1: Timing of Information How long should the ASes believe the info? –What if the link (1,0) comes back up? –What if the info about the failure is still propagating? Do the ASes need to remember the old paths? –E.g., should AS 2 remember (2,3,1,0) in case it learns later that (1,0) has come back up? –BGP is an incremental protocol, so forgetting information may be risky unless you will get it back again But, these issues are probably surmountable –… with some attention to the details

Problem #2: AS With Multiple Routers/Links BGP introduces abstraction –Treats each AS as a single node –Doesn’t distinguish between links Example: one link fails –Should AS 1 tell others? –Need to identify which link? –Does it introduce more updates? Internal BGP details matter –Some AS 1 routers don’t know about both paths through AS 0… 1 0 d

Internal BGP Convergence Briefly, the border router has no route at all!

Questions Can we reduce path exploration –Hints in the BGP update messages –To avoid exploring a set of related paths Handling the challenges –Timing details –Multiple routers and links per AS –… without excessive overhead Can we change the problem –Server per AS that stores all candidate routes –Exchanging information about the root cause

Next Time: Protocol Divergence Two papers –“The Stable Paths Problem and Interdomain Routing” –“Stable Interdomain Routing Without Global Coordination” Review only of the first paper –Summary –Why accept –Why reject –Future work Optional NANOG video on “BGP Wedgies”