Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski.

Slides:



Advertisements
Similar presentations
Micro-loop Prevention Methods draft-bryant-shand-lf-conv-frmwk-00.txt draft-zinin-microloop-analysis-00.txt.
Advertisements

Comparing IPv4 and IPv6 from the perspective of BGP dynamic activity Geoff Huston APNIC February 2012.
LIFEGUARD: Practical Repair of Persistent Route Failures Ethan Katz-Bassett (USC), Colin Scott (UW/UCB), David Choffnes, Italo Cunha (UW), Valas Valancius,
Part IV: BGP Routing Instability. March 8, BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages.
Network Layer: Internet-Wide Routing & BGP Dina Katabi & Sam Madden.
Copyright 2008 Kenneth M. Chipps Ph.D. Cisco CCNA Exploration CCNA 2 Routing Protocols and Concepts Chapter 4 Distance Vector Routing Protocols.
Internet Routing Instability
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
By Hitesh Ballani, Paul Francis, Xinyang Zhang Slides by Benson Luk for CS 217B.
BGP update profiles and the implications for secure BGP update validation processing Geoff Huston Swinburne University of Technology PAM April 2007.
Chapter 5 IP Routing Routing Sending packets through network from one device to another What must routers know? – Destination address – Neighboring routers.
1 An Experimental Analysis of BGP Convergence Time Timothy Griffin AT&T Research & Brian Premore Dartmouth College.
CS Summer 2003 Quiz 1 Q1) Answer the following: List one protocol that is commonly used for intra AS routing? List one protocol that is used for.
CSEE W4140 Networking Laboratory Lecture 4: IP Routing (RIP) Jong Yul Kim
CSEE W4140 Networking Laboratory Lecture 4: IP Routing (RIP) Jong Yul Kim
Dynamics of Hot-Potato Routing in IP Networks Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Inherently Safe Backup Routing with BGP Lixin Gao (U. Mass Amherst) Timothy Griffin (AT&T Research) Jennifer Rexford (AT&T Research)
Routing Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
Root cause analysis of BGP routing dynamics Matt Caesar, Lakshmi Subramanian, Randy H. Katz.
1 Limiting the Impact of Failures on Network Performance Joint work with Supratik Bhattacharyya, and Christophe Diot High Performance Networking Group,
A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems.
Network Measurement Bandwidth Analysis. Why measure bandwidth? Network congestion has increased tremendously. Network congestion has increased tremendously.
Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and.
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman.
1 Network Topology Measurement Yang Chen CS 8803.
Better by a HAIR: Hardware-Amenable Internet Routing Brent Mochizuki University of Illinois at Urbana-Champaign Joint work with: Firat Kiyak (Illinois)
Computer Networks Layering and Routing Dina Katabi
Chapter 22 Network Layer: Delivery, Forwarding, and Routing
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Authors Renata Teixeira, Aman Shaikh and Jennifer Rexford(AT&T), Tim Griffin(Intel) Presenter : Farrukh Shahzad.
Distance Vector Routing Protocols W.lilakiatsakun.
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira (UCSD),
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,
Detection of Routing Loops and Analysis of Its Causes Sue Moon Dept. of Computer Science KAIST Joint work with Urs Hengartner, Ashwin Sridharan, Richard.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
By, Matt Guidry Yashas Shankar.  Analyze BGP beacons which are announced and withdrawn, usually within two hour intervals.  The withdraws have an effect.
R-BGP: Staying Connected in a Connected World Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs.
Eliminating Packet Loss Caused by BGP Convergence Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs.
SafeGuard: Safe Forwarding during Route Changes Ang Li†, Xiaowei Yang†, and David Wetherall‡ †Duke University ‡UW/Intel Research.
D1 - 08/12/2015 Requirements for planned maintenance of BGP sessions draft-dubois-bgp-pm-reqs-02.txt
Internet Protocols. ICMP ICMP – Internet Control Message Protocol Each ICMP message is encapsulated in an IP packet – Treated like any other datagram,
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
An internet is a combination of networks connected by routers. When a datagram goes from a source to a destination, it will probably pass through many.
Securing BGP Bruce Maggs. BGP Primer AT&T /8 Sprint /16 CMU /16 bmm.pc.cs.cmu.edu Autonomous System Number Prefix.
HLP: A Next Generation Interdomain Routing Protocol Lakshminarayanan Subramanian, Matthew Caesar, Cheng Tien Ee, Mark Handley, Morley Mao, Scott Shenker,
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,
1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.
Uni Innsbruck Informatik th IETF, PMTUD WG: Path MTU Discovery Using Options draft-welzl-pmtud-options-01.txt Michael Welzl
Traffic-aware Inter-Domain Routing for Improved Internet Routing Stability Zhenhai Duan Florida State University 1.
CSci5221: Inter-Domain Routing Convergence Issues and Improvements 1 Inter-Domain Routing Convergence Issues, Impacts and Improvements Inter-Domain Routing.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Dissecting Significant Outages from 2014 Valerio Plessi CCIE R&S Customer Success Engineer
EDCS IETF 81, Jul/2011, Quebec City, Canadadraft-bashandy-idr-bgp-repair-label-02 Scalable Loop Free BGP FRR Using Repair Label draft-bashandy-idr-bgp-repair-label-02.
1 QOS ©2000, Cisco Systems, Inc. BGP MED Churn Daniel Walton
ICMP ICMP – Internet Control Message Protocol
Detection of Routing Loops and Analysis of Its Causes
CS 457 – Lecture 12 Routing Spring 2012.
Intra-Domain Routing Jacob Strauss September 14, 2006.
Net 323 D: Networks Protocols
Internet Control Message Protocol (ICMP)
Net 323 D: Networks Protocols
Internet Control Message Protocol (ICMP)
Dynamic Routing and OSPF
COS 561: Advanced Computer Networks
Measuring the Measurers: How is Atlas Used?
COS 461: Computer Networks
Transient BGP Loops Do they matter, and what can be done about them?
Computer Networks Protocols
Routing in Mobile Wireless Networks Neil Tang 11/14/2008
Presentation transcript:

Transient BGP Loops Do they matter, and what can be done about them? Nate Kushman MIT/Akamai Srikanth Kandula, Dina Katabi and John Wroclawski

What causes: “Transient BGP Loops” MIT Bob JoeAT&TSprint Maintenance Withdraw MIT

What causes: “Transient BGP Loops” MIT Bob JoeAT&TSprint Maintenance

What causes: “Transient BGP Loops” MIT Bob JoeAT&TSprint Maintenance Withdraw MIT

What causes: “Transient BGP Loops” MIT Bob JoeAT&TSprint Maintenance Routing Loop

What causes: “Transient BGP Loops” MIT Bob JoeAT&TSprint Maintenance Withdraw MIT

What causes: “Transient BGP Loops” MIT Bob JoeAT&TSprint Maintenance

What causes: “Transient BGP Loops” MIT Bob JoeAT&TSprint Maintenance

How common are: “Transient Inter-domain Routing Loops” Sprint Study (IMC 2003, IMW 2002): –Looked at packet traces from the Sprint backbone –Up to 90% of the observed packet-loss was caused by routing loops –60-100% of the loops attributable to BGP

Routing Loop Damage Our Study: –20 vantage points with BGP feeds –2 Months –70,000 unique prefixes –Pinged once every 2 minutes –Trace-routed once every 30 minutes –TTL Exceeded responses to detect loops –Additional pings and traceroutes when loops detected

Routing Loop Damage 10-15% of updates cause routing loops

Collateral Damage AS A AS F AS E AS D AS C AS B

Collateral Damage AS A AS F AS E AS D AS C AS B Collateral Damage X

Collateral Damage Prefixes sharing a loopy link see 19% loss

What should be done? We should prevent forwarding loops

A loop occurs because: One AS pushes a route update to the data plane, but other AS's, unaware yet of the move, try to send packets on the old route

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance Withdraw MIT

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance Withdraw MIT AT&T still thinks Joe is routing through Bob

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance What if: AT&T knew about Joe’s change before making its own?

Suspension Continue to route traffic Tell control system not to propagate the route

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance Withdraw MIT

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance Withdraw MIT What if: Joe sends it’s update before changing it’s forwarding table?

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance And also waits for an Ack from AT&T before updating it’s forwarding table?

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance Then we can be sure that AT&T knows about the path change before it happens and will not use the path

How can we avoid Routing Loops? MIT Bob JoeAT&TSprint Maintenance Instead, AT&T will move immediately to the Sprint path and the loop is avoided.

More Generally We have proven: –Loops are prevented in the general case –Convergence properties similar to normal BGP All sorts of good proofs and stuff: –

Your feedback Clearly: –Planned Maintenance events 20% of update events caused by planned maintenance –Link up events What about? –Unplanned Link down events –Trade-off between loss on current path and collateral damage

In Short Routing loops cause significant performance problems Even prefixes with no BGP updates are significantly affected by loops A simple change to BGP can avoid all routing loops

Questions?