4/12/20151 CompSci 514: Computer Networks Lecture 5: BGP problems Xiaowei Yang.

Slides:



Advertisements
Similar presentations
Distance Vector Routing Protocols
Advertisements

Karlston D'Emanuele Distance Vector Routing Protocols Notes courtesy of Mr. Joe Cordina Password Removed
Introduction to IP Routing Geoff Huston. Routing How do packets get from A to B in the Internet? A B Internet.
Update Damping in BGP Geoff Huston Chief Scientist, APNIC.
Multihoming and Multi-path Routing
Multihoming and Multi-path Routing
Routing Convergence and the Impact of Scale Dan Massey Colorado State University.
1 The MASC/BGMP Architecture for Inter-domain Multicast Routing Satish Kumar (USC), Pavlin Radoslavov (USC), Dave Thaler (Merit), Cengiz Alaettinoglu (ISI),
CSCI-1680 Network Layer: Intra-domain Routing Based partly on lecture notes by David Mazières, Phil Levis, John Jannotti Rodrigo Fonseca.
End to End Routing Behavior in the Internet Vern Paxson Network Research Group Lawrence Berkeley National Laboratory University of California, Berkeley.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Distance Vector Routing Protocols Routing Protocols and Concepts –
Route Optimisation RD-CSY3021.
Title Name institution. April, 9-11th, 2015FORIL XI
1/10/20151 Mobile Computing COE 446 Network Operation Tarek Sheltami KFUPM CCSE COE Principles of Wireless.
Analysis of Algorithms
State-Space Collapse via Drift Conditions Atilla Eryilmaz (OSU) and R. Srikant (Illinois) 4/10/20151.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
1 End-to-End Routing Behavior in the Internet Internet Routing Instability Presented by Carlos Flores Gaurav Jain May 31st CS 6390 Advanced Computer.
Internet Routing Instability Craig Labovitz, G. Robert Malan, Farham Jahanian University of Michigan Presented By Krishnanand M Kamath.
Part IV: BGP Routing Instability. March 8, BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages.
Advanced Networks 1. Delayed Internet Routing Convergence 2. The Impact of Internet Policy and Topology on Delayed Routing Convergence.
CS 268: Routing Behavior in the Internet Ion Stoica February 18, 2003.
Delayed Internet Routing Convergence Craig Labovitz, Microsoft Research Abha Ahuja, University of Michigan Farnam Jahanian, University of Michigan Abhit.
1 Experimental Study of Internet Stability and Wide-Area Backbone Failure Craig Labovitz, Abha Ahuja Merit Network, Inc Presented by Changchun Zou.
Copyright 2008 Kenneth M. Chipps Ph.D. Cisco CCNA Exploration CCNA 2 Routing Protocols and Concepts Chapter 4 Distance Vector Routing Protocols.
Internet Routing Instability
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
1 Measurement of Highly Active Prefixes in BGP Ricardo V. Oliveira, Rafit Izhak-Ratzin, Beichuan Zhang, Lixia Zhang GLOBECOM’05.
BGP in 2009 Geoff Huston APNIC May Conventional BGP Wisdom IAB Workshop on Inter-Domain routing in October 2006 – RFC 4984: “routing scalability.
End-to-End Routing Behavior in the Internet Vern Paxson Presented by Zhichun Li.
1 An Experimental Analysis of BGP Convergence Time Timothy Griffin AT&T Research & Brian Premore Dartmouth College.
Internet Routing Instability Labovitz et al. Sigcomm 1997 Largely adopted from Ion Stoica’s slide at UCB.
BGP: Inter-Domain Routing Protocol Noah Treuhaft U.C. Berkeley.
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
More on BGP Check out the links on politics: ICANN and net neutrality To read for next time Path selection big example Scaling of BGP.
E2E Routing Behavior in the Internet Vern Paxson Sigcomm 1996 Slides are adopted from Ion Stoica’s lecture at UCB.
M. Menelaou CCNA2 DYNAMIC ROUTING. M. Menelaou DYNAMIC ROUTING Dynamic routing protocols can help simplify the life of a network administrator Routing.
Unicast Routing Protocols  A routing protocol is a combination of rules and procedures that lets routers in the internet inform each other of changes.
Information-Centric Networks04a-1 Week 4 / Paper 1 Open issues in Interdomain Routing: a survey –Marcelo Yannuzzi, Xavier Masip-Bruin, Olivier Bonaventure.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
Routing Convergence Dan Massey Colorado State University.
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,
TCOM 509 – Internet Protocols (TCP/IP) Lecture 06_a Routing Protocols: RIP, OSPF, BGP Instructor: Dr. Li-Chuan Chen Date: 10/06/2003 Based in part upon.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
By, Matt Guidry Yashas Shankar.  Analyze BGP beacons which are announced and withdrawn, usually within two hour intervals.  The withdraws have an effect.
02/01/2006USC/ISI1 Updates on Routing Experiments Cyber DEfense Technology Experimental Research (DETER) Network Evaluation Methods for Internet Security.
Eliminating Packet Loss Caused by BGP Convergence Nate Kushman Srikanth Kandula, Dina Katabi, and Bruce Maggs.
An internet is a combination of networks connected by routers. When a datagram goes from a source to a destination, it will probably pass through many.
End-to-End Routing Behavior in the Internet Vern Paxson Presented by Sankalp Kohli and Patrick Wong.
Routing: In an Autonomous System Chapter 16. Introduction How does a router in an Autonomous System learn about other networks within its AS? –In an internet.
1 Border Gateway Protocol (BGP) and BGP Security Jeff Gribschaw Sai Thwin ECE 4112 Final Project April 28, 2005.
A Measurement Study on the Impact of Routing Events on End-to-End Internet Path Performance Feng Wang 1, Zhuoqing Morley Mao 2 Jia Wang 3, Lixin Gao 1,
Traffic-aware Inter-Domain Routing for Improved Internet Routing Stability Zhenhai Duan Florida State University 1.
1 Internet Routing: BGP Routing Convergence Jennifer Rexford Princeton University
Routing Loops.
COS 561: Advanced Computer Networks
A stability-oriented approach to improving BGP convergence
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
BGP Interactions Jennifer Rexford
COS 461: Computer Networks
2005 – A BGP Year in Review February 2006 Geoff Huston
Routing Experiments Chen-Nee Chuah, Sonia Fahmy, Denys Ma,
BGP Instability Jennifer Rexford
Computer Networks Protocols
Presentation transcript:

4/12/20151 CompSci 514: Computer Networks Lecture 5: BGP problems Xiaowei Yang

4/12/20152 Today Known problems of BGP –Multi-homing –Instability –Delayed convergence Slow failover Discussing fixes –Root cause, ghost flushing etc.

4/12/20153 Failover BGP is designed for scaling more than fast failover –Many mechanisms favor this balance –Route flap damping, for example. If excess routing changes (“flapping”), ignore for some time. Has unexpected effects on convergence times. –Route advertisement/withdrawal timers in the 30 second range –Effect: tens of seconds to many minutes to recover from “simple” failures. –15-30 minute outages not uncommon.

4/12/20154 Multi-homing Connect to multiple providers –Goal: Higher availability, more capacity Problems: –Provider-based addressing breaks –Everyone needs their own address space

4/12/20155 Multi-homing increases routing table size Mutil-home.com / / /16 ISP2 ISP1 You can reach /8 And /16 via ISP1 ISP /16 ISP / /8 ISP /16 ISP /8 ISP2

4/12/20156 Global routing tables continue to grow Source:

4/12/20157 Other BGP problems Convergence: BGP may explore many routes before finding the right new one. –Labovitz et al., SIGCOMM 2000 Correctness: routes may not be valid, visible, or loop-free. Security: There is none! –Some providers filter what announcements their customers can make. Not all do. –See paper discussion site for pointers

4/12/20158 Measurement studies Two papers (measurement) –End-to-end traffic –Routing messages Experimental techniques Results

4/12/20159 Internet Routing Instability Goals: how often BGP sends updates to change routes Methodology: –Analyzing BGP logs for a long time

Terms WADiff: withdrawal  announcement AADiff: announcement  announcement WADup: same route withdrawal  announcement AADup: same route announcement  announcement WWDup: same route withdrawal  wthdrawal 4/12/2015CPS 21410

Observed pathologies Repeated WWDup, WADup, AADup Why are they pathologies? 4/12/2015CPS 21411

4/12/ Majority of BGP updates are WWDup WWDup belong to ASes that never announce them Why? –Stateless BGP, does not remember what have sent to peers –Send withdrawals to all peers

Possible origins of instability Stateless BGP Physical link errors Unjittered timers IGP, BGP interactions Conflicting routing policies 4/12/201513

4/12/ Data analysis techniques Time series analysis Frequency analysis –Fast Fourier transform –Maximum entropy spectral estimation Different estimation methods, but both find significant frequencies at seven days, and 24 hours

Main results Much more updates than expected –99% is pathological. Impressive! –A taxonomy to analyze pathologies Speculation of causes –Configuration errors, router bugs –Correlate with traffic load, perhaps due to router architectures –Open research questions: root cause of updates Motivated much follow-up work 4/12/201515

4/12/ End-to-end routing behavior Goals: study routing pathologies, route stability, and routing asymmetry Methodology: –End-to-end measumrents –Traceroutes from N sites, N 2 paths –Exponentially spaced-out sampling Nice properties! Unbiased PASTA: the fraction of measures that observe a given state is equal to the time that the system spends in that state –Two datasets: D1 and D2 for different intervals 1~2 days, 2 hrs, 2.75 days

Measurement infrastructure 37 hosts 8% of Internet at that time Pair-wise traceroute 4/12/201517

4/12/ Pathologies Persistent loops: %, some lasted hours –Same router appearing in traceroute more than three times Erroneous routing: packets to UCL sent to Israel Mid-stream change, 0.16% (D1) and 0.44% (D2) –Suggests route change Infrastructure Failure: availability 99.8%, 99.5% –Unreachable to host –Telephone networks: 2 hours in 40 years, five nines Outages: more than six packet losses in a traceroute

4/12/ Stabilities Prevalence: the probability of observing a particular route Persistence: how long a route lasts Examples: –R1, R2, R1, R2 –R1, R1, R2, R2 –Same prevalence, different persistence

4/12/2015CPS Prevalence of dominant route P domp =k p /n p Internet paths are strongly dominated by one path

4/12/ /3 of routes persist for days

4/12/ Path asymmetry Common –Don’t assume path symmetry in your design –49% of measures have asymmetric paths differed by at least one city –30% observed different ASes –20% differ by more than one city/AS –Q: what might cause it?

Comments on this paper Seminal work on Internet measurement Solid data Rigorous analysis 4/12/201523

4/12/ Delayed Internet Convergence Measurement Problem discovery Modeling & analysis Improvement Methodologies

4/12/ Experiments setup Actively inject BGP faults –How is fault injected? Passively listen at peering sessions, and use NTP synchronized machines to calculate the convergence time Actively send probe packets to observe end-to-end packet loss and latency Much BGP work later uses similar measurement techniques.

4/12/ Results show delayed convergence Bad news travels slow.

4/12/ Slow routing convergence results in poor end-to-end performance

4/12/ What causes the delayed routing convergence? A simple BGP convergence model reveals that in the worse case, all possible paths are explored before a prefix is withdrawn. No minimum advertisement timer: synchronized network, global message queue 0 12 R (*0R, ∞,, 2R) (∞, ∞, *2R) (∞, ∞, *20R) (*0R, 1R, ∞,) (01R, *1R, ∞) (*01R, 10R, ∞,) (∞, *1R, 2R) (∞, ∞, *2R) (∞, ∞, ∞) 01R 10R 20R

4/12/ Min router Advertisement interval timer(MRAI) reduces message count Why? –MATI introduces synchronization. Multiple announcements are combined into one announcement, reducing the total message count. However, the convergence time becomes proportional to timer_interval * (n-3)

4/12/ Let’s brain storm… How can we fix the slow convergence problem? –What is the solution proposed by the authors? Sender-side loop detection. When a sender detects a loop, it sends a withdrawal to a neighbor immediately. Since withdrawal is not subject to MATI delay, this improvement reduces both message count and convergence time. –What exactly is the root cause of BGP’s slow convergence problem? –Can you come up with any solution?

4/12/ Sender-side loop detection Without sender-side loop detection: –AS3  AS1: 301R –This announcement is sent out when MRAI timer expires With sender-side loop detection: –AS3->AS1: withdrawal –Withdrawal is sent out immediately. AS1 knows it has no path.

4/12/ BGP assertion Detect path inconsistency between different neighbors If inconsistency is found, give path learned from direct neighbors high priority Sensitive to topology Does not eliminate all invalid paths N1R D XY N2 X N1  R: N1 X Y D R: N1 D N2 N1 D

4/12/ Ghost flushing If new path is worse than last announced path, and router advertisement timer has not expired yet, send a withdrawal immediately. The withdrawal flushes “ghost” information. Reduces the convergence time because withdrawals are not delayed by MRAI, but does not help much with “Tlong.” N1R D XY N2 X N1  R: withdrawal R: N1 D N2 N1 D

4/12/ BGP root cause notification Neither BGP-assertion nor ghost flushing works well in this topology. –Why? –BGP-assertion: 3 and 6 are both direct neighbors to 5, but their announcements may be inconsistent –BGP ghost flushing: the newer path is subject to MATI delay Explicitly send out link up/down information Essentially adds link-state information into BGP Sequence number is used to order the notifications. Open research problem: can you get rid of sequence number? X 6

4/12/ Summary BGP’s slow convergence problem and other problems It represents a message overhead, processing overhead, and latency tradeoff. We do not yet know the best solution to address this problem.

Comments Measurement paper –Data –Data collection techniques –Data analysis When to write a measurement paper 4/12/201536