Nick Feamster Interdomain Routing Correctness and Stability.

Slides:



Advertisements
Similar presentations
Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)
Advertisements

Improving Internet Availability. Some Problems Misconfiguration Miscoordination Efficiency –Market efficiency –Efficiency of end-to-end paths Scalability.
Data Mining Challenges for Network Management Nick Feamster, Georgia Tech Dave Andersen, CMU (joint with Jay Lepreau and Emulab)
Networking Research Nick Feamster CS Nick Feamster Ph.D. from MIT, Post-doc at Princeton this fall Arriving January 2006 –Here off-and-on until.
Internet Availability Nick Feamster Georgia Tech.
Nick Feamster Research Interest: Networked Systems Arriving January 2006 Likely teaching CS 7260 in Spring 2005 Here off-and-on until then. works.
Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)
Network Operations Research Nick Feamster
1 Interdomain Traffic Engineering with BGP By Behzad Akbari Spring 2011 These slides are based on the slides of Tim. G. Griffin (AT&T) and Shivkumar (RPI)
1 Copyright  1999, Cisco Systems, Inc. Module10.ppt10/7/1999 8:27 AM BGP — Border Gateway Protocol Routing Protocol used between AS’s Currently Version.
© J. Liebeherr, All rights reserved 1 Border Gateway Protocol This lecture is largely based on a BGP tutorial by T. Griffin from AT&T Research.
Does BGP Solve the Shortest Paths Problem? Timothy G. Griffin Joint work with Bruce Shepherd and Gordon Wilfong Bell Laboratories, Lucent Technologies.
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
Best Practices for ISPs
Towards a Logic for Wide-Area Internet Routing Nick Feamster and Hari Balakrishnan M.I.T. Computer Science and Artificial Intelligence Laboratory Kunal.
1 Route Control Platform – IEEE CCW 2004 Route Control Platform Making an AS look and act like one router Aman Shaikh AT&T Labs - Research IEEE CCW 2004.
Announcement  Slides and reference materials available at  Slides and reference materials available.
BGP Safety with Spurious Updates Martin Suchara in collaboration with: Alex Fabrikant and Jennifer Rexford IEEE INFOCOM April 14, 2011.
1 Tutorial 5 Safe “Peering Backup” Routing With BGP Based on:
S ufficient C onditions to G uarantee P ath V isibility Akeel ur Rehman Faridee
Tutorial 5 Safe Routing With BGP Based on: Internet.
Internet Networking Spring 2004 Tutorial 5 Safe “Peering Backup” Routing With BGP.
Stable Internet Routing Without Global Coordination Jennifer Rexford Princeton University Joint work with Lixin Gao (UMass-Amherst)
Slide -1- February, 2006 Interdomain Routing Gordon Wilfong Distinguished Member of Technical Staff Algorithms Research Department Mathematical and Algorithmic.
1 Route Control Platform – IEEE CCW 2004 Route Control Platform Making an AS look and act like a router Aman Shaikh AT&T Labs - Research IEEE CCW 2004.
Interdomain Routing Establish routes between autonomous systems (ASes). Currently done with the Border Gateway Protocol (BGP). AT&T Qwest Comcast Verizon.
Troubleshooting Network Configuration Nick Feamster CS 6250: Computer Networking Fall 2011.
Inherently Safe Backup Routing with BGP Lixin Gao (U. Mass Amherst) Timothy Griffin (AT&T Research) Jennifer Rexford (AT&T Research)
1 Design and implementation of a Routing Control Platform Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, Jacobus van der.
Economic Incentives in Internet Routing Jennifer Rexford Princeton University
Stable Internet Routing Without Global Coordination Jennifer Rexford AT&T Labs--Research
1 Interdomain Routing Policy Reading: Sections plus optional reading COS 461: Computer Networks Spring 2008 (MW 1:30-2:50 in COS 105) Jennifer Rexford.
Stable Internet Routing Without Global Coordination Jennifer Rexford AT&T Labs--Research
Stable Internet Routing Without Global Coordination Jennifer Rexford AT&T Labs--Research Joint work with Lixin Gao.
Interdomain Routing and the Border Gateway Protocol (BGP) Reading: Section COS 461: Computer Networks Spring 2011 Mike Freedman
Interdomain Routing (Nick Feamster) February 4, 2008.
© 2009 Cisco Systems, Inc. All rights reserved. ROUTE v1.0—6-1 Connecting an Enterprise Network to an ISP Network BGP Attributes and Path Selection Process.
CS 3700 Networks and Distributed Systems Inter Domain Routing (It’s all about the Money) Revised 8/20/15.
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
Chapter 9. Implementing Scalability Features in Your Internetwork.
Towards an Internet that “Never Fails” Hari Balakrishnan MIT Joint work with Nick Feamster, Scott Shenker, Mythili Vutukuru.
Border Gateway Protocol (BGP) W.lilakiatsakun. BGP Basics (1) BGP is the protocol which is used to make core routing decisions on the Internet It involves.
Stable Internet Routing Without Global Coordination Jennifer Rexford Princeton University Joint work with Lixin Gao,
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
1 Agenda for Today’s Lecture The rationale for BGP’s design –What is interdomain routing and why do we need it? –Why does BGP look the way it does? How.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—5-1 Customer-to-Provider Connectivity with BGP Connecting a Multihomed Customer to a Single Service.
Bringing External Connectivity and Experimenters to GENI Nick Feamster Georgia Tech.
Michael Schapira, Princeton University Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks
CSci5221: BGP Policies1 Inter-Domain Routing: BGP, Routing Policies, etc. BGP Path Selection and Policy Routing Stable Path Problem and Policy Conflicts.
Doing Don’ts: Modifying BGP Attributes within an Autonomous System Luca Cittadini, Stefano Vissicchio, Giuseppe Di Battista Università degli Studi RomaTre.
1 Internet Routing 11/11/2009. Admin. r Assignment 3 2.
CS 3700 Networks and Distributed Systems
CS 3700 Networks and Distributed Systems
New Directions in Routing
Border Gateway Protocol
Interdomain Routing (Nick Feamster).
COS 561: Advanced Computer Networks
BGP supplement Abhigyan Sharma.
Interdomain Traffic Engineering with BGP
Can Economic Incentives Make the ‘Net Work?
Inter-Domain Routing: BGP, Routing Policies, etc.
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
BGP Policies Jennifer Rexford
COS 561: Advanced Computer Networks
COS 461: Computer Networks
Presentation transcript:

Nick Feamster Interdomain Routing Correctness and Stability

2 Is correctness really that important?

3 The Internet is increasingly becoming part of the mission-critical Infrastructure (a public utility!). Big problem: Very poor understanding of how to manage it.

4 Why does routing go wrong? Complex policies –Competing / cooperating networks –Each with only limited visibility Large scale –Tens of thousands networks –…each with hundreds of routers –…each routing to hundreds of thousands of IP prefixes

5 What can go wrong? Two-thirds of the problems are caused by configuration of the routing protocol Some things are out of the hands of networking research But…

6 Review: Simple operation… Route Advertisement Autonomous Systems (ASes) Session DestinationNext-hopAS Path / MIT

7 …but complex configuration! Which neighboring networks can send traffic Where traffic enters and leaves the network How routers within the network learn routes to external destinations Flexibility for realizing goals in complex business landscape FlexibilityComplexity Traffic Route No Route

8 Configuration Semantics Ranking: route selection Dissemination: internal route advertisement Filtering: route advertisement Customer Competitor Primary Backup

9 What types of problems does configuration cause? Persistent oscillation (today’s reading) Forwarding loops Partitions “Blackholes” Route instability …

10 These problems are real “…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.” -- news.com, April 25, 1997 UUNet Florida Internet Barn Sprint

11 These problems are real “…a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint.” -- news.com, April 25, 1997 “Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes.” -- wired.com, January 25, 2001 “WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue." -- cnn.com, October 3, 2002 "A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration).” -- dslreports.com, February 23, 2004

12 Several “Big” Problems a Week

13 Why is routing hard to get right? Defining correctness is hard Interactions cause unintended consequences –Each network independently configured –Unintended policy interactions Operators make mistakes –Configuration is difficult –Complex policies, distributed configuration

14 Correctness Specification Safety The protocol converges to a stable path assignment for every possible initial state and message ordering The protocol does not oscillate

15 Safety: No Persistent Oscillation Varadhan, Govindan, & Estrin, “Persistent Route Oscillations in Interdomain Routing”, 1996

16 Strawman: Global Policy Check Require each AS to publish its policies Detect and resolve conflicts Problems: ASes typically unwilling to reveal policies Checking for convergence is NP-complete Failures may still cause oscillations

17 Think Globally, Act Locally Key features of a good solution –Safety: guaranteed convergence –Expressiveness: allow diverse policies for each AS –Autonomy: do not require revelation/coordination –Backwards-compatibility: no changes to BGP Local restrictions on configuration semantics –Ranking –Filtering

18 Main Idea of Today’s Paper Permit only two business arrangements –Customer-provider –Peering Constrain both filtering and ranking based on these arrangements to guarantee safety Surprising result: these arrangements correspond to today’s (common) behavior Gao & Rexford, “Stable Internet Routing without Global Coordination”, IEEE/ACM ToN, 2001

19 Relationship #1: Customer-Provider Filtering –Routes from customer: to everyone –Routes from provider: only to customers providers customer From the customer To other destinations advertisements traffic From other destinations To the customer customer providers

20 Relationship #2: Peering Filtering –Routes from peer: only to customers –No routes from other peers or providers advertisements traffic customer peer

21 Rankings Routes from customers over routes from peers Routes from peers over routes from providers provider peer customer

22 Additional Assumption: Hierarchy Disallowed!

23 Safety: Proof Sketch System state: the current route at each AS Activation sequence: revisit some router’s selection based on those of neighboring ASes

24 Activation Sequence: Intuition Activation: emulates a message ordering –Activated router has received and processed all messages corresponding to the system state “Fair” activation: all routers receive and process outstanding messages

25 Safety: Proof Sketch State: the current route at each AS Activation sequence: revisit some router’s selection based on those of neighboring ASes Goal: find an activation sequence that leads to a stable state Safety: satisfied if that activation sequence is contained within any “fair” activation sequence

26 Proof, Step 1: Customer Routes Activate ASes from customer to provider –AS picks a customer route if one exists –Decision of one AS cannot cause an earlier AS to change its mind An AS picks a customer route when one exists

27 Proof, Step 2: Peer & Provider Routes Activate remaining ASes from provider to customer –Decision of one Step-2 AS cannot cause an earlier Step- 2 AS to change its mind –Decision of Step-2 AS cannot affect a Step-1 AS AS picks a peer or provider route when no customer route is available

28 Ranking and Filtering Interactions Allowing more flexibility in ranking –Allow same preference for peer and customer routes –Never choose a peer route over a shorter customer route … at the expense of stricter AS graph assumptions –Hierarchical provider-customer relationship (as before) –No private peering with (direct or indirect) providers Peering

29 Some problems Requires acyclic hierarchy (global condition) Cannot express many business relationships Abovenet Verio PSINet Sprint Customer Question: Can we relax the constraints on filtering? What happens to rankings?

30 Other Possible Local Rankings Accept only next-hop rankings –Captures most routing policies –Generalizes customer/peer/provider –Problem: system not safe Accept only shortest hop count rankings –Guarantees safety under filtering –Problem: not expressive *,2*,0* 2*,1*,0*1*, 3*, 0* Feamster, Johari, & Balakrishnan, “Implications of Autonomy for the Expressiveness of Policy Routing”, SIGCOMM 2005

31 What Rankings Violate Safety? Theorem. Permitting paths of length n+2 over paths of length n will violate safety under filtering. Theorem. Permitting paths of length n+1 over paths of length n will result in a dispute wheel. Feamster, Johari, & Balakrishnan, “Implications of Autonomy for the Expressiveness of Policy Routing”, SIGCOMM 2005

32 What about properties of resulting paths, after the protocol has converged? We need additional correctness properties.

33 Correctness Specification Safety The protocol converges to a stable path assignment for every possible initial state and message ordering The protocol does not oscillate Path Visibility Every destination with a usable path has a route advertisement Route Validity Every route advertisement corresponds to a usable path Example violation: Network partition Example violation: Routing loop If there exists a path, then there exists a route If there exists a route, then there exists a path

34 Path Visibility: Internal BGP (iBGP) “iBGP” Default: “Full mesh” iBGP. Doesn’t scale. Large ASes use “Route reflection” Route reflector: non-client routes over client sessions; client routes over all sessions Client: don’t re-advertise iBGP routes.

35 iBGP Signaling: Static Check Theorem. Suppose the iBGP reflector-client relationship graph contains no cycles. Then, path visibility is satisfied if, and only if, the set of routers that are not route reflector clients forms a clique. Condition is easy to check with static analysis.

36 How do we guarantee these additional properties in practice?

37 Today: Reactive Operation Problems cause downtime Problems often not immediately apparent What happens if I tweak this policy…? ConfigureObserve Wait for Next Problem Desired Effect? Revert No Yes

38 Goal: Proactive Operation Idea: Analyze configuration before deployment Configure Detect Faults Deploy rcc Many faults can be detected with static analysis. Feamster & Balakrishnan, “Detecting BGP Configuration Faults with Static Analysis”, NSDI 2005

39 “rcc” rcc Overview Normalized Representation Correctness Specification Constraints Faults Analyzing complex, distributed configuration Defining a correctness specification Mapping specification to constraints Challenges Distributed router configurations (Single AS) Feamster & Balakrishnan, “Detecting BGP Configuration Faults with Static Analysis”, NSDI 2005

40 rcc Implementation PreprocessorParser Verifier Distributed router configurations Relational Database (mySQL) Constraints Faults (Cisco, Avici, Juniper, Procket, etc.) Feamster & Balakrishnan, “Detecting BGP Configuration Faults with Static Analysis”, NSDI 2005

41 Summary: Faults across 17 ASes Route ValidityPath Visibility Every AS had faults, regardless of network size Most faults can be attributed to distributed configuration

42 rcc: Take-home lessons Static configuration analysis uncovers many errors Major causes of error: –Distributed configuration –Intra-AS dissemination is too complex –Mechanistic expression of policy About 100 downloads (70 network operators)

43 Two Philosophies This lecture: Accept the Internet as is. Devise “band-aids”. Another direction: Redesign Internet routing to guarantee safety, route validity, and path visibility

44 Preventing Errors in the First Place iBGP RCP After: RCP gets “best” iBGP routes (and IGP topology) iBGP eBGP Before: conventional iBGP Feamster et al., “The Case for Separating Routing from Routers”, SIGCOMM FDNA, 2004 Caesar et al., “Design and Implementation of a Routing Control Platform”, NSDI, 2005

45

46 Ranking Configuration Syntax (Example) Dissemination router bgp 7018 neighbor remote-as neighbor route-map IMPORT in neighbor remote-as 7018 neighbor route-reflector-client ! route-map IMPORT permit 1 match ip address 199 set local-preference 80 ! route-map IMPORT permit 2 match as-path 99 set local-preference 110 ! route-map IMPORT permit 3 set community 7018:1000 ! ip as-path access-list 99 permit ^65000$ access-list 199 permit ip host host access-list 199 permit ip host host

47 Why is Routing Hard to Get Right? Defining correctness is hard Operators make mistakes –Configuration is difficult –Complex policies, distributed configuration Interactions cause unintended consequences –Each network independently configured –Unintended policy interactions

48 Which faults does rcc detect? Faults found by rcc Latent faults Potentially active faults End-to-end failures

49 Normalizing Router Configuration Hundreds of routers distributed across an AS Thousands to tens of thousands of lines per router Many ways to express identical policy Challenge: Solution: Express configuration with centralized tables Check constraints by issuing queries on tables Sessions Filters Rankings

50 Route Validity: Consistent Export Malice/deception iBGP signaling partition Inconsistent export policy neighbor route-map PEER permit 10 set prepend 123 neighbor route-map PEER permit 10 set prepend Possible Causes Neighbor AS Export ExportClause Prepend Policy normalization makes comparison easy.

51 Inconsistent Export Observed at AT&T Feamster et al., “BorderGuard: Detecting Cold Potatoes from Peers”. ACM IMC, October % of destinations inconsistent for >4 days Percentage of destinations with inconsistent routes Percentage of time

52 Example: “Bogon” Routes Feamster et al., “An Empirical Study of ‘Bogon’ Route Advertisements”. ACM CCR, January 2005.

53 rcc Interface

54 Parsing Configuration

55 List of Faults