Beyond BGP Dan Massey Colorado State University. 24 October Internet Routing l Challenges Facing Internet Routing n Internet.

Slides:



Advertisements
Similar presentations
Karlston D'Emanuele Distance Vector Routing Protocols Notes courtesy of Mr. Joe Cordina Password Removed
Advertisements

Routing Convergence and the Impact of Scale Dan Massey Colorado State University.
COS 461 Fall 1997 Routing COS 461 Fall 1997 Typical Structure.
Part IV: BGP Routing Instability. March 8, BGP routing updates  Route updates at prefix level  No activity in “steady state”  Routing messages.
Advanced Networks 1. Delayed Internet Routing Convergence 2. The Impact of Internet Policy and Topology on Delayed Routing Convergence.
Border Gateway Protocol Ankit Agarwal Dashang Trivedi Kirti Tiwari.
Lecture 9 Overview. Hierarchical Routing scale – with 200 million destinations – can’t store all dests in routing tables! – routing table exchange would.
BGP Multiple Origin AS (MOAS) Conflict Analysis Xiaoliang Zhao, NCSU S. Felix Wu, UC Davis Allison Mankin, Dan Massey, USC/ISI Dan Pei, Lan Wang, Lixia.
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
DSN 2003 A Study of Packet Delivery Performance during Routing Convergence Dan Pei, Lan Wang, Lixia Zhang, UCLA Dan Massey, USC/ISI S. Felix Wu, UC Davis.
Consensus Routing: The Internet as a Distributed System John P. John, Ethan Katz-Bassett, Arvind Krishnamurthy, and Thomas Anderson Presented.
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
1 Measurement of Highly Active Prefixes in BGP Ricardo V. Oliveira, Rafit Izhak-Ratzin, Beichuan Zhang, Lixia Zhang GLOBECOM’05.
Chapter 4: Network Layer 4. 1 Introduction 4.2 Virtual circuit and datagram networks 4.3 What’s inside a router 4.4 IP: Internet Protocol –Datagram format.
Network Infrastructure Security Research at Colorado State University Dan Massey November 19, 2004.
Dynamic routing Routing Algorithm (Dijkstra / Bellman-Ford) – idealization –All routers are identical –Network is flat. Not true in Practice Hierarchical.
1 BGP Security -- Zhen Wu. 2 Schedule Tuesday –BGP Background –" Detection of Invalid Routing Announcement in the Internet" –Open Discussions Thursday.
Improving BGP Convergence Through Consistency Assertions Dan Pei, Lan Wang, Lixia Zhang UCLA Xiaoliang Zhao, Daniel Massey, Allison Mankin, USC/ISI S.
The Border Gateway Protocol (BGP) Sharad Jaiswal.
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
Protecting the BGP Routes to Top Level DNS Servers NANOG-25, June 11, 2002 UCLA Lan Wang Dan Pei Lixia Zhang USC/ISI Xiaoliang Zhao Dan Massey Allison.
Routing and Routing Protocols
Routing.
14 – Inter/Intra-AS Routing
Feb 12, 2008CS573: Network Protocols and Standards1 Border Gateway Protocol (BGP) Network Protocols and Standards Winter
1 Computer Networks Routing Algorithms. 2 IP Packet Delivery Two Processes are required to accomplish IP packet delivery: –Routing discovering and selecting.
DARPA NMS PI Meeting November 14, 2002 Understanding BGP in Action Dan Massey USC/ISI.
Computer Networks Layering and Routing Dina Katabi
14 – Inter/Intra-AS Routing Network Layer Hierarchical Routing scale: with > 200 million destinations: can’t store all dest’s in routing tables!
I-4 routing scalability Taekyoung Kwon Some slides are from Geoff Huston, Michalis Faloutsos, Paul Barford, Jim Kurose, Paul Francis, and Jennifer Rexford.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
Dynamic Routing Protocols  Function(s) of Dynamic Routing Protocols: – Dynamically share information between routers (Discover remote networks). – Automatically.
M. Menelaou CCNA2 DYNAMIC ROUTING. M. Menelaou DYNAMIC ROUTING Dynamic routing protocols can help simplify the life of a network administrator Routing.
CS551: Unicast Routing Christos Papadopoulos (
Routing/Routed Protocols. Remember: A Routed Protocol – defines logical addressing. Most notable example on the test – IP A Routing Protocol – fills the.
Unicast Routing Protocols  A routing protocol is a combination of rules and procedures that lets routers in the internet inform each other of changes.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
Routing protocols Basic Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF)
Routing in the Internet The Global Internet consists of Autonomous Systems (AS) interconnected with eachother: Stub AS: small corporation Multihomed AS:
Border Gateway Protocol (BGP) W.lilakiatsakun. BGP Basics (1) BGP is the protocol which is used to make core routing decisions on the Internet It involves.
More on Internet Routing A large portion of this lecture material comes from BGP tutorial given by Philip Smith from Cisco (ftp://ftp- eng.cisco.com/pfs/seminars/APRICOT2004.
Routing Convergence Dan Massey Colorado State University.
Network Layer4-1 Intra-AS Routing r Also known as Interior Gateway Protocols (IGP) r Most common Intra-AS routing protocols: m RIP: Routing Information.
TCOM 509 – Internet Protocols (TCP/IP) Lecture 06_a Routing Protocols: RIP, OSPF, BGP Instructor: Dr. Li-Chuan Chen Date: 10/06/2003 Based in part upon.
Routing protocols. Static Routing Routes to destinations are set up manually Route may be up or down but static routes will remain in the routing tables.
Internet Protocols. ICMP ICMP – Internet Control Message Protocol Each ICMP message is encapsulated in an IP packet – Treated like any other datagram,
An internet is a combination of networks connected by routers. When a datagram goes from a source to a destination, it will probably pass through many.
4: Network Layer4b-1 OSPF (Open Shortest Path First) r “open”: publicly available r Uses Link State algorithm m LS packet dissemination m Topology map.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.
Inter-domain Routing Outline Border Gateway Protocol.
ROUTING ON THE INTERNET COSC Jun-16. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
Dynamic routing Routing Algorithm (Dijkstra / Bellman-Ford) – idealization All routers are identical Network is flat. Not true in Practice Hierarchical.
Dynamic routing Routing Algorithm (Dijkstra / Bellman-Ford) – idealization All routers are identical Network is flat. Not true in Practice Hierarchical.
COS 561: Advanced Computer Networks
Introduction to Internet Routing
Intra-Domain Routing Jacob Strauss September 14, 2006.
Dynamic routing Routing Algorithm (Dijkstra / Bellman-Ford) – idealization All routers are identical Network is flat. Not true in Practice Hierarchical.
COS 561: Advanced Computer Networks
CS 3700 Networks and Distributed Systems
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
CS 3700 Networks and Distributed Systems
COS 561: Advanced Computer Networks
BGP Interactions Jennifer Rexford
COS 461: Computer Networks
BGP Instability Jennifer Rexford
Computer Networks Protocols
Dynamic routing Routing Algorithm (Dijkstra / Bellman-Ford) – idealization All routers are identical Network is flat. Not true in Practice Hierarchical.
Presentation transcript:

Beyond BGP Dan Massey Colorado State University

24 October Internet Routing l Challenges Facing Internet Routing n Internet Has Grown Dramatically –Large number of routing entries –High volumes of updates –Frequent topological changes n Fault-Model Has Changed Dramatically –More malfunctioning components –Intentional attacks l Do we need a fundamentally new routing architecture?

24 October Toward a New Architecture l One claim: BGP is nearing the end of its useful lifetime n The Internet will soon collapse unless we act!! l Other claim: BGP is the best engineering solution we are likely to produce n We need incremental patches to new problems l Who is right? n Beyond BGP uses –Measurements to assess where we are –Identification of (new?) routing requirements –Development of changes (incremental or new system) to address the above

24 October How Did We Get To BGP l Simple Distance Vector Routing Algorithms n Used in early Internet routing designs n Convey only limited information n Prone to long lasting loops l Expensive Link State Routing Algorithms n Learn the Full Network Topology n Signal every change in every link l Path Vector Routing (BGP) n Middle ground that signals some path data n But does not signal the full topology

24 October RIP and DBF RIP Keep shortest path only Distributed Bellman-Ford(DBF) Keep distance info from all neighbors A B C EF D D:1 D:3 D:2 D:3 B’s route to D: Nexthop=A, Dist=4 B’s route to D: Nexthop=A, dist=4 Alternate Nexthop=C, Dist=4 D: infinity 30sec refreshing interval Damping timer to space out two triggered updates: 1~5 seconds Poison reverse: B sends infinity distance to A RIP and DBF: Exchange distance info.

24 October Internet: composed of thousands of Autonomous Systems(ASes). BGP Background BGP (Border Gateway Protocol): the de facto inter-AS routing protocol AS A R1 R2 R3 AS B AS C R4 R5 AS E R6 BGP Routers

24 October How BGP works l Uses path vector protocol –similar to distance vector protocol. what if no path available? Consider an AS as a node Route via A = Route via C = B’s route to D: route includes entire path(sequence of nodes) D A B C E D:

24 October Path Vector Routing Changes l Worms triggered edge instabilty n Routers crashed due to ARP cache overflow. n Links were congested by worm traffic. l BGP Path Exploration Exacerbates Dynamics B’s route to D Route via A= Route via C= D ABC E Obsolete backup path is used and convergence is delayed withdraw

24 October Policies and Policy Withdrawal But A could stop advertising to B due to a policy change, path is still valid! ABC E policy withdraw D Attach a Failure Withdrawal Community Attribute Only apply the approach to failure withdrawal B’s route to D Route via A= Route via C= Route via A= A B C E

24 October BGP Traffic Engineering BGP Traffic Engineering: R4 chooses path R5 chooses path We assumed an AS could be modeled as a node with a single best path to the destination But a single AS may advertise more than one path. Divide one AS into Logical ASes such that All routers within a logical AS have the same best path  each logical AS can be modeled as a node.

24 October Number of Updates Number of ASes in Network Number of Updates Original BGP Enhanced BGP Substantial reduction is achieved. E.g to 1419 in the 60-AS topology MinRouteAdver timer: within 30 seconds, only one advertisement is allowed. It “packs” consecutive changes into one update.

24 October Convergence time Number of ASes in Network Convergence Time(seconds) Original BGP Enhanced BGP Enhanced BGP reduces the convergence time substantially. E.g seconds to 19.5 seconds in the 60- AS topology Elimination of one advertisement can cut convergence time by 30 seconds

24 October Improving Path Vector Convergence l Infocom 02 [4] uses consistency to detect invalid paths. n Reject path if r1 is adirect neighbor r1’s path is not n Adjusted to account for policy and implement in BGP l Infocom 03 [Afek, et al] quickly flushes invalid paths. n BGP requires updates be separated by a min interval n Send withdraw (to flush route) if blocked by the interval l Our recent work [5] attaches a new attribute: Root Cause Notification (RCN) n Identifies the failed link and includes a sequence number. n Allows any route relying on the failed link to be rejected.

24 October Analyzing Path Vector Convergence l Route fail-over has two stages. l First, nodes inside the blue triangle lose routes and explore backup paths. n All short invalid paths are explored l Second, an edge (a0) eventually selects the valid backup path via Sk. n Valid routes begin to propagate through the blue triangle.

24 October Generic Convergence Results Algorithm Fail-Over Convergence Bounds SPVP (BGP)(N-1) (M + ld) + 3 Pmax(|E|-degree(G,0)) SPVP-AS(N- degree(G,0) ) (M+ld) + 3Pmax(|E| - |E^| + Degree(G^)) SPVP-GF(N-1) ld + 3Pmax(|E| - degree(G,0)) SPVP-RCNDistance(G,0) (ld) + (Pmax) Distance(G,0) Pmax = Node Processing Delay, ld = Link Delay M = Minimum Advertisement Interval

24 October Simulation Results

24 October What About Security? l Convergence Discussion Neglects Security n What if routers send intentionally bad information? l What is the Simplest Possible Attack? n Announce someone elses routes l Example: Suppose Univ. of Colorado announces it is the origin for /16 n In other words, CU announces CSU IP Address Space l Can this Happen and/or What Would Prevent It?

24 October Multiple Origin AS (MOAS) Cases l Prefixes originate from Multiple Origin AS (MOAS) n Lower curve likely due to valid operational needs l Spikes are errors that disrupt routing to prefix n Includes loss of routes to top level DNS servers

24 October Infrastructure Faults and Attacks Internet c.gtld-servers.net BGP monitor originates route to /24 l BGP and DNS Provide No Authentication n Faults and attacks can mis-direct traffic. n One (of many) examples observed from BGP logs. n Server could have replied with false DNS data. ISPs announced new path for 20 minutes to 3 hours

24 October BGP-based Solution Example router bgp 59 neighbor remote-as 52 neighbor send-community neighbor route-map setcommunity out route-map setcommunity match ip address /8 set community 59:MOAS 58:MOAS additive Example configuration: AS58 18/8, PATH, MOAS{4,58,59} AS /8 18/8, PATH, MOAS{58,59} 18/8, PATH, MOAS{52, 58} AS52

24 October (b) Two Origin AS’s(a) One Origin AS BGP false origin detection Simulation Results

24 October A Simple Filter l Current BGP provides dynamic routes n Explore the opposite extreme... l Select a single static route to each server. n Apply AS path filters to block all other announcements. –Also filter against more specifics. l Route changes on a frequency of months, if at all. n Change in IP address, origin AS, or transit policy. n Adjust route only after off-line verification

24 October Why This Works: Theory l Scale is limited to a small number of routes. n No exponential growth in top level DNS servers. l Loss of a server is tolerable, invalid server is not. n Resolvers detect and time-out unreachable servers. –Provided surviving servers handle load, cost is some delay. l Expect predictable properties and stable routes. n Servers don’t change without non-trivial effort. n Servers located in highly available locations.

24 October Why This Works: Data l Analysis based on BGP updates from RIPE. n Archive of BGP updates sent by each peer. n 9 ISPs from US, Europe, and Japan. n February April 2002 l Some data collection notes n Used only peers that exchange full routing tables –Otherwise some route changes are hidden by policies n Adjusted data to discount multi-hop effect. –Multi-hop peering session resets don’t reflect ISP ops.

24 October Impact on Reachability ISP1 (US/Tier 1)

24 October How Static Are The Routes? l 3 changes in route to “A” over 14 months. l 2 (valid) changes in the origin AS n 5/19/01 origin AS changed from 6245 to n 6/4/01 origin AS changed from to l 1 change in transit AS routing policy n 11/8/01 (*,10913, 10913, 10913,*) -> (*,10913, *) n Could have built filter to allow this...

24 October What Routes Are Lost? l Results from 3/1/01 until 5/19/01 AS change. n Reduced reachability to “A” from % to % l 18 events when trusted route was withdrawn n 2 resulted in no route available (28 secs, 103 secs) n 8 instances of a back-up route lasting over 3 minutes n Longest lasting back-up advertised for 15 minutes l Similar results for other time periods and servers.

24 October Example of Filtered Routes l With filter no route at 16:06: * server No route at 16:08:30

24 October Worst Case In Study ISP 3 (Europe) ISP 3 used one main route and a small number of consistent back-up routes.

24 October Toward a More Balanced Approach l Required infrequent updates to the filter. n Especially useful to automate infrequent tasks. –Natural tendency to forget task or forget how to do task l More paths improves robustness n Simple filtered allowed only 1 path. n ISP3’s reachability can be improved if filter allows two routes… l Strike a balance between allowing dynamic changes and restricting to trusted paths.

24 October BGP Adaptive Filters l Slow down the route dynamics and add validation. n Apply hysteresis before accepting new paths n Add options for validating new paths: –Believe route based purely on hysteresis –Probabilistic query/response testing against known data. –Trigger off-line checking (did origin AS really change?)

24 October Impacts on Reachability ISP1 Root servers gTLD servers

24 October Impacts on Reachability ISP3 Root servers gTLD servers

24 October Convergence And Authentication l BGP Suffers From Both Convergence Problems and Authentication Problems n Convergence fixes are good, if no attacks. n Authentication fixes work for redundant sites l Can you improve both convergence and authentication in a realistic environment? n Do you need to replace BGP? –If yes, with what? n Would you pick BGP for your new network? –If no, what would you do instead? l Wide Variety of Other Routing Challenges n Check out CS 580 and BBGP Project if interested

24 October BGP Measurement and Artifacts l BGP peers establish TCP session and send full route table (120K+ routes) n Updates sent only if routes change. l Our results show frequent session resets between ISP routers and the monitoring point. n Monitoring point sessions cross multiple systems in the Internet. n Each reset adds 120K updates. n But very few ISP-ISP session resets. l Our work in [1] presents rules to remove session reset artifacts. Initial Table (120K+ routes) Route Changes Initial Table (120K+ routes)

24 October BGP Updates During Slammer Worm

24 October BGP Updates During Nimda Worm Measurement Artifacts Routing Changes Total Attack

24 October What Our Analysis Shows 40.2% A substantial percentage of the BGP messages during the worm attack were not about route changes 37.6% 8.8% 8.3%

24 October FRTR: Improving Peer Communication l BGP Updates Are Not (Topology) Event Driven n Session resets trigger high volume surges –Govindan shows cascade failures can result. l Lifetime of Invalid Routes is Unbounded n Never recover (until reset) if update is somehow lost. –Despite TCP, we found cases of “lost” withdrawals. n Attacker can poison a route with one update. l Soft-state (periodic re-announce) is too costly… l FRTR Uses Periodic Bloom Filter Digests n Digests quickly confirm state after session reset. n Periodic digests bound lifetime of faults (w/ high prob). n Co-Author Keyur Patel (Cisco) is exploring Cisco development.

24 October FRTR Performance l For each route at receiver, check against the digest. n Bloom filter results in no false negatives. l Compare total digests for missing route detection. n False positive possible with known rate. n Add salts to reduce the chance of repeated false positives. l Overhead is a function of digest size and frequency. l Work with Cisco suggests a 1.3% overhead increase. l Complete Details to appear in [2] (DSN 2004)

24 October Packet Delivery during Routing Convergence l Failures do occur in the Internet –20% of intra-ISP links have a MTTF < 1 day[Diot:IMW02] –40% of Inter-ISP routes have a MTT-Change < 1 day [Labovitz:FTCS-29] l Routing convergence after failure takes time –IS-IS(Intra-ISP protocol): 5+ seconds [Diot:IMW02] –BGP(Inter-ISP protocol): 3+ minutes [Labovitz:Sigcomm00] l Packets can be delivered during convergence ABC E F D G

24 October What Is the Goal of Routing l How to maximize packet delivery during routing convergence? –Topological connectivity’s impact? –Studying: RIP, Distributed Bellman-Ford( DBF ), BGP – Previous work focused on: preventing loops, minimizing convergence time and routing overhead This problem becomes more important with Larger Internet topology [Huston01] --> higher freq. of component failures Richer connectivity[Huston01] --> potentially helps with more alternate paths Higher bandwidth --> more packets sent during convergence

24 October Simulation conducted 7 by 7 mesh topologies similar those in [Baran64] 20 pkts/second l Measure Packet loss, loops, path convergence time, throughput, and e2e delay. Simulated node degree range [3 ~ 16]

24 October Packet Losses (I) : Observation RI P DBF, BGP’ and BGP Packet losses of DBF, BGP’ and BGP decrease to zero at degree 6. Richer connectivity helps RIP little. Node Degree Packet Loss

24 October Packet Loss(II): Lessons Learned l Keeping alternate paths F D A B C E F D A B C E Connectivity Matters no immediate available alternative due to poor connectivity and poison reverse RIP: DBF, BGP: alternative is more likely with richer connectivity

24 October Is an alternate path valid? l Valid Alternate Paths: not using the failed link n Poison reverse and BGP’s path information are not enough! [Pei:Infocom2002] F D A B C E U X V W Richer connectivity --> reduces one single link’s impact better availability of valid(but may be suboptimal) path C2 D:

24 October Transient Loops(I): Observation DB F BGP’ BGP BGP has the most loops! RIP has no loops Richer connectivity reduces the chance of looping. Node Degree Losses due to loops

24 October F D A B C E Transient Loops(II): Msg Propagation Damping timer slows the msg propagation, causing looping U X V W Y D: D:<BAEF>D:<BAEF> Richer connectivity can reduce the chance of looping More details in: “A Study of Transient Loops in BGP” 30 seconds! D:

24 October Instantaneous Throughput RIP DBF BGP’ BGP RIP Time Throughput(pkts/second

24 October Packet Delay During Convergence

24 October Forwarding Path Convergence time BGP: no loss at degree 6 or higher Shall we still tune MRAI timer to minimize convergence time(with the risk of increasing overhead)? Node Degree BGP:70 BGP’:10 Time till there is no routing msg. BGP:13 BGP’:2 Time till the forwarding path from S to D stabilizes.

24 October Packet Delivery After a Failure