A survey of Internet infrastructure reliability Presented by Kundan Singh PhD candidacy exam May 2, 2003.

Slides:



Advertisements
Similar presentations
CS Summer 2003 CS672: MPLS Architecture, Applications and Fault-Tolerance.
Advertisements

1 Experimental Study of Internet Stability and Wide-Area Backbone Failure Craig Labovitz, Abha Ahuja Merit Network, Inc Presented by Changchun Zou.
© J. Liebeherr, All rights reserved 1 Border Gateway Protocol This lecture is largely based on a BGP tutorial by T. Griffin from AT&T Research.
Border Gateway Protocol Autonomous Systems and Interdomain Routing (Exterior Gateway Protocol EGP)
Internet Routing Instability
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
Courtesy: Nick McKeown, Stanford
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
BGP: Inter-Domain Routing Protocol Noah Treuhaft U.C. Berkeley.
Slide -1- February, 2006 Interdomain Routing Gordon Wilfong Distinguished Member of Technical Staff Algorithms Research Department Mathematical and Algorithmic.
Delayed Internet Routing Convergence Craig Labovitz, Abha Ahuja, Abhijit Bose, Farham Jahanian Presented By Harpal Singh Bassali.
More on BGP Check out the links on politics: ICANN and net neutrality To read for next time Path selection big example Scaling of BGP.
COS 420 Day 17. Agenda Finished Grading Individualized Projects Very large disparity in student grading No two students had same ranking for other students.
COS 420 Day 13. Agenda Assignment 3 Posted Covers chapters Due March 23 2 Days till Daytona Beach Bike Week Midterm Exam is Due Today Today we will.
Stable Internet Routing Without Global Coordination Jennifer Rexford AT&T Labs--Research Joint work with Lixin Gao.
Multicast and Anycast Mike Freedman COS 461: Computer Networks
1 ECE453 – Introduction to Computer Networks Lecture 10 – Network Layer (Routing II)
Information-Centric Networks07b-1 Week 7 / Paper 2 NIRA: A New Inter-Domain Routing Architecture –Xiaowei Yang, David Clark, Arthur W. Berger –IEEE/ACM.
Chapter 22 Network Layer: Delivery, Forwarding, and Routing
I-4 routing scalability Taekyoung Kwon Some slides are from Geoff Huston, Michalis Faloutsos, Paul Barford, Jim Kurose, Paul Francis, and Jennifer Rexford.
1 Computer Communication & Networks Lecture 22 Network Layer: Delivery, Forwarding, Routing (contd.)
Information-Centric Networks04a-1 Week 4 / Paper 1 Open issues in Interdomain Routing: a survey –Marcelo Yannuzzi, Xavier Masip-Bruin, Olivier Bonaventure.
Introduction to BGP.
1 Interdomain Routing (BGP) By Behzad Akbari Fall 2008 These slides are based on the slides of Ion Stoica (UCB) and Shivkumar (RPI)
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
CS 3700 Networks and Distributed Systems Inter Domain Routing (It’s all about the Money) Revised 8/20/15.
Routing protocols Basic Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF)
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
Chapter 9. Implementing Scalability Features in Your Internetwork.
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira (UCSD),
Page 110/27/2015 A router ‘knows’ only of networks attached to it directly – unless you configure a static route or use routing protocols Routing protocols.
A Firewall for Routers: Protecting Against Routing Misbehavior1 June 26, A Firewall for Routers: Protecting Against Routing Misbehavior Jia Wang.
TCOM 509 – Internet Protocols (TCP/IP) Lecture 06_a Routing Protocols: RIP, OSPF, BGP Instructor: Dr. Li-Chuan Chen Date: 10/06/2003 Based in part upon.
CCNA 2 Week 6 Routing Protocols. Copyright © 2005 University of Bolton Topics Static Routing Dynamic Routing Routing Protocols Overview.
IP Routing Principles. Network-Layer Protocol Operations Each router provides network layer (routing) services X Y A B C Application Presentation Session.
Dynamic Routing Protocols II OSPF
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
Information-Centric Networks Section # 4.1: Routing Issues Instructor: George Xylomenos Department: Informatics.
Routing in the Inernet Outcomes: –What are routing protocols used for Intra-ASs Routing in the Internet? –The Working Principle of RIP and OSPF –What is.
CS 6401 Overlay Networks Outline Overlay networks overview Routing overlays Resilient Overlay Networks Content Distribution Networks.
CS 640: Introduction to Computer Networks Aditya Akella Lecture 11 - Inter-Domain Routing - BGP (Border Gateway Protocol)
1 Agenda for Today’s Lecture The rationale for BGP’s design –What is interdomain routing and why do we need it? –Why does BGP look the way it does? How.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—6-1 Scaling Service Provider Networks Scaling IGP and BGP in Service Provider Networks.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
1 Border Gateway Protocol (BGP) and BGP Security Jeff Gribschaw Sai Thwin ECE 4112 Final Project April 28, 2005.
1 Chapter 4: Internetworking (IP Routing) Dr. Rocky K. C. Chang 16 March 2004.
Chapter 25 Internet Routing. Static Routing manually configured routes that do not change Used by hosts whose routing table contains one static route.
Text BGP Basics. Document Name CONFIDENTIAL Border Gateway Protocol (BGP) Introduction to BGP BGP Neighbor Establishment Process BGP Message Types BGP.
Inter-domain Routing Outline Border Gateway Protocol.
A survey of Internet routing reliability Presented by Kundan Singh IRT internal talk April 9, 2003.
1 Internet Routing 11/11/2009. Admin. r Assignment 3 2.
1 CS716 Advanced Computer Networks By Dr. Amir Qayyum.
CS 3700 Networks and Distributed Systems
Dynamic Routing Protocols II OSPF
Border Gateway Protocol
Border Gateway Protocol
COS 561: Advanced Computer Networks
Introduction to Internet Routing
Intra-Domain Routing Jacob Strauss September 14, 2006.
Cours BGP-MPLS-IPV6-QOS
Dynamic Routing Protocols II OSPF
COS 561: Advanced Computer Networks
Dynamic Routing and OSPF
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
BGP Interactions Jennifer Rexford
COS 461: Computer Networks
Computer Networks Protocols
Presentation transcript:

A survey of Internet infrastructure reliability Presented by Kundan Singh PhD candidacy exam May 2, 2003

2 Agenda Introduction Routing problems Route oscillations, slow convergence, scaling, configuration Reliability via DNS, transport, application Effect on VoIP

3 Overview of Internet routing AT&T (inter-national provider) Regional provider MCI Regional provider Campus OSPF (optimize path) BGP (policy based) Autonomous systems Cable modem provider

4 Border gateway protocol TCP OPEN, UPDATE, KEEPALIVE, NOTIFICATION Hierarchical peering relationship Export all routes to customers only customer and local routes to peers and providers Path-vector Optimal AS path satisfying policy ProviderCustomer Peer Backup d: e: d d: 1247 d: 247 d: 47 d: [1] A border gateway protocol (BGP-4), RFC 1771

5 Route selection Local AS preference AS path length Multi-exit discriminator (MED) Prefer external-BGP over internal-BGP Use internal routing metrics (e.g., OSPF) Use identifier as last tie breaker AS1 AS3 AS2 AS4 B1 B2 B3 B4 R1 R2 C1 C2

6 Route oscillation Each AS policy independent Persistent vs transient Not if distance based Solution: Static graph analysis Policy guidelines Dynamic “flap” damping 0 12 [2] Persistent route oscillations in inter-domain routing

7 Static analysis Abstract models: Solvable? Resilience on link failure? Multiple solutions? Sometimes solvable? Does not work NP complete Relies on Internet routing registries [7] An analysis of BGP convergence property

8 Policy guidelines MUST Prefer customer over peer/provider Have lowest preference for backup path “avoidance level” increases as path traverses Works even on failure and consistent with current practice Limits the policy usage [3] Stable internet routing without global co-ordination [4] Inherently safe backup routing with BGP

9 Convergence in intra-domain IS-IS – millisecond convergence Detect change (hardware, keep-alive) Improved incremental SPF Link “down” immediate, “up” delayed Propagate update before calculate SPF Keep-alive before data packets Detect duplicate updates OSPF stability Sub-second keep-alive Randomization Multiple failures Loss resilience Distance vector Count to infinity [5] Towards milli-second IGP convergence [6] Stability issues in OSPF routing

10 BGP convergence 0 12 R ( R, 1R, 2R) (0R, 1R, R)(0R, R, 2R) [7] An analysis of BGP convergence properties [8] Experimental study of delayed internet routing convergence

11 BGP convergence 0 12 R ( -, 1R, 2R) (0R, 1R, - )(0R, -, 2R) 0->1: 01R 0->2: 01R 1->0: 10R 1->2: 10R 2->0: 20R 2->1: 20R

12 BGP convergence 0 12 R ( -, 1R, 2R) (01R,1R, - )( -, -, 2R) 1->0: 10R 1->2: 10R 1->0: 12R 1->2: 12R 2->0: 20R 2->1: 20R 2->0: 21R 2->1: 21R 01R

13 BGP convergence 0 12 R ( -, -, 2R) (01R,10R, - )( -, -, 2R) 1->0: 12R 1->2: 12R 2->0: 20R 2->1: 20R 2->0: 21R 2->1: 21R 2->0: 201R 2->1: 201R 10R 0->1: W 0->2: W

14 BGP convergence MinRouteAdver To announcements In 13 steps Sender side loop detection One step 0 12 R ( -, -, - ) After 48 steps

15 BGP convergence [2] Latency due to path exploration Fail-over latency = 30 n Where n = longest backup path length Within 3min, some oscillations up to 15 min Loss and delay during convergence “up” converges faster than “down” Verified using experiment [8] An experimental study of delayed internet routing convergence [9] The impact of internet policy and topology on delayed routing convergence

16 BGP convergence [3] Path exploration => latency More dense peering => more latency Large providers, better convergence Most error path due to misconfiguration or software bugs [9] The impact of internet policy and topology on delayed routing convergence

17 BGP convergence [4] Route flap damping To avoid excessive flaps, penalize updated routes Penalty decays exponentially. “suppression” and “reuse” threshold Worsens convergence Selective damping Do not penalize if path length keeps increasing Attach a preference with route [10] Route flap damping exacerbates Internet routing convergence,

18 BGP convergence [5] 12R and 235R are inconsistent. Prefer directly learnt 235R Order of magnitude improvement Distinguish failure with policy change 1 20R 35 12R 235R 2R [11] Improving BGP convergence through consistency assertions

19 BGP scaling Full mesh logical connection within an AS Add hierarchy

20 BGP scaling [2] Route reflector More popular Upgrade only RR Confederations Sub-divide AS Less updates, sessions [12] A comparison of scaling techniques for BGP

21 BGP scaling [3] May have loop If signaling path is not forwarding path Persistent oscillations possible Modify to pass multiple route information within an AS RR C2 RR C1 Q P Signaling path Choose QChoose P Logical BGP session Physical link [13] On the correctness of IBGP configuration [14] Route oscillations in I-BGP with route reflections

22 BGP stability Initial experiment (’96) 99% redundant updates <= implementation or configuration bug After bug fixes (97-98) Well distributed across AS and prefix [15] Internet routing instabilities [16] Experimental study of Internet stability and wide-area backbone failures

23 BGP stability [2] Inter-domain experiment (’98) 9 months, 9GB, routes, 3 ISP, 15 min filtering 25-35% routes are 99.99% available 10% of routes less that 95% available [16] Experimental study of Internet stability and wide-area backbone failures

24 BGP stability [3] Failure More than 50% have MTTF > 15 days, 75% failed in 30 days Most fail-over/re-route within 2-days (increased since ’94) Repair 40% route failure repaired in < 10min, 60% in 30min Small fraction of routes affect majority of instability Weekly/daily frequency => congestion possible [16] Experimental study of Internet stability and wide-area backbone failures [24] End-to-end routing behavior in the Internet

25 BGP stability [4] Backbone routers Interface MTTF 40 days 80% failures resolved in 2 hr Maintenance, power and PSTN are major cause for outages (approx 16% each) Overall uptime of 99% Popular destinations Quite robust Average duration is less than 20s => due to convergence [16] Experimental study of Internet stability and wide-area backbone failures [17] BGP routing stability of popular destinations

26 BGP under stress Congestion Prioritize routing control messages over data Routing table size AS count, prefix length, multi-home, NAT Effects: Number of updates; convergence Configuration, no universal filter Real routers “malloc” failure Cascading effect Prefix limiting option Graceful restart CodeRed/Nimda Quite robust Some features get activated during stress Cascading failures [18] Routing Stability in Congested Networks: Experimentation and Analysis [19] Analyzing the Internet BGP routing table [20] An empirical study of router response to large BGP routing table load [21] Observation and analysis of BGP behavior under stress [22] Network Resilience: Exploring Cascading Failures within BGP

27 BGP misconfiguration Failure to summarize, hijack, advertise internal prefix, or policy prefix each day ¾ of new advertisement as a result 4% prefix affect connectivity Cause Initialization bug (22%), reliance on upstream filtering (14%), from IGP (32%) Bad ACL (34%), prefix based (8%) Conclusion user interface, authentication, consistency verification, transaction semantics for command [23] Understanding BGP misconfiguration

28 Reactive routing Resilient overlay network Detect failure (outages, loss) and reroute Application control of metric, expressive policy Scalability suffers Failure often, everywhere 90% of failure last 15min, 70% less than 5min, median is just over 3min Many near edge, inside AS Helps in case of multi-homing Failures in core more related with BGP [26] Resilient overlay networks [27] Measuring the effect of Internet path faults on reactive routing

29 Reliable multicast Reliable, sequenced, loosely synchronized Existing TCP ACK aggregation Local recovery possible Performance Linux-2.0.x BSD packet filter, IP firewall and raw socket [30] IRMA: A reliable multicast architecture for the Internet

30 Transport layer fail-over Server fail-over Front-end bottleneck or forge IP address Migrate TCP Works for static data (http pages) Needs application stream mapping Implemented in Apache 1.3 Huge overhead for short service [29] Fine grained failover using connection migration

31 DNS performance Low TTL Latency grows by 2 orders Client and local name server may be distant Embedded object 23% no answer, 13% failure answer 27% sent to root server failed TTL as low as 10min Share DNS cache by < clients [33] On the effectiveness of DNS-based server selection [32] DNS performance and effectiveness of caching

32 DNS replication Replicate entire DNS in distributed servers [31] A replicated architecture for the domain name system Network RNS AS

33 Reliable server pooling [34] Architecture for reliable server pooling [35] Requirements for reliable server pooling [36] Comparison of protocols for reliable server pooling

34 PSTN failures Switch vendors aim for % availability Network availability varies (domestic US calls > 99.9%) Study in ‘97 Overload caused 44% customer-minutes Mostly short outages Human error caused 50% outages Software only 14% No convergence problem [37] Sources of failures in PSTN

35 VoIP Backbone links underutilized Tier-1 backbone (Sprint) have good delay, loss characteristics. Average scattered loss.19% (mostly single packet loss, use FEC) 99.9% probes have <33ms delay Most burst loss due to routing problem Mean opinion score: 4.34 out of 5 Customer sites have more problems Internet backbone Can be provided by some ISP But many lead to poor performance Adaptive delay is needed for bad paths Mostly due to reliability and router operation, not traffic load Choice of audio codec [28] Understanding traffic dynamics at a backbone POP [39] Impact of link failures on VoIP performance [38] Assessing the quality of voice communications over Internet backbones

36 VoIP [2] Prevalent but not persistent path Very asymmetric loss; bursty Outages = more than 300ms loss More than 23% losses are outages Outages are similar for different networks Call abortion due to poor quality Net availability = 98% [25] Measurement and interpretation of Internet packet loss [41] Assessment of VoIP service availability in the current Internet

37 Future work End system and higher layer protocol reliability and availability Mechanism to reduce effect of outages in VoIP Redundancy of VoIP systems during outages Convergence and scaling of TRIP, which is similar to BGP Scaling (DNS) + Reliable (server pool)