1 Aman Shaikh Ph.D. Defense Management of Routing Protocols in IP Networks Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003.

Slides:



Advertisements
Similar presentations
1 A. Sshaikh, A. Greenberg; Nov 01 UCSC Sigcomm IMW Experience in Black-box OSPF Measurement Aman Shaikh, UCSC Albert Greenberg, AT&T Labs-Research.
Advertisements

Dynamic Routing Overview 1.
1 Aman Shaikh UCSC SHS IMW A Case-study of OSPF Behavior in a Large Enterprise Network Aman Shaikh, UCSC Chris Isett, Siemens Health Services Albert.
1 Aman Shaikh: June 02 UCSC INFOCOM 2002 Avoiding Instability during Graceful Shutdown of OSPF Aman Shaikh, UCSC Joint work with Rohit Dube, Xebeo Communications.
Dynamic Routing Scalable Infrastructure Workshop, AfNOG2008.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.1 Routing Working at a Small-to-Medium Business or ISP – Chapter 6.
Traffic Engineering With Traditional IP Routing Protocols
CCNA 2 v3.1 Module 6.
Dynamics of Hot-Potato Routing in IP Networks Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Routing and Routing Protocols
Routing.
Announcements List Lab is still under construction Next session we will have paper discussion, assign papers,
Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and.
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman.
OSPF Monitor Architecture, Design and Deployment Experience
Lecture Week 3 Introduction to Dynamic Routing Protocol Routing Protocols and Concepts.
1 Semester 2 Module 6 Routing and Routing Protocols YuDa college of business James Chen
1 Relates to Lab 4. This module covers link state routing and the Open Shortest Path First (OSPF) routing protocol. Dynamic Routing Protocols II OSPF.
ROUTING ON THE INTERNET COSC Aug-15. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Switching and Routing Technique
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
Authors Renata Teixeira, Aman Shaikh and Jennifer Rexford(AT&T), Tim Griffin(Intel) Presenter : Farrukh Shahzad.
Dynamic Routing Protocols  Function(s) of Dynamic Routing Protocols: – Dynamically share information between routers (Discover remote networks). – Automatically.
Routing/Routed Protocols. Remember: A Routed Protocol – defines logical addressing. Most notable example on the test – IP A Routing Protocol – fills the.
Routing and Routing Protocols Routing Protocols Overview.
1 Introducing Routing 1. Dynamic routing - information is learned from other routers, and routing protocols adjust routes automatically. 2. Static routing.
M.Menelaou CCNA2 ROUTING. M.Menelaou ROUTING Routing is the process that a router uses to forward packets toward the destination network. A router makes.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 6 Routing and Routing Protocols.
Code : STM#520-1 Samsung Electronics Co., Ltd. OfficeServ7400 Router Operation Distribution EnglishED01.
Routing -2 Dynamic Routing
1 Chapter 27 Internetwork Routing (Static and automatic routing; route propagation; BGP, RIP, OSPF; multicast routing)
“Intra-Network Routing Scheme using Mobile Agents” by Ajay L. Thakur.
Routing protocols Basic Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF)
Guide to TCP/IP, Third Edition Chapter 10: Routing in the IP Environment.
1. 2 Anatomy of an IP Packet IP packets consist of the data from upper layers plus an IP header. The IP header consists of the following:
CCNA 1 Module 10 Routing Fundamentals and Subnets.
Interior Gateway Protocol. Introduction An IGP (Interior Gateway Protocol) is a protocol for exchanging routing information between gateways (hosts with.
Routing/Routed Protocols Part I. Routed Protocol Definition: Routed Protocol – used to transmit user data (packets) through an internetwork. Routed protocols.
© 2002, Cisco Systems, Inc. All rights reserved. 1 Determining IP Routes.
Page 110/27/2015 A router ‘knows’ only of networks attached to it directly – unless you configure a static route or use routing protocols Routing protocols.
IGP Data Plane Convergence draft-ietf-bmwg-dataplane-conv-meth-14.txt draft-ietf-bmwg-dataplane-conv-term-14.txt draft-ietf-bmwg-dataplane-conv-app-14.txt.
CCNA Guide to Cisco Networking Fundamentals Fourth Edition
Institute of Technology Sligo - Dept of Computing Sem 2 Chapter 12 Routing Protocols.
7400 Samsung Confidential & Proprietary Information Copyright 2006, All Rights Reserved. -0/35- OfficeServ 7x00 Enterprise IP Solutions Quick Install Guide.
CCNA 2 Week 6 Routing Protocols. Copyright © 2005 University of Bolton Topics Static Routing Dynamic Routing Routing Protocols Overview.
Routing and Routing Protocols
1 OSPF in Multiple Areas. 2 2 Scalability Problems in Large OSPF Areas Scalability problems in large OSPF areas include Large routing tables Large routing.
IP Routing Principles. Network-Layer Protocol Operations Each router provides network layer (routing) services X Y A B C Application Presentation Session.
1 Version 3.1 Module 6 Routed & Routing Protocols.
Routing protocols. 1.Introduction A routing protocol is the communication used between routers. A routing protocol allows routers to share information.
© 2002, Cisco Systems, Inc. All rights reserved..
Routing Protocols Brandon Wagner.
1 © 2004, Cisco Systems, Inc. All rights reserved. CCNA 1 Module 10 Routing Fundamentals and Subnets.
Spring 2000CS 4611 Routing Outline Algorithms Scalability.
Cisco 2 - Routers Perrine modified by Brierley Page 13/21/2016 Chapter 4 Module 6 Routing & Routing Protocols.
Prof. Alfred J Bird, Ph.D., NBCT Office – Science 3rd floor – S Office Hours – Monday and Thursday.
ROUTING ON THE INTERNET COSC Jun-16. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.1 Routing Working at a Small-to-Medium Business or ISP – Chapter 6.
Sem 2 v2 Chapter 12: Routing. Routers can be configured to use one or more IP routing protocols. Two of these IP routing protocols are RIP and IGRP. After.
Prof. Alfred J Bird, Ph.D., NBCT Office – McCormick 3rd floor 607 Office Hours – Monday 3:00 to 4:00 and.
Routing and Routing Protocols CCNA 2 v3 – Module 6.
CS 268: Lecture 9 Intra-domain Routing Protocols Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University.
Management of Routing Protocols in IP Networks
Routing/Routed Protocols
Intra-Domain Routing Jacob Strauss September 14, 2006.
Routing.
CS 268: Lecture 8 Intra-domain Routing Protocols
Dynamic Routing and OSPF
Presentation transcript:

1 Aman Shaikh Ph.D. Defense Management of Routing Protocols in IP Networks Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003

2 Aman Shaikh Ph.D. Defense Introduction Internet connects millions of computers –Internet is packet-switched: Each packet travels independently of the rest Routers provide connectivity –Routers forward packets so that they reach their ultimate destination Forwarding is destination-based and hop-by-hop –Router decides next-hop (i.e., neighbor router) for each packet based on its destination address Routing protocols allow routers to determine next-hop(s) for every destination

3 Aman Shaikh Ph.D. Defense Management of Routing Infrastructure Management of routing infrastructure is a nightmare –“Simple core (= routing infrastructure), smart edge (= end hosts)” design paradigm Internet only provides a best-effort, connectionless, unreliable service Routing is not designed with manageability in mind –Large distributed system Hundreds of routers and thousands of links in big service provider networks Variety of routing protocols –The infrastructure is evolving New services require new protocols and devices

4 Aman Shaikh Ph.D. Defense Dissertation Contribution Focuses on management of Open Shortest Path First (OSPF) protocol –OSPF is widely used to control routing within service provider and enterprise networks Three areas of focus –Monitoring –Characterization –Maintenance

5 Aman Shaikh Ph.D. Defense Monitoring Motivation: –Effective management requires sound monitoring systems Contribution: –Design and implementation of an OSPF monitor –Deployment in two commercial networks Has proved valuable for trouble-shooting and identifying impending problems in early stage Collection and archiving of OSPF data that is used for performance improvement, post-mortem analysis and further research

6 Aman Shaikh Ph.D. Defense Characterization Motivation: –Need sound simulation and analytical models for scalability studies, addition of new features etc... How do we parameterize these models? –Need vendor-independent benchmarking methods Contribution: –Black-box techniques for estimating OSPF processing delays within a router Has become basis for OSPF benchmarking standardization efforts –Case study of OSPF dynamics in an enterprise network

7 Aman Shaikh Ph.D. Defense Maintenance Motivation: –Maintenance of routers occurs fairly frequently Protocol enhancements, bug fixes, hardware/software upgrades –During maintenance, operators have to withdraw router undergoing maintenance Leads to route flapping and instability –How to perform seamless maintenance? Contribution: –I’ll Be Back (IBB) capability for OSPF Allows “router-under-maintenance” to be used for forwarding

8 Aman Shaikh Ph.D. Defense Outline Background –Routing and OSPF overview –Design of an IP router Monitoring –OSPF Monitor Characterization –Black-box measurements for OSPF –Case study of OSPF dynamics Maintenance –I’ll Be Back (IBB) Capability for OSPF Conclusions and future work

9 Aman Shaikh Ph.D. Defense Routing in the Internet Internet is a collection of Autonomous Systems (ASes) Two classes of routing protocols –IGP (Interior Gateway Protocols) Used within an AS Example: OSPF, IS-IS, RIP, EIGRP –EGP (Exterior Gateway Protocols) Used across ASes Example: BGP AS1AS2 AS3AS4AS5 OSPF RIP OSPF IS-IS BGP

10 Aman Shaikh Ph.D. Defense Overview of OSPF OSPF is a link-state protocol –Every router learns entire network topology Topology is represented as graph –Routers are vertices, links are edges –Every link is assigned weight through configuration –Every router uses Dijkstra’s single source shortest path algorithm to build its forwarding table Router builds Shortest Path Tree (SPT) with itself as root Shortest Path Calculation (SPF) –Packets are forwarded along shortest paths defined by link weights

11 Aman Shaikh Ph.D. Defense Areas in OSPF OSPF allows domain to be divided into areas for scalability –Areas are numbered 0, 1, 2 … –Hub-and-spoke with area 0 as hub –Every link is assigned to exactly one area –Routers with links in multiple areas are called border routers Area 1 Area 2 Area 0 Border routers

12 Aman Shaikh Ph.D. Defense Summarization with Areas Each router learns –Entire topology of its attached areas –Information about subnets in remote areas and their distance from the border routers Distance = sum of link costs from border router to subnet B1B2 R2 Area R3 R1 R1’s View Area / / Area 1 Area B1B2 C1 C / / R3R2 R1 OSPF domain

13 Aman Shaikh Ph.D. Defense Link State Advertisements (LSAs) Every router describes its local connectivity in Link State Advertisements (LSAs) Router originates an LSA due to… –Change in network topology Example: link goes down or comes up –Periodic soft-state refresh Recommended value of interval is 30 minutes LSA is flooded to other routers in the domain –Flooding is reliable and hop-by-hop –Includes change and refresh LSAs –Flooding leads to duplicate copies of LSAs being received Every router stores LSAs (self-originated + received) in link-state database (= topology graph)

14 Aman Shaikh Ph.D. Defense Adjacency Neighbor routers (i.e., routers connected by a physical link) form an adjacency The purpose is to make sure –Link is operational and routers can communicate with each other –Neighbor routers have consistent view of network topology To avoid loops and black holes Link gets used for data forwarding only after adjacency is established Use of periodic Hellos to monitor the status of link and adjacency

15 Aman Shaikh Ph.D. Defense Interface card Forwarding Interface card Forwarding Design of an IP Router Route Processor (CPU) OSPF Process Routing calculation BGP Process Routing calculation RIP Process Routing calculation Route Manager Switching Fabric Data Plane Control Plane Forwarding Info. Base (FIB) Data packet

16 Aman Shaikh Ph.D. Defense Outline Background Monitoring –Motivation: Effective management requires sound monitoring systems –Contribution: OSPF monitor Design –Three component and their functionality Deployment in two commercial networks –How OSPF Monitor is being used –Lessons learnt through deployment Characterization Maintenance Conclusions and future work

17 Aman Shaikh Ph.D. Defense OSPF Monitor: Objectives Real-time analysis of OSPF behavior –Trouble-shooting, alerting –Real-time snapshots of OSPF network topology Off-line analysis –Post-mortem analysis of recurring problems –Identify anomaly signatures and use them to predict impending problems –Allow operators to tune configurable parameters –Improve maintenance procedures –Analyze OSPF behavior in commercial networks

18 Aman Shaikh Ph.D. Defense Related Work Route monitoring –Commercial IP monitors Route Dynamics (IPSUM), Route Explorer (PacketDesign) –IPMON project at Sprint IS-IS and BGP listeners –RouteViews and RIPE Collects BGP updates from several networks Topology tracking –OSPF topology server [shaikh:jsac02] Evaluation and comparison of LSA-based versus SNMP-based approaches –Rocketfuel project at UW Seattle Inference of intra-domain topologies from end-to-end measurements

19 Aman Shaikh Ph.D. Defense Components Data collection: LSA Reflector (LSAR) –Passively collects OSPF LSAs from network –“Reflects” streams of LSAs to LSAG –Archives LSAs for analysis by OSPFScan Real-time analysis: LSA aGgregator (LSAG) –Monitors network for topology changes, LSA storms, node flaps and anomalies Off-line analysis: OSPFScan –Tools for analysis of LSA archives Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics

20 Aman Shaikh Ph.D. Defense Example Area 0 Area 1 Area 2 Real-time Monitoring LSAG “Reflect” LSA LSA archive LSAR 1 “Reflect” LSA LSAR 2 OSPFScan Off-line Analysis replicate LSA archive OSPF Network LSAs

21 Aman Shaikh Ph.D. Defense How LSAR attaches to Network Host mode –Join multicast group –Adv: completely passive –Disadv: not reliable, delayed initialization of LSDB Full adjacency mode –Form full adjacency with a router –Adv: reliable, immediate initialization of LSDB –Disadv: LSAR’s instability can impact entire network Partial adjacency mode –Keep adjacency in a state that allows LSAR to receive LSAs, but does not allow data forwarding over link –Adv: reliable, LSAR’s instability does not impact entire network, immediate initialization of LSDB –Disadv: can raise alarms on the router

22 Aman Shaikh Ph.D. Defense LSA aGregator (LSAG) Analyzes “reflected” LSAs from LSARs over TCP connections in real-time Generates console messages: –Changes in OSPF network topology ADJACENY COST CHANGE: rtr (intf )  rtr old_cost 1000 new_cost area –Node flaps RTR FLAP: rtr no_flaps 7 flap_window 570 sec –LSA storms LSA STORM: lstype 3 lsid advrt area no_lsas 7 storm_window 470 sec –Anomalous behavior TYPE-3 ROUTE FROM NON-BORDER RTR: ntw /24 rtr area

23 Aman Shaikh Ph.D. Defense OSPFScan Tools for off-line analysis of LSA archives –Parse, select (based on queries), and analyze Derivation and analysis of auxiliary information from LSA archives –LSAs indicating network topology changes –Routing table entries How OSPF routing tables evolved in response to network changes How end-to-end path within OSPF domain looked like at any instance –Topology changes as graph-based abstraction Vertex addition/deletion and link addition/deletion/change_weight Playback of topology change events –Essentially an LSAG playback

24 Aman Shaikh Ph.D. Defense Deployment Deployed in two commercial networks –Enterprise network 15 areas, 500+ routers; Ethernet-based LANs Deployed since February, 2002 LSA archive size: 10 MB/day LSAR connection: host mode –ISP network Area 0, 100+ routers; Point-to-point links Deployed since January, 2003 LSA archive size: 8 MB/day LSAR connection: partial adjacency mode

25 Aman Shaikh Ph.D. Defense LSAG in Day-to-day Operations Generation of alarms by feeding messages into higher layer network management systems –Correlation and grouping of messages into a single alarm –Prioritization of messages Validation of maintenance steps and monitoring the impact of these steps on network-wide OSPF behavior –Example: Operators change link weights to carry out maintenance activities A “link-audit” web-page allows operators to keep track of link weights in real-time

26 Aman Shaikh Ph.D. Defense Problems Caught by LSAG Equipment problem –Detected internal problems in a crucial router in enterprise network Problem manifested as episodes of OSPF adjacency flapping Configuration problem –Identified assignment of same router-ids to two routers in enterprise network OSPF implementation bug –Caught a bug in refresh algorithm of routers from a particular vendor in ISP network Bug resulted in a much faster refresh of LSAs than standards-mandated rate

27 Aman Shaikh Ph.D. Defense Long Term Analysis by OSPFScan LSA traffic analysis –Identified excessive duplicate LSA traffic in some areas of the enterprise network Led to root-cause analysis and preventative steps Generation of statistics –Inter-arrival time of change LSAs in the ISP network Fine-tuning configurable timers related to SPF calculation –Mean down-time and up-time for links and routers in the ISP network Assessment of reliability and availability as ISP network gears for deployment of new services

28 Aman Shaikh Ph.D. Defense Lessons Learnt through Deployment New tools reveal new failure modes Real networks exhibit significant activity –Maintenance and genuine problems Archive all LSAs –LSA volume is manageable Stability and reliability of monitor is extremely important Keep data collection separate from its analysis –Keep data collector as simple as possible Add functionality incrementally and through interaction with users

29 Aman Shaikh Ph.D. Defense Summary Three component architecture –LSAR: LSA capture from the network –LSAG: real-time analysis of LSA stream Detection and trouble-shooting of problems –OSPFScan: off-line analysis tools for LSA archives Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics Deployed in two commercial networks –Has proven a valuable network management tool –“OSPF Monitor was a lifesaver” VP of Networking, Enterprise network –When monitor caught an impending failure in an early stage

30 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization –Motivation: Simulation and analytical models, benchmarking –Contributions: Black-box techniques for estimating OSPF processing delays on a router –Tasks we measure, methodology, results for Cisco and GateD Case study of OSPF dynamics in an enterprise network Maintenance Conclusions and future work

31 Aman Shaikh Ph.D. Defense Black-box Measurements for OSPF OSPF processing delays within a router matter! –Add up to impact convergence and stability –Guidance in tuning configurable parameters, head to head vendor comparisons, simulation models Instrumenting routing code for measuring delays is challenging –Commercial implementations are proprietary –May involve grappling with Numerous code versions, hardware platforms, and developers Use black-box measurements –Measure the timing delays using external observations –Applied to Cisco and GateD OSPF implementations

32 Aman Shaikh Ph.D. Defense Related Work White-box measurements for IS-IS [alaettinoglu] –SPF delays reported are comparable to results obtained by us Empirical analysis of router behavior under large BGP routing tables [chang:imw02] –Cisco and Juniper routers Benchmarking Methodology working group (bmwg) at IETF –Drafts related to OSPF benchmarking Our black-box methods are basis for some benchmark tests

33 Aman Shaikh Ph.D. Defense What tasks did we measure? Route Processor (CPU) FIB Interface card Forwarding Switching Fabric Data packet Topology View SPF Calculation OSPF Process LSA LS Ack LSA Forwarding LSA Processing LSA Flooding SPF Calculation FIB Update

34 Aman Shaikh Ph.D. Defense Methodology TopTracker Target router Emulated topology Load emulated topology on target router Initiate task of interest Measure the time for task Testbed LSA

35 Aman Shaikh Ph.D. Defense Measuring Task Time top bracket event bottom bracket event task start time task finish time time 1.Use a black-box method to bracket task start and finish times 2.Subtract out intervals that precede and exceed these times X B C X = A - (B + C) A

36 Aman Shaikh Ph.D. Defense Measuring SPF Calculation Ack for duplicate LSA arrives Initiator LSA arrives SPF calculation ends SPF calculation starts time Target Router TopTracker Send initiator LSA Send duplicate LSA Load desired topology Send ack for duplicate LSA X = A – (B + C + D + E) Estimate the overhead = B + C + D + E A X C D B E

37 Aman Shaikh Ph.D. Defense Estimating the Overhead Remove SPF calculation from bracket –spf_delay = 60 seconds Ack for duplicate LSA arrives Initiator LSA arrives Initiator LSA processing done Duplicate LSA arrives time Target Router TopTracker Send initiator LSA Send duplicate LSA Duplicate LSA processing done; send ack SPF calculation starts overhead = B + C + D + E B E C D overhead

38 Aman Shaikh Ph.D. Defense Results Results for Cisco GSR, 7513 and GateD –For GateD, comparison of black-box results with those obtained using instrumentation (white-box) –Route processors Cisco: 200 MHz R5000 processor GateD: 500 MHz AMD-K6 processor Topology: full n  n mesh with random OSPF edge weights –n in range 10, 20, …, 100

39 Aman Shaikh Ph.D. Defense Results for Cisco Routers Observations –Similar results for two models –SPF calculation time is O(n 2 )

40 Aman Shaikh Ph.D. Defense Results for GateD Observations: –Black-box over-estimates white-box measurement –Black-box captures the characteristics very well

41 Aman Shaikh Ph.D. Defense Black-box methods for estimating OSPF processing delays –Work across wide range of time delays –Work for pure CPU bound tasks –Effective in capturing scaling –Match with white-box measurements Applied methods to Cisco GSR and 7513 –LSA Processing: microseconds –LSA flooding: milliseconds Pacing timer is the determining factor –SPF calculation: 1-40 milliseconds O(n 2 ) behavior for full n x n mesh –FIB update time: milliseconds No dependence on topology size Summary

42 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization –Motivation: Simulation and analytical models, benchmarking –Contributions: Black-box techniques for estimating OSPF processing delays on a router Case study of OSPF dynamics in an enterprise network –Enterprise network topology, categorization of LSA traffic, results Maintenance Conclusions and future work

43 Aman Shaikh Ph.D. Defense Case Study of OSPF Dynamics OSPF behavior in commercial networks is not well understood Understanding dynamics of LSA traffic is key to better understanding of OSPF –Bulk of OSPF processing is due to LSAs –Big impact on OSPF convergence, (in)stability Analysis of LSA archives collected by OSPF monitor in enterprise network –Focus on April, 2002 data

44 Aman Shaikh Ph.D. Defense Related Work Several studies focusing on BGP dynamics in the Internet –Relatively easy to collect BGP data –BGP is more complicated OSPF dynamics in a regional service provider network (MichNet) [watson:icdcs03] –One year worth of data –Several findings are similar to our observations Analysis of OSPF stability through simulations [basu:sigcomm01]

45 Aman Shaikh Ph.D. Defense Enterprise Network Provides customers with connectivity to applications and databases residing in data center OSPF network –15 areas, 500 routers This case study covers 8 areas, 250 routers One month: April, 2002 –Ethernet-based LANs Customers are connected via leased lines –Customer routes are injected via EIGRP into OSPF The routes are propagated via external LSAs

46 Aman Shaikh Ph.D. Defense Enterprise Network Topology Area 0Area BArea C Area A Servers Database Applications Customer OSPF Domain Customer B1B2 Monitor LAN1LAN 2 Border rtrs Area A Area 0 External (EIGRP) Monitor uses host mode to receive LSAs EIGRP

47 Aman Shaikh Ph.D. Defense Categorizing LSA Traffic Refresh LSA traffic –Originated due to periodic soft-state refresh –Forms base-line LSA traffic –Can be predicted using configuration information Change LSA traffic –Originated due to changes in network topology E.g, link goes down/comes up –Allows detection of anomalies and problems Duplicate LSA traffic –Received due to redundancy in flooding –Overhead -- wastes resources

48 Aman Shaikh Ph.D. Defense LSA Traffic in Different Areas Area 4 Days Area 3 Days Area 2 Days Area 0 Days Duplicate LSAs Change LSAs Refresh LSAs Artifact: 23 hr day (Apr 7) Genuine Anomaly

49 Aman Shaikh Ph.D. Defense Baseline LSA Traffic: Refresh LSAs Refresh LSA traffic can be reliably predicted using router configuration files –Important for workload generation Area 2Area 3 Days

50 Aman Shaikh Ph.D. Defense Refresh process is not synchronized No evidence of synchronization –Contrary to simulation-based study [basu:sigcomm01] Reasons –Changes in the topology help break synchronization –LSA refresh at one router is not coupled with LSA refresh at other routers –Drift in the refresh interval of different routers

51 Aman Shaikh Ph.D. Defense Change LSAs Internal to OSPF domain versus external –Change LSAs due to external events dominated –Not surprising due to large number of leased lines and import of customer routes into OSPF Customer volatility  network volatility Days

52 Aman Shaikh Ph.D. Defense Root Causes of Change LSAs Persistent problem  flapping  numerous change LSAs –Internal LSA spikes  hardware router problems OSPF monitor identified a problem (not visible other network mgt tools) early and led to preventive maintenance –External LSA spikes  customer route volatility Overload of an external link to a customer between 9 PM – 3 AM caused EIGRP session to flap Link flaps

53 Aman Shaikh Ph.D. Defense Overhead: Duplicate LSAs Why do some areas witness substantial duplicate LSA traffic, while other areas do not witness any? –OSPF flooding over LANs leads to control plane asymmetries and to imbalances in duplicate LSA traffic Days

54 Aman Shaikh Ph.D. Defense Summary Refresh LSAs: constituted bulk of overall LSA traffic –No evidence of synchronization between different routers –Refresh LSA traffic predictable from configuration information Change LSAs: mostly indicated persistent yet partial failure modes –Internal LSA spikes  hardware router problems  preventive router maintenance –External LSA spikes  customer congestion problems  “preventive” customer care Duplicate LSAs: arose from control plane asymmetries –Simple configuration changes could eliminate duplicate LSAs and improved performance

55 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization Maintenance –Motivation: Seamless maintenance and upgrades of routers –Minimal instability and flaps –Contribution: I’ll Be Back (IBB) capability for OSPF –What IBB capability provides, how capability is implemented, performance analysis Conclusions and future work

56 Aman Shaikh Ph.D. Defense Maintenance is a Pain Maintenance of routers is a way of life in commercial networks –Extensions to routing protocols, new functionality, hardware and software upgrades, bug fixes Maintenance is a painful exercise –During maintenance, operators withdraw “router- under-maintenance” from forwarding service Leads to route flaps, traffic disruption and instability –Operators have to carefully schedule maintenance Schedule them during night when load is moderate Stagger maintenance of different routers across time

57 Aman Shaikh Ph.D. Defense We can do better Observation: router can continue forwarding even while its routing process is inactive, at least for a while –Current routers have separate routing and forwarding paths Routing in software (CPU) Forwarding in hardware (switching) Need to extend routing protocols since they always try to route around inactive router –Our proposal: IBB (I’ll Be Back) extensions to OSPF

58 Aman Shaikh Ph.D. Defense IBB Proposal in a Nutshell OSPF process on router R needs to be shutdown Before shutdown, R informs other routers that it is going to be inactive for a while R specifies a time period (IBB Timeout) by which it expects to become operational again Other routers continue using R for forwarding during IBB Timeout period If R comes back within IBB Timeout period, no routing instability or flaps Else other routers start forwarding packets around R

59 Aman Shaikh Ph.D. Defense Related Work Graceful restart proposals for various routing protocols at IETF –Graceful restart proposal for OSPF by John Moy Alex zinin’s propsal to avoid flaps upon restart of OSPF process –Process has to come up before other routers notice it was shutdown –Provides small window of opportunity Use of redundant route processors and seamless transfer of control –NSR (Avici), High Availability Initiative (Cisco)

60 Aman Shaikh Ph.D. Defense What if topology changes R cannot update its forwarding table to reflect the change –Can lead to loop or black holes B A R (a) Topology when R went down B A R (b) Topology changes while R is inactive

61 Aman Shaikh Ph.D. Defense Handling Changes: Three Options Don’t do anything Stop using R: John Moy’s proposal –Inadvertent changes during upgrade are likely Example: flapping due to a bad interface somewhere –But all changes are not bad Do not always lead to loops or black holes Stop using R only when loop or black hole gets formed –And only for destinations for which there is a problem Our approach

62 Aman Shaikh Ph.D. Defense Roadmap of Algorithm Single area, single inactive router case –Loop formation –Black hole formation Single area, multiple inactive routers case –Loop formation Multiple areas –Black hole formation and area partitions

63 Aman Shaikh Ph.D. Defense Single Area, Single Inactive Router Problem Formulation –Inactive Router = R –All routers other than R have the same image of the topology graph –R’s image is that of a past = the time at which it went down –Source = S, Destination = D –Next hop(R, D) = Y –Actual path a packet takes from S to D = P(S  D)

64 Aman Shaikh Ph.D. Defense Loop Detection P(S  D) has a loop iff S and Y have R on their paths to D in their SPTs D R 3 26 Topology when R went down S 1 Y 20 D R S 1 Y Topology changes while R is inactive 20 Y R D 2 6 S and Y have R on their paths to D in their SPT S 1 S R D 1 6 Y 2 If there is a loop, neighbor can always detect it

65 Aman Shaikh Ph.D. Defense Loop Prevention Every router needs to calculate a path to D such that R does not appear on it D R S 1 Y Changed topology while R is inactive 20 S D S and Y calculate paths to D w/o R on it Y D 10

66 Aman Shaikh Ph.D. Defense Loop Avoidance Procedure R sends forwarding table to neighbors before shutdown - Thus, Y knows that next hop(R, D) is Y Detection: during SPF calculation neighbors detect loops - Y checks if R exists on the path to D or not Upon detection, neighbors send avoid messages to other routers in the domain - avoid(R, D) = avoid using R for reaching D Prevention: upon receiving avoid(R, D) message, other routers calculate a new path to D without R on it

67 Aman Shaikh Ph.D. Defense Performance Maximum effect on SPF calculation –Quantify overhead –Impact of topology size Prototype Implementation –IBB extensions incorporated into GateD 4.0.7

68 Aman Shaikh Ph.D. Defense Testbed Setup LSAs SUT 1 SUT’s view of the Topology TopTracker LAN 1 Router under maintenance 20 X R M1M1 Complete graph with n nodes 1 1 Emulated topology LAN TopTracker Physical Topology SUT System Under Test = where IBB overhead is measured

69 Aman Shaikh Ph.D. Defense Experiment Sequence GateD on SUTIBB-GateD on SUTTime (mins) T = 0Bring R downBring R down in IBB mode T = 4 Send avoid(R, M j ) messages to SUT (1  j  n) T = 8Bring R up Case A inactive rtr Case B inactive rtr, avoid it Overhead = mean SPF time in Case B mean SPF time in Case A

70 Aman Shaikh Ph.D. Defense Result Overhead remains constant at roughly 2.0 as n increases Sources of overhead: –Second SPF calculation –Graph in case B is larger than graph in case A

71 Aman Shaikh Ph.D. Defense Summary IBB proposal: extend OSPF so that a router can be used for forwarding even while its OSPF process is inactive Main contribution: algorithm that gracefully handles topology changes –Stops using the inactive router for a destination if using the router can lead to loops or black holes –Overhead of the algorithm is modest Shows good scaling behavior in terms of topology size

72 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization Maintenance Conclusions and future work

73 Aman Shaikh Ph.D. Defense Conclusions Monitoring –Design and implementation of an OSPF monitor –Deployment in two commercial networks Characterization –Black-box techniques for estimating OSPF processing delays within a router –Case study of OSPF dynamics in enterprise network Maintenance –I’ll Be Back (IBB) capability for OSPF that allows a “router-under-maintenance” to be used for forwarding

74 Aman Shaikh Ph.D. Defense Future Work Three principal directions for future work –Application of this work to other routing protocols IS-IS is very similar to OSPF EIGRP, RIP and BGP bring their own set of challenges –Distance-vector nature of the protocols –BGP also brings scalability issues –Other areas related to routing and network management Security, network design, configuration management, simulation & modeling How performance of routing infrastructure affects user- perceived performance –More work in each of three focus areas

75 Aman Shaikh Ph.D. Defense Future Work for Monitoring Real-time analysis –More meaningful alerting Correlation with other fault and performance data Learn from past events –Prioritization of alerts Off-line analysis –Correlation with other data sources Work already underway: BGP, fault, performance –Identification of problem signatures and feeding them into real-time component for problem prediction

76 Aman Shaikh Ph.D. Defense Future Work for Characterization Expand measurements to cover other router vendors and commercial networks Use results to build simulation and analytical models –Validation of models

77 Aman Shaikh Ph.D. Defense Future Work for Maintenance Improvements to IBB scheme –Incremental deployment –Reduction in overhead How to use IBB-like schemes in conjunction with other approaches –Routing software that can be upgraded without bringing the process down –Use of redundant route processors and seamless transfer of control –Scheduling maintenance task such that they have minimal impact

78 Aman Shaikh Ph.D. Defense Holy Grail Networks that manage themselves!

79 Aman Shaikh Ph.D. Defense Grill me... Probably your last chance… :-) Q and A

80 Aman Shaikh Ph.D. Defense Backups

81 Aman Shaikh Ph.D. Defense Partial Adjacency for LSAR LSAR Partial state I have LSA L Please send me LSA L I need LSA L from LSAR LSAR does not originate any LSAs LSAR  R link not used for data forwarding LSAR does not install any routes in forwarding table R Router R does not advertise a link to LSAR Routers (except R) not aware of the presence of LSAR Does not trigger SPF calculations in network LSAR’s going up/down does not impact the network

82 Aman Shaikh Ph.D. Defense Multiple Inactive Routers for IBB Loop Avoidance –Change in loop detection conditions –Simplification for loop prevention No change in black-hole detection

83 Aman Shaikh Ph.D. Defense Loop Avoidance Set of inactive routers: R 1, R 2, …, R n Loop avoidance procedure applies for each inactive router –Detection Router detects loops for all its inactive neighbors –Prevention A router can get avoid(R i, D) messages for j inactive routers (j <= n) The router avoids these j forbidden routers on its path to D Problem: Set of forbidden routers can be different for different destinations –O(n) shortest path calculations n = number of vertices

84 Aman Shaikh Ph.D. Defense Simplification Router avoids all inactive routers if it has some forbidden routers on its path to D –Calculate two SPTs: –SPT with all inactive routers on it –SPT w/o any inactive router on it –If the path to D does not contain any forbidden routers on it, Pick next hop for D from the first SPT –Else, Pick next hop for D from the second SPT

85 Aman Shaikh Ph.D. Defense Multiple Inactive Routers: Loop Detection Loop detection condition for single inactive router cannot detect all loop when multiple routers are inactive Two new conditions for loop detection by neighbors –Generalization of loop detection for single inactive router Conditions can result in false positives Evaluation using realistic OSPF topology graphs with two inactive routers –Using two conditions together eliminate most false positives (90% hit-rate), but not all...

86 Aman Shaikh Ph.D. Defense Publications Aman Shaikh, Mukul Goyal, Albert Greenberg, Raju Rajan and K.K. Ramakrishnan, An OSPF Topology Server: Design and Evalution, IEEE J- SAC, 20(4), May Aman Shaikh and Albert Greenberg, OSPF Monitoring: Architecture, Design, and Deployment Experience, submitted to NSDI, Aman Shaikh and Albert Greenberg, Experience in Black-box OSPF Measurement, In Proc. ACM SIGCOMM IMW, pp , November 2001 Aman Shaikh, Chris Isett, Albert Greenberg, Matthew Roughan and Joel Gottlieb, A Case Study of OSPF Behavior in a Large Enterprise Network, In Proc. ACM SIGCOMM IMW, pp , November Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of OSPF, In Proc. IEEE INFOCOM, June Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of Multiple OSPF Routers, submitted to IEEE/ACM Transactions on Networking (ToN).