Download presentation
Presentation is loading. Please wait.
Published byShania Ida Modified over 9 years ago
1
1 Aman Shaikh Ph.D. Defense Management of Routing Protocols in IP Networks Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003
2
2 Aman Shaikh Ph.D. Defense Introduction Internet connects millions of computers –Internet is packet-switched: Each packet travels independently of the rest Routers provide connectivity –Routers forward packets so that they reach their ultimate destination Forwarding is destination-based and hop-by-hop –Router decides next-hop (i.e., neighbor router) for each packet based on its destination address Routing protocols allow routers to determine next-hop(s) for every destination
3
3 Aman Shaikh Ph.D. Defense Management of Routing Infrastructure Management of routing infrastructure is a nightmare –“Simple core (= routing infrastructure), smart edge (= end hosts)” design paradigm Internet only provides a best-effort, connectionless, unreliable service Routing is not designed with manageability in mind –Large distributed system Hundreds of routers and thousands of links in big service provider networks Variety of routing protocols –The infrastructure is evolving New services require new protocols and devices
4
4 Aman Shaikh Ph.D. Defense Dissertation Contribution Focuses on management of Open Shortest Path First (OSPF) protocol –OSPF is widely used to control routing within service provider and enterprise networks Three areas of focus –Monitoring –Characterization –Maintenance
5
5 Aman Shaikh Ph.D. Defense Monitoring Motivation: –Effective management requires sound monitoring systems Contribution: –Design and implementation of an OSPF monitor –Deployment in two commercial networks Has proved valuable for trouble-shooting and identifying impending problems in early stage Collection and archiving of OSPF data that is used for performance improvement, post-mortem analysis and further research
6
6 Aman Shaikh Ph.D. Defense Characterization Motivation: –Need sound simulation and analytical models for scalability studies, addition of new features etc... How do we parameterize these models? –Need vendor-independent benchmarking methods Contribution: –Black-box techniques for estimating OSPF processing delays within a router Has become basis for OSPF benchmarking standardization efforts –Case study of OSPF dynamics in an enterprise network
7
7 Aman Shaikh Ph.D. Defense Maintenance Motivation: –Maintenance of routers occurs fairly frequently Protocol enhancements, bug fixes, hardware/software upgrades –During maintenance, operators have to withdraw router undergoing maintenance Leads to route flapping and instability –How to perform seamless maintenance? Contribution: –I’ll Be Back (IBB) capability for OSPF Allows “router-under-maintenance” to be used for forwarding
8
8 Aman Shaikh Ph.D. Defense Outline Background –Routing and OSPF overview –Design of an IP router Monitoring –OSPF Monitor Characterization –Black-box measurements for OSPF –Case study of OSPF dynamics Maintenance –I’ll Be Back (IBB) Capability for OSPF Conclusions and future work
9
9 Aman Shaikh Ph.D. Defense Routing in the Internet Internet is a collection of Autonomous Systems (ASes) Two classes of routing protocols –IGP (Interior Gateway Protocols) Used within an AS Example: OSPF, IS-IS, RIP, EIGRP –EGP (Exterior Gateway Protocols) Used across ASes Example: BGP AS1AS2 AS3AS4AS5 OSPF RIP OSPF IS-IS BGP
10
10 Aman Shaikh Ph.D. Defense Overview of OSPF OSPF is a link-state protocol –Every router learns entire network topology Topology is represented as graph –Routers are vertices, links are edges –Every link is assigned weight through configuration –Every router uses Dijkstra’s single source shortest path algorithm to build its forwarding table Router builds Shortest Path Tree (SPT) with itself as root Shortest Path Calculation (SPF) –Packets are forwarded along shortest paths defined by link weights
11
11 Aman Shaikh Ph.D. Defense Areas in OSPF OSPF allows domain to be divided into areas for scalability –Areas are numbered 0, 1, 2 … –Hub-and-spoke with area 0 as hub –Every link is assigned to exactly one area –Routers with links in multiple areas are called border routers Area 1 Area 2 Area 0 Border routers
12
12 Aman Shaikh Ph.D. Defense Summarization with Areas Each router learns –Entire topology of its attached areas –Information about subnets in remote areas and their distance from the border routers Distance = sum of link costs from border router to subnet B1B2 R2 Area 0 100 200 500 400 300 R3 R1 R1’s View Area 1 10.10.4.0/2410.10.5.0/24 20 70 10 60 Area 1 Area 0 20 100 B1B2 C1 C2 10.10.4.0/24 10.10.5.0/24 10 50 200 500 400 300 R3R2 R1 OSPF domain
13
13 Aman Shaikh Ph.D. Defense Link State Advertisements (LSAs) Every router describes its local connectivity in Link State Advertisements (LSAs) Router originates an LSA due to… –Change in network topology Example: link goes down or comes up –Periodic soft-state refresh Recommended value of interval is 30 minutes LSA is flooded to other routers in the domain –Flooding is reliable and hop-by-hop –Includes change and refresh LSAs –Flooding leads to duplicate copies of LSAs being received Every router stores LSAs (self-originated + received) in link-state database (= topology graph)
14
14 Aman Shaikh Ph.D. Defense Adjacency Neighbor routers (i.e., routers connected by a physical link) form an adjacency The purpose is to make sure –Link is operational and routers can communicate with each other –Neighbor routers have consistent view of network topology To avoid loops and black holes Link gets used for data forwarding only after adjacency is established Use of periodic Hellos to monitor the status of link and adjacency
15
15 Aman Shaikh Ph.D. Defense Interface card Forwarding Interface card Forwarding Design of an IP Router Route Processor (CPU) OSPF Process Routing calculation BGP Process Routing calculation RIP Process Routing calculation Route Manager Switching Fabric Data Plane Control Plane Forwarding Info. Base (FIB) Data packet
16
16 Aman Shaikh Ph.D. Defense Outline Background Monitoring –Motivation: Effective management requires sound monitoring systems –Contribution: OSPF monitor Design –Three component and their functionality Deployment in two commercial networks –How OSPF Monitor is being used –Lessons learnt through deployment Characterization Maintenance Conclusions and future work
17
17 Aman Shaikh Ph.D. Defense OSPF Monitor: Objectives Real-time analysis of OSPF behavior –Trouble-shooting, alerting –Real-time snapshots of OSPF network topology Off-line analysis –Post-mortem analysis of recurring problems –Identify anomaly signatures and use them to predict impending problems –Allow operators to tune configurable parameters –Improve maintenance procedures –Analyze OSPF behavior in commercial networks
18
18 Aman Shaikh Ph.D. Defense Related Work Route monitoring –Commercial IP monitors Route Dynamics (IPSUM), Route Explorer (PacketDesign) –IPMON project at Sprint IS-IS and BGP listeners –RouteViews and RIPE Collects BGP updates from several networks Topology tracking –OSPF topology server [shaikh:jsac02] Evaluation and comparison of LSA-based versus SNMP-based approaches –Rocketfuel project at UW Seattle Inference of intra-domain topologies from end-to-end measurements
19
19 Aman Shaikh Ph.D. Defense Components Data collection: LSA Reflector (LSAR) –Passively collects OSPF LSAs from network –“Reflects” streams of LSAs to LSAG –Archives LSAs for analysis by OSPFScan Real-time analysis: LSA aGgregator (LSAG) –Monitors network for topology changes, LSA storms, node flaps and anomalies Off-line analysis: OSPFScan –Tools for analysis of LSA archives Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics
20
20 Aman Shaikh Ph.D. Defense Example Area 0 Area 1 Area 2 Real-time Monitoring LSAG “Reflect” LSA LSA archive LSAR 1 “Reflect” LSA LSAR 2 OSPFScan Off-line Analysis replicate LSA archive OSPF Network LSAs
21
21 Aman Shaikh Ph.D. Defense How LSAR attaches to Network Host mode –Join multicast group –Adv: completely passive –Disadv: not reliable, delayed initialization of LSDB Full adjacency mode –Form full adjacency with a router –Adv: reliable, immediate initialization of LSDB –Disadv: LSAR’s instability can impact entire network Partial adjacency mode –Keep adjacency in a state that allows LSAR to receive LSAs, but does not allow data forwarding over link –Adv: reliable, LSAR’s instability does not impact entire network, immediate initialization of LSDB –Disadv: can raise alarms on the router
22
22 Aman Shaikh Ph.D. Defense LSA aGregator (LSAG) Analyzes “reflected” LSAs from LSARs over TCP connections in real-time Generates console messages: –Changes in OSPF network topology ADJACENY COST CHANGE: rtr 10.0.0.1 (intf 10.0.0.2) rtr 10.0.0.5 old_cost 1000 new_cost 50000 area 0.0.0.0 –Node flaps RTR FLAP: rtr 10.0.0.12 no_flaps 7 flap_window 570 sec –LSA storms LSA STORM: lstype 3 lsid 10.1.0.0 advrt 10.0.0.3 area 0.0.0.0 no_lsas 7 storm_window 470 sec –Anomalous behavior TYPE-3 ROUTE FROM NON-BORDER RTR: ntw 10.3.0.0/24 rtr 10.0.0.6 area 0.0.0.0
23
23 Aman Shaikh Ph.D. Defense OSPFScan Tools for off-line analysis of LSA archives –Parse, select (based on queries), and analyze Derivation and analysis of auxiliary information from LSA archives –LSAs indicating network topology changes –Routing table entries How OSPF routing tables evolved in response to network changes How end-to-end path within OSPF domain looked like at any instance –Topology changes as graph-based abstraction Vertex addition/deletion and link addition/deletion/change_weight Playback of topology change events –Essentially an LSAG playback
24
24 Aman Shaikh Ph.D. Defense Deployment Deployed in two commercial networks –Enterprise network 15 areas, 500+ routers; Ethernet-based LANs Deployed since February, 2002 LSA archive size: 10 MB/day LSAR connection: host mode –ISP network Area 0, 100+ routers; Point-to-point links Deployed since January, 2003 LSA archive size: 8 MB/day LSAR connection: partial adjacency mode
25
25 Aman Shaikh Ph.D. Defense LSAG in Day-to-day Operations Generation of alarms by feeding messages into higher layer network management systems –Correlation and grouping of messages into a single alarm –Prioritization of messages Validation of maintenance steps and monitoring the impact of these steps on network-wide OSPF behavior –Example: Operators change link weights to carry out maintenance activities A “link-audit” web-page allows operators to keep track of link weights in real-time
26
26 Aman Shaikh Ph.D. Defense Problems Caught by LSAG Equipment problem –Detected internal problems in a crucial router in enterprise network Problem manifested as episodes of OSPF adjacency flapping Configuration problem –Identified assignment of same router-ids to two routers in enterprise network OSPF implementation bug –Caught a bug in refresh algorithm of routers from a particular vendor in ISP network Bug resulted in a much faster refresh of LSAs than standards-mandated rate
27
27 Aman Shaikh Ph.D. Defense Long Term Analysis by OSPFScan LSA traffic analysis –Identified excessive duplicate LSA traffic in some areas of the enterprise network Led to root-cause analysis and preventative steps Generation of statistics –Inter-arrival time of change LSAs in the ISP network Fine-tuning configurable timers related to SPF calculation –Mean down-time and up-time for links and routers in the ISP network Assessment of reliability and availability as ISP network gears for deployment of new services
28
28 Aman Shaikh Ph.D. Defense Lessons Learnt through Deployment New tools reveal new failure modes Real networks exhibit significant activity –Maintenance and genuine problems Archive all LSAs –LSA volume is manageable Stability and reliability of monitor is extremely important Keep data collection separate from its analysis –Keep data collector as simple as possible Add functionality incrementally and through interaction with users
29
29 Aman Shaikh Ph.D. Defense Summary Three component architecture –LSAR: LSA capture from the network –LSAG: real-time analysis of LSA stream Detection and trouble-shooting of problems –OSPFScan: off-line analysis tools for LSA archives Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics Deployed in two commercial networks –Has proven a valuable network management tool –“OSPF Monitor was a lifesaver” VP of Networking, Enterprise network –When monitor caught an impending failure in an early stage
30
30 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization –Motivation: Simulation and analytical models, benchmarking –Contributions: Black-box techniques for estimating OSPF processing delays on a router –Tasks we measure, methodology, results for Cisco and GateD Case study of OSPF dynamics in an enterprise network Maintenance Conclusions and future work
31
31 Aman Shaikh Ph.D. Defense Black-box Measurements for OSPF OSPF processing delays within a router matter! –Add up to impact convergence and stability –Guidance in tuning configurable parameters, head to head vendor comparisons, simulation models Instrumenting routing code for measuring delays is challenging –Commercial implementations are proprietary –May involve grappling with Numerous code versions, hardware platforms, and developers Use black-box measurements –Measure the timing delays using external observations –Applied to Cisco and GateD OSPF implementations
32
32 Aman Shaikh Ph.D. Defense Related Work White-box measurements for IS-IS [alaettinoglu] –SPF delays reported are comparable to results obtained by us Empirical analysis of router behavior under large BGP routing tables [chang:imw02] –Cisco and Juniper routers Benchmarking Methodology working group (bmwg) at IETF –Drafts related to OSPF benchmarking Our black-box methods are basis for some benchmark tests
33
33 Aman Shaikh Ph.D. Defense What tasks did we measure? Route Processor (CPU) FIB Interface card Forwarding Switching Fabric Data packet Topology View SPF Calculation OSPF Process LSA LS Ack LSA Forwarding LSA Processing LSA Flooding SPF Calculation FIB Update
34
34 Aman Shaikh Ph.D. Defense Methodology TopTracker Target router Emulated topology Load emulated topology on target router Initiate task of interest Measure the time for task Testbed LSA
35
35 Aman Shaikh Ph.D. Defense Measuring Task Time top bracket event bottom bracket event task start time task finish time time 1.Use a black-box method to bracket task start and finish times 2.Subtract out intervals that precede and exceed these times X B C X = A - (B + C) A
36
36 Aman Shaikh Ph.D. Defense Measuring SPF Calculation Ack for duplicate LSA arrives Initiator LSA arrives SPF calculation ends SPF calculation starts time Target Router TopTracker Send initiator LSA Send duplicate LSA Load desired topology Send ack for duplicate LSA X = A – (B + C + D + E) Estimate the overhead = B + C + D + E A X C D B E
37
37 Aman Shaikh Ph.D. Defense Estimating the Overhead Remove SPF calculation from bracket –spf_delay = 60 seconds Ack for duplicate LSA arrives Initiator LSA arrives Initiator LSA processing done Duplicate LSA arrives time Target Router TopTracker Send initiator LSA Send duplicate LSA Duplicate LSA processing done; send ack SPF calculation starts overhead = B + C + D + E B E C D overhead
38
38 Aman Shaikh Ph.D. Defense Results Results for Cisco GSR, 7513 and GateD –For GateD, comparison of black-box results with those obtained using instrumentation (white-box) –Route processors Cisco: 200 MHz R5000 processor GateD: 500 MHz AMD-K6 processor Topology: full n n mesh with random OSPF edge weights –n in range 10, 20, …, 100
39
39 Aman Shaikh Ph.D. Defense Results for Cisco Routers Observations –Similar results for two models –SPF calculation time is O(n 2 )
40
40 Aman Shaikh Ph.D. Defense Results for GateD Observations: –Black-box over-estimates white-box measurement –Black-box captures the characteristics very well
41
41 Aman Shaikh Ph.D. Defense Black-box methods for estimating OSPF processing delays –Work across wide range of time delays –Work for pure CPU bound tasks –Effective in capturing scaling –Match with white-box measurements Applied methods to Cisco GSR and 7513 –LSA Processing: 100-800 microseconds –LSA flooding: 30-40 milliseconds Pacing timer is the determining factor –SPF calculation: 1-40 milliseconds O(n 2 ) behavior for full n x n mesh –FIB update time: 100-300 milliseconds No dependence on topology size Summary
42
42 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization –Motivation: Simulation and analytical models, benchmarking –Contributions: Black-box techniques for estimating OSPF processing delays on a router Case study of OSPF dynamics in an enterprise network –Enterprise network topology, categorization of LSA traffic, results Maintenance Conclusions and future work
43
43 Aman Shaikh Ph.D. Defense Case Study of OSPF Dynamics OSPF behavior in commercial networks is not well understood Understanding dynamics of LSA traffic is key to better understanding of OSPF –Bulk of OSPF processing is due to LSAs –Big impact on OSPF convergence, (in)stability Analysis of LSA archives collected by OSPF monitor in enterprise network –Focus on April, 2002 data
44
44 Aman Shaikh Ph.D. Defense Related Work Several studies focusing on BGP dynamics in the Internet –Relatively easy to collect BGP data –BGP is more complicated OSPF dynamics in a regional service provider network (MichNet) [watson:icdcs03] –One year worth of data –Several findings are similar to our observations Analysis of OSPF stability through simulations [basu:sigcomm01]
45
45 Aman Shaikh Ph.D. Defense Enterprise Network Provides customers with connectivity to applications and databases residing in data center OSPF network –15 areas, 500 routers This case study covers 8 areas, 250 routers One month: April, 2002 –Ethernet-based LANs Customers are connected via leased lines –Customer routes are injected via EIGRP into OSPF The routes are propagated via external LSAs
46
46 Aman Shaikh Ph.D. Defense Enterprise Network Topology Area 0Area BArea C Area A Servers Database Applications Customer OSPF Domain Customer B1B2 Monitor LAN1LAN 2 Border rtrs Area A Area 0 External (EIGRP) Monitor uses host mode to receive LSAs EIGRP
47
47 Aman Shaikh Ph.D. Defense Categorizing LSA Traffic Refresh LSA traffic –Originated due to periodic soft-state refresh –Forms base-line LSA traffic –Can be predicted using configuration information Change LSA traffic –Originated due to changes in network topology E.g, link goes down/comes up –Allows detection of anomalies and problems Duplicate LSA traffic –Received due to redundancy in flooding –Overhead -- wastes resources
48
48 Aman Shaikh Ph.D. Defense LSA Traffic in Different Areas Area 4 Days Area 3 Days Area 2 Days Area 0 Days Duplicate LSAs Change LSAs Refresh LSAs Artifact: 23 hr day (Apr 7) Genuine Anomaly
49
49 Aman Shaikh Ph.D. Defense Baseline LSA Traffic: Refresh LSAs Refresh LSA traffic can be reliably predicted using router configuration files –Important for workload generation Area 2Area 3 Days
50
50 Aman Shaikh Ph.D. Defense Refresh process is not synchronized No evidence of synchronization –Contrary to simulation-based study [basu:sigcomm01] Reasons –Changes in the topology help break synchronization –LSA refresh at one router is not coupled with LSA refresh at other routers –Drift in the refresh interval of different routers
51
51 Aman Shaikh Ph.D. Defense Change LSAs Internal to OSPF domain versus external –Change LSAs due to external events dominated –Not surprising due to large number of leased lines and import of customer routes into OSPF Customer volatility network volatility Days
52
52 Aman Shaikh Ph.D. Defense Root Causes of Change LSAs Persistent problem flapping numerous change LSAs –Internal LSA spikes hardware router problems OSPF monitor identified a problem (not visible other network mgt tools) early and led to preventive maintenance –External LSA spikes customer route volatility Overload of an external link to a customer between 9 PM – 3 AM caused EIGRP session to flap Link flaps
53
53 Aman Shaikh Ph.D. Defense Overhead: Duplicate LSAs Why do some areas witness substantial duplicate LSA traffic, while other areas do not witness any? –OSPF flooding over LANs leads to control plane asymmetries and to imbalances in duplicate LSA traffic Days
54
54 Aman Shaikh Ph.D. Defense Summary Refresh LSAs: constituted bulk of overall LSA traffic –No evidence of synchronization between different routers –Refresh LSA traffic predictable from configuration information Change LSAs: mostly indicated persistent yet partial failure modes –Internal LSA spikes hardware router problems preventive router maintenance –External LSA spikes customer congestion problems “preventive” customer care Duplicate LSAs: arose from control plane asymmetries –Simple configuration changes could eliminate duplicate LSAs and improved performance
55
55 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization Maintenance –Motivation: Seamless maintenance and upgrades of routers –Minimal instability and flaps –Contribution: I’ll Be Back (IBB) capability for OSPF –What IBB capability provides, how capability is implemented, performance analysis Conclusions and future work
56
56 Aman Shaikh Ph.D. Defense Maintenance is a Pain Maintenance of routers is a way of life in commercial networks –Extensions to routing protocols, new functionality, hardware and software upgrades, bug fixes Maintenance is a painful exercise –During maintenance, operators withdraw “router- under-maintenance” from forwarding service Leads to route flaps, traffic disruption and instability –Operators have to carefully schedule maintenance Schedule them during night when load is moderate Stagger maintenance of different routers across time
57
57 Aman Shaikh Ph.D. Defense We can do better Observation: router can continue forwarding even while its routing process is inactive, at least for a while –Current routers have separate routing and forwarding paths Routing in software (CPU) Forwarding in hardware (switching) Need to extend routing protocols since they always try to route around inactive router –Our proposal: IBB (I’ll Be Back) extensions to OSPF
58
58 Aman Shaikh Ph.D. Defense IBB Proposal in a Nutshell OSPF process on router R needs to be shutdown Before shutdown, R informs other routers that it is going to be inactive for a while R specifies a time period (IBB Timeout) by which it expects to become operational again Other routers continue using R for forwarding during IBB Timeout period If R comes back within IBB Timeout period, no routing instability or flaps Else other routers start forwarding packets around R
59
59 Aman Shaikh Ph.D. Defense Related Work Graceful restart proposals for various routing protocols at IETF –Graceful restart proposal for OSPF by John Moy Alex zinin’s propsal to avoid flaps upon restart of OSPF process –Process has to come up before other routers notice it was shutdown –Provides small window of opportunity Use of redundant route processors and seamless transfer of control –NSR (Avici), High Availability Initiative (Cisco)
60
60 Aman Shaikh Ph.D. Defense What if topology changes R cannot update its forwarding table to reflect the change –Can lead to loop or black holes B A R 3 2 6 (a) Topology when R went down B A R 10 2 6 (b) Topology changes while R is inactive
61
61 Aman Shaikh Ph.D. Defense Handling Changes: Three Options Don’t do anything Stop using R: John Moy’s proposal –Inadvertent changes during upgrade are likely Example: flapping due to a bad interface somewhere –But all changes are not bad Do not always lead to loops or black holes Stop using R only when loop or black hole gets formed –And only for destinations for which there is a problem Our approach
62
62 Aman Shaikh Ph.D. Defense Roadmap of Algorithm Single area, single inactive router case –Loop formation –Black hole formation Single area, multiple inactive routers case –Loop formation Multiple areas –Black hole formation and area partitions
63
63 Aman Shaikh Ph.D. Defense Single Area, Single Inactive Router Problem Formulation –Inactive Router = R –All routers other than R have the same image of the topology graph –R’s image is that of a past = the time at which it went down –Source = S, Destination = D –Next hop(R, D) = Y –Actual path a packet takes from S to D = P(S D)
64
64 Aman Shaikh Ph.D. Defense Loop Detection P(S D) has a loop iff S and Y have R on their paths to D in their SPTs D R 3 26 Topology when R went down S 1 Y 20 D R 10 26 S 1 Y Topology changes while R is inactive 20 Y R D 2 6 S and Y have R on their paths to D in their SPT S 1 S R D 1 6 Y 2 If there is a loop, neighbor can always detect it
65
65 Aman Shaikh Ph.D. Defense Loop Prevention Every router needs to calculate a path to D such that R does not appear on it D R 10 26 S 1 Y Changed topology while R is inactive 20 S D S and Y calculate paths to D w/o R on it Y D 10
66
66 Aman Shaikh Ph.D. Defense Loop Avoidance Procedure R sends forwarding table to neighbors before shutdown - Thus, Y knows that next hop(R, D) is Y Detection: during SPF calculation neighbors detect loops - Y checks if R exists on the path to D or not Upon detection, neighbors send avoid messages to other routers in the domain - avoid(R, D) = avoid using R for reaching D Prevention: upon receiving avoid(R, D) message, other routers calculate a new path to D without R on it
67
67 Aman Shaikh Ph.D. Defense Performance Maximum effect on SPF calculation –Quantify overhead –Impact of topology size Prototype Implementation –IBB extensions incorporated into GateD 4.0.7
68
68 Aman Shaikh Ph.D. Defense Testbed Setup LSAs SUT 1 SUT’s view of the Topology TopTracker LAN 1 Router under maintenance 20 X R M1M1 Complete graph with n nodes 1 1 Emulated topology LAN TopTracker Physical Topology SUT System Under Test = where IBB overhead is measured
69
69 Aman Shaikh Ph.D. Defense Experiment Sequence GateD on SUTIBB-GateD on SUTTime (mins) T = 0Bring R downBring R down in IBB mode T = 4 Send avoid(R, M j ) messages to SUT (1 j n) T = 8Bring R up Case A inactive rtr Case B inactive rtr, avoid it Overhead = mean SPF time in Case B mean SPF time in Case A
70
70 Aman Shaikh Ph.D. Defense Result Overhead remains constant at roughly 2.0 as n increases Sources of overhead: –Second SPF calculation –Graph in case B is larger than graph in case A
71
71 Aman Shaikh Ph.D. Defense Summary IBB proposal: extend OSPF so that a router can be used for forwarding even while its OSPF process is inactive Main contribution: algorithm that gracefully handles topology changes –Stops using the inactive router for a destination if using the router can lead to loops or black holes –Overhead of the algorithm is modest Shows good scaling behavior in terms of topology size
72
72 Aman Shaikh Ph.D. Defense Outline Background Monitoring Characterization Maintenance Conclusions and future work
73
73 Aman Shaikh Ph.D. Defense Conclusions Monitoring –Design and implementation of an OSPF monitor –Deployment in two commercial networks Characterization –Black-box techniques for estimating OSPF processing delays within a router –Case study of OSPF dynamics in enterprise network Maintenance –I’ll Be Back (IBB) capability for OSPF that allows a “router-under-maintenance” to be used for forwarding
74
74 Aman Shaikh Ph.D. Defense Future Work Three principal directions for future work –Application of this work to other routing protocols IS-IS is very similar to OSPF EIGRP, RIP and BGP bring their own set of challenges –Distance-vector nature of the protocols –BGP also brings scalability issues –Other areas related to routing and network management Security, network design, configuration management, simulation & modeling How performance of routing infrastructure affects user- perceived performance –More work in each of three focus areas
75
75 Aman Shaikh Ph.D. Defense Future Work for Monitoring Real-time analysis –More meaningful alerting Correlation with other fault and performance data Learn from past events –Prioritization of alerts Off-line analysis –Correlation with other data sources Work already underway: BGP, fault, performance –Identification of problem signatures and feeding them into real-time component for problem prediction
76
76 Aman Shaikh Ph.D. Defense Future Work for Characterization Expand measurements to cover other router vendors and commercial networks Use results to build simulation and analytical models –Validation of models
77
77 Aman Shaikh Ph.D. Defense Future Work for Maintenance Improvements to IBB scheme –Incremental deployment –Reduction in overhead How to use IBB-like schemes in conjunction with other approaches –Routing software that can be upgraded without bringing the process down –Use of redundant route processors and seamless transfer of control –Scheduling maintenance task such that they have minimal impact
78
78 Aman Shaikh Ph.D. Defense Holy Grail Networks that manage themselves!
79
79 Aman Shaikh Ph.D. Defense Grill me... Probably your last chance… :-) Q and A
80
80 Aman Shaikh Ph.D. Defense Backups
81
81 Aman Shaikh Ph.D. Defense Partial Adjacency for LSAR LSAR Partial state I have LSA L Please send me LSA L I need LSA L from LSAR LSAR does not originate any LSAs LSAR R link not used for data forwarding LSAR does not install any routes in forwarding table R Router R does not advertise a link to LSAR Routers (except R) not aware of the presence of LSAR Does not trigger SPF calculations in network LSAR’s going up/down does not impact the network
82
82 Aman Shaikh Ph.D. Defense Multiple Inactive Routers for IBB Loop Avoidance –Change in loop detection conditions –Simplification for loop prevention No change in black-hole detection
83
83 Aman Shaikh Ph.D. Defense Loop Avoidance Set of inactive routers: R 1, R 2, …, R n Loop avoidance procedure applies for each inactive router –Detection Router detects loops for all its inactive neighbors –Prevention A router can get avoid(R i, D) messages for j inactive routers (j <= n) The router avoids these j forbidden routers on its path to D Problem: Set of forbidden routers can be different for different destinations –O(n) shortest path calculations n = number of vertices
84
84 Aman Shaikh Ph.D. Defense Simplification Router avoids all inactive routers if it has some forbidden routers on its path to D –Calculate two SPTs: –SPT with all inactive routers on it –SPT w/o any inactive router on it –If the path to D does not contain any forbidden routers on it, Pick next hop for D from the first SPT –Else, Pick next hop for D from the second SPT
85
85 Aman Shaikh Ph.D. Defense Multiple Inactive Routers: Loop Detection Loop detection condition for single inactive router cannot detect all loop when multiple routers are inactive Two new conditions for loop detection by neighbors –Generalization of loop detection for single inactive router Conditions can result in false positives Evaluation using realistic OSPF topology graphs with two inactive routers –Using two conditions together eliminate most false positives (90% hit-rate), but not all...
86
86 Aman Shaikh Ph.D. Defense Publications Aman Shaikh, Mukul Goyal, Albert Greenberg, Raju Rajan and K.K. Ramakrishnan, An OSPF Topology Server: Design and Evalution, IEEE J- SAC, 20(4), May 2002. Aman Shaikh and Albert Greenberg, OSPF Monitoring: Architecture, Design, and Deployment Experience, submitted to NSDI, 2004. Aman Shaikh and Albert Greenberg, Experience in Black-box OSPF Measurement, In Proc. ACM SIGCOMM IMW, pp. 113-125, November 2001 Aman Shaikh, Chris Isett, Albert Greenberg, Matthew Roughan and Joel Gottlieb, A Case Study of OSPF Behavior in a Large Enterprise Network, In Proc. ACM SIGCOMM IMW, pp. 217-230, November 2002. Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of OSPF, In Proc. IEEE INFOCOM, June 2002. Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of Multiple OSPF Routers, submitted to IEEE/ACM Transactions on Networking (ToN).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.