Download presentation
Presentation is loading. Please wait.
Published byKristopher Hodges Modified over 6 years ago
1
Management of Routing Protocols in IP Networks
Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003 Ph.D. Defense
2
Introduction Internet connects millions of computers
Internet is packet-switched: Each packet travels independently of the rest Routers provide connectivity Routers forward packets so that they reach their ultimate destination Forwarding is destination-based and hop-by-hop Router decides next-hop (i.e., neighbor router) for each packet based on its destination address Routing protocols allow routers to determine next-hop(s) for every destination Ph.D. Defense
3
Management of Routing Infrastructure
Management of routing infrastructure is a nightmare “Simple core (= routing infrastructure), smart edge (= end hosts)” design paradigm Internet only provides a best-effort, connectionless, unreliable service Routing is not designed with manageability in mind Large distributed system Hundreds of routers and thousands of links in big service provider networks Variety of routing protocols The infrastructure is evolving New services require new protocols and devices Ph.D. Defense
4
Dissertation Contribution
Focuses on management of Open Shortest Path First (OSPF) protocol OSPF is widely used to control routing within service provider and enterprise networks Three areas of focus Monitoring Characterization Maintenance Ph.D. Defense
5
Monitoring Motivation: Contribution:
Effective management requires sound monitoring systems Contribution: Design and implementation of an OSPF monitor Deployment in two commercial networks Has proved valuable for trouble-shooting and identifying impending problems in early stage Collection and archiving of OSPF data that is used for performance improvement, post-mortem analysis and further research Ph.D. Defense
6
Characterization Motivation: Contribution:
Need sound simulation and analytical models for scalability studies, addition of new features etc... How do we parameterize these models? Need vendor-independent benchmarking methods Contribution: Black-box techniques for estimating OSPF processing delays within a router Has become basis for OSPF benchmarking standardization efforts Case study of OSPF dynamics in an enterprise network Ph.D. Defense
7
Maintenance Motivation: Contribution:
Maintenance of routers occurs fairly frequently Protocol enhancements, bug fixes, hardware/software upgrades During maintenance, operators have to withdraw router undergoing maintenance Leads to route flapping and instability How to perform seamless maintenance? Contribution: I’ll Be Back (IBB) capability for OSPF Allows “router-under-maintenance” to be used for forwarding Ph.D. Defense
8
Outline Background Monitoring Characterization Maintenance
Routing and OSPF overview Design of an IP router Monitoring OSPF Monitor Characterization Black-box measurements for OSPF Case study of OSPF dynamics Maintenance I’ll Be Back (IBB) Capability for OSPF Conclusions and future work Ph.D. Defense
9
Routing in the Internet
AS1 AS2 BGP OSPF IS-IS BGP BGP BGP BGP AS3 AS4 AS5 BGP BGP OSPF RIP OSPF Internet is a collection of Autonomous Systems (ASes) Two classes of routing protocols IGP (Interior Gateway Protocols) Used within an AS Example: OSPF, IS-IS, RIP, EIGRP EGP (Exterior Gateway Protocols) Used across ASes Example: BGP Ph.D. Defense
10
Overview of OSPF OSPF is a link-state protocol
Every router learns entire network topology Topology is represented as graph Routers are vertices, links are edges Every link is assigned weight through configuration Every router uses Dijkstra’s single source shortest path algorithm to build its forwarding table Router builds Shortest Path Tree (SPT) with itself as root Shortest Path Calculation (SPF) Packets are forwarded along shortest paths defined by link weights Ph.D. Defense
11
Areas in OSPF OSPF allows domain to be divided into areas for scalability Areas are numbered 0, 1, 2 … Hub-and-spoke with area 0 as hub Every link is assigned to exactly one area Routers with links in multiple areas are called border routers Border routers Area 1 Area 2 Area 0 Ph.D. Defense
12
Summarization with Areas
Each router learns Entire topology of its attached areas Information about subnets in remote areas and their distance from the border routers Distance = sum of link costs from border router to subnet Area 1 Area 0 20 100 B1 B2 C1 C2 /24 /24 10 50 200 500 400 300 R3 R2 R1 OSPF domain B1 B2 R2 Area 0 100 200 500 400 300 R3 R1 R1’s View Area 1 /24 /24 20 70 10 60 Ph.D. Defense
13
Link State Advertisements (LSAs)
Every router describes its local connectivity in Link State Advertisements (LSAs) Router originates an LSA due to… Change in network topology Example: link goes down or comes up Periodic soft-state refresh Recommended value of interval is 30 minutes LSA is flooded to other routers in the domain Flooding is reliable and hop-by-hop Includes change and refresh LSAs Flooding leads to duplicate copies of LSAs being received Every router stores LSAs (self-originated + received) in link-state database (= topology graph) Ph.D. Defense
14
Adjacency Neighbor routers (i.e., routers connected by a physical link) form an adjacency The purpose is to make sure Link is operational and routers can communicate with each other Neighbor routers have consistent view of network topology To avoid loops and black holes Link gets used for data forwarding only after adjacency is established Use of periodic Hellos to monitor the status of link and adjacency Ph.D. Defense
15
Design of an IP Router Route Processor (CPU) Data packet Data packet
OSPF Process Routing calculation BGP Process Routing calculation RIP Process Routing calculation Route Manager Control Plane Data Plane Forwarding Info. Base (FIB) Interface card Forwarding Interface card Forwarding Data packet Data packet Switching Fabric Ph.D. Defense
16
Outline Monitoring Motivation: Background
Effective management requires sound monitoring systems Contribution: OSPF monitor Design Three component and their functionality Deployment in two commercial networks How OSPF Monitor is being used Lessons learnt through deployment Characterization Maintenance Conclusions and future work Ph.D. Defense
17
OSPF Monitor: Objectives
Real-time analysis of OSPF behavior Trouble-shooting, alerting Real-time snapshots of OSPF network topology Off-line analysis Post-mortem analysis of recurring problems Identify anomaly signatures and use them to predict impending problems Allow operators to tune configurable parameters Improve maintenance procedures Analyze OSPF behavior in commercial networks Ph.D. Defense
18
Related Work Route monitoring Topology tracking Commercial IP monitors
Route Dynamics (IPSUM), Route Explorer (PacketDesign) IPMON project at Sprint IS-IS and BGP listeners RouteViews and RIPE Collects BGP updates from several networks Topology tracking OSPF topology server [shaikh:jsac02] Evaluation and comparison of LSA-based versus SNMP-based approaches Rocketfuel project at UW Seattle Inference of intra-domain topologies from end-to-end measurements Ph.D. Defense
19
Components Data collection: LSA Reflector (LSAR)
Passively collects OSPF LSAs from network “Reflects” streams of LSAs to LSAG Archives LSAs for analysis by OSPFScan Real-time analysis: LSA aGgregator (LSAG) Monitors network for topology changes, LSA storms, node flaps and anomalies Off-line analysis: OSPFScan Tools for analysis of LSA archives Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics Ph.D. Defense
20
Example OSPF Network Area 1 Area 0 Area 2 Real-time Monitoring LSAG
OSPFScan Off-line Analysis LSAs LSAs LSAs LSAR 1 LSAR 2 “Reflect” LSA “Reflect” LSA LSA archive LSA archive LSA archive replicate LSAs LSAs LSAs OSPF Network Area 1 Area 0 Area 2 Ph.D. Defense
21
How LSAR attaches to Network
Host mode Join multicast group Adv: completely passive Disadv: not reliable, delayed initialization of LSDB Full adjacency mode Form full adjacency with a router Adv: reliable, immediate initialization of LSDB Disadv: LSAR’s instability can impact entire network Partial adjacency mode Keep adjacency in a state that allows LSAR to receive LSAs, but does not allow data forwarding over link Adv: reliable, LSAR’s instability does not impact entire network, immediate initialization of LSDB Disadv: can raise alarms on the router Ph.D. Defense
22
LSA aGregator (LSAG) Analyzes “reflected” LSAs from LSARs over TCP connections in real-time Generates console messages: Changes in OSPF network topology ADJACENY COST CHANGE: rtr (intf ) rtr old_cost 1000 new_cost area Node flaps RTR FLAP: rtr no_flaps 7 flap_window 570 sec LSA storms LSA STORM: lstype 3 lsid advrt area no_lsas 7 storm_window 470 sec Anomalous behavior TYPE-3 ROUTE FROM NON-BORDER RTR: ntw /24 rtr area Ph.D. Defense
23
OSPFScan Tools for off-line analysis of LSA archives
Parse, select (based on queries), and analyze Derivation and analysis of auxiliary information from LSA archives LSAs indicating network topology changes Routing table entries How OSPF routing tables evolved in response to network changes How end-to-end path within OSPF domain looked like at any instance Topology changes as graph-based abstraction Vertex addition/deletion and link addition/deletion/change_weight Playback of topology change events Essentially an LSAG playback Ph.D. Defense
24
Deployment Deployed in two commercial networks Enterprise network
15 areas, 500+ routers; Ethernet-based LANs Deployed since February, 2002 LSA archive size: 10 MB/day LSAR connection: host mode ISP network Area 0, 100+ routers; Point-to-point links Deployed since January, 2003 LSA archive size: 8 MB/day LSAR connection: partial adjacency mode Ph.D. Defense
25
LSAG in Day-to-day Operations
Generation of alarms by feeding messages into higher layer network management systems Correlation and grouping of messages into a single alarm Prioritization of messages Validation of maintenance steps and monitoring the impact of these steps on network-wide OSPF behavior Example: Operators change link weights to carry out maintenance activities A “link-audit” web-page allows operators to keep track of link weights in real-time Ph.D. Defense
26
Problems Caught by LSAG
Equipment problem Detected internal problems in a crucial router in enterprise network Problem manifested as episodes of OSPF adjacency flapping Configuration problem Identified assignment of same router-ids to two routers in enterprise network OSPF implementation bug Caught a bug in refresh algorithm of routers from a particular vendor in ISP network Bug resulted in a much faster refresh of LSAs than standards-mandated rate Ph.D. Defense
27
Long Term Analysis by OSPFScan
LSA traffic analysis Identified excessive duplicate LSA traffic in some areas of the enterprise network Led to root-cause analysis and preventative steps Generation of statistics Inter-arrival time of change LSAs in the ISP network Fine-tuning configurable timers related to SPF calculation Mean down-time and up-time for links and routers in the ISP network Assessment of reliability and availability as ISP network gears for deployment of new services Ph.D. Defense
28
Lessons Learnt through Deployment
New tools reveal new failure modes Real networks exhibit significant activity Maintenance and genuine problems Archive all LSAs LSA volume is manageable Stability and reliability of monitor is extremely important Keep data collection separate from its analysis Keep data collector as simple as possible Add functionality incrementally and through interaction with users Ph.D. Defense
29
Summary Three component architecture
LSAR: LSA capture from the network LSAG: real-time analysis of LSA stream Detection and trouble-shooting of problems OSPFScan: off-line analysis tools for LSA archives Post-mortem analysis of recurring problems, performance improvement, what-if analysis, OSPF dynamics Deployed in two commercial networks Has proven a valuable network management tool “OSPF Monitor was a lifesaver” VP of Networking, Enterprise network When monitor caught an impending failure in an early stage Ph.D. Defense
30
Outline Characterization Motivation: Contributions: Background
Monitoring Characterization Motivation: Simulation and analytical models, benchmarking Contributions: Black-box techniques for estimating OSPF processing delays on a router Tasks we measure, methodology, results for Cisco and GateD Case study of OSPF dynamics in an enterprise network Maintenance Conclusions and future work Ph.D. Defense
31
Black-box Measurements for OSPF
OSPF processing delays within a router matter! Add up to impact convergence and stability Guidance in tuning configurable parameters, head to head vendor comparisons, simulation models Instrumenting routing code for measuring delays is challenging Commercial implementations are proprietary May involve grappling with Numerous code versions, hardware platforms, and developers Use black-box measurements Measure the timing delays using external observations Applied to Cisco and GateD OSPF implementations Ph.D. Defense
32
Related Work White-box measurements for IS-IS [alaettinoglu]
SPF delays reported are comparable to results obtained by us Empirical analysis of router behavior under large BGP routing tables [chang:imw02] Cisco and Juniper routers Benchmarking Methodology working group (bmwg) at IETF Drafts related to OSPF benchmarking Our black-box methods are basis for some benchmark tests Ph.D. Defense
33
What tasks did we measure?
LSA Processing Route Processor (CPU) OSPF Process LSA Flooding Topology View LSA LSA SPF Calculation SPF Calculation LS Ack FIB Update FIB Forwarding Forwarding Data packet Switching Fabric Interface card Interface card Ph.D. Defense
34
Methodology Testbed Load emulated topology on target router
LSA LSA LSA TopTracker Target router Testbed Load emulated topology on target router Initiate task of interest Measure the time for task Ph.D. Defense
35
Measuring Task Time Use a black-box method to bracket task start and finish times Subtract out intervals that precede and exceed these times time top bracket event B A task start time X task finish time C bottom bracket event X = A - (B + C) Ph.D. Defense
36
Measuring SPF Calculation
TopTracker Target Router Load desired topology Send initiator LSA A B Initiator LSA arrives C SPF calculation starts time Send duplicate LSA X SPF calculation ends E D Send ack for duplicate LSA Ack for duplicate LSA arrives X = A – (B + C + D + E) Estimate the overhead = B + C + D + E Ph.D. Defense
37
Estimating the Overhead
Remove SPF calculation from bracket spf_delay = 60 seconds TopTracker Target Router B Send initiator LSA overhead Send duplicate LSA Initiator LSA arrives Duplicate LSA arrives C time Initiator LSA processing done D Duplicate LSA processing done; send ack E Ack for duplicate LSA arrives SPF calculation starts overhead = B + C + D + E Ph.D. Defense
38
Results Results for Cisco GSR, 7513 and GateD
For GateD, comparison of black-box results with those obtained using instrumentation (white-box) Route processors Cisco: 200 MHz R5000 processor GateD: 500 MHz AMD-K6 processor Topology: full n n mesh with random OSPF edge weights n in range 10, 20, …, 100 Ph.D. Defense
39
Results for Cisco Routers
Observations Similar results for two models SPF calculation time is O(n2) Ph.D. Defense
40
Results for GateD Observations:
Black-box over-estimates white-box measurement Black-box captures the characteristics very well Ph.D. Defense
41
Summary Black-box methods for estimating OSPF processing delays
Work across wide range of time delays Work for pure CPU bound tasks Effective in capturing scaling Match with white-box measurements Applied methods to Cisco GSR and 7513 LSA Processing: microseconds LSA flooding: milliseconds Pacing timer is the determining factor SPF calculation: 1-40 milliseconds O(n2) behavior for full n x n mesh FIB update time: milliseconds No dependence on topology size Ph.D. Defense
42
Outline Characterization Motivation: Contributions: Background
Monitoring Characterization Motivation: Simulation and analytical models, benchmarking Contributions: Black-box techniques for estimating OSPF processing delays on a router Case study of OSPF dynamics in an enterprise network Enterprise network topology, categorization of LSA traffic, results Maintenance Conclusions and future work Ph.D. Defense
43
Case Study of OSPF Dynamics
OSPF behavior in commercial networks is not well understood Understanding dynamics of LSA traffic is key to better understanding of OSPF Bulk of OSPF processing is due to LSAs Big impact on OSPF convergence, (in)stability Analysis of LSA archives collected by OSPF monitor in enterprise network Focus on April, 2002 data Ph.D. Defense
44
Related Work Several studies focusing on BGP dynamics in the Internet
Relatively easy to collect BGP data BGP is more complicated OSPF dynamics in a regional service provider network (MichNet) [watson:icdcs03] One year worth of data Several findings are similar to our observations Analysis of OSPF stability through simulations [basu:sigcomm01] Ph.D. Defense
45
Enterprise Network Provides customers with connectivity to applications and databases residing in data center OSPF network 15 areas, 500 routers This case study covers 8 areas, 250 routers One month: April, 2002 Ethernet-based LANs Customers are connected via leased lines Customer routes are injected via EIGRP into OSPF The routes are propagated via external LSAs Ph.D. Defense
46
Enterprise Network Topology
Customer Customer Customer EIGRP EIGRP EIGRP B1 B2 Monitor LAN1 LAN 2 Border rtrs Area A Area 0 External (EIGRP) OSPF Domain Area A Area B Area 0 Area C Servers Database Applications Monitor uses host mode to receive LSAs Ph.D. Defense
47
Categorizing LSA Traffic
Refresh LSA traffic Originated due to periodic soft-state refresh Forms base-line LSA traffic Can be predicted using configuration information Change LSA traffic Originated due to changes in network topology E.g, link goes down/comes up Allows detection of anomalies and problems Duplicate LSA traffic Received due to redundancy in flooding Overhead -- wastes resources Ph.D. Defense
48
LSA Traffic in Different Areas
Days Area 2 Days Refresh LSAs Genuine Anomaly Change LSAs Area 3 Days Area 4 Days Duplicate LSAs Artifact: 23 hr day (Apr 7) Ph.D. Defense
49
Baseline LSA Traffic: Refresh LSAs
Refresh LSA traffic can be reliably predicted using router configuration files Important for workload generation Days Days Area 2 Area 3 Ph.D. Defense
50
Refresh process is not synchronized
No evidence of synchronization Contrary to simulation-based study [basu:sigcomm01] Reasons Changes in the topology help break synchronization LSA refresh at one router is not coupled with LSA refresh at other routers Drift in the refresh interval of different routers Ph.D. Defense
51
Change LSAs Internal to OSPF domain versus external
Days Internal to OSPF domain versus external Change LSAs due to external events dominated Not surprising due to large number of leased lines and import of customer routes into OSPF Customer volatility network volatility Ph.D. Defense
52
Root Causes of Change LSAs
Persistent problem flapping numerous change LSAs Internal LSA spikes hardware router problems OSPF monitor identified a problem (not visible other network mgt tools) early and led to preventive maintenance External LSA spikes customer route volatility Overload of an external link to a customer between 9 PM – 3 AM caused EIGRP session to flap Link flaps Ph.D. Defense
53
Overhead: Duplicate LSAs
Days Why do some areas witness substantial duplicate LSA traffic, while other areas do not witness any? OSPF flooding over LANs leads to control plane asymmetries and to imbalances in duplicate LSA traffic Ph.D. Defense
54
Summary Refresh LSAs: constituted bulk of overall LSA traffic
No evidence of synchronization between different routers Refresh LSA traffic predictable from configuration information Change LSAs: mostly indicated persistent yet partial failure modes Internal LSA spikes hardware router problems preventive router maintenance External LSA spikes customer congestion problems “preventive” customer care Duplicate LSAs: arose from control plane asymmetries Simple configuration changes could eliminate duplicate LSAs and improved performance Ph.D. Defense
55
Outline Maintenance Motivation: Contribution: Background Monitoring
Characterization Maintenance Motivation: Seamless maintenance and upgrades of routers Minimal instability and flaps Contribution: I’ll Be Back (IBB) capability for OSPF What IBB capability provides, how capability is implemented, performance analysis Conclusions and future work Ph.D. Defense
56
Maintenance is a Pain Maintenance of routers is a way of life in commercial networks Extensions to routing protocols, new functionality, hardware and software upgrades, bug fixes Maintenance is a painful exercise During maintenance, operators withdraw “router-under-maintenance” from forwarding service Leads to route flaps, traffic disruption and instability Operators have to carefully schedule maintenance Schedule them during night when load is moderate Stagger maintenance of different routers across time Ph.D. Defense
57
We can do better Observation: router can continue forwarding even while its routing process is inactive, at least for a while Current routers have separate routing and forwarding paths Routing in software (CPU) Forwarding in hardware (switching) Need to extend routing protocols since they always try to route around inactive router Our proposal: IBB (I’ll Be Back) extensions to OSPF Ph.D. Defense
58
IBB Proposal in a Nutshell
OSPF process on router R needs to be shutdown Before shutdown, R informs other routers that it is going to be inactive for a while R specifies a time period (IBB Timeout) by which it expects to become operational again Other routers continue using R for forwarding during IBB Timeout period If R comes back within IBB Timeout period, no routing instability or flaps Else other routers start forwarding packets around R Ph.D. Defense
59
Related Work Graceful restart proposals for various routing protocols at IETF Graceful restart proposal for OSPF by John Moy Alex zinin’s propsal to avoid flaps upon restart of OSPF process Process has to come up before other routers notice it was shutdown Provides small window of opportunity Use of redundant route processors and seamless transfer of control NSR (Avici), High Availability Initiative (Cisco) Ph.D. Defense
60
What if topology changes
R cannot update its forwarding table to reflect the change Can lead to loop or black holes B A R 3 2 6 (a) Topology when R went down B A R 10 2 6 (b) Topology changes while R is inactive Ph.D. Defense
61
Handling Changes: Three Options
Don’t do anything Stop using R: John Moy’s proposal Inadvertent changes during upgrade are likely Example: flapping due to a bad interface somewhere But all changes are not bad Do not always lead to loops or black holes Stop using R only when loop or black hole gets formed And only for destinations for which there is a problem Our approach Ph.D. Defense
62
Roadmap of Algorithm Single area, single inactive router case
Loop formation Black hole formation Single area, multiple inactive routers case Multiple areas Black hole formation and area partitions Ph.D. Defense
63
Single Area, Single Inactive Router
Problem Formulation Inactive Router = R All routers other than R have the same image of the topology graph R’s image is that of a past = the time at which it went down Source = S, Destination = D Next hop(R, D) = Y Actual path a packet takes from S to D = P(SD) Ph.D. Defense
64
Loop Detection P(SD) has a loop
iff S and Y have R on their paths to D in their SPTs D R 3 2 6 Topology when R went down S 1 Y 20 D R 10 2 6 S 1 Y Topology changes while R is inactive 20 Y R D 2 6 S and Y have R on their paths to D in their SPT S 1 If there is a loop, neighbor can always detect it Ph.D. Defense
65
Loop Prevention Every router needs to calculate a path to D
such that R does not appear on it D R 10 2 6 S 1 Y Changed topology while R is inactive 20 S D 20 S and Y calculate paths to D w/o R on it Y 10 Ph.D. Defense
66
Loop Avoidance Procedure
R sends forwarding table to neighbors before shutdown - Thus, Y knows that next hop(R, D) is Y Detection: during SPF calculation neighbors detect loops - Y checks if R exists on the path to D or not Upon detection, neighbors send avoid messages to other routers in the domain - avoid(R, D) = avoid using R for reaching D Prevention: upon receiving avoid(R, D) message, other routers calculate a new path to D without R on it Ph.D. Defense
67
Performance Maximum effect on SPF calculation Prototype Implementation
Quantify overhead Impact of topology size Prototype Implementation IBB extensions incorporated into GateD 4.0.7 Ph.D. Defense
68
Testbed Setup Physical Topology SUT’s view of the Topology
LAN TopTracker Physical Topology SUT System Under Test = where IBB overhead is measured SUT 1 SUT’s view of the Topology TopTracker LAN LSAs LSAs LSAs 1 Router under maintenance 20 X R M1 Complete graph with n nodes Emulated topology Ph.D. Defense
69
Experiment Sequence mean SPF time in Case B Overhead =
GateD on SUT IBB-GateD on SUT Time (mins) T = 0 Bring R down Bring R down in IBB mode Case A inactive rtr T = 4 Send avoid(R, Mj) messages to SUT (1j n) Case B inactive rtr, avoid it T = 8 Bring R up Overhead = mean SPF time in Case B mean SPF time in Case A Ph.D. Defense
70
Result Overhead remains constant at roughly 2.0 as n increases
Sources of overhead: Second SPF calculation Graph in case B is larger than graph in case A Ph.D. Defense
71
Summary IBB proposal: extend OSPF so that a router can be used for forwarding even while its OSPF process is inactive Main contribution: algorithm that gracefully handles topology changes Stops using the inactive router for a destination if using the router can lead to loops or black holes Overhead of the algorithm is modest Shows good scaling behavior in terms of topology size Ph.D. Defense
72
Outline Conclusions and future work Background Monitoring
Characterization Maintenance Conclusions and future work Ph.D. Defense
73
Conclusions Monitoring Characterization Maintenance
Design and implementation of an OSPF monitor Deployment in two commercial networks Characterization Black-box techniques for estimating OSPF processing delays within a router Case study of OSPF dynamics in enterprise network Maintenance I’ll Be Back (IBB) capability for OSPF that allows a “router-under-maintenance” to be used for forwarding Ph.D. Defense
74
Future Work Three principal directions for future work
Application of this work to other routing protocols IS-IS is very similar to OSPF EIGRP, RIP and BGP bring their own set of challenges Distance-vector nature of the protocols BGP also brings scalability issues Other areas related to routing and network management Security, network design, configuration management, simulation & modeling How performance of routing infrastructure affects user-perceived performance More work in each of three focus areas Ph.D. Defense
75
Future Work for Monitoring
Real-time analysis More meaningful alerting Correlation with other fault and performance data Learn from past events Prioritization of alerts Off-line analysis Correlation with other data sources Work already underway: BGP, fault, performance Identification of problem signatures and feeding them into real-time component for problem prediction Ph.D. Defense
76
Future Work for Characterization
Expand measurements to cover other router vendors and commercial networks Use results to build simulation and analytical models Validation of models Ph.D. Defense
77
Future Work for Maintenance
Improvements to IBB scheme Incremental deployment Reduction in overhead How to use IBB-like schemes in conjunction with other approaches Routing software that can be upgraded without bringing the process down Use of redundant route processors and seamless transfer of control Scheduling maintenance task such that they have minimal impact Ph.D. Defense
78
Networks that manage themselves!
Holy Grail Networks that manage themselves! Ph.D. Defense
79
Probably your last chance… :-)
Grill me ... Probably your last chance… :-) Q and A Ph.D. Defense
80
Backups Ph.D. Defense
81
Partial Adjacency for LSAR
I need LSA L from LSAR I have LSA L R LSAR Please send me LSA L Please send me LSA L Please send me LSA L Partial state Router R does not advertise a link to LSAR Routers (except R) not aware of the presence of LSAR Does not trigger SPF calculations in network LSAR’s going up/down does not impact the network LSAR does not originate any LSAs LSARR link not used for data forwarding LSAR does not install any routes in forwarding table Ph.D. Defense
82
Multiple Inactive Routers for IBB
Loop Avoidance Change in loop detection conditions Simplification for loop prevention No change in black-hole detection Ph.D. Defense
83
Loop Avoidance Set of inactive routers: R1, R2, …, Rn
Loop avoidance procedure applies for each inactive router Detection Router detects loops for all its inactive neighbors Prevention A router can get avoid(Ri, D) messages for j inactive routers (j <= n) The router avoids these j forbidden routers on its path to D Problem: Set of forbidden routers can be different for different destinations O(n) shortest path calculations n = number of vertices Ph.D. Defense
84
Simplification Router avoids all inactive routers if it has some forbidden routers on its path to D Calculate two SPTs: SPT with all inactive routers on it SPT w/o any inactive router on it If the path to D does not contain any forbidden routers on it, Pick next hop for D from the first SPT Else, Pick next hop for D from the second SPT Ph.D. Defense
85
Multiple Inactive Routers: Loop Detection
Loop detection condition for single inactive router cannot detect all loop when multiple routers are inactive Two new conditions for loop detection by neighbors Generalization of loop detection for single inactive router Conditions can result in false positives Evaluation using realistic OSPF topology graphs with two inactive routers Using two conditions together eliminate most false positives (90% hit-rate), but not all... Ph.D. Defense
86
Publications Aman Shaikh, Mukul Goyal, Albert Greenberg, Raju Rajan and K.K. Ramakrishnan, An OSPF Topology Server: Design and Evalution, IEEE J- SAC, 20(4), May 2002. Aman Shaikh and Albert Greenberg, OSPF Monitoring: Architecture, Design, and Deployment Experience, submitted to NSDI, 2004. Aman Shaikh and Albert Greenberg, Experience in Black-box OSPF Measurement, In Proc. ACM SIGCOMM IMW, pp , November 2001 Aman Shaikh, Chris Isett, Albert Greenberg, Matthew Roughan and Joel Gottlieb, A Case Study of OSPF Behavior in a Large Enterprise Network, In Proc. ACM SIGCOMM IMW, pp , November 2002. Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of OSPF, In Proc. IEEE INFOCOM, June 2002. Aman Shaikh, Rohit Dube and Anujan Varma, Avoiding Instability during Graceful Shutdown of Multiple OSPF Routers, submitted to IEEE/ACM Transactions on Networking (ToN). Ph.D. Defense
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.