OSPF Monitor Architecture, Design and Deployment Experience

Slides:



Advertisements
Similar presentations
1 A. Sshaikh, A. Greenberg; Nov 01 UCSC Sigcomm IMW Experience in Black-box OSPF Measurement Aman Shaikh, UCSC Albert Greenberg, AT&T Labs-Research.
Advertisements

1 Aman Shaikh UCSC SHS IMW A Case-study of OSPF Behavior in a Large Enterprise Network Aman Shaikh, UCSC Chris Isett, Siemens Health Services Albert.
1 Aman Shaikh: June 02 UCSC INFOCOM 2002 Avoiding Instability during Graceful Shutdown of OSPF Aman Shaikh, UCSC Joint work with Rohit Dube, Xebeo Communications.
CS 4700 / CS 5700 Network Fundamentals Lecture 9: Intra Domain Routing Revised 7/30/13.
1 Aman Shaikh Ph.D. Defense Management of Routing Protocols in IP Networks Ph.D. Defense Aman Shaikh Computer Engineering, UCSC November 18, 2003.
© 2007 Cisco Systems, Inc. All rights reserved.ICND2 v1.0—3-1 Medium-Sized Routed Network Construction Reviewing Routing Operations.
Traffic Engineering With Traditional IP Routing Protocols
Routing So how does the network layer do its business?
CCNA 2 v3.1 Module 6.
1 Relates to Lab 4. This module covers link state routing and the Open Shortest Path First (OSPF) routing protocol. Dynamic Routing Protocols II OSPF.
Dynamics of Hot-Potato Routing in IP Networks Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Unicast Routing Protocols: RIP, OSPF, and BGP
1 Design and implementation of a Routing Control Platform Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, Jacobus van der.
CSEE W4140 Networking Laboratory Lecture 5: IP Routing (OSPF and BGP) Jong Yul Kim
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Routing and Routing Protocols
Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and.
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman.
1 Relates to Lab 4. This module covers link state routing and the Open Shortest Path First (OSPF) routing protocol. Dynamic Routing Protocols II OSPF.
UCSC 1 Aman ShaikhICNP 2003 An Efficient Algorithm for OSPF Subnet Aggregation ICNP 2003 Aman Shaikh Dongmei Wang, Guangzhi Li, Jennifer Yates, Charles.
Delivery, Forwarding and
Introduction to networking Dynamic routes. Objectives  Define dynamic routing and its properties  Describe the classes of routing protocols  Describe.
Link State Routing Protocol W.lilakiatsakun. Introduction (1) Link-state routing protocols are also known as shortest path first protocols and built around.
Open Shortest Path First (OSPF) -Sheela Anand -Kalyani Ravi -Saroja Gadde.
Routing and Routing Protocols Dynamic Routing Overview.
1 CS 4396 Computer Networks Lab Dynamic Routing Protocols - II OSPF.
Link-State Routing Protocols
Authors Renata Teixeira, Aman Shaikh and Jennifer Rexford(AT&T), Tim Griffin(Intel) Presenter : Farrukh Shahzad.
Routing/Routed Protocols. Remember: A Routed Protocol – defines logical addressing. Most notable example on the test – IP A Routing Protocol – fills the.
Unicast Routing Protocols  A routing protocol is a combination of rules and procedures that lets routers in the internet inform each other of changes.
Routing and Routing Protocols Routing Protocols Overview.
1 Introducing Routing 1. Dynamic routing - information is learned from other routers, and routing protocols adjust routes automatically. 2. Static routing.
M.Menelaou CCNA2 ROUTING. M.Menelaou ROUTING Routing is the process that a router uses to forward packets toward the destination network. A router makes.
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 2 Module 6 Routing and Routing Protocols.
© Janice Regan, CMPT 128, CMPT 371 Data Communications and Networking BGP, Flooding, Multicast routing.
Routing protocols Basic Routing Routing Information Protocol (RIP) Open Shortest Path First (OSPF)
Routing/Routed Protocols Part I. Routed Protocol Definition: Routed Protocol – used to transmit user data (packets) through an internetwork. Routed protocols.
Network Architecture and Design
1 © 2003, Cisco Systems, Inc. All rights reserved. CCNA 3 v3.0 Module 2 Single-Area OSPF.
CCNA 3 Week 2 Link State Protocols OSPF. Copyright © 2005 University of Bolton Distance Vector vs Link State Distance Vector –Copies Routing Table to.
1 Module 4: Implementing OSPF. 2 Lessons OSPF OSPF Areas and Hierarchical Routing OSPF Operation OSPF Routing Tables Designing an OSPF Network.
CCNA Guide to Cisco Networking Fundamentals Fourth Edition
Simulation of the OLSRv2 Protocol First Report Presentation.
OSPF Offloading: The HELLO Protocol A First Step Toward Distributed Heterogeneous Offloading Speaker: Mary Bond.
Routing and Routing Protocols
Routing Networks and Protocols Prepared by: TGK First Prepared on: Last Modified on: Quality checked by: Copyright 2009 Asia Pacific Institute of Information.
1 OSPF in Multiple Areas. 2 2 Scalability Problems in Large OSPF Areas Scalability problems in large OSPF areas include Large routing tables Large routing.
Open Shortest Path First (OSPF)
Dynamic Routing Protocols II OSPF
Routing protocols. 1.Introduction A routing protocol is the communication used between routers. A routing protocol allows routers to share information.
© 2002, Cisco Systems, Inc. All rights reserved..
Routing Protocols Brandon Wagner.
© 2009 Cisco Systems, Inc. All rights reserved. ROUTE v1.0—3-1 Implementing a Scalable Multiarea Network OSPF-Based Solution Planning Routing Implementations.
Chapter 25 Internet Routing. Static Routing manually configured routes that do not change Used by hosts whose routing table contains one static route.
Single Area OSPF Module 2, Review How routing information is maintained Link-state routers apply the Dijkstra shortest path first algorithm against.
GROUP ASSIGNMENT CT NWT NETWORK TROUBLESHOOTING Name: Tan Ming Fatt Student ID: TP Group Members: - Gan Pei Shan Elamparithi A/L Thuraisamy.
ROUTING ON THE INTERNET COSC Jun-16. Routing Protocols  routers receive and forward packets  make decisions based on knowledge of topology.
CSE 421 Computer Networks. Network Layer 4-2 Chapter 4: Network Layer r 4. 1 Introduction r 4.2 Virtual circuit and datagram networks r 4.3 What’s inside.
CS 268: Lecture 9 Intra-domain Routing Protocols Ion Stoica Computer Science Division Department of Electrical Engineering and Computer Sciences University.
1 Relates to Lab 4. This module covers link state routing and the Open Shortest Path First (OSPF) routing protocol. Dynamic Routing Protocols II OSPF.
Multi Node Label Routing – A layer 2.5 routing protocol
Management of Routing Protocols in IP Networks
Dynamic Routing Protocols part2
Chapter 5: Dynamic Routing
Chapter 9: Multiarea OSPF
CS 268: Lecture 8 Intra-domain Routing Protocols
Dynamic Routing and OSPF
COS 561: Advanced Computer Networks
Chapter 9: Multiarea OSPF
Chapter 9: Multiarea OSPF
Presentation transcript:

OSPF Monitor Architecture, Design and Deployment Experience Aman Shaikh Albert Greenberg AT&T Labs - Research NSDI 2004 OSPF Monitor - NSDI 2004

Objectives for OSPF Monitor Real-time analysis of OSPF behavior Trouble-shooting, alerting, validation of maintenance Real-time snapshots of OSPF network topology Off-line analysis Post-mortem analysis of recurring problems Generate statistics and reports about network performance Identify anomaly signatures Facilitate tuning of configurable parameters Improve maintenance procedures Analyze OSPF behavior in commercial networks OSPF Monitor - NSDI 2004

OSPF Monitor in a Nutshell Collect OSPF LSAs (Link State Advertisements) passively from network Every router describes its local connectivity in an LSA Router originates an LSA due to... Change in network topology Periodic soft-state refresh LSA is flooded to other routers in the domain Flooding is reliable and hop-by-hop Flooding leads to duplicate copies of LSAs being received Every router stores LSAs (self-originated + received) in link-state database (= topology graph) Real-time analysis of LSA streams Archive LSAs for off-line analysis OSPF Monitor - NSDI 2004

Components Data collection: LSA Reflector (LSAR) Passively collects OSPF LSAs from network “Reflects” streams of LSAs to LSAG Archives LSAs for analysis by OSPFScan Real-time analysis: LSA aGgregator (LSAG) Monitors network for topology changes, LSA storms, node flaps and anomalies Off-line analysis: OSPFScan Supports queries on LSA archives Allows playback and modeling of topology changes Allows emulation of OSPF routing OSPF Monitor - NSDI 2004

Example OSPF Network Area 1 Area 0 Area 2 Real-time Monitoring LSAG OSPFScan Off-line Analysis LSAs TCP Connection LSAs LSAs LSAR 1 LSAR 2 “Reflect” LSA “Reflect” LSA LSA archive LSA archive LSA archive replicate LSAs LSAs LSAs OSPF Network Area 1 Area 0 Area 2 OSPF Monitor - NSDI 2004

How LSAR attaches to Network Host mode Join multicast group Adv: completely passive Disadv: not reliable, delayed initialization of LSDB Full adjacency mode Form full adjacency (= peering session) with a router Adv: reliable, immediate initialization of LSDB Disadv: LSAR’s instability can impact entire network Partial adjacency mode Keep adjacency in a state that allows LSAR to receive LSAs, but does not allow data forwarding over link Adv: reliable, LSAR’s instability does not impact entire network, immediate initialization of LSDB Disadv: can raise alarms on the router OSPF Monitor - NSDI 2004

Partial Adjacency for LSAR I need LSA L from LSAR I have LSA L R LSAR Please send me LSA L Please send me LSA L Please send me LSA L Partial state Router R does not advertise a link to LSAR LSAR does not originate any LSAs Routers (except R) not aware of LSAR’s presence Does not trigger routing calculations in network LSAR’s going up/down does not impact network LSARR link is not used for data forwarding OSPF Monitor - NSDI 2004

LSA aGregator (LSAG) Analyzes “reflected” LSAs from LSARs in real-time Generates console messages: Change in OSPF network topology ADJACENY COST CHANGE: rtr 10.0.0.1 (intf 10.0.0.2)  rtr 10.0.0.5 old_cost 1000 new_cost 50000 area 0.0.0.0 Node flaps RTR FLAP: rtr 10.0.0.12 no_flaps 7 flap_window 570 sec LSA storms LSA STORM: lstype 3 lsid 10.1.0.0 advrt 10.0.0.3 area 0.0.0.0 no_lsas 7 storm_window 470 sec Anomalous behavior TYPE-3 ROUTE FROM NON-BORDER RTR: ntw 10.3.0.0/24 rtr 10.0.0.6 area 0.0.0.0 Dumps snapshots of network topology OSPF Monitor - NSDI 2004

OSPFScan Tools for off-line analysis of LSA archives Parse, select (based on queries), and analyze Functionality supported by OSPFScan Classification of LSA traffic Change LSAs, refresh LSAs, duplicate LSAs Emulation of OSPF Routing How OSPF routing tables evolved in response to network changes How end-to-end path within OSPF domain looked like at any instance Modeling of topology changes Vertex addition/deletion and link addition/deletion/change_cost Playback of topology change events Statistics and report generation OSPF Monitor - NSDI 2004

Performance Evaluation Performance of LSAR and LSAG through lab experiments LSAR and LSAG are key to real-time monitoring How performance scales with LSA-rate and network size OSPF Monitor - NSDI 2004

Experimental Setup PC SUT Measure LSA processing time for LSAG LSAG Emulated topology TCP connection LSA LSA LSA LSA OSPF adjacency Zebra LSAR TCP connection Measure LSA pass-through time for LSAR LSA OSPF Monitor - NSDI 2004

Methodology Send a burst of LSAs from Zebra to LSAR Vary number of LSAs (l) in a burst of 1 sec duration Use of fully connected graph as the emulated topology Vary number of nodes (n) in the topology Performance measurements LSAR performance: LSA “pass-through” time Zebra measures time difference between sending and receiving an LSA from LSAR LSAG performance: LSA processing time Instrumentation of LSAG code OSPF Monitor - NSDI 2004

LSAR Performance OSPF Monitor - NSDI 2004

LSAG Performance OSPF Monitor - NSDI 2004

Deployment Tier-1 ISP network Enterprise network Area 0, 100+ routers; point-to-point links Deployed since January, 2003 LSA archive size: 8 MB/day LSAR connection: partial adjacency mode Enterprise network 15 areas, 500+ routers; Ethernet-based LANs Deployed since February, 2002 LSA archive size: 10 MB/day LSAR connection: host mode OSPF Monitor - NSDI 2004

LSAG in Day-to-day Operations Generation of alarms by feeding messages into higher layer network management systems Grouping of messages to reduce the number of alarms Prioritization of messages Validation of maintenance steps and monitoring the impact of these steps on network-wide OSPF behavior Example: Network operators use cost-out/cost-in of links to carry out maintenance A “link-audit” web-page allows operators to keep track of link costs in real-time OSPF Monitor - NSDI 2004

Problems Caught by LSAG Equipment problem Detected internal problems in a crucial router in enterprise network Problem manifested as episodes of OSPF adjacency flapping Configuration problem Identified assignment of same router-id to two routers in enterprise network OSPF implementation bug Caught a bug in type-3 LSA generation code of a router vendor in ISP network Faster refresh of LSAs than standards-mandated rate OSPF Monitor - NSDI 2004

Long Term Analysis by OSPFScan LSA traffic analysis Identified excessive duplicate LSA traffic in some areas of Enterprise Network Led to root-cause analysis and preventative steps Statistics generation Inter-arrival time of change LSAs in ISP network Fine-tuning configurable timers related to route calculation (= SPF calculation) Mean down-time and up-time for links and routers in ISP network Assessment of reliability and availability OSPF Monitor - NSDI 2004

Lessons Learned through Deployment New tools reveal new failure modes Real-time alerting and off-line analysis are complementary Distributed architecture helped a lot OSPF exhibits significant activity in real networks Maintenance and genuine problems Add functionality incrementally and through interaction with users Archive all LSAs LSA volume is manageable Don’t throw away refresh and duplicate LSAs OSPF Monitor - NSDI 2004

Conclusion Three component architecture Performance analysis LSAR: data collection LSAG: real-time analysis OSPFScan: off-line analysis Performance analysis LSAR and LSAG scale well as LSA-rate and network size increases Deployment Deployed in Tier-1 ISP and Enterprise network Has proved to be an extremely valuable tool for network management “OSPF Monitor was a Lifesaver” VP of Networking, Enterprise network OSPF Monitor - NSDI 2004

Future Work Real-time analysis Off-line analysis Correlation with other fault and performance data for more meaningful alerting Prioritization of alerts Off-line analysis Correlation with other data sources Work already underway: BGP, fault, performance Identification of problem signatures and feeding them into real-time component for problem prediction OSPF Monitor - NSDI 2004

Backup Slides OSPF Monitor - NSDI 2004

Overview of OSPF OSPF is a link-state protocol Every router learns entire network topology Topology is represented as graph Routers are vertices, links are edges Every link is assigned weight through configuration Every router uses Dijkstra’s single source shortest path algorithm to build its forwarding table Router builds Shortest Path Tree (SPT) with itself as root Shortest Path Calculation (SPF) Packets are forwarded along shortest paths defined by link weights OSPF Monitor - NSDI 2004

Areas in OSPF OSPF allows domain to be divided into areas for scalability Areas are numbered 0, 1, 2 … Hub-and-spoke with area 0 as hub Every link is assigned to exactly one area Routers with links in multiple areas are called border routers Border routers Area 1 Area 2 Area 0 OSPF Monitor - NSDI 2004

Summarization with Areas Each router learns Entire topology of its attached areas Information about subnets in remote areas and their distance from the border routers Distance = sum of link costs from border router to subnet Area 1 Area 0 20 100 B1 B2 C1 C2 10.10.4.0/24 10.10.5.0/24 10 50 200 500 400 300 R3 R2 R1 OSPF domain B1 B2 R2 Area 0 100 200 500 400 300 R3 R1 R1’s View Area 1 10.10.4.0/24 10.10.5.0/24 20 70 10 60 OSPF Monitor - NSDI 2004

Link State Advertisements (LSAs) Every router describes its local connectivity in Link State Advertisements (LSAs) Router originates an LSA due to… Change in network topology Example: link goes down or comes up Periodic soft-state refresh Recommended value of interval is 30 minutes LSA is flooded to other routers in the domain Flooding is reliable and hop-by-hop Includes change and refresh LSAs Flooding leads to duplicate copies of LSAs being received Every router stores LSAs (self-originated + received) in link-state database (= topology graph) OSPF Monitor - NSDI 2004

Adjacency Neighbor routers (i.e., routers connected by a physical link) form an adjacency The purpose is to make sure Link is operational and routers can communicate with each other Neighbor routers have consistent view of network topology To avoid loops and black holes Link gets used for data forwarding only after adjacency is established Use of periodic Hellos to monitor the status of link and adjacency OSPF Monitor - NSDI 2004

Equipment Problem at Enterprise Network Internal errors in a router in area 0 Episodes where router would drop adjacencies with other routers Problem manifested in LSAG as “ADJ UP” and “ADJ DOWN” messages Not visible in other network management systems Led to proactive maintenance OSPF Monitor - NSDI 2004

LSA Traffic in Enterprise Network Area 0 Days Area 2 Days Refresh LSAs Genuine Anomaly Change LSAs Area 3 Days Area 4 Days Duplicate LSAs Artifact: 23 hr day (Apr 7) OSPF Monitor - NSDI 2004

Overhead: Duplicate LSAs Days Why do some areas witness substantial duplicate LSA traffic, while other areas do not witness any? OSPF flooding over LANs leads to control plane asymmetries and to imbalances in duplicate LSA traffic OSPF Monitor - NSDI 2004