Internet Routing (COS 598A) Today: Detecting Anomalies Inside an AS Jennifer Rexford Tuesdays/Thursdays.

Slides:



Advertisements
Similar presentations
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Advertisements

Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
CCNA2 Module 4. Discovering and Connecting to Neighbors Enable and disable CDP Use the show cdp neighbors command Determine which neighboring devices.
1 Semester 2 Module 4 Learning about Other Devices Yuda college of business James Chen
Border Gateway Protocol Ankit Agarwal Dashang Trivedi Kirti Tiwari.
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs)
1 Interdomain Routing Protocols. 2 Autonomous Systems An autonomous system (AS) is a region of the Internet that is administered by a single entity and.
Network Measurement COS 461 Recitation
1 Finding a Needle in a Haystack: Pinpointing Significant BGP Routing Changes in an IP Network Jian Wu (University of Michigan) Z. Morley Mao (University.
Traffic Engineering With Traditional IP Routing Protocols
Routing Measurements: Three Case Studies Jennifer Rexford.
1 Traffic Engineering for ISP Networks Jennifer Rexford IP Network Management and Performance AT&T Labs - Research; Florham Park, NJ
MIRED: Managing IP Routing is Extremely Difficult Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
Network Protocols Designed for Optimizability Jennifer Rexford Princeton University
Dynamics of Hot-Potato Routing in IP Networks Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Network Measurement Jennifer Rexford Advanced Computer Networks Tuesdays/Thursdays 1:30pm-2:50pm.
More on BGP Check out the links on politics: ICANN and net neutrality To read for next time Path selection big example Scaling of BGP.
Internet Routing (COS 598A) Today: Interdomain Traffic Engineering Jennifer Rexford Tuesdays/Thursdays.
1 Design and implementation of a Routing Control Platform Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, Jacobus van der.
Internet Routing (COS 598A) Today: Hot-Potato Routing Jennifer Rexford Tuesdays/Thursdays 11:00am-12:20pm.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Impact of BGP Dynamics on Intra-Domain Traffic Patterns in the Sprint IP Backbone Sharad Agarwal, Chen-Nee Chuah, Supratik Bhattacharyya, Christophe Diot.
Measurement and Monitoring Nick Feamster Georgia Tech.
Internet Routing (COS 598A) Today: Multi-Homing Jennifer Rexford Tuesdays/Thursdays 11:00am-12:20pm.
Network Monitoring for Internet Traffic Engineering Jennifer Rexford AT&T Labs – Research Florham Park, NJ 07932
Routing.
Backbone Networks Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
© 2009 Cisco Systems, Inc. All rights reserved.ROUTE v1.0—6-1 Connecting an Enterprise Network to an ISP Network Configuring and Verifying Basic BGP Operations.
Announcements List Lab is still under construction Next session we will have paper discussion, assign papers,
Hot Potatoes Heat Up BGP Routing Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman Shaikh, and.
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira, Aman.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Wide-Area Traffic Management COS 597E: Software Defined Networking.
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Stub.
Network Sensitivity to Hot-Potato Disruptions Renata Teixeira (UC San Diego) with Aman Shaikh (AT&T), Tim Griffin(Intel),
Authors Renata Teixeira, Aman Shaikh and Jennifer Rexford(AT&T), Tim Griffin(Intel) Presenter : Farrukh Shahzad.
1 Pertemuan 20 Teknik Routing Matakuliah: H0174/Jaringan Komputer Tahun: 2006 Versi: 1/0.
M.Menelaou CCNA2 ROUTING. M.Menelaou ROUTING Routing is the process that a router uses to forward packets toward the destination network. A router makes.
Traffic Engineering for ISP Networks Jennifer Rexford Internet and Networking Systems AT&T Labs - Research; Florham Park, NJ
Network Measurement Jennifer Rexford COS 461: Computer Networks Lectures: MW 10-10:50am in Architecture N101
Using Measurement Data to Construct a Network-Wide View Jennifer Rexford AT&T Labs—Research Florham Park, NJ
Jennifer Rexford Fall 2014 (TTh 3:00-4:20 in CS 105) COS 561: Advanced Computer Networks BGP.
TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 11 Unicast Routing Protocols.
Dynamics of Hot-Potato Routing in IP Networks Jennifer Rexford AT&T Labs—Research Joint work with Renata Teixeira (UCSD),
CCNA 3 Week 2 Link State Protocols OSPF. Copyright © 2005 University of Bolton Distance Vector vs Link State Distance Vector –Copies Routing Table to.
A Firewall for Routers: Protecting Against Routing Misbehavior1 June 26, A Firewall for Routers: Protecting Against Routing Misbehavior Jia Wang.
Controlling the Impact of BGP Policy Changes on IP Traffic Jennifer Rexford IP Network Management and Performance AT&T Labs – Research; Florham Park, NJ.
Jennifer Rexford Fall 2010 (TTh 1:30-2:50 in COS 302) COS 561: Advanced Computer Networks Network.
April 4th, 2002George Wai Wong1 Deriving IP Traffic Demands for an ISP Backbone Network Prepared for EECE565 – Data Communications.
Intradomain Traffic Engineering By Behzad Akbari These slides are based in part upon slides of J. Rexford (Princeton university)
Routing and Routing Protocols
Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University
1 OSPF in Multiple Areas. 2 2 Scalability Problems in Large OSPF Areas Scalability problems in large OSPF areas include Large routing tables Large routing.
IP Routing Principles. Network-Layer Protocol Operations Each router provides network layer (routing) services X Y A B C Application Presentation Session.
1 Version 3.1 Module 6 Routed & Routing Protocols.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—6-1 Scaling Service Provider Networks Scaling IGP and BGP in Service Provider Networks.
BGP Routing Stability of Popular Destinations Jennifer Rexford, Jia Wang, Zhen Xiao, and Yin Zhang AT&T Labs—Research Florham Park, NJ All flaps are not.
Connecting an Enterprise Network to an ISP Network
Jian Wu (University of Michigan)
COS 561: Advanced Computer Networks
Routing.
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
COS 561: Advanced Computer Networks
BGP Policies Jennifer Rexford
BGP Interactions Jennifer Rexford
COS 561: Advanced Computer Networks
BGP Instability Jennifer Rexford
Routing.
Presentation transcript:

Internet Routing (COS 598A) Today: Detecting Anomalies Inside an AS Jennifer Rexford Tuesdays/Thursdays 11:00am-12:20pm

Outline Traffic –SNMP link statistics –Packet and flow monitoring Network topology –IP routers and links –Fault data, layer-2 topology, and configuration –Intradomain route monitoring Interdomain routes –BGP route monitoring –Analysis of BGP update data Conclusions

Why is Traffic Measurement Important? Billing the customer –Measure usage on links to/from customers –Applying billing model to generate a bill Traffic engineering and capacity planning –Measure the traffic matrix (i.e., offered load) –Tune routing protocol or add new capacity Denial-of-service attack detection –Identify anomalies in the traffic –Configure routers to block the offending traffic Analyze application-level issues –Evaluate benefits of deploying a Web caching proxy –Quantify fraction of traffic that is P2P file sharing

Collecting Traffic Data: SNMP Simple Network Management Protocol –Standard Management Information Base (MIB) –Protocol for querying the MIBs Advantage: ubiquitous –Supported on all networking equipment –Multiple products for polling and analyzing data Disadvantages: dumb –Coarse granularity of the measurement data E.g., number of byte/packet per interface per 5 minutes –Cannot express complex queries on the data –Unreliable delivery of the data using UDP

Collecting Traffic Data: Packet Monitoring Packet monitoring –Passively collecting IP packets on a link –Recording IP, TCP/UDP, or application-layer traces Advantages: details –Fine-grain timing information E.g., can analyze the burstiness of the traffic –Fine-grain packet contents Addresses, port numbers, TCP flags, URLs, etc. Disadvantages: overhead –Hard to keep up with high-speed links –Often requires a separate monitoring device

Collecting Traffic Data: Flow Statistics Flow monitoring (e.g., Cisco Netflow) –Statistics about groups of related packets (e.g., same IP/TCP headers and close in time) –Recording header information, counts, and time Advantages: detail with less overhead –Almost as good as packet monitoring, except no fine-grain timing information or packet contents –Often implemented directly on the interface card Disadvantages: trade-off detail and overhead –Less detail than packet monitoring –Less ubiquitous than SNMP statistics

Using the Traffic Data in Network Operations SNMP byte/packet counts: everywhere –Tracking link utilizations and detecting anomalies –Generating bills for traffic on customer links –Inference of the offered load (i.e., traffic matrix) Packet monitoring: selected locations –Analyzing the small time-scale behavior of traffic –Troubleshooting specific problems on demand Flow monitoring: selective, e.g,. network edge –Tracking the application mix –Direct computation of the traffic matrix –Input to denial-of-service attack detection

Network Topology

IP Topology Topology information –Routers –Links, and their capacities Internal links inside the AS Edge links connecting to neighboring domains Ways to learn the topology –Inventory database –SNMP polling/traps –Traceroute –Route monitoring –Router configuration data

Below IP Layer-2 paths –ATM virtual circuits –Frame Relay virtual circuits Mapping to lower layers –Specific fibers –Shared optical amplifiers –Shared conduits –Physical length (propagation delay) Information not visible to IP –Stored in an inventory database –Not necessarily generated/updated automatically

Intradomain Monitoring: OSPF Protocol Link-state protocol –Routers flood Link State Advertisements (LSAs) –Routers compute shortest paths based on weights –Routers identify next-hop to reach other routers

Intradomain Route Monitoring Construct continuous view of topology –Detect when equipment goes up or down –Input to traffic-engineering and planning tools Detect routing anomalies –Identify failures, LSA storms, and route flaps –Verify that LSA load matches expectations –Flag strange weight settings as misconfigurations Analyze convergence delay –Monitor LSAs in multiple locations with go –Compare the times when LSAs arrive Detect router implementation mistakes

Passive Collection of LSAs OSPF is a flooding protocol –Every LSA sent on every participating link –Very helpful for simplifying the monitor Can participate in the protocol –Shared media (e.g., Ethernet) Join multicast group and listen to LSAs –Point-to-point links Establish an adjacency with a router … or passively monitor packets on a link –Tap a link and capture the OSPF packets

Reducing the Volume of Information Prioritizing the messages –Router failure over router recovery –Link failure or weight change over a refresh –Informational messages about weight settings Grouping related messages –Link failure: group messages for the two ends –Router failure: group the affected links –Common failure: group links failing close in time

Anomalies Found in the Shaikh04 paper Intermittent hardware problem –Router periodically losing OSPF adjacencies –Risk of network partition if 2 nd failure occurred External link flaps –Congestion on edge link causing lost messages –Lost adjacency leading to flapping routes Configuration errors –Two routers assigned the same IP address –Inefficient config leading to duplicate LSAs Vendor implementation bug –More frequent refreshing of LSAs than specified

Interdomain Route Monitoring

Motivation for BGP Monitoring Visibility into external destinations –What neighboring ASes are telling you –How you are reaching external destinations Detecting anomalies –Increases in number of destination prefixes –Lost reachability to some destinations –Route hijacking –Instability of the routes Input to traffic-engineering tools –Knowing the current routes in the network Workload for testing routers –Realistic message traces to play back to routers

BGP Monitoring: A Wish List Ideally: knowing what the router knows –All externally-learned routes –Before policy has modified the attributes –Before a single best route is picked How to achieve this –Special monitoring session on routers that tells everything they have learned –Packet monitoring on all links with BGP sessions If you can’t do that, you could always do… –Periodic dumps of routing tables –BGP session to learn best route from router

Using Routers to Monitor BGP Talk to operational routers using SNMP or telnet at command line (-) BGP table dumps are expensive (+) Table dumps show all alternate routes (-) Update dynamics lost (-) restricted to interfaces provided by vendors Establish a “passive” BGP session from a workstation running BGP software (+) BGP table dumps do not burden operational routers (-) Receives only best routes from BGP neighbor (+) Update dynamics captured (+) not restricted to interfaces provided by vendors eBGP or iBGP

Atlanta St. Louis San Francisco Denver Cambridge Washington, D.C. Orlando Chicago Seattle Los Angeles Detroit Houston New York Phoenix San Diego Austin Philadelphia Dallas 2 Kansas City Collect BGP Data From Many Routers Route Monitor BGP is not a flooding protocol

Detecting Important Routing Changes Large volume of BGP updates messages –Around 2 million/day, and very bursty –Too much for an operator to manage Identify important anomalies –Lost reachability –Persistent flapping –Large traffic shifts Not the same as root-cause analysis –Identify changes and their effects –Focus on mitigation, rather than diagnosis –Diagnose causes if they occur in/near the AS

Challenge #1: Excess Update Messages A single routing change –Leads to multiple update messages –Affects routing decision at multiple routers BGP Update Grouping BGP Update Grouping EE BR EE EE BGP Updates Events Group updates for a prefix with inter-arrival < 70 seconds, and flag prefixes with changes lasting > 10 minutes. Persistent Flapping Prefixes

Determine “Event Timeout” (70, 98%) Cumulative distribution of BGP update inter-arrival time BGP beacon

Event Duration: Persistent Flapping Complementary cumulative distribution of event duration (600, 0.1%) Long Events

Detecting Persistent Flapping Significant persistent flapping –15.2% of all BGP update messages –… though a small number of destination prefixes –Surprising, especially since flap dampening is used Types of persistent flapping –Conservative flap-damping parameters (78.6%) –Protocol oscillations, e.g., MED oscillation (18.3%) –Unstable interface or BGP session (3.0%)

Example: Unstable eBGP Session AT&T Peer Customer E C E B E A E D p Flap damping parameters is session-based Damping not implemented for iBGP sessions

Challenge #2: Identify Important Events Major concerns of network operators –Changes in reachability –Heavy load of routing messages on the routers –Flow of the traffic through the network Event Classification Event Classification Events “Typed” Events No Disruption Loss/Gain of Reachability Internal Disruption Single External Disruption Multiple External Disruption Classify events by type of impact it has on the network

Event Category – “No Disruption” AT&T E A p E B E C E E AS 2 E D AS 1 No Traffic Shift “No Disruption”: each of the border routers has no traffic shift

Event Category – “Internal Disruption” AT&T E A p E B E C E E AS 2 E D AS 1 Internal Traffic Shift “Internal Disruption”: all of the traffic shifts are internal traffic shift

Event Type: “Single External Disruption” AT&T E A p E B E C E E AS 2 E D AS 1 external Traffic Shift “Single External Disruption”: traffic at one exit point shifts to other exit points

Statistics on Event Classification EventsUpdates No Disruption50.3%48.6% Internal Disruption15.6%3.4% Single External Disruption20.7%7.9% Multiple External Disruption7.4%18.2% Loss/Gain of Reachability6.0%21.9%

Challenge #3: Multiple Destinations A single routing change –Affects multiple destination prefixes Event Correlation Event Correlation “Typed” Events Clusters Group events of same type that occur close in time

Main Causes of Large Clusters External BGP session resets –Failure/recovery of external BGP session –E.g., session to another large tier-1 ISP –Caused “single external disruption” events –Validated by looking at syslog reports on routers Hot-potato routing changes –Failure/recovery of an intradomain link –E.g., leads to changes in IGP path costs –Caused “internal disruption” events –Validated by looking at OSPF measurements

Challenge #4: Popularity of Destinations Impact of event on traffic –Depends on the popularity of the destinations Traffic Impact Prediction Traffic Impact Prediction EE BR EE EE Clusters Large Disruptions Netflow Data Weight the group of destinations by the traffic volume

Traffic Impact Prediction Traffic weight –Per-prefix measurements from Netflow –10% prefixes accounts for 90% of traffic Traffic weight of a cluster –The sum of “traffic weight” of the prefixes Flag clusters with heavy traffic –A few large clusters have large traffic weight –Mostly session resets and hot-potato changes

Conclusions Network troubleshooting from the inside –Traffic, topology, and routing data –Easier to understand what’s going on –… though still challenging to collect/analyze data Traffic measurement –SNMP, packet monitoring, and flow monitoring Routing monitors –Track network state and identify anomalies –Intradomain monitor capturing LSAs –BGP monitor capturing BGP updates

Next Time: BGP Routing Table Size Three papers –“On characterizing BGP routing table growth” –“An empirical study of router response to large BGP routing table load” –“A framework for interdomain route aggregation” Review only of the first paper –Summary –Why accept –Why reject –Avenues for future work Optional –Vanevar Bush on “As We May Think” (1945)