1 Diagnosing Network Disruptions with Network-wide Analysis Yiyi Huang, Nick Feamster, Anukool Lakhina*, Jim Xu College of Computing, Georgia Tech * Guavus,

Slides:



Advertisements
Similar presentations
Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)
Advertisements

Nick Feamster Georgia Tech
Wenke Lee and Nick Feamster Georgia Tech Botnet and Spam Detection in High-Speed Networks.
Improving Internet Availability. Some Problems Misconfiguration Miscoordination Efficiency –Market efficiency –Efficiency of end-to-end paths Scalability.
Dynamics of Online Scam Hosting Infrastructure
Data Mining Challenges for Network Management Nick Feamster, Georgia Tech Dave Andersen, CMU (joint with Jay Lepreau and Emulab)
Diagnosing Network Disruptions with Network-wide Analysis Yiyi Huang, Nick Feamster, Anukool Lakhina, Jim Xu College of Computing, Georgia Tech Boston.
Networking Research Nick Feamster CS Nick Feamster Ph.D. from MIT, Post-doc at Princeton this fall Arriving January 2006 –Here off-and-on until.
Challenges in Making Tomography Practical
Data-Plane Accountability with In-Band Path Diagnosis Murtaza Motiwala, Nick Feamster Georgia Tech Andy Bavier Princeton University.
Research Summary Nick Feamster. The Big Picture Improving Internet availability by making networks easier to operate Three approaches –From the ground.
Spamming with BGP Spectrum Agility Anirudh Ramachandran Nick Feamster Georgia Tech.
Nick Feamster Research Interest: Networked Systems Arriving January 2006 Likely teaching CS 7260 in Spring 2005 Here off-and-on until then. works.
Characterizing VLAN-Induced Sharing in a Campus Network
Path Splicing Nick Feamster, Murtaza Motiwala, Megan Elmore, Santosh Vempala.
Multihoming and Multi-path Routing
Nick Feamster Research: Network security and operations Teaching CS 7260 in Spring 2007 CS 7001 Mini-projects: –
Nick Feamster Research: Network security and operations –Helping network operators run the network better –Helping users help themselves Lab meetings:
Nick Feamster Research: Network security and operations –Helping network operators run the network better –Helping users help themselves Lab meetings:
The Datapository Dave Andersen, CMU James Moss, CMU Nick Feamster, Georgia Tech
Nick Feamster Research: Network security and operations –Helping network operators run the network better –Helping users help themselves Lab meetings:
Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)
Network Security Highlights Nick Feamster Georgia Tech.
1 Dynamics of Online Scam Hosting Infrastructure Maria Konte, Nick Feamster Georgia Tech Jaeyeon Jung Intel Research.
1 Network-Level Spam Detection Nick Feamster Georgia Tech.
Nick Feamster Georgia Tech Joint work with Mukarram bin Tariq, Murtaza Motiwala, Yiyi Huang, Mostafa Ammar, Anukool Lakhina, Jim Xu Detecting and Diagnosing.
Network Operations Research Nick Feamster
Path Splicing with Network Slicing Nick Feamster Murtaza Motiwala Santosh Vempala.
Cabo: Concurrent Architectures are Better than One Nick Feamster, Georgia Tech Lixin Gao, UMass Amherst Jennifer Rexford, Princeton.
Network Security Highlights Nick Feamster Georgia Tech.
Nick Feamster Georgia Tech
Multihoming and Multi-path Routing
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v MPLS VPN Technology Introducing the MPLS VPN Routing Model.
BGP Overview Processing BGP Routes.
URCA: Pulling out Anomalies by their Root Causes Fernando Silveira and Christophe Diot.
© 2006 Cisco Systems, Inc. All rights reserved. MPLS v2.2—5-1 MPLS VPN Implementation Configuring BGP as the Routing Protocol Between PE and CE Routers.
Technical Aspects of Peering Session 4. Overview Peering checklist/requirements Peering step by step Peering arrangements and options Exercises.
BGP route propagation between neighboring domains Renata Teixeira Laboratoire LIP6 – CNRS University Pierre et Marie Curie – Paris 6 with Steve Uhlig (Delft.
Detectability of Traffic Anomalies in Two Adjacent Networks Augustin Soule, Haakon Ringberg, Fernando Silveira, Jennifer Rexford, Christophe Diot.
The Connectivity and Fault-Tolerance of the Internet Topology
Fundamentals of Computer Networks ECE 478/578 Lecture #18: Policy-Based Routing Instructor: Loukas Lazos Dept of Electrical and Computer Engineering University.
1 BGP Anomaly Detection in an ISP Jian Wu (U. Michigan) Z. Morley Mao (U. Michigan) Jennifer Rexford (Princeton) Jia Wang (AT&T Labs)
S ufficient C onditions to G uarantee P ath V isibility Akeel ur Rehman Faridee
More on BGP Check out the links on politics: ICANN and net neutrality To read for next time Path selection big example Scaling of BGP.
1 Design and implementation of a Routing Control Platform Matthew Caesar, Donald Caldwell, Nick Feamster, Jennifer Rexford, Aman Shaikh, Jacobus van der.
A Routing Control Platform for Managing IP Networks Jennifer Rexford Princeton University
Measurement and Monitoring Nick Feamster Georgia Tech.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
Routing Measurements Matt Zekauskas, ITF Meeting 2006-Apr-24.
Towards a Logic for Wide- Area Internet Routing Nick Feamster Hari Balakrishnan.
CS 3700 Networks and Distributed Systems Inter Domain Routing (It’s all about the Money) Revised 8/20/15.
Remote Trigger Black Hole 111. Remotely Triggered Black Hole Filtering We use BGP to trigger a network wide response to a range of attack flows. A simple.
Automatic network configuration: Position presentation Cristel Pelsser WODNAFO, Feb
The New Policy for Enterprise Networking Robert Bays Chief Scientist June 2002.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—6-1 Scaling Service Provider Networks Scaling IGP and BGP in Service Provider Networks.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—5-1 Customer-to-Provider Connectivity with BGP Connecting a Multihomed Customer to a Single Service.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—2-1 BGP Transit Autonomous Systems Forwarding Packets in a Transit AS.
© 2005 Cisco Systems, Inc. All rights reserved. BGP v3.2—7-1 Optimizing BGP Scalability Improving BGP Convergence.
Route Selection Using Attributes
1 Effective Diagnosis of Routing Disruptions from End Systems Ying Zhang Z. Morley Mao Ming Zhang.
Predictive Analytics derived from HVAC and PMU data at UCSD Chuck Wells Industry Principal OSIsoft, LLC 1.
CS 3700 Networks and Distributed Systems
Scaling Service Provider Networks
BGP 1. BGP Overview 2. Multihoming 3. Configuring BGP.
CS 3700 Networks and Distributed Systems
Jian Wu (University of Michigan)
Connecting an Enterprise Network to an ISP Network
Scaling Service Provider Networks
Backbone Networks Mike Freedman COS 461: Computer Networks
Computer Networks Protocols
Presentation transcript:

1 Diagnosing Network Disruptions with Network-wide Analysis Yiyi Huang, Nick Feamster, Anukool Lakhina*, Jim Xu College of Computing, Georgia Tech * Guavus, Inc.

2 Problem Overview Network routing disruptions are frequent –On Abilene from January 1,2006 to June 30, s, 282 disruptions How to help network operators deal with disruptions quickly? –Massive amounts of data –Lots of noise –Need for fast detection

3 Existing Approaches Many existing tools and data sources –Tivoli Netcool, SNMP, Syslog, IGP, BGP, etc. Possible issues –Noise level –Time to detection Network-wide correlation/analysis –Not just reporting on manually specified traps This talk: Explore complementary data sources –First step: Mining BGP routing data

4 Challenges: Analyzing Routing Data Large volume of data Lack of semantics in a single stream of routing updates Needed: Mining, not simple reporting Idea: Can we improve detection by mining network- wide dependencies across routing streams?

5 Key Idea: Network-Wide Analysis Structure and configuration of the network gives rise to dependencies across routers Analysis should be cognizant of these dependencies. Dont treat streams of data independently. Big network events may cause correlated blips.

6 Overview

7 Approach: network-wide, multivariate analysis –Model network-wide dependencies directly from the data –Extract common trends –Look for deviations from those trends High detection rate (for acceptable false positives) –100% of node/link disruptions, 60% of peer disruptions Fast detection –Current time to reporting (in minutes) Detection

8 Identification: Approach Classify disruptions into four types –Internal node, internal link, peer, external node Track three features 1.Global iBGP next-hops 2.Local iBGP next-hops 3.Local eBGP next-hops Approach Goal

9 Identification: Results

10 Main Results 90% of local disruptions are visible in BGP –Many disruptions are low volume –Disruption size can vary by several orders of magnitude About 75% involve more than 2 routers –Analyze data across streams –BGP routing data is but one possible input data set Detection: –100% of node and link disruptions –60% of peer disruptions Identification –100% of node disruptions, –74% of link disruptions –93% of peer disruptions

11 Questions for Network Operators How happy are you with existing approaches? What are the most common types of network faults you must diagnose? How effective are existing tools in terms of: –Reducing the noise level in reporting? –Fast detection? What about incorporating other data (e.g., active probes)? Can you help us work with you to improve detection/identification of disruptions? Please send thoughts to feamster at cc.gatech.edu.