Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)

Slides:



Advertisements
Similar presentations
Presented by Nikita Shah 5th IT ( )
Advertisements

The Public Sector and Xtremesofts AppMetrics Working Together to Maximize Application Availability for Government Servants and Citizens Web Site:
Using Network Virtualization Techniques for Scalable Routing Nick Feamster, Georgia Tech Lixin Gao, UMass Amherst Jennifer Rexford, Princeton University.
Network Monitoring System In CSTNET Long Chun China Science & Technology Network.
1 Fault Analysis for Large-scale Campus-wide Wireless Networks Jian Chen Department of CS, Tsinghua University, Beijing, China.
The Role of a Registry Certificate Authority Some Steps towards Improving the Resiliency of the Internet Routing System: The Role of a Registry Certificate.
Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)
1 Diagnosing Network Disruptions with Network-wide Analysis Yiyi Huang, Nick Feamster, Anukool Lakhina*, Jim Xu College of Computing, Georgia Tech * Guavus,
Resonance: Dynamic Access Control in Enterprise Networks Ankur Nayak, Alex Reimers, Nick Feamster, Russ Clark School of Computer Science Georgia Institute.
Path Splicing with Network Slicing
1 OpenFlow Research on the Georgia Tech Campus Network Russ Clark Nick Feamster Students: Yogesh Mundada, Hyojoon Kim, Ankur Nayak, Anirudh Ramachandran,
Data Mining Challenges for Network Management Nick Feamster, Georgia Tech Dave Andersen, CMU (joint with Jay Lepreau and Emulab)
Diagnosing Network Disruptions with Network-wide Analysis Yiyi Huang, Nick Feamster, Anukool Lakhina, Jim Xu College of Computing, Georgia Tech Boston.
Networking Research Nick Feamster CS Nick Feamster Ph.D. from MIT, Post-doc at Princeton this fall Arriving January 2006 –Here off-and-on until.
Improving Internet Availability with Path Splicing Nick Feamster Georgia Tech Joint work with Murtaza Motiwala and Santosh Vempala.
Challenges in Making Tomography Practical
Data-Plane Accountability with In-Band Path Diagnosis Murtaza Motiwala, Nick Feamster Georgia Tech Andy Bavier Princeton University.
Research Summary Nick Feamster. The Big Picture Improving Internet availability by making networks easier to operate Three approaches –From the ground.
Internet Availability Nick Feamster Georgia Tech.
Nick Feamster Research Interest: Networked Systems Arriving January 2006 Likely teaching CS 7260 in Spring 2005 Here off-and-on until then. works.
Characterizing VLAN-Induced Sharing in a Campus Network
My Experience Writing an NSF NeTS FIND Proposal Nick Feamster Georgia Tech.
Multihoming and Multi-path Routing
Nick Feamster Research: Network security and operations –Helping network operators run the network better –Helping users help themselves Lab meetings:
Network Operations Nick Feamster
Network Troubleshooting: rcc and Beyond Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina)
1 Resonance: Dynamic Access Control in Enterprise Networks Ankur Nayak, Alex Reimers, Nick Feamster, Russ Clark School of Computer Science Georgia Institute.
Network Operations Nick Feamster
Network Operations Research Nick Feamster
Path Splicing with Network Slicing Nick Feamster Murtaza Motiwala Santosh Vempala.
Theory Lunch. 2 Problem Areas Network Virtualization for Experimentation and Architecture –Embedding problems –Economics problems (markets, etc.) Network.
Multihoming and Multi-path Routing
(Distributed) Denial of Service Nick Feamster CS 4251 Spring 2008.
1 Resonance: Dynamic Access Control in Enterprise Networks Ankur Nayak, Alex Reimers, Nick Feamster, Russ Clark School of Computer Science Georgia Institute.
Top-Down Network Design Chapter Nine Developing Network Management Strategies Copyright 2010 Cisco Press & Priscilla Oppenheimer.
Chapter 1: Introduction to Scaling Networks
The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.
Management: Fault Detection and Troubleshooting Nick Feamster CS 7260 February 5, 2007.
Deployment of MPLS VPN in Large ISP Networks
Measurement and Monitoring Nick Feamster Georgia Tech.
© 2009 Cisco Systems, Inc. All rights reserved. ROUTE v1.0—6-1 Connecting an Enterprise Network to an ISP Network Considering the Advantages of Using BGP.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
Bandwidth DoS Attacks and Defenses Robert Morris Frans Kaashoek, Hari Balakrishnan, Students MIT LCS.
Remote Monitoring and Desktop Management Week-7. SNMP designed for management of a limited range of devices and a limited range of functions Monitoring.
Presented by INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used?
Nick Feamster Interdomain Routing Correctness and Stability.
Formal checkings in networks James Hongyi Zeng with Peyman Kazemian, George Varghese, Nick McKeown.
1 October 20-24, 2014 Georgian Technical University PhD Zaza Tsiramua Head of computer network management center of GTU South-Caucasus Grid.
Unit 4, Lesson 11 How Data Travels the Internet
Happy Network Administrators  Happy Packets  Happy Users WIRED Position Statement Aman Shaikh AT&T Labs – Research October 16,
1/28/2010 Network Plus Network Device Review. Physical Layer Devices Repeater –Repeats all signals or bits from one port to the other –Can be used extend.
Current Practice for Network Analysis in CSTNet Chunjing Han CSTNET, CNIC
1 Second ATLAS-South Caucasus Software / Computing Workshop & Tutorial October 24, 2012 Georgian Technical University PhD Zaza Tsiramua Head of computer.
1 © 2001, Cisco Systems, Inc. All rights reserved. Cisco Info Center for Security Monitoring.
Towards an Internet that “Never Fails” Hari Balakrishnan MIT Joint work with Nick Feamster, Scott Shenker, Mythili Vutukuru.
A Firewall for Routers: Protecting Against Routing Misbehavior1 June 26, A Firewall for Routers: Protecting Against Routing Misbehavior Jia Wang.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Exploring the Enterprise Network Infrastructure Introducing Routing and Switching.
Net Optics Confidential and Proprietary 1 Bypass Switches Intelligent Access and Monitoring Architecture Solutions.
1 A Framework for Measuring and Predicting the Impact of Routing Changes Ying Zhang Z. Morley Mao Jia Wang.
INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used? Tripwire.
1 Version 3.1 Module 6 Routed & Routing Protocols.
정하경 MMLAB Fundamentals of Internet Measurement: a Tutorial Nevil Brownlee, Chris Lossley, “Fundamentals of Internet Measurement: a Tutorial,” CMG journal.
IS3220 Information Technology Infrastructure Security
Matt Jennings.  What is DDoS?  Recent DDoS attacks  History of DDoS  Prevention Techniques.
IPv6 Security Issues Georgios Koutepas, NTUA IPv6 Technology and Advanced Services Oct.19, 2004.
Dissecting Significant Outages from 2014 Valerio Plessi CCIE R&S Customer Success Engineer
Examples based on draft-cheng-supa-applicability-00.txt
Introduction to Internet Routing
TRIP WIRE INTRUSION DETECTION SYSYTEM Presented by.
Lecture 10, Computer Networks (198:552)
Presentation transcript:

Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)

2 Three Disjoint Views of the Network Policy: The operators wish list Static: What the configurations say Dynamic: The behavior that users witness PolicyStaticDynamic Generation Error Checking and Deployment - rancid/rcc - FIREMAN/Lumeta - ping - traceroute - … Independent analyses!

3 A Closer Look Proactive analysis –Fault avoidance –Policy conformance Reactive diagnosis –Correcting network faults Detection Localization –Active and passive measurements –Need users perspective Idea: These analyses should inform each other Two studies 1.Routing 2.Firewalls

4 Catastrophic Configuration Faults …a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint. -- news.com, April 25, 1997Sprint Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes. -- wired.com, January 25, 2001 WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue." -- cnn.com, October 3, 2002 "A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration). -- dslreports.com, February 23, 2004

5 Case 1: Network-Wide Routing Analysis Proactive routing configuration analysis Idea: Analyze configuration before deployment Configure Detect Faults Deploy rcc Many faults can be detected with static analysis.

6 Operators Find Static Analysis Useful Thats wicked! -- Nicolas Strina, ip-man.net Thanks again for a great tool. -- Paul Piecuch, IT Manager...good to finally see more coverage of routing as distributed programming. From my experience, the principles of software engineering eliminate a vast majority of errors. -- Joe Provo, rcn.com I find your approach useful, it is really not fun (but critical for the health of the network) to keep track of the inconsistencies among different routers…a configuration verifier like yours can give the operator a degree of confidence that the sky won't fall on his head real soon now. -- Arnaud Le Tallanter, clara.net

7 Yes, but Surprises Happen! Link failures Node failures Traffic volumes shift Network devices wedged … Two problems –Detection –Localization

8 Detection: Analyze Routing Dynamics Idea: Routers exhibit correlated behavior Blips across signals may be more operationally interesting than any spike in one.

9 Detection Three Types of Events Single-router bursts Correlated bursts Multi-router bursts Common Commonly missed using thresholds

10 Localization: Joint Dynamic/Static Which routers are border routers for that burst Topological properties of routers in the burst StaticDynamic Proactive Analysis Deployment Reactive Detection Diagnosis/ Correction

11 Case 2: Firewalls Georgia Tech Campus Network –Research and Administrative Network –180 buildings –130+ firewalls –1700+ switches – ports Problem: Availability/Reachability –Flux in firewall, router, switch configurations –No common authority over changes made

12 Specific Focus: Firewall Configuration Difficult to understand and audit configs Subject to continual modifications –Roughly 1-2 touches per day Federated policy, distributed dependencies –Each department has independent policies –Local changes may affect global behavior

13 (Immediate) Open Issues Reachability and reliability of controller Service-level probes –Diagnostic tools != Service-level Happiness Policy conformance