Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)

There are copies: 1
Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)

Similar presentations


Presentation on theme: "Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)"— Presentation transcript:

1 Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)

2 2 Three Disjoint Views of the Network Policy: The operators wish list Static: What the configurations say Dynamic: The behavior that users witness PolicyStaticDynamic Generation Error Checking and Deployment - rancid/rcc - FIREMAN/Lumeta - ping - traceroute - … Independent analyses!

3 3 A Closer Look Proactive analysis –Fault avoidance –Policy conformance Reactive diagnosis –Correcting network faults Detection Localization –Active and passive measurements –Need users perspective Idea: These analyses should inform each other Two studies 1.Routing 2.Firewalls

4 4 Catastrophic Configuration Faults …a glitch at a small ISP… triggered a major outage in Internet access across the country. The problem started when MAI Network Services...passed bad router information from one of its customers onto Sprint. -- news.com, April 25, 1997Sprint Microsoft's websites were offline for up to 23 hours...because of a [router] misconfiguration…it took nearly a day to determine what was wrong and undo the changes. -- wired.com, January 25, 2001 WorldCom Inc…suffered a widespread outage on its Internet backbone that affected roughly 20 percent of its U.S. customer base. The network problems…affected millions of computer users worldwide. A spokeswoman attributed the outage to "a route table issue." -- cnn.com, October 3, 2002 "A number of Covad customers went out from 5pm today due to, supposedly, a DDOS (distributed denial of service attack) on a key Level3 data center, which later was described as a route leak (misconfiguration). -- dslreports.com, February 23, 2004

5 5 Case 1: Network-Wide Routing Analysis Proactive routing configuration analysis Idea: Analyze configuration before deployment Configure Detect Faults Deploy rcc Many faults can be detected with static analysis.

6 6 Operators Find Static Analysis Useful Thats wicked! -- Nicolas Strina, ip-man.net Thanks again for a great tool. -- Paul Piecuch, IT Manager...good to finally see more coverage of routing as distributed programming. From my experience, the principles of software engineering eliminate a vast majority of errors. -- Joe Provo, rcn.com I find your approach useful, it is really not fun (but critical for the health of the network) to keep track of the inconsistencies among different routers…a configuration verifier like yours can give the operator a degree of confidence that the sky won't fall on his head real soon now. -- Arnaud Le Tallanter, clara.net

7 7 Yes, but Surprises Happen! Link failures Node failures Traffic volumes shift Network devices wedged … Two problems –Detection –Localization

8 8 Detection: Analyze Routing Dynamics Idea: Routers exhibit correlated behavior Blips across signals may be more operationally interesting than any spike in one.

9 9 Detection Three Types of Events Single-router bursts Correlated bursts Multi-router bursts Common Commonly missed using thresholds

10 10 Localization: Joint Dynamic/Static Which routers are border routers for that burst Topological properties of routers in the burst StaticDynamic Proactive Analysis Deployment Reactive Detection Diagnosis/ Correction

11 11 Case 2: Firewalls Georgia Tech Campus Network –Research and Administrative Network –180 buildings –130+ firewalls –1700+ switches –55000+ ports Problem: Availability/Reachability –Flux in firewall, router, switch configurations –No common authority over changes made

12 12 Specific Focus: Firewall Configuration Difficult to understand and audit configs Subject to continual modifications –Roughly 1-2 touches per day Federated policy, distributed dependencies –Each department has independent policies –Local changes may affect global behavior

13 13 (Immediate) Open Issues Reachability and reliability of controller Service-level probes –Diagnostic tools != Service-level Happiness Policy conformance


Download ppt "Enterprise Network Troubleshooting Nick Feamster Georgia Tech (joint with Russ Clark, Yiyi Huang, Anukool Lakhina, Manas Khadilkar, Aditi Thanekar)"

Similar presentations


Ads by Google