Motivation: Finding the root cause of a symptom

Slides:



Advertisements
Similar presentations
Data-Plane Accountability with In-Band Path Diagnosis Murtaza Motiwala, Nick Feamster Georgia Tech Andy Bavier Princeton University.
Advertisements

SIMPLE-fying Middlebox Policy Enforcement Using SDN
20.1 Chapter 20 Network Layer: Internet Protocol Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Packet Switching COM1337/3501 Textbook: Computer Networks: A Systems Approach, L. Peterson, B. Davie, Morgan Kaufmann Chapter 3.
Diagnosing Missing Events in Distributed Systems with Negative Provenance Yang Wu* Mingchen Zhao* Andreas Haeberlen* Wenchao Zhou + Boon Thau Loo* * University.
1 The Case for Byzantine Fault Detection. 2 Challenge: Byzantine faults Distributed systems are subject to a variety of failures and attacks Hacker break-in.
First Step Towards Automatic Correction of Firewall Policy Faults Fei Chen Alex X. Liu Computer Science and Engineering Michigan State University JeeHyun.
USING PACKET HISTORIES TO TROUBLESHOOT NETWORKS Presented by: Yi Gao Emnets Seminar
A. Haeberlen Having your Cake and Eating it too: Routing Security with Privacy Protections 1 HotNets-X (November 15, 2011) Alexander Gurney * Andreas Haeberlen.
Planning, Outlining, Drafting e.g. Formally Starting the Process.
1 A survey of Internet Topology Discovery. 2 Outline Motivations Internet topology IP Interface Level Router Level AS Level PoP Level.
1 Switching and Forwarding Bridges and Extended LANs.
1 End-to-End Detection of Shared Bottlenecks Sridhar Machiraju and Weidong Cui Sahara Winter Retreat 2003.
© 2006 Andreas Haeberlen, MPI-SWS 1 The Case for Byzantine Fault Detection Andreas Haeberlen MPI-SWS / Rice University Petr Kouznetsov MPI-SWS Peter Druschel.
User-level Internet Path Diagnosis R. Mahajan, N. Spring, D. Wetherall and T. Anderson.
1 Chapter 13: Representing Identity What is identity Different contexts, environments Pseudonymity and anonymity.
Diagnosing Missing Events in Distributed Systems with Negative Provenance Yang Wu* Mingchen Zhao* Andreas Haeberlen* Wenchao Zhou + Boon Thau Loo* * University.
Data Plane Verification. Background: What are network policies Alice can talk to Bob Skype traffic must go through a VoIP transcoder All traffic must.
1 Switching and Forwarding Bridges and Extended LANs.
Routing of Outgoing Packets with MP-TCP draft-handley-mptcp-routing-00 Mark Handley Costin Raiciu Marcelo Bagnulo.
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
The internet and the WWW
Lecture 22 Page 1 Advanced Network Security Other Types of DDoS Attacks Advanced Network Security Peter Reiher August, 2014.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2007 (TPDS 2007)
Introduction to Networking. Key Terms packet  envelope of data sent between computers server  provides services to the network client  requests actions.
1 IP Forwarding Relates to Lab 3. Covers the principles of end-to-end datagram delivery in IP networks.
Where is the Debugger for my Software-Defined Network? [ndb]
VeriFlow: Verifying Network-Wide Invariants in Real Time
A Scalable, Commodity Data Center Network Architecture Jingyang Zhu.
How Does the Internet Work? Protocols Protocols are rules that describe how computers communicate and exchange data. The Internet has a series of these.
Chi-Cheng Lin, Winona State University CS 313 Introduction to Computer Networking & Telecommunication Chapter 5 Network Layer.
Y. WuHotNets-XII (Nov 22, 2013)1 Answering Why-Not Queries in Software-Defined Networks with Negative Provenance Yang Wu* Andreas Haeberlen* Wenchao Zhou.
ECO-DNS: Expected Consistency Optimization for DNS Chen Stephanos Matsumoto Adrian Perrig © 2013 Stephanos Matsumoto1.
Enforcing Network-Wide Policies in the Presence of Dynamic Middlebox Actions using FlowTags Seyed K. Fayazbakhsh *, Luis Chiang ¶, Vyas Sekar *, Minlan.
Review: –Ethernet What is the MAC protocol in Ethernet? –CSMA/CD –Binary exponential backoff Is there any relationship between the minimum frame size and.
Authors: Yih-Chun Hu, Adrian Perrig, David B. Johnson
Bob Knowledge Plane -- Scaling of the WHY App Bob Braden, ISI 24 Sept 03.
PRISM: Private Retrieval of the Internet’s Sensitive Metadata Ang ChenAndreas Haeberlen University of Pennsylvania.
A data-centric platform for analyzing distributed systems Provenance Maintenance and Querying on Log-structured Databases.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
A. Haeberlen Fault Tolerance and the Five-Second Rule 1 HotOS XV (May 18, 2015) Ang Chen Hanjun Xiao Andreas Haeberlen Linh Thi Xuan Phan Department of.
SDN AND OPENFLOW SPECIFICATION SPEAKER: HSUAN-LING WENG DATE: 2014/11/18.
SIGCOMM 2012 (August 16, 2012) Private and Verifiable Interdomain Routing Decisions Mingchen Zhao * Wenchao Zhou * Alexander Gurney * Andreas Haeberlen.
Lab 10 Overview DNS. DNS name server Set up a local domain name server . is the root domain .lab is the WH302 lab’s TLD (top level domain)  hades.lab.
Lawrence Snyder University of Washington, Seattle © Lawrence Snyder 2004 Relating the “logical” with the “physical”
CS 740: Advanced Computer Networks IP Lookup and classification Supplemental material 02/05/2007.
Automated Network Repair with Meta Provenance
TCP continued. Discussion – TCP Throughput TCP will most likely generate the saw tooth type of traffic. – A rough estimate is that the congestion window.
Firewalls A brief introduction to firewalls. What does a Firewall do? Firewalls are essential tools in managing and controlling network traffic Firewalls.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Plane Verification COS 597E: Software Defined Networking.
NetEgg: Scenario-based Programming for SDN Policies Yifei Yuan, Dong Lin, Rajeev Alur, Boon Thau Loo University of Pennsylvania 1.
Packet switching Monil Adhikari. Packet Switching Packet switching is the method by which the internet works, it features delivery of packets of data.
Common networking problems Bad names and other DNS name resolution failures Missing capabilities Bad HTTP headers are sent Incorrect use of completion.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
First generation firewalls packets filtering ريماز ابراهيم محمد علي دعاء عادل محمد عسجد سامي عبدالكريم.
1 Packet Switching Outline Switching and Forwarding Bridges and Extended LANs.
TBAS: Enhancing Wi-Fi Authentication by Actively Eliciting Channel State Information Muye Liu, Avishek Mukherjee, Zhenghao Zhang, and Xiuwen Liu Florida.
Programming SDN 1 Problems with programming with POX.
Problem: Internet diagnostics and forensics
SDN Network Updates Minimum updates within a single switch
Praveen Tammana† Rachit Agarwal‡ Myungjin Lee†
Dispersing Asymmetric DDoS Attacks with SplitStack
Chapter 4 Data Link Layer Switching
Enhanced Provenance Model (TAP): Time-aware Provenance for Distributed Systems Original Article: Wenchao Zhou, Ling Ding, Andreas Haeberlen, Zachary Ives,
Switching and Forwarding Bridges and Extended LANs
Intra-Domain Routing Jacob Strauss September 14, 2006.
by Xiang Mao and Qin Chen
Virtual LAN (VLAN).
Chapter 4: outline 4.1 Overview of Network layer data plane
Presentation transcript:

Differential Provenance: Better Network Diagnostics with Reference Events Ang Chen Yang Wu Andreas Haeberlen Wenchao Zhou+ Boon Thau Loo University of Pennsylvania Georgetown University+

Motivation: Finding the root cause of a symptom Traffic arriving at the wrong server !?! Overly specific flow entry 4.3.2.0/24 4.3.3.0/24 Internet Bob Web server 2 Web server 1 DPI Networks can (and frequently do!) have bugs Example: Software-defined networks We need a good debugger!

Debugging networks with provenance C received packet Packet P Packet P B sent packet A B C B received packet Rule match on B Rule installed by controller A sent packet A received packet Rule match on A Incoming packet at controller Typical debuggers tell us what happened: NetSight: Packet histories Y!: Network provenance Key benefit: Rich explanation of what, when, and why.

Problem: Explanation can be too big! Rule 7: Next-hop=port2 root Root cause: faulty rule Packet arrives at wrong server The problem: Finding the root cause in a large provenance tree.

Key insight: Use reference events! Bob Web server 2 Web server 1 DPI Remember that some packets were routed correctly. The same things should have happened to all packets! Key insight: If we have both a (bad) symptom and a (good) reference, we only need to reason about the differences between them!

A new debugger Bob collects both a bad symptom and a good reference fault Field 3 of config entry 4 is wrong! Bob reference Debugger Bob collects both a bad symptom and a good reference Bob sends both events to the debugger Debugger generates provenance, outputs difference Ideally, there is only one diff—the root cause!

Outline Motivation: Network diagnostics Background Key insight A new debugger Differential provenance Are references typically available? Strawman approach Our approach Initial results Conclusion

Are references typically available? Survey: Posts on the ‘Outages’ mailing list in Sept-Dec 2014. 64 posts related to diagnostics. 42/64 (66%) posts involve both a fault and some reference. Examples: Some DNS servers have stale records, but others are good Probes sometimes fail, sometimes succeed More examples in the paper

Strawman solution - = ? Bad provenance Reference provenance A strawman solution: Pick out different nodes in trees. Bad provenance: 201 nodes Reference provenance: 156 nodes Naïve diff: 278 nodes!

Why does the strawman not work? Faulty rule Observation: The diff can be larger than the individual trees. Reason #1: Differences that “do not matter” E.g., timestamps, packet payloads, etc. Reason #2: “Butterfly effect” A small difference can change later events drastically!

Differential provenance Output: - Rule 7: change port - Rule 9: change range Bad provenance Reference provenance Approach: Change past events, and think about what could have happened. (1) Find some early ‘differences’ in the trees. (2) Change the faulty node to a correct equivalent. (3) Use replay to determine what would have happened. (4) Output the set of changes that align the trees.

Technical challenges Challenge #1: Where do we start? Heuristics: Change early events, minimum changes… E.g., prefer changing 1 event than 1000 events. Challenge #2: How should we make the change? Approach: Think about what should have happened. E.g., packet should go to switch 2, not 1. Challenge #3: Irrelevant differences? Approach: Equivalence relations between events. E.g., IPs 4.3.2.1 and 4.3.3.1 See paper for more details.

Setup Setup Overly specific flow entry 4.3.2.0/24 4.3.3.0/24 Internet Web server 1 DPI Setup Platform: RapidNet SDN: 6 switches, 2 servers The symptom: misrouted packets from 4.3.2.0/24 The reference: packets from 4.3.3.0/24

Differential provenance Initial results = Fault: 201 nodes Naïve diff Reference: 156 nodes = Rule 7: next hop should be port 1, not 2! Differential provenance Differential provenance finds a single node (the faulty rule) to be the root cause!

Conclusion Thanks! Debugging networks is hard Need good debuggers! Provenance can find the causes of an event Problem: Explanation can be too detailed. Idea: Use reference events Sufficient to find the (few) differences to the observed symptom New debugger based on differential provenance Result: Very precise diagnostics Ideally, can identify a single root cause! Thanks!