I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks Nikhil Handigol With Brandon Heller, Vimal Jeyakumar, David Mazières,

Slides:



Advertisements
Similar presentations
Virtual Network Diagnosis as a Service Wenfei Wu (UW-Madison) Guohui Wang (Facebook) Aditya Akella (UW-Madison) Anees Shaikh (IBM System Networking)
Advertisements

Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Programming Protocol-Independent Packet Processors
CloudWatcher: Network Security Monitoring Using OpenFlow in Dynamic Cloud Networks or: How to Provide Security Monitoring as a Service in Clouds? Seungwon.
OpenFlow Costin Raiciu Using slides from Brandon Heller and Nick McKeown.
Leveraging SDN Layering to Systematically Troubleshoot Networks Brandon Heller ★ Colin Scott  Nick McKeown ⌘ Scott Shenker  Andreas Wundsam § Hongyi.
Software-Defined Networking, OpenFlow, and how SPARC applies it to the telecommunications domain Pontus Sköldström - Wolfgang John – Elisa Bellagamba November.
OpenFlow : Enabling Innovation in Campus Networks SIGCOMM 2008 Nick McKeown, Tom Anderson, et el. Stanford University California, USA Presented.
USING PACKET HISTORIES TO TROUBLESHOOT NETWORKS Presented by: Yi Gao Emnets Seminar
Troubleshooting SDNs Peyman Kazemian. Why SDN Troubleshooting SDN decouples software (control plane) from hardware (data plane). Opens doors for innovation.
SDN and Openflow.
Flowspace revisited OpenFlow Basics Flow Table Entries Switch Port MAC src MAC dst Eth type VLAN ID IP Src IP Dst IP Prot L4 sport L4 dport Rule Action.
“ElasticTree: Saving energy in data center networks“ by Brandon Heller, Seetharaman, Mahadevan, Yiakoumis, Sharma, Banerjee, McKeown presented by Nicoara.
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) SriramGopinath( )
1 A Policy-aware Switching Layer for Data Centers Dilip Joseph Arsalan Tavakoli Ion Stoica University of California at Berkeley.
A victim-centric peer-assisted framework for monitoring and troubleshooting routing problems.
COS 420 Day 16. Agenda Assignment 3 Corrected Poor results 1 C and 2 Ds Spring Break?? Assignment 4 Posted Chap Due April 6 Individual Project Presentations.
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
An Overview of Software-Defined Network Presenter: Xitao Wen.
Presenter: Chi-Hung Lu 1. Problems Distributed applications are hard to validate Distribution of application state across many distinct execution environments.
Chapter 4 Queuing, Datagrams, and Addressing
Application-Aware Aggregation & Traffic Engineering in a Converged Packet-Circuit Network Saurav Das, Yiannis Yiakoumis, Guru Parulkar Nick McKeown Stanford.
Formal checkings in networks James Hongyi Zeng with Peyman Kazemian, George Varghese, Nick McKeown.
TELE202 Lecture 10 Internet Protocols (2) 1 Lecturer Dr Z. Huang Overview ¥Last Lecture »Internet Protocols (1) »Source: chapter 15 ¥This Lecture »Internet.
How SDNs will tame networks Nick McKeown Stanford University.
Composing Software Defined Networks Jennifer Rexford Princeton University With Joshua Reich, Chris Monsanto, Nate Foster, and.
How SDN will shape networking
Chapter 4: Managing LAN Traffic
OpenFlow: Enabling Technology Transfer to Networking Industry Nikhil Handigol Nikhil Handigol Cisco Nerd.
Frenetic: A Programming Language for Software Defined Networks Jennifer Rexford Princeton University Joint work with Nate.
Software-Defined Networks Jennifer Rexford Princeton University.
PA3: Router Junxian (Jim) Huang EECS 489 W11 /
Aaron Gember Aditya Akella University of Wisconsin-Madison
Where is the Debugger for my Software-Defined Network? [ndb]
VeriFlow: Verifying Network-Wide Invariants in Real Time
CS : Software Defined Networks 3rd Lecture 28/3/2013
Traffic Management - OpenFlow Switch on the NetFPGA platform Chun-Jen Chung( ) Sriram Gopinath( )
POSTECH DP&NM Lab. Internet Traffic Monitoring and Analysis: Methods and Applications (1) 5. Passive Monitoring Techniques.
NetFlow: Digging Flows Out of the Traffic Evandro de Souza ESnet ESnet Site Coordinating Committee Meeting Columbus/OH – July/2004.
workshop eugene, oregon What is network management? System & Service monitoring  Reachability, availability Resource measurement/monitoring.
© Jörg Liebeherr (modified by M. Veeraraghavan) 1 ICMP: A helper protocol to IP The Internet Control Message Protocol (ICMP) is the protocol used for error.
Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue B. Moon and Timothy Roscoe.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
Next Generation Operating Systems Zeljko Susnjar, Cisco CTG June 2015.
Programming Languages for Software Defined Networks Jennifer Rexford and David Walker Princeton University Joint work with the.
Network Layer4-1 Datagram networks r no call setup at network layer r routers: no state about end-to-end connections m no network-level concept of “connection”
Motivation: Finding the root cause of a symptom
Jennifer Rexford Princeton University MW 11:00am-12:20pm Data-Plane Verification COS 597E: Software Defined Networking.
1 Switching and Forwarding Sections Connecting More Than Two Hosts Multi-access link: Ethernet, wireless –Single physical link, shared by multiple.
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
Header Space Analysis: Static Checking for Networks Broadband Network Technology Integrated M.S. and Ph.D. Eun-Do Kim Network Standards Research Section.
SDN and Beyond Ghufran Baig Mubashir Adnan Qureshi.
1 Netflow Collection and Aggregation in the AT&T Common Backbone Carsten Lund.
1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO PathSolutions.
Verifying & Testing SDN Data & Control Planes: Header Space Analysis.
Introduction to Networks
Programming SDN 1 Problems with programming with POX.
SDN challenges Deployment challenges
CIS 700-5: The Design and Implementation of Cloud Networks
SDN Network Updates Minimum updates within a single switch
Jennifer Rexford Princeton University
Praveen Tammana† Rachit Agarwal‡ Myungjin Lee†
Network and Services Management
Challenges in Network Troubleshooting In big scale networks, when an issue like latency or packet drops occur its very hard sometimes to pinpoint.
The Stanford Clean Slate Program
Software Defined Networking
Programmable Networks
COMPUTER NETWORKS CS610 Lecture-16 Hammad Khalid Khan.
Chapter 4: outline 4.1 Overview of Network layer data plane
Presentation transcript:

I Know What Your Packet Did Last Hop: Using Packet Histories to Troubleshoot Networks Nikhil Handigol With Brandon Heller, Vimal Jeyakumar, David Mazières, Nick McKeown NSDI 2014, Seattle, WA April 2, 2014

2 Bug Story: Incomplete Handover A B Switch X WiFi AP YWiFi AP Z Match: Action Src A, Dst B: Output to Y Match: Action Src A, Dst B: Output to Y

Network Outages make news headlines 3 Hosting.com's New Jersey data center was taken down on June 1, 2010, igniting a cloud outage and connectivity loss for nearly two hours… Hosting.com said the connectivity loss was due to a software bug in a Cisco switch that caused the switch to fail. On April 26, 2010, NetSuite suffered a service outage that rendered its cloud-based applications inaccessible to customers worldwide for 30 minutes… NetSuite blamed a network issue for the downtime. The Planet was rocked by a pair of network outages that knocked it off line for about 90 minutes on May 2, The outages caused disruptions for another 90 minutes the following morning.... Investigation found that the outage was caused by a fault in a router in one of the company's data centers.

Troubleshooting Networks is Hard Today 4 ping traceroute tcpdump/SPAN/sFlow SNMP Forwarding State Tedious and ad hoc Requires skill and experience Not guaranteed to provide helpful answers Lots and lots of graphs (source: NANOG Survey in “Automatic Test Packet Generation”, Hongyi Zeng, et. al.)

We want complete network visibility 5 ping traceroutesFlow SNMP Complete visibility: every event that ever happened to every packet 0100 We want to be here Visibility Spectrum

Talk Outline 1.How to achieve complete network visibility – An abstraction: Packet History – A platform: NetSight 2.Why achieving complete visibility is feasible – Data compression – MapReduce-style scale-out design 6

7 Forwarding State Packet History Packet history = Path taken by a packet + Header modifications + Switch state encountered

Our Troubleshooting Workflow 8 Forwarding State 1.Record and store all packet histories 2.Query and use packet histories of errant packets

NetSight A platform to capture and filter packet histories of interest 9

Postcard Collector 10 Control Plane Flow Table State Recorder

Postcard Collector 11 Control Plane Flow Table State Recorder Match ACT Match ACT

Postcard Collector 12 Control Plane Flow Table State Recorder

Postcard Collector 13 Control Plane Flow Table State Recorder Version -> Flow Table State Packet Header Switch ID Output port Version Step 1: Generate postcards

Reconstructing Packet Histories Step 2: Group postcards by generating packet Packet Header Switch ID Output port Version Packet Header Switch ID Output port Version Packet Header Switch ID Output port Version Packet Header Switch ID Output port Version 14

Reconstructing Packet Histories Step 3: Sort postcards using topology 15 Topo-sort

Postcard Collector 16 Control Plane Flow Table State Recorder … 7. … … 7. … … 7. … … 7. … … 7. … … 7. … … 7. … … 7. …

17 Control Plane Flow Table State Recorder Postcard Collector NetSight API Packet History Filter: A regular-expression-like language to specify packet histories of interest Reachability errors Isolation violation Black holes Waypoint routing violation Troubleshooting Apps Postcards Packet History Assembly Troubleshooting Application Troubleshooting App Filtered Packet Histories Filtered Packet Histories

Bug Story: Incomplete Handover 18 Packet History Filter Packet History WiFi AP YWiFi AP Z Switch X Packet History Filter “Pkts from server not reaching the client” Packet History Switch X: inport: p0, outports: [p1] mods: [...] state version: 3 Switch Y: inport p1, outports: [p3] mods:... … Y X

Troubleshooting Apps netshark nprof ndb netwatch 19 ndb : Interactive network debugger netwatch : Live network invariant monitor netshark : Network-wide wireshark nprof : Hierarchical network profiler

But will it scale? 20

Why generating postcards for every packet at every hop is crazy! Network Overhead – 64 byte-postcard/pkt/hop – Stanford Network: 5 hops avg, 1031 byte avg pkt – 31% extra traffic! Processing Overhead – Packet history assembly and filtering Storage Overhead 21

Why generating postcards for every packet at every hop is ^ crazy! Cost is OK for low-utilization networks – E.g., test networks, “bring-up phase” networks – Single server can handle entire Stanford traffic 22 not

Why generating postcards for every packet at every hop is ^ crazy! Huge redundancy in packet header fields – Only a few fields change – IP ID, TCP seq. no. – Postcards can be compressed to bytes/pkt 23 not Diff-based compression

Why generating postcards for every packet at every hop is ^ crazy! Postcard processing is embarrassingly parallel – Each packet history can be processed independent of other packet histories 24 not AssemblyFiltering

Scaling NetSight Performance 25 Switch NetSight Server Disk … …… … Postcards Shuffle Compressed Postcard Lists Compressed Packet Histories

Scaling NetSight Performance 26 Switch NetSight Server Disk … …… … Postcards Shuffle Compressed Postcard Lists Compressed Packet Histories

Scaling NetSight Performance 27 Switch NetSight Server Disk … …… … Postcards Shuffle Compressed Postcard Lists Compressed Packet Histories

Scaling NetSight Performance 28 Switch NetSight Server Disk … …… … Postcards Shuffle Compressed Postcard Lists Compressed Packet Histories

NetSight Variants 29

NetSight-SwitchAssist moves postcard compression to switches 30 Switch NetSight Server Disk … … … Shuffle Move postcard compression to switches with simple hardware mechanisms

NetSight-HostAssist exploits visibility from the hypervisor 31 Switch NetSight Server Disk … … … Shuffle HV Packet Header Packet Header Mini-postcards contain only unique pkt ID and switch state version (1) Store packet header at the hypervisor (2) Add unique pkt ID

Overhead Reduction in NetSight Basic (naïve) NetSight : 31% extra traffic in Stanford backbone network NetSight Switch-Assist: 7% NetSight Host-Assist: 3% 32

Takeaways Complete network visibility is possible – Packet History: a powerful troubleshooting abstraction that gives complete visibility – NetSight: a platform to capture and filter packet histories of interest Complete network visibility is feasible – It is possible to collect and filter packet histories at scale 33

34 Every N et S ight A PI