Fast and Efficient Flow and Loss Measurement for Data Center Networks

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

Logically Centralized Control Class 2. Types of Networks ISP Networks – Entity only owns the switches – Throughput: 100GB-10TB – Heterogeneous devices:
Programming Protocol-Independent Packet Processors
P4: specifying data planes
ENGINEERING WORKSHOP Compute Engineering Workshop P4: specifying data planes Mihai Budiu San Jose, March 11, 2015.
Florin Dinu T. S. Eugene Ng Rice University Inferring a Network Congestion Map with Traffic Overhead 0 zero.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.
Slick: A control plane for middleboxes Bilal Anwer, Theophilus Benson, Dave Levin, Nick Feamster, Jennifer Rexford Supported by DARPA through the U.S.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
Fine-Grained Latency and Loss Measurements in the Presence of Reordering Myungjin Lee, Sharon Goldberg, Ramana Rao Kompella, George Varghese.
Enabling Flow-level Latency Measurements across Routers in Data Centers Parmjeet Singh, Myungjin Lee Sagar Kumar, Ramana Rao Kompella.
Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.
Profiling Network Performance in Multi-tier Datacenter Applications
Internet Indirection Infrastructure Ion Stoica UC Berkeley.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
OpenFlow Switch Limitations. Background: Current Applications Traffic Engineering application (performance) – Fine grained rules and short time scales.
Building Compilers for Reconfigurable Switches Lavanya Jose, Lisa Yan, Nick McKeown, and George Varghese 1 Research funded by AT&T, Intel, Open Networking.
NET-REPLAY: A NEW NETWORK PRIMITIVE Ashok Anand Aditya Akella University of Wisconsin, Madison.
1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions.
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
NetPilot: Automating Datacenter Network Failure Mitigation Xin Wu, Daniel Turner, Chao-Chih Chen, David A. Maltz, Xiaowei Yang, Lihua Yuan, Ming Zhang.
1 Countering DoS Through Filtering Omar Bashir Communications Enabling Technologies
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
1 Root-Cause VoIP Troubleshooting Optimizing the Process Tim Titus CTO, PathSolutions.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Jiaxin Cao, Rui Xia, Pengkun Yang, Chuanxiong Guo,
Coping with Link Failures in Centralized Control Plane Architecture Maulik Desai, Thyagarajan Nandagopal.
MMPTCP: A Multipath Transport Protocol for Data Centres 1 Morteza Kheirkhah University of Edinburgh, UK Ian Wakeman and George Parisis University of Sussex,
1 Root-Cause Network Troubleshooting Optimizing the Process Tim Titus CTO PathSolutions.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
MOZART: Temporal Coordination of Measurement (SOSR’ 16)
SketchVisor: Robust Network Measurement for Software Packet Processing
COS 561: Advanced Computer Networks
HULA: Scalable Load Balancing Using Programmable Data Planes
Problem: Internet diagnostics and forensics
James Hongyi Zeng, Jeongkeun Lee, Changhoon Kim, Minlan Yu
SDN Network Updates Minimum updates within a single switch
Jehandad Khan and Peter Athanas Virginia Tech
How I Learned to Stop Worrying About the Core and Love the Edge
Jennifer Rexford Princeton University
FlowRadar: A Better NetFlow For Data Centers
Advanced Computer Networks
Data Streaming in Computer Networking
NOX: Towards an Operating System for Networks
What's the buzz about HORNET?
LossRadar: Fast Detection of Lost Packets in Data Centers
Srinivas Narayana MIT CSAIL October 7, 2016
Hubs Hubs are essentially physical-layer repeaters:
Dingming Wu+, Yiting Xia+*, Xiaoye Steven Sun+,
SoftRing: Taming the Reactive Model for Software Defined Networks
Network Core and QoS.
AMP: A Better Multipath TCP for Data Center Networks
Layered Protocol Wrappers Design and Interface review
EE 122: Lecture 7 Ion Stoica September 18, 2001.
RDMA over Commodity Ethernet at Scale
1 Multi-Protocol Label Switching (MPLS). 2 MPLS Overview A forwarding scheme designed to speed up IP packet forwarding (RFC 3031) Idea: use a fixed length.
Catching the Microburst Culprits with Snappy
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Taurus: An Intelligent Data Plane
Toward Self-Driving Networks
Toward Self-Driving Networks
2019/10/9 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Jin-Li Ye, Yu-Huang Chu, Chien Chen.
SPINE: Surveillance protection in the network Elements
Catching the Microburst Culprits with Snappy
Elmo Muhammad Shahbaz Lalith Suresh, Jennifer Rexford, Nick Feamster,
Network Core and QoS.
Presentation transcript:

Fast and Efficient Flow and Loss Measurement for Data Center Networks Yuliang Li Rui Miao Changhoon Kim Minlan Yu

FlowRadar: Capture all flows on a fine time scale

Flow coverage We need to inspect each individual flow Defined by 5 tuples: source, dest IPs, ports, protocol Transient loops/blackholes Fine-grained traffic analysis

Temporal coverage We need flow information on a fine time scale DoS Short time scale Loss rates Timely attack detection

Key insight: division of labor Goal: Monitor all the flows on a fine time scale Overhead at the collector Needs sampling Mirroring Collector has limited bandwidth and storage NetFlow Overhead at the switches Limited per-packet processing time Limited memory (10s of MB)

Key insight: division of labor Goal: Monitor all the flows on a fine time scale Overhead at the collector Mirroring Small data structures FlowRadar NetFlow Overhead at the switches Use fixed operations per packet at switches

FlowRadar architecture Analysis Applications Flows & counters Collector Extract per-flow counters across switches Periodic report Switches Compact flow counters with fixed per-packet operations

Challenge: How to handle collisions? Handling hash collisions is hard Large hash tables  high memory usage Linked list/Cuckoo hashing multiple, non-constant #memory accesses Flow d collision Flow a b c PacketCount … d 1

Solution: Embraces collisions Handling hash collisions is hard Large hash tables  high memory usage Linked list/Cuckoo hashing multiple, non-constant #memory accesses Embracing the collisions Less memory and constant #memory accesses

Solution: Embraces collisions Embracing the collisions: xor up all the flows Less memory and constant #memory accesses flow a: S(a) a Flow b: S(b) b c Flow c: S(c) H1 H2 H3 a a ⊕b b ⊕c b ⊕c a c FlowXor FlowCount PacketCount Counting table 1 2 1 1 2 2 1 1 1 S(a) S(a) +S(b) S(b) +S(c) S(b) +S(c) S(a) S(c) [Invertible Bloom Lookup Table (arXiv 2011)] S(x): #packets in x

Flow Filter to identify new flows 1. Check and update the flow filter 2. Update counting table The first packet from a new flow, update all fields Subsequent packets update only PacketCount d d Encoded Flowset Flow filter bloom filter: identify new flow H1 H2 H3 ⊕d ⊕d ⊕d FlowXor a a⊕b b⊕c c FlowCount 1 2 PacketCount S(a) S(a)+S(b) S(b)+S(c) S(c) Counting table +1 +1 +1 +1 +1 +1 +1 +1 +1 [Invertible Bloom Lookup Table (arXiv 2011)]

Easy to implement in merchant silicon Switch data plane Fixed operations per packet Small memory: 2.36MB for 100K flows Switch control plane Collects the small encoded flowset every 10ms We implemented it using P4 Language. Portable to many p4-compatible switch chips

FlowRadar architecture Analysis Applications Flows & counters Collector Extract per-flow counters across switches Stage1.SingleDecode Stage2. Network-wide Decode Periodic report Switches Compact flow counters with fixed per-packet operations Encoded flowset Encoded flowset Encoded flowset

Stage1. SingleDecode Input: an encoded flowset from a single switch Output: per-flow counters FlowXor … FlowCount PacketCount Bloom filter Counting table Flow filter Flow #packet a S(a) …

Stage1. SingleDecode Find a pure cell: a cell with one flow Remove the flow from all cells Flow #packet a 5 Decoded: H1 H2 H3 Flow filter FlowXor a a⊕b b⊕c⊕d c⊕d FlowCount 1 2 3 PacketCount 5 12 13 6 -1 -1 -1 -5 -5 -5 Pure cell

Stage1. SingleDecode Find a pure cell: a cell with one flow Remove the flow from all cells Create more pure cells Iterate until no pure cells Flow #packet a 5 Decoded: H1 H2 H3 Flow filter FlowXor b b⊕c⊕d c⊕d FlowCount 1 3 2 PacketCount 7 13 6

We want to leverage the network-wide info Stage1. SingleDecode Flow #packet a 5 b 7 Decoded: We want to leverage the network-wide info to decode more flows H1 H2 H3 Flow filter FlowXor c⊕d FlowCount 2 PacketCount 6

FlowRadar architecture Analysis Applications Flows & counters Collector Stage1.SingleDecode Stage2. Network-wide Decode Periodic report Switches Encoded flowset Encoded flowset Encoded flowset

Key insight: overlapping sets of flows The sets of flows overlap across hops We can use the redundancy to decode more flows Use different hash functions across hops FlowXor FlowCount PktCount a a⊕c ⊕d b⊕c a⊕b ⊕c b⊕d 1 3 2 FlowXor FlowCount PktCount a⊕d a⊕c b⊕c ⊕d a⊕b ⊕c b⊕d 2 3 a a b b 10 c Use 5 cells to decode 4 flows c d d Collector can leverage flowsets from all switches to decode more flows

Key insight: overlapping sets of flows The sets of flows overlap across hops We can use the redundancy to decode more flows Use different hash functions across hops Provision switch memory based on avg(#flows), not max(#flows) SingleDecode for normal cases Network-wide decode for bursts of flows

Challenge 1: sets of flows not fully overlapped Flows from one switch may go to different next hops One switch receives flows from multiple hops a b c d e

Solution: Zigzag decode with Flow Filters Generalize to the entire network No need for routing information Support incremental deployment FlowDecode SingleDecode Remove from the other SingleDecode Decoded: a b c d a c d e Flow filter Flow filter a b c d a c d e

Challenge 2: different counter values The counters of the same flow may be different across hops Some packets may get lost Some packets may be on the fly drop

Solution: calculate counters for each switch Solve a linear equation system for each switch Already know the full set of flows at each switch Solvable with n flows and >= n combinations of counters a b c d Flow #pkt a 5 b 7 c 4 d 2 FlowXor FlowCount PktCount a⊕b⊕d … 3 14 𝑆(𝑎)+𝑆(𝑏)+𝑆(𝑑)=14 …

Solving the linear equation system Challenge: solving the linear equation system is not fast The matrix is very sparse, each column has k “1”s. We use iterative solvers Speed up the iteration: Start the iteration from close approximations of counters, which is got from the FlowDecode Stop the iteration when the result is floating within ±0.5 an integer.

FlowRadar architecture Analysis Applications Collector Flows & counters Stage1.SingleDecode Stage2. Network-wide Decode Stage2.1 FlowDecode Stage2.2 CounterDecode Flow filter Flow Decode Flow #pkt a b c Counter Decode Flow #pkt a 5 b 7 c 4 Periodic report FlowXor … FlowCount PacketCount Switches Encoded flowset Encoded flowset Encoded flowset Flow filter Flow #pkt x y z Counter Decode Flow #pkt x 3 y 5 z 6 FlowXor … FlowCount PacketCount

Analysis Applications Evaluations Memory usage is efficient Bandwidth usage is low NetDecode vs SingleDecode Analysis Applications Flows & counters Collector Switches Stage1. SingleDecode Stage2.1 FlowDecode Stage2.2 CounterDecode Periodic report Encoded flowset

Evaluation Simulation of k=8 FatTree (80 switches, 128 hosts) in ns3 Configure the memory base on avg(#flow), when burst of flows happens, use network-wide decode The worst case is all switches are pushed to max(#flow) Traffic: each switch has same number of flows, and thus same memory Each switch reports the flowsets every 10 ms.

Memory efficiency FlowRadar: 24.8MB #cell=#flow (Impractical)

Bandwidth usage Evaluated based on Facebook data centers (SIGCOMM’ 15) Each ToR has less than 100K flows per 10ms  less than 2.3Gbps Each ToR talks to 44 * 10G-hosts FlowRadar consumes 0.52% of total ToR bandwidth

NetDecode vs. SingleDecode 100K flow, 10ms NetDecode needs more time NetDecode: 126.8K flow, 3sec SingleDecode NetDecode

NetDecode The CounterDecode takes the majority of the decoding time The CounterDecode limits the #counters could be decoded If #flows > 126.8K, we can still decode all flows (up to 152K) without counters using FlowDecode

FlowRadar analyzer Encoded flowset Encoded flowset Encoded flowset Analysis Applications Flows & counters Collector Stage1.SingleDecode Stage2. Network-wide Decode Periodic report Switches Encoded flowset Encoded flowset Encoded flowset

Analysis applications Flow coverage Transient loops/blackholes Error in match-action table Fine-grained traffic analysis Temporal coverage Short time scale per-flow loss rate ECMP load imbalance Timely attack detection

Per-flow loss map: better temporal coverage Detect losses after every flowlet 35 packets 15 packets NetFlow detects loss Switch 1 Switch 2 FlowRadar detects loss 14 packets 34 packets

FlowRadar conclusion Report all per-flow counters on a fine time scale Fully leverage both switches and the collector Switches: Encoded flowsets with fixed per-packet processing time, and low memory usage Collector: Network-wide decoding FlowRadar Overhead at the collector Overhead at switches NetFlow Mirroring

LossRadar: Detect individual losses on a fine time scale

Losses have significant impact Significant latency increase and throughput drop Violating SLA and drop revenue Takes operators up to tens of hours to find and fix the problem Takeaway: We want to know loss ASAP

Challenge: many types of losses misconfigurations updates Grey failure Takeaway: We need a generic tool OutputPort 1 Output Port n Input Port 1 Input Port n Input buffer … parser Ingress match-actions (L2->L3->ACL) Shared buffer Egress match-actions Switch CPU Configure Resource shortage Resource shortage corruptions corruptions

Challenge: losses could happen everywhere Takeaway: We need the locations of the losses ECMP: multiple possible paths TCP: I have losses But I don’t know where

Challenge: lack of details of losses Difficult to infer the root causes Takeaway: Need the information of individual losses Flow1 Drop Flow2 Pass Different flows experience different loss patterns Loss patterns change over time

LossRadar overview Fast detection  Periodically send traffic digest to collector Knowing location  Install meters to monitor traffic Need info of individual losses  Include details of each loss Being generic  Compare the sets of packets Digest Collector Traffic Digest Traffic Digest Traffic Digest

How to cover the whole network Need to cover all pipelines UM: Upstream meter DM: Downstream meter Cover ingress pipeline Cover egress pipeline DM UM Ingress pipeline shared buffer Egress pipeline DM UM S3 S1 DM UM S2

LossRadar overview Fast detection  Periodically send traffic digest to collector Knowing location  Install meters to monitor traffic Need info of individual losses  Include details of each loss Being generic  Compare the sets of packets Digest Collector Traffic Digest

How to compare the sets of packets Using Invertible Bloom Filter (IBF) [Sigcomm’ 11] Only O(#loss) memory Each end keeps a small piece of memory Same packets at both ends will cancel out Only the differences remain upstream downstream Collector

How to achieve low memory usage upstream downstream Collector xorSum count xorSum count

How to achieve low memory usage a = 5-tuple + IPID a a a upstream downstream Collector xorSum count a 1 a drop xorSum count

How to achieve low memory usage b a b a a b upstream downstream Collector xorSum count a 1 b a⊕b 2 b b b b xorSum count b 1 b

How to achieve low memory usage b c a b a a b upstream downstream Collector xorSum count a⊕c 2 b⊕c a⊕b b 1 c drop b b b xorSum count b 1

How to achieve low memory usage d d c c b c d a b a a b upstream downstream Collector xorSum count a⊕c⊕d 3 b⊕c 2 a⊕b⊕d a⊕c b⊕d d Collector does subtraction d d d b b b xorSum count d 1 b b⊕d 2 d c c a c a a xorSum count a⊕c 2 c 1 a

Benefits Memory-efficient Extend to collect more packet information Proportional to #losses 10KB per meter (traffic pattern reported in DCTCP) Extend to collect more packet information TTL: help identify loops Timestamp: tagged in the packet headers at the upstream Any other fields that programmable switches can configure

Challenges How to ensure UM and DM capture the same set of packets Use the packet header to carry batch ID Incremental deployment Put UM and DM around the blackbox Compare sum(UMi) and sum(DMj)

LossRadar conclusion Design a fast and efficient loss detection system Provide locations and details of individual losses Generic to any types of losses Memory only proportional to #losses Future work: design an analyzer to diagnose problems Temporal analysis, correlation across flows and switches Combine with info provided by hosts

Conclusion Full visibility of Data Centers Report all the flows and all the lost packets in a few milliseconds across all the switches Efficient data structures on programmable switches Both FlowRadar and LossRadar are implemented on P4 Technology transfer to Barefoot switches (See my demo tomorrow)