NitroSketch: Robust and General Sketch-based Monitoring in Software Switches Alan (Zaoxing) Liu Joint work with Ran Ben-Basat, Gil Einziger, Yaron Kassner,

Slides:



Advertisements
Similar presentations
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
Advertisements

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Inline Path Characteristic Estimation to Improve TCP Performance in High Bandwidth-Delay Networks HIDEyuki Shimonishi Takayuki Hama Tutomu Murase Cesar.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
DRFQ: Multi-Resource Fair Queueing for Packet Processing Ali Ghodsi 1,3, Vyas Sekar 2, Matei Zaharia 1, Ion Stoica 1 1 UC Berkeley, 2 Intel ISTC/Stony.
OpenFlow-Based Server Load Balancing GoneWild Author : Richard Wang, Dana Butnariu, Jennifer Rexford Publisher : Hot-ICE'11 Proceedings of the 11th USENIX.
Authors: Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn [Systems Technology Lab, Intel Corporation] Source: 2007 ACM/IEEE conference on Supercomputing.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster
Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
Dynamic Voltage Frequency Scaling for Multi-tasking Systems Using Online Learning Gaurav DhimanTajana Simunic Rosing Department of Computer Science and.
Enabling a “RISC” Approach for Software-Defined Monitoring using Universal Streaming Vyas Sekar Zaoxing Liu, Greg Vorsanger, Vladimir Braverman.
Selective Packet Inspection to Detect DoS Flooding Using Software Defined Networking Author : Tommy Chin Jr., Xenia Mountrouidou, Xiangyang Li and Kaiqi.
1 ECE 526 – Network Processing Systems Design System Implementation Principles I Varghese Chapter 3.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Re-evaluating Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref, Minlan Yu {alipourf, moshrefj,
SketchVisor: Robust Network Measurement for Software Packet Processing
NFP: Enabling Network Function Parallelism in NFV
Shaopeng, Ho Architect of Chinac Group
Balazs Voneki CERN/EP/LHCb Online group
New Approach to OVS Datapath Performance
Problem: Internet diagnostics and forensics
BESS: A Virtual Switch Tailored for NFV
vCAT: Dynamic Cache Management using CAT Virtualization
Constant Time Updates in Hierarchical Heavy Hitters
Jonathan Walpole Computer Science Portland State University
Towards a single virtualized data path for VPP
Data Streaming in Computer Networking
Presented by Kristen Carlson Accardi
Cache Memory Presentation I
Performance Evaluation of Adaptive MPI
Srinivas Narayana MIT CSAIL October 7, 2016
Accelerating MapReduce on a Coupled CPU-GPU Architecture
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.
NFP: Enabling Network Function Parallelism in NFV
Pipeline parallelism and Multi–GPU Programming
CS 179 Lecture 14.
HyperLoop: Group-Based NIC Offloading to Accelerate Replicated Transactions in Multi-tenant Storage Systems Daehyeok Kim Amirsaman Memaripour, Anirudh.
Optimal Elephant Flow Detection Presented by: Gil Einziger,
GEN: A GPU-Accelerated Elastic Framework for NFV
Qun Huang, Patrick P. C. Lee, Yungang Bao
NFP: Enabling Network Function Parallelism in NFV
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Smita Vijayakumar Qian Zhu Gagan Agrawal
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Memento: Making Sliding Windows Efficient for Heavy Hitters
Network Systems and Throughput Preservation
Lecture 2 The Art of Concurrency
CherryPick: Adaptively Unearthing the Best
Constant Time Updates in Hierarchical Heavy Hitters
By: Ran Ben Basat, Technion, Israel
Network-Wide Routing Oblivious Heavy Hitters
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
Hash Functions for Network Applications (II)
CS703 - Advanced Operating Systems
Lu Tang , Qun Huang, Patrick P. C. Lee
Toward Self-Driving Networks
Toward Self-Driving Networks
Re-evaluating Measurement Algorithms in Software
A Closer Look at NFV Execution Models
2019/11/12 Efficient Measurement on Programmable Switches Using Probabilistic Recirculation Presenter:Hung-Yen Wang Authors:Ran Ben Basat, Xiaoqi Chen,
Restrictive Compression Techniques to Increase Level 1 Cache Capacity
Presentation transcript:

NitroSketch: Robust and General Sketch-based Monitoring in Software Switches Alan (Zaoxing) Liu Joint work with Ran Ben-Basat, Gil Einziger, Yaron Kassner, Vladimir Braverman, Roy Friedman, and Vyas Sekar

Need for network telemetry in software switches Where: Virtual switches, White-box switches, DPDK, FD.io,… VM Virtual SW DPDK BESS Apps A

Need for network telemetry in software switches Where: Virtual switches, White-box switches, DPDK, FD.io …… Traffic Engineering Anomaly Detection “Flow Size Distribution” “Entropy”, “Traffic Changes” Worm Detection Accounting “SuperSpreaders” “Heavy Hitters” Requirements: Performance, Accuracy

Sketches appear to be a promising alternative Flow key k 5 7 9 11 +1 3 H1(k) Keep flow keys Packet 3 4 6 +1 10 H2(k) . 7 d arrays of counters 9 8 4 6 +1 11 Hd (k) 7 9 +1 5 10 3 r counters per array Require small memory with bounded error rates. Guaranteed fidelity with arbitrary workloads.

Reality: Sketches in software switches not performant! Single core 100% CPU. Cannot meet high line-rates: < 10Gbps

Existing proposals to speed up sketches? SketchVisor [SIGCOMM’17]: Elastic Sketch [SIGCOMM’18]: R-HHH [SIGCOMM’17]: ✔ Performance increase (still < 10 Mpps) X Not robust on arbitrary workloads. ✔ Performance increase (> 14.8 Mpps) X Not robust on arbitrary workloads. ✔ Performance increase (up to 14.8 Mpps) X Not general towards many tasks.

NitroSketch in a nutshell A software-switch based sketching framework that simultaneously has: Performance: line-rate (40G) with minimal CPU and memory. Generality: support a variety of measurement tasks. Robustness: accuracy guarantees for any workload.

NitroSketch Approach Systematically analyze the performance bottlenecks. Learn key insights from strawman solutions (sampling, one-level hash). Reformulate sketching for software from first principles - Tradeoff slight memory increase, Novel counter sampling rather than packet.

Outline for this talk Motivation Understanding bottlenecks Design Insights Strawman ideas Our proposals Evaluation Conclusions and future work

Outline for this talk Motivation Understanding bottlenecks Design Insights Strawman ideas Our proposals Evaluation Conclusions and future work

Bottleneck Analysis - Performance benchmarks using Intel VTune Amplifier. - Hotspots for UnivMon [SIGCOMM’16].

Bottleneck B1: Many hash computations per packet +1 hashing Top keys +1 Packet d arrays of counters +1 +1 r counters per array B1: Many (independent) hash computations per packet (~37% CPU)

Bottleneck B2: Many counter updates per packet +1 hashing Top keys +1 Packet d arrays of counters +1 +1 r counters per array B2: Many counter updates (~15% CPU)

Bottleneck B3: Tracking keys is also expensive +1 hashing Top keys +1 Packet d arrays of counters +1 +1 r counters per array B3: Expensive flow key data structure operations (e.g., heap (~10% CPU))

Outline for this talk Motivation Understanding bottlenecks Design Insights Strawman ideas Our proposals Evaluation Conclusions and future work

Outline for this talk Motivation Understanding bottlenecks Design Insights Strawman ideas Our proposals Evaluation Conclusions and future work

Strawman 1: Reduce hashes by using single array? Traditional Single larger hash array +1 H1 +1 H2 …. + +1 Hd +1 CPU cache DRAM Pros: Simple idea to reduce hash computations. Cons: - Memory increase may cause cache misses. - Even one lightweight hash per packet may be high!

Strawman 2: Reduce updates by packet sampling? Sampled Packet + +1 + Flip a coin: with p + + Pros: Simple idea to reduce hash/counter updates. Cons: - Memory increase may cause cache misses. - Even one coin flip per packet may be high! - Incurs accuracy-convergence tradeoff.

Key lessons from strawman solutions Can tradeoff memory for CPU reduction But need to ensure cache residency Sampling is promising But need to manage per-packet ops and convergence

How do we tackle this? We trade memory for CPU reduction while ensuring cache residency Sampling is promising But need to manage per-packet ops and convergence

Key Idea: Sample counter updates not packets! Sketch 1 Uniform Packet Sampling Sampled Packet + 2 + +1 3 4 + 5 + 6 Rethinking the way of doing sampling with sketches 1 +1 Each Packet 2 + Uniform Counter Sampling 3 4 Our key idea is Sampling counter updates not packets. We sample counters updates not packets that achieve better tradeoff between memory and CPU reduction. What it means is the following: Considering we have uniformly sampled packets as in the strawman, each sampled packet will update the counters in all sketch arrays. To achieve the same accuracy guarantee, memory needs to increase by some polynomial in (1/p) in order to offset the accuracy loss from the sampling. What we do is different, we are not sampling packets but the counters to update. For each packet we sampled the counters to be updated. The high-level intuition why this way can achieve higher memory efficiency is the following: 5 6

Key Idea: Sample counter updates not packets! Multiple independent hashes key for memory efficiency Coin flips with p +1 Packet +1 +1 +1 +1 - Per-packet hash/counter updates can be reduced to less than one. - Memory increase by O(1/p): Much better than uniform packet sampling. By adopting this key idea, we can significant reduce the hash computation overhead to less than one per packet. And achieves much better memory efficiency than uniform packet sampling. However, you may notice that there are two issues here: (1) Still incur many per-packet coin flips (2) Require some amount time before converged

How do we tackle this? Can tradeoff memory for CPU reduction But need to ensure still cache resident Sampling works! but need to manage per-packet ops and convergence +1 Each Packet +1

Trick: Geometric sampling to reduce per-packet ops Uniform sampling with p 1 2 3 4 5 6 Coin flip with p Coin flip with p Coin flip with p Coin flip with p Coin flip with p Coin flip with p Equivalent 1 2 3 4 5 6 Geometric sampling with p Interval is 4 Sampled. Flip a coin with p Sampled. Flip a coin for next Skipped

Trick: Adapt sampling rate to manage convergence Sampling-based approach needs to receive enough packets before becoming accurate (need convergence). When packet rate isn’t high, we can sample more packets ⇒ faster converge Adaptively adjusting sampling rate: Packet rate: 1Mpps, sample with 1/1 Packet + Packet rate: 8Mpps, sample with 1/8 Packet rate: 64Mpps, sample with 1/64 Datarate Adaptive 25

NitroSketch: Putting it together Update with +1/p 1 +1 Array 1 2 Select next 21 3 +1 + Array 2 Skipped 4 +1 + Array 3 5 +1 Array 4 6 Update with +1/p Adjust the sampling rate when needed Trade small space for speedup on CPU 26

NitroSketch is theoretically robust! NitroSketch offers accuracy guarantees for a variety of measurement tasks. Our theoretical analysis holds after receiving enough packets. In practice, we need ~2-4 Mil packets to converge. Check out our paper for more details! 27

NitroSketch Inline Implementation Controller/ Other Users User Space Shared Buffer OVS-DPDK vswitchd dpif-netdev forwarding pipeline NitroSketchD … emc dp classider PMD PMD PMD Kernel Space NIC NIC NIC Other versions: FD.io-VPP, BESS

NitroSketch achieves 40G on software switches Two threads with OVS-DPDK, VPP and BESS on Intel XL710 NIC. NitroSketch uses no extra cores. 29

NitroSketch can achieve higher throughput In-memory single thread (Intel E5 2620 v4 CPU) Algorithms use 5~10 independent hash functions Original w/ NitroSketch 30

Guaranteed accuracy after convergence After received 2~4 Mil packets, Sketches achieve comparable (or better) accuracy as the original sketches. 31

NitroSketch outforms other solutions NitroSketch achieves higher accuracy when converged. 32

Conclusions Sketching is a promising alternative for software switch based telemetry. Performance of sketches is far from optimal. Existing efforts missing in performance, robustness, or generality. NitroSketch key ideas: Tradeoff small memory increase, Sample counters not packets, Geometric sampling to reduce packet ops, Adaptive sampling NitroSketch improves the performance of sketches by 1~2 orders of magnitude while retaining the robustness and generality https://github.com/zaoxing/NitroSketch Alan Liu: zaoxing@cmu.edu 33