Software-defined Measurement

Slides:

Advertisements

Similar presentations

Flow-level State Transition as a New Switch Primitive for SDN Masoud Moshref, Apoorv Bhargava, Adhip Gupta, Minlan Yu, Ramesh Govindan (HotSDN’14)

Advertisements

Programming Protocol-Independent Packet Processors

VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.

Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.

Cisco S3 C5 Routing Protocols. Network Design Characteristics Reliable – provides mechanisms for error detection and correction Connectivity – incorporate.

Slick: A control plane for middleboxes Bilal Anwer, Theophilus Benson, Dave Levin, Nick Feamster, Jennifer Rexford Supported by DARPA through the U.S.

OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Trafﬁc engineering – Identify large traffic aggregates, traffic changes.

A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,

Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.

Making Cellular Networks Scalable and Flexible Li Erran Li Bell Labs, Alcatel-Lucent Joint work with collaborators at university of Michigan, Princeton,

Software-Defined Networking, OpenFlow, and how SPARC applies it to the telecommunications domain Pontus Sköldström - Wolfgang John – Elisa Bellagamba November.

Measurement in Networks & SDN Applications. Interesting Questions Who is sending a lot to a subnet? – Heavy Hitters Is someone doing a port Scan? Is someone.

Scalable Flow-Based Networking with DIFANE 1 Minlan Yu Princeton University Joint work with Mike Freedman, Jennifer Rexford and Jia Wang.

Network Architecture for Joint Failure Recovery and Traffic Engineering Martin Suchara in collaboration with: D. Xu, R. Doverspike, D. Johnson and J. Rexford.

Measuring Large Traffic Aggregates on Commodity Switches Lavanya Jose, Minlan Yu, Jennifer Rexford Princeton University, NJ 1.

Trajectory Sampling for Direct Traffic Observation Matthias Grossglauser joint work with Nick Duffield AT&T Labs – Research.

1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.

Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:

Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.

Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.

Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.

Dream Slides Courtesy of Minlan Yu (USC) 1. Challenges in Flow-based Measurement 2 Controller Configure resources1Fetch statistics2(Re)Configure resources1.

DREAM: Dynamic Resource Allocation for Software-defined Measurement

BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,

Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.

Software-Defined Networks Jennifer Rexford Princeton University.

SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.

Eric Keller, Evan Green Princeton University PRESTO /22/08 Virtualizing the Data Plane Through Source Code Merging.

BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,

Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.

CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.

Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.

Wire Speed Packet Classification Without TCAMs ACM SIGMETRICS 2007 Qunfeng Dong (University of Wisconsin-Madison) Suman Banerjee (University of Wisconsin-Madison)

Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.

1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.

1 Dynamic Pipelining: Making IP- Lookup Truly Scalable Jahangir Hasan T. N. Vijaykumar School of Electrical and Computer Engineering, Purdue University.

Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.

Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University

Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.

Enabling a “RISC” Approach for Software-Defined Monitoring using Universal Streaming Vyas Sekar Zaoxing Liu, Greg Vorsanger, Vladimir Braverman.

CellSDN: Software-Defined Cellular Core networks Xin Jin Princeton University Joint work with Li Erran Li, Laurent Vanbever, and Jennifer Rexford.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)

A Classification for Access Control List To Speed Up Packet-Filtering Firewall CHEN FAN, LONG TAN, RAWAD FELIMBAN and ABDELSHAKOUR ABUZNEID Department.

BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,

Re-evaluating Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref, Minlan Yu {alipourf, moshrefj,

MOZART: Temporal Coordination of Measurement (SOSR’ 16)

SketchVisor: Robust Network Measurement for Software Packet Processing

SDN challenges Deployment challenges

Programming SDN Newer proposals Frenetic (ICFP’11) Maple (SIGCOMM’13)

Constant Time Updates in Hierarchical Heavy Hitters

Jennifer Rexford Princeton University

FlowRadar: A Better NetFlow For Data Centers

Data Streaming in Computer Networking

NOX: Towards an Operating System for Networks

Augmented Sketch: Faster and More Accurate Stream Processing

Query-Friendly Compression of Graph Streams

DDoS Attack Detection under SDN Context

Optimal Elephant Flow Detection Presented by: Gil Einziger,

Qun Huang, Patrick P. C. Lee, Yungang Bao

SCREAM: Sketch Resource Allocation for Software-defined Measurement

Elastic Sketch: Adaptive and Fast Network-wide Measurements

Programmable Networks

Elastic Sketch: Adaptive and Fast Network-wide Measurements

Memento: Making Sliding Windows Efficient for Heavy Hitters

Constant Time Updates in Hierarchical Heavy Hitters

Lu Tang , Qun Huang, Patrick P. C. Lee

Toward Self-Driving Networks

Toward Self-Driving Networks

Presentation transcript:

Software-defined Measurement Minlan Yu University of Southern California Joint work with Lavanya Jose, Rui Miao, Masoud Moshref, Ramesh Govindan, Amin Vahdat

Management = Measurement + Control Accounting Count resource usage for tenants Trafﬁc engineering Identify large traffic aggregates, traffic changes Understand flow characteristics (flow size, etc.) Performance diagnosis Why my application has high delay, low throughput? Management is important, yet underexplored Taking 80% of IT budget Responsible for 62% of outages Measurement is at least half of network management Figuring out what’s going on is harder than deciding what to do

Yet, measurement is underexplored Measurement is an afterthought in network device Control functions are optimized w/ many resources Limited, fixed measurement support with NetFlow/sFlow Traffic analysis is incomplete and indirect Incomplete: May not catch all the events from samples Indirect: Offline analysis based on pre-collected logs Network-wide view of traffic is especially difficult Data are collected at different times/places SLOW down, spend more time on this

Software-defined Measurement SDN offers unique opportunities for measurement Simple, reusable primitives at switches Diverse and dynamic analysis at controller Network-wide view Controller Heavy Hitter detection Change detection (Re)Configure resources 1 Configure resources 1 Fetch statistics 2

Challenges Diverse measurement tasks Limited resources at switches Generic measurement primitives for diverse tasks Measurement library for easy programming Limited resources at switches New data structures to reduce memory usage Multiplexing across many tasks

Software-defined Measurement OpenSketch (NSDI’13) DREAM (SIGCOMM’14) Sketch-based commodity switch components Flow-based OpenFlow TCAM Data plane Primitives Optimization w/ Provable resource-accuracy bounds Dynamic Allocation w/ Accuracy estimator Resource alloc across tasks In my other works, I also use optimization theory, random walk, hashing, and graph theory. These algorithms and data structures are not useful if we cannot demonstrate them with real prototypes. These prototypes are not sufficient to make real world impact of my research, so I also actively collaborate with industry. OpenSource NetFPGA + Sketch library networks of hardware switches and Open vSwitch Prototype

Software-defined Measurement with Sketches (NSDI’13) Talk about how SNAP helps developers and auto-adaptation

Software Defined Networking Controller Configure devices and collect measurements API to the data plane (OpenFlow) Fields action counters Src=1.2.3.4drop, #packets, #bytes Rethink the abstractions for measurement Packet/Byte counters are not enough Large traffic aggregates: too many counters at switches (one counter for each micro flow) Pull too many statistics to the controller Switches Forward/measure packets

Tradeoff of Generality and Efficiency Supporting a wide variety of measurement tasks Who’s sending a lot to 23.43.0.0/16? Is someone being DDoS-ed? How many people downloaded ﬁles from 10.0.2.1? Efficiency Enabling high link speed (40 Gbps or larger) Ensuring low cost (Cheap switches with small memory) Easy to implement with commodity switch components E.g., Cisco NetFlow

NetFlow: General, Not Efficient Cisco NetFlow/sFlow Log sampled packets, or flow-level counters General Ok for many measurement tasks Not ideal for any single task Not efficient It’s hard to determine the right sampling rate Measurement accuracy depends on traffic distribution Turned off or not even available in datacenters Accuracy depends on traffic distribution Different sampling solutions for different problems

Streaming Algo: Efficient, Not General Streaming algorithms Summarize packet information with Sketches E.g. Count-Min Sketch, Who’s sending a lot to host A? Not general:Each algorithm solves just one question Require customized hardware or network processors Hard to implement every solution in practice # bytes from 23.43.12.1 3 5 1 9 2 4 Hash2 Hash1 Hash3 Data plane Query: 23.43.12.1 5 3 4 Pick min: 3 Control plane Bitmap for #unique items, Bloom filter for set member check

Where is the Sweet Spot? OpenSketch General Efficient NetFlow/sFlow (too expensive) Streaming Algo (Not practical) OpenSketch General, and efficient data plane based on sketches Modularized control plane with automatic configuration A new data structure that harnesses the capacity of the TCAMs across all the switches to give the illusion of a much larger TCAM.

Flexible Measurement Data Plane Picking the packets to measure Hashes to represent a compact set of flows A set of blacklisting IPs Classify flows with different resources/accuracy Filter out traffic for 23.43.0.0/16 Storing and exporting the data A table with flexible indexing Complex indexing using hashes and classification Diverse mappings between counters and flows

A three-stage pipeline Hashing: A few hash functions on packet source Classification: based on hash value or packets Counting: Update a few counters with simple calc. # bytes from 23.43.12.1 3 5 1 9 2 4 Hash2 Hash1 Hash3

Build on Existing Switch Components A few simple hash functions 4-8 three-wise or five-wise independent hash functions Leverage traffic diversity to approx. truly random func. A few TCAM entries for classification Match on both packets and hash values Avoid matching on individual micro-flow entries Flexible counters in SRAM Many logical tables for different sketches Different numbers and sizes of counters Access counters by addresses Different

Modularized Measurement Libarary A measurement library of sketches Bitmap, Bloom filter, Count-Min Sketch, etc. Easy to implement with the data plane pipeline Support diverse measurement tasks Implement Heavy Hitters with OpenSketch Who’s sending a lot to 23.43.0.0/16? count-min sketch to count volume of ﬂows reversible sketch to identify ﬂows with heavy counts in the count-min sketch

Support Many Measurement Tasks Measurement Programs Building blocks Line of Code Heavy hitters Count-min sketch; Reversible sketch Config:10 Query: 20 Superspreaders Count-min sketch; Bitmap; Reversible sketch Query:: 14 Traffic change detection Query: 30 Traffic entropy on port field Multi-resolution classifier; Count-min sketch Query: 60 Flow size distribution multi-resolution classifier; hash table Config:10 Query: 109

Resource management Automatic configuration within a task Pick the right sketches for measurement tasks Allocating resources across sketches Based on provable resource-accuracy curves Resource allocation across tasks Operators simply specify relative importance of tasks Minimizing weighted error using convex optimization Decompose to optimization problem of individual tasks

OpenSketch Architecture

Evaluation Prototype on NetFPGA Trace Driven Simulators No effect on data plane throughput Line speed measurement performance Trace Driven Simulators OpenSketch, NetFlow, and streaming algorithm One-hour CAIDA packet traces on a backbone link Tradeoff between generality and efficiency How efficient is OpenSketch compared to NetFlow? How accurate is OpenSketch compared to specific streaming algorithms? OpenSketch Configurations 4 Bytes per counter 3-5 2-universal hash functions NetFlow Configurations 32 Bytes per entry Use different sampling rate (1/50 – 1/1000) Count the actual number of ﬂow entries used

Heavy Hitters: false positives/negatives Identify flows taking > 0.5% bandwidth OpenSketch requires less memory with higher accuracy OpenSketch has no false-negatives with for 85KB memory, and no false positives when the switch has 600KB memory. In contrast, NetFlow needs 724KB memory to achieve the 3% false positves/negatives

Tradeoff Efficiency for Generality In theory, OpenSketch requires 6 times memory than complex streaming algorithm or example, the Space-Saving heavy hitter detection algorithm [8] maintains a hash table of items and counts, and requires customized opera- tions such as keeping a pointer to the item with minimum counts and replacing the minimum-count entry with a new item, if the item does not have an entry

OpenSketch Conclusion Bridging the gap between theory and practice Leveraging good properties of sketches Provable accuracy-memory tradeoff Making sketches easy to implement and use Generic support for different measurement tasks Easy to implement with commodity switch hardware Modularized library for easy programming

Dynamic Resource Allocation For TCAM-based Measurement SIGCOMM’14 Talk about how SNAP helps developers and auto-adaptation

SDM Challenges Many measurement tasks Many Management tasks Controller Heavy Hitter detection Change detection Heavy Hitter detection Heavy Hitter detection H Dynamic Resource Allocator (Re)Configure resources 1 Configure resources 1 Fetch statistics 2 Many measurement tasks Task types: Heavy hitter, Change detection Header fields: Source IP, Destination port Traffic aggregate: To drill down Per-tenant tasks Limited measurement resources Limited CPU for measurement Cheap switches with limited memory We focus on hardware switches with TCAM Limited resources (TCAM)

Dynamic Resource Allocator Diminishing return of resources More resources make smaller accuracy gain More resources find less significant outputs Operators can accept an accuracy bound <100% Finding 8Mbps heavy hitters on CAIDA trace detected true HH/all Recall=

Dynamic Resource Allocator Temporal and spatial resource multiplexing Traffic varies over time and switches Resource for an accuracy bound depends on Traffic detected true HH/all Recall=

Challenges No ground truth of resource-accuracy Hard to do traditional convex optimization New ways to estimate accuracy on the fly Adaptively increase/decrease resources accordingly Spatial & temporal changes Task and traffic dynamics Coordinate multiple switches to keep a task accurate Spatial and temporal resource adaptation

Dynamic Resource Allocator Controller Heavy Hitter detection Change detection Heavy Hitter detection Heavy Hitter detection H Estimated accuracy Estimated accuracy Allocated resource Allocated resource Dynamic Resource Allocator Decompose the resource allocator to each switch Each switch separately increase/decrease resources When and how to change resources?

Per-switch Resource Allocator: When? When a task on a switch needs more resources? Based on A’s accuracy (25%) is not enough if bound is 40%, no need to increase A’s resources Based on the global accuracy (47%) is not enough if bound is 80%, increasing B’s resources is not helpful Conclusion: when max(local, global) < accuracy bound Controller Detected HH: 14 out of 30 Global accuracy=47% Heavy Hitter detection Detected HH:5 out of 20 Local accuracy=25% Detected HH:9 out of 10 Local accuracy=90% A B

Per-Switch Resource Allocator: How? How to adapt resources? Take from rich tasks, give to poor tasks How much resource to take/give? Adaptive change step for fast convergence Small steps close to bound, large steps otherwise

Task Implementation Controller Heavy Hitter detection Change detection Estimated accuracy Estimated accuracy Allocated resource Allocated resource Dynamic Resource Allocator (Re)Configure resources 1 Configure resources 1 Fetch statistics 2

Flow-based algorithms using TCAM Goal: Maximize accuracy given limited resources A general resource-aware algorithm Different tasks: e.g., HH, HHH, Change detection Multiple switches: e.g., HHs from different switches Assume: Each flow is seen at one switch (e.g., at sources) 36 Current *** 26 10 New 0** 1** 12 14 5 5 00* 01* 10* 11* 001 011 111 101 5 7 12 2 5 2 3 000 010 100 110

Divide & Merge at Multiple Switches Divide: Monitor children to increase accuracy Requires more resources on a set of switches Example: Needs an additional entry on switch B Merge: Monitor parent to free resources Each node keeps the switch set it frees after merge Finding the least important prefixes to merge is the minimum set cover problem 26 0** Current: A:0**, B:0**, C:0** {A,B,C} {A,B} {B,C} New: A:00*, B:00*,01*, C:01* 12 14 00* 01*

Accuracy Estimation: Heavy Hitter Detection Any monitored leaf with volume > threshold is a true HH Recall: Estimate missing HHs using volume and level of counter Threshold=10 At level 2 missed <=2 HH 76 26 50 12 14 5 7 2 15 35 20 000 001 010 011 100 101 110 111 10* 11* 00* 01* 0** 1** *** With size 26 missed <=2 HHs How to estimate missed HHs? HHH: challenge: Describe what is HHH reported ones are the true ones Dependent on each other

DREAM Overview Resource Allocator DREAM SDN Controller Task type (Heavy hitter, Hierarchical heavy hitter, Change detection) Task specific parameters (HH threshold) Packet header field (source IP) Filter (src IP=10/24, dst IP=10.2/16) Accuracy bound (80%) Prototype Implementation with DREAM algorithms on Floodlight and Open vSwitches 1) Instantiate task 2) Accept/Reject 5) Report Resource Allocator Task object 1 Task object n 7) Allocate / Drop 6) Estimate accuracy DREAM SDN Controller 4) Fetch counters 3) Configure counters

Evaluation Evaluation Goals Compare to How accurate are tasks in DREAM? Satisfaction: Task lifetime fraction above given accuracy How many more accurate tasks can DREAM support? % of rejected/dropped tasks How fast is the DREAM control loop? Compare to Equal: divide resources equally at each switch, no reject Fixed: 1/n resources to each task, reject extra tasks Prototype Implementation with DREAM algorithms on Floodlight and Open vSwitches Comparing with Equal Divide switch resources among tasks with traffic on switch Changes resources on task arrival/leave No rejection/drop Fixed Allocate fixed resources (1/32) to each task Reject if resources are not enough

Prototype Results DREAM: High satisfaction for avg & 5th % of tasks with low rejection Mean 5th % Equal: only keeps small tasks satisfied Fixed: High rejection as over-provisions for small tasks 256 tasks (various task types) on 8 switches

Prototype Results DREAM: High satisfaction for avg & 5th % of tasks at the expense of more rejection Equal & Fixed: only keeps small tasks satisfied

Control Loop Delay Allocation delay is negligible vs. other delays Incremental saving lets reduce save delay

DREAM Conclusion Challenges with software-defined measurement Diverse and dynamic measurement tasks Limited resources at switches Dynamic resource allocation across tasks Accuracy estimators for TCAM-based algorithms Spatial and temporal resource multiplexing Future work Apply DREAM on the hash-based primitive (Sketches) Time-slicing measurement Other service interfaces to tasks

Summary Software-defined measurement Our work Measurement is important, yet underexplored SDN brings new opportunities to measurement Time to rebuild the entire measurement stack Our work OpenSketch:Generic, efficient measurement on sketches DREAM: Dynamic resource allocation for many tasks

Thanks!