SCREAM: Sketch Resource Allocation for Software-defined Measurement

Slides:



Advertisements
Similar presentations
VCRIB: Virtual Cloud Rule Information Base Masoud Moshref, Minlan Yu, Abhishek Sharma, Ramesh Govindan HotCloud 2012.
Advertisements

Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
Programmable Measurement Architecture for Data Centers Minlan Yu University of Southern California 1.
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
A Fast and Compact Method for Unveiling Significant Patterns in High-Speed Networks Tian Bu 1, Jin Cao 1, Aiyou Chen 1, Patrick P. C. Lee 2 Bell Labs,
Detecting DDoS Attacks on ISP Networks Ashwin Bharambe Carnegie Mellon University Joint work with: Aditya Akella, Mike Reiter and Srinivasan Seshan.
Robust Network Compressive Sensing Lili Qiu UT Austin NSF Workshop Nov. 12, 2014.
Measuring Large Traffic Aggregates on Commodity Switches Lavanya Jose, Minlan Yu, Jennifer Rexford Princeton University, NJ 1.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Reverse Hashing for High-speed Network Monitoring: Algorithms, Evaluation, and Applications Robert Schweller 1, Zhichun Li 1, Yan Chen 1, Yan Gao 1, Ashish.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
Dream Slides Courtesy of Minlan Yu (USC) 1. Challenges in Flow-based Measurement 2 Controller Configure resources1Fetch statistics2(Re)Configure resources1.
DREAM: Dynamic Resource Allocation for Software-defined Measurement
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Software-defined Measurement
SIMPLE-fying Middlebox Policy Enforcement Using SDN Zafar Ayyub Qazi Cheng-Chun Tu Luis Chiang Vyas Sekar Rui Miao Minlan Yu.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Enabling a “RISC” Approach for Software-Defined Monitoring using Universal Streaming Vyas Sekar Zaoxing Liu, Greg Vorsanger, Vladimir Braverman.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Re-evaluating Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref, Minlan Yu {alipourf, moshrefj,
MOZART: Temporal Coordination of Measurement (SOSR’ 16)
SketchVisor: Robust Network Measurement for Software Packet Processing
Problem: Internet diagnostics and forensics
Constant Time Updates in Hierarchical Heavy Hitters
Jennifer Rexford Princeton University
A Scalable Approach to Architectural-Level Reliability Prediction
FlowRadar: A Better NetFlow For Data Centers
Xin Li , Chen Qian University of Kentucky
Updating SF-Tree Speaker: Ho Wai Shing.
A Resource-minimalist Flow Size Histogram Estimator
Data Streaming in Computer Networking
A paper on Join Synopses for Approximate Query Answering
The Variable-Increment Counting Bloom Filter
Streaming & sampling.
Augmented Sketch: Faster and More Accurate Stream Processing
Srinivas Narayana MIT CSAIL October 7, 2016
Lottery Scheduling Ish Baid.
Query-Friendly Compression of Graph Streams
DDoS Attack Detection under SDN Context
Lecture 4: CountSketch High Frequencies
Optimal Elephant Flow Detection Presented by: Gil Einziger,
Qun Huang, Patrick P. C. Lee, Yungang Bao
Elastic Sketch: Adaptive and Fast Network-wide Measurements
SPEAKER: Yu-Shan Chou ADVISOR: DR. Kai-Wei Ke
Balancing Risk and Utility in Flow Trace Anonymization
Approximate Frequency Counts over Data Streams
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Memento: Making Sliding Windows Efficient for Heavy Hitters
Linköping University, IDA, ESLAB
COS 461: Computer Networks
Constant Time Updates in Hierarchical Heavy Hitters
Lecture 6: Counting triangles Dynamic graphs & sampling
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
Ran Ben Basat, Xiaoqi Chen, Gil Einziger, Ori Rottenstreich
Author: Yi Lu, Balaji Prabhakar Publisher: INFOCOM’09
Lu Tang , Qun Huang, Patrick P. C. Lee
Re-evaluating Measurement Algorithms in Software
NitroSketch: Robust and General Sketch-based Monitoring in Software Switches Alan (Zaoxing) Liu Joint work with Ran Ben-Basat, Gil Einziger, Yaron Kassner,
Towards Predictable Datacenter Networks
(Learned) Frequency Estimation Algorithms
Presentation transcript:

SCREAM: Sketch Resource Allocation for Software-defined Measurement (CoNEXT’15) I will show you how to stuff many measurement tasks with high accuracy on switches with limited resources. Masoud is on job market Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat

Measurement is Crucial for Network Management Network Management on multiple tenants: Traffic Engineering Anomaly Detection Accounting Traffic Engineering DDoS detection Anomaly Detection Need fine-grained visibility of how networks work Multiple tenants, many mgm goals, these mgm goals are achieved with these measurement tasks Anomaly detection: to find performance bugs Traffic engineering algorithms need to know big flows For DDoS detection we need to know if a set of IPs send too much traffic An anomaly is to find hosts that communicate with too many destinations and can be detected using a super source detection task. Measurement: Heavy Hitter detection Heavy hitter detection (HH) Heavy Hitter detection Hierarchical heavy hitter detection (HHH) Change detection Super source detection (SSD)

Software Defined Measurement Controller DREAM [SIGCOMM’14] / SCREAM [CoNEXT’15] Task 1 Task 2 Configure Emphasize on the workflow (Tasks configure switches, switches measure and send counters to the controller, tasks make report to the user) Collect Switch A Switch B Task 1 counters Task 1 counters Task 2 counters Task 2 counters

Our Focus: Sketch-based Measurement Summaries of streaming data to approximately answer specific queries Ex: Bitmap for counting unique items OpenFlow Counters Sketches DREAM [SIGCOMM’14] SCREAM [CoNEXT’15] Memory Expensive, Cheaper SRAM power-hungry TCAM Don’t say it in a way that sketches are our contribution Sketches: Memory efficient, streaming, approximate, specific query All traffic all-the-time  sketches are more accurate for variable traffic and support more tasks like flow size distribution Counters Volume counters Volume and Connection counters Flows Selected prefixes All traffic all-the-time Sketches use a cheaper memory and are more expressive

Sketch Example: Count-Min Sketch At packet arrival: w h1(IP) 2+1=3 h2(IP) (IP, 1 Kbytes) 4+1=5 d h3(IP) 1+1=2 At query: What is the traffic size of IP? = row with min collision = Min(3,5,2) = 2 Explain what Count-Min means At query we pick minimum to pick the row that had minimum HASH COLLISIONS Highlight the traffic dependence early: Given a sketch of a specific size, its error depends on traffic properties such as total traffic size. (emphasize here) Resource accuracy trade-off: Provable error bound given traffic properties

Challenges: Limited Counters for Many Tasks Limited shared resources: SRAM capacity (e.g., 128 MB) Shared with other functions (e.g., routing) Too many resources to guarantee accuracy: 1 MB-32 MB per task Less than 4-128 tasks in SRAM Many task instances: 3 types (Heavy hitter, Hierarchical heavy hitter, Super source) Different flow aggregates (Rack, App, Src/Dst/Port) 1000s of tenants

Goal: Many Accurate Sketch-based Measurements Users dynamically instantiate a variety of measurement tasks SCREAM supports the largest number of measurement tasks while maintaining measurement accuracy At high level, our contribution is to enable flexible measurements in networks where users can dynamically instantiate a number of complex measurements into the network state. Our system, SCREAM, accommodates the largest number of measurements while maintaining accuracy, by leveraging tradeoffs between aggregate switch resource consumption and measurement accuracy.

Approach: Dynamic Resource Allocation Resource accuracy trade-off depends on traffic Count Min: Provable error bound given traffic properties Ex: Skew of traffic from each IP Worst-case uses >10x counters than average Required memory Skew Dynamic allocation for current traffic

Opportunity: Temporal Multiplexing Memory requirement varies over time Task 1 Task 2 Required Memory This gives us the opportunity of temporal multiplexing to support more tasks. Time Multiplex memory among tasks over time

Opportunity: Spatial Multiplexing Memory requirement varies across switches Task 1 Task 2 Required Memory It also gives us the opportunity of spatial multiplexing to support more tasks. Switch A Switch B Multiplex memory among tasks across switches

Key Insight Leverage spatial and temporal multiplexing and dynamically allocate switch memory per task to achieve sufficient accuracy for many tasks DREAM has the same insight SCREAM applies it for sketches

SCREAM Contributions 1- Allocate memory among sketch-based task instances across switches while maintaining sufficient accuracy SCREAM Dynamic resource allocator Allocation Heavy hitter (HH) tasks Hierarchical heavy hitter (HHH) tasks Super Source (SSD) tasks 2- Supports 3 sketch-based task types Anomaly detection Traffic engineering DDoS detection

SCREAM Iterative Workflow Collect & report Counters & output Estimate accuracy Accuracy Allocate resources Memory size

SCREAM Iterative Workflow Collect & report Merge counters from switches Accuracy Estimate accuracy Task1 accuracy <80% Allocate resources Give more memory to task1

SCREAM Iterative Workflow Collect & report Merge counters from switches Accuracy Estimate accuracy Skew of traffic for task2 changes Task2 accuracy <80% Allocate resources Give more memory to task2

SCREAM Challenges Collect & report Network-wide task implementation using sketches Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM’14]

Challenge: Merge Sketches of Different Sizes Network-wide Task Heavy hitter (HH) Source IPs sending > 10Mbps 1- Tasks can have traffic from multiple switches 2- The sketches on these switches may have different sizes 3- However, previous work can only merge sketches of the same size Let me give an example, Consider heavy hitter detection task that finds source Ips that send > 10Mbps. It has traffic from two switches A and B and it wants to find the size of a flow that sends 10 Mbps on one and 15 on another It uses count-min sketch. When we change the memory to its sketch on a switch, we change the number of counters per row (w),. Previous work just add two arrays but it is impossible if the arrays have different sizes. 25 10 15 Switch A Switch B d d w1 w2

SCREAM Solution to Merge Sketches for HH Detection Previous work: Min of sums SCREAM: Sum of mins 10 40 30 50 70 20 50 80 90 + 10 40 30 50 70 20 ≥ 10 20 Min Min + 50 30 Previous work can only merge sketches of the same size. A natural extension for sketches of different sizes is to find the corresponding counter for each prefix at each row and sum the counters at similar rows across sketches. We call this approach min of sums. Here, I show the counters in each row with different colors. Taking the minimum of sums results in 50. Another approach is to get the approximation from each sketch and add them together. We call this sum of mins and results in 30 in this example Sum of mins is always smaller than min of sums. Because count-min always over-approximates, smaller is more accurate thus SCREAM uses sum of mins. 25 10 15 Switch A Switch B 10 30 70 40 50 20 Both over-approximate  smaller is more accurate

SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Allocate resources Fast & Stable allocation in DREAM [SIGCOMM’14]

Precision Estimation for Heavy Hitter Detection True detected HH Detected HHs Precision = = Sum(P[Detected HH is true]) Insight: Relate probability to Error on counters of detected HHs Threshold True HH False HH Estimate-Threshold Error Estimate-Threshold Thus to estimate the probability that a detected HH is a true HH we need to find the probability that the error is larger than the difference between estimated value and threshold Next I describe how to find this probabilty Estimated Real P[Detected HH is true] = 1 - P[Error ≥ Estimate-Threshold]

Precision Estimation Step 1: Find a Bound on The Error Insight: Relate probability to Error on counters of detected HHs Idea: Use average Error in Markov’s inequality to bound it This is the strawman solution. We know the average error on each counter of count-min. Thus using Markov’s inequality, we find a bound on the probability that error goes above estimation minus threshold Step 1 P[Detected HH is true] = 1 - P[Error ≥ Estimate-Threshold]

Precision Estimation Step 2: Improve The Bound A row in Count-Min: Step 2 We know counter indices for heavy items so we can find their collisions. Thus we don’t need to rely on Markov’s inequality to find their errors. Step 1 Insight: Average Error = heavy items collision + small items collision Counter indices of detected HHs show heavy collisions Idea: Markov’s inequality only for small items

SCREAM Solutions Collect & report Network-wide task implementation using sketches Merge sketches of different sizes for HH, HHH, SSD SSD algorithm with higher and more stable accuracy Estimate accuracy Accuracy estimation without the ground-truth Precision estimators for HH, HHH and SSD tasks Allocate resources Fast & Stable allocation in DREAM [SIGCOMM’14]

Evaluation Metrics: Satisfaction of a task: Fraction of task’s lifetime with sufficient accuracy % of rejected tasks OpenSketch allocates for bounded relative error based on the worst-case traffic. We test it for different error bounds. Alternatives: OpenSketch: Allocate for bounded error for worst-case traffic at task instantiation (test with different bounds) Oracle: Knows required resource for a task in each switch in advance

Evaluation Setting Simulation for 8 switches: 256 task instances (HH, HHH, SSD, combination) Accuracy bound = 80% 5 min tasks arriving in 20 minutes 2 hours CAIDA trace We tested for each type of tasks and combination of them

SCREAM Provides High Accuracy for More Tasks SCREAM: High satisfaction and low reject We tested open sketch with 10%, 50% and 90% relative error bounds OpenSketch: Loose bound  Under provision  low satisfaction Tight bound  Over provision  high reject

SCREAM’s Performance Is Close to An Oracle SCREAM satisfaction is lower because: Iterative allocation takes time Accuracy estimation has error

Other Evaluations Changing traffic skew SCREAM supports more accurate tasks than OpenSketch Accuracy estimation error SCREAM accuracy estimation has 5% error in average Other accuracy metrics Tasks in SCREAM have high recall (low false negative)

Conclusion Measurement is crucial for SDN management in a resource-constrained environment Practical sketch-based SDM by dynamic memory allocation Implementing network-wide tasks using sketches Estimating accuracy for 3 types of tasks SCREAM is available at github.com/USC-NSL/SCREAM