Optimal Elephant Flow Detection Presented by: Gil Einziger,

Slides:



Advertisements
Similar presentations
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Advertisements

New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 2 Some of the sides are exported from different sources.
Data Streaming Algorithms for Accurate and Efficient Measurement of Traffic and Flow Matrices Qi Zhao*, Abhishek Kumar*, Jia Wang + and Jun (Jim) Xu* *College.
OpenSketch Slides courtesy of Minlan Yu 1. Management = Measurement + Control Traffic engineering – Identify large traffic aggregates, traffic changes.
ABSTRACT We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result.
Ariel Rosenfeld Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc,
1 Reversible Sketches for Efficient and Accurate Change Detection over Network Data Streams Robert Schweller Ashish Gupta Elliot Parsons Yan Chen Computer.
Polytechnic University,ECE Department1 Detection of “Hot Spots” Paper Title : Joint Data Streaming and Sampling Techniques for Detection of Super Sources.
Analysis of a Statistics Counter Architecture Devavrat Shah, Sundar Iyer, Balaji Prabhakar & Nick McKeown (devavrat, sundaes, balaji,
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering.
Towards a High-speed Router-based Anomaly/Intrusion Detection System (HRAID) Zhichun Li, Yan Gao, Yan Chen Northwestern.
CS591A1 Fall Sketch based Summarization of Data Streams Manish R. Sharma and Weichao Ma.
Coordinated Sampling sans Origin-Destination Identifiers: Algorithms and Analysis Vyas Sekar, Anupam Gupta, Michael K. Reiter, Hui Zhang Carnegie Mellon.
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
SIGCOMM 2002 New Directions in Traffic Measurement and Accounting Focusing on the Elephants, Ignoring the Mice Cristian Estan and George Varghese University.
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
By Graham Cormode and Marios Hadjieleftheriou Presented by Ankur Agrawal ( )
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
TinyLFU: A Highly Efficient Cache Admission Policy
Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Jennifer Rexford Princeton University MW 11:00am-12:20pm Measurement COS 597E: Software Defined Networking.
Connect. Communicate. Collaborate Using Temporal Locality for a Better Design of Flow-oriented Applications Martin Žádník, CESNET TNC 2007, Lyngby.
@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.
Locating network monitors: complexity, heuristics, and coverage Kyoungwon Suh Yang Guo Jim Kurose Don Towsley.
Block-Based Packet Buffer with Deterministic Packet Departures Hao Wang and Bill Lin University of California, San Diego HSPR 2010, Dallas.
SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)
1 Monitoring: from research to operations Christophe Diot and the IP Sprintlabs ipmon.sprintlabs.com.
Re-evaluating Measurement Algorithms in Software Omid Alipourfard, Masoud Moshref, Minlan Yu {alipourf, moshrefj,
1 Building big router from lots of little routers Nick McKeown Assistant Professor of Electrical Engineering and Computer Science, Stanford University.
SketchVisor: Robust Network Measurement for Software Packet Processing
Complexity Analysis (Part I)
Constant Time Updates in Hierarchical Heavy Hitters
FlowRadar: A Better NetFlow For Data Centers
Xin Li , Chen Qian University of Kentucky
A Resource-minimalist Flow Size Histogram Estimator
Empirically Characterizing the Buffer Behaviour of Real Devices
Data Streaming in Computer Networking
Augmented Sketch: Faster and More Accurate Stream Processing
Query-Friendly Compression of Graph Streams
Qun Huang, Patrick P. C. Lee, Yungang Bao
SCREAM: Sketch Resource Allocation for Software-defined Measurement
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Approximate Frequency Counts over Data Streams
Elastic Sketch: Adaptive and Fast Network-wide Measurements
Memento: Making Sliding Windows Efficient for Heavy Hitters
Constant Time Updates in Hierarchical Heavy Hitters
By: Ran Ben Basat, Technion, Israel
Network-Wide Routing Oblivious Heavy Hitters
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
Ran Ben Basat, Xiaoqi Chen, Gil Einziger, Ori Rottenstreich
A flow aware packet sampling mechanism for high speed links
Lu Tang , Qun Huang, Patrick P. C. Lee
Toward Self-Driving Networks
Complexity Analysis (Part I)
Toward Self-Driving Networks
Complexity Analysis (Part I)
NitroSketch: Robust and General Sketch-based Monitoring in Software Switches Alan (Zaoxing) Liu Joint work with Ran Ben-Basat, Gil Einziger, Yaron Kassner,
2019/11/12 Efficient Measurement on Programmable Switches Using Probabilistic Recirculation Presenter:Hung-Yen Wang Authors:Ran Ben Basat, Xiaoqi Chen,
(Learned) Frequency Estimation Algorithms
Presentation transcript:

Optimal Elephant Flow Detection Presented by: Gil Einziger, Joint work with: Ran Ben Basat, Roy Friedman and Yaron Kassner 03/09/2017 [***] -- director notes/comments (***) -- extra read-outs… This is a joint work… We are currently working on a new research project, we call “Remote Continuous Deployment” (RCD). In this talk I’ll give a background survey – and we call this talk “Continuous Development Processes”…

Background: Network Measurements Useful for network optimization: Load Balancing, Traffic Engineering, Quality of Service and Intrusion/Anomaly detection. What is the current throughput of a specific flow? What is the current packet count of a specific flow? 2 7 8 3 5 1 4 6 3 2 What is the current throughput of a specific link? 7 2 8 7 3 4 1 7 8 8 7 3 5 1 7 2 8 7 3 4 1 7 3 4 1 7 8 8 7 3 5 1 7 2 8 7 3 4 1 7 2 8 7 3 4 1 7 3 4 1 7 8 8 7 3 5 1 7 2 8 7 3 3 7

Background: Measurement Challenges Measurement is difficult to implement in practice. Update at line speed  up to 14.88 million of packets per second in old 10GBit links. (and there are much faster links). Large volume of data

Background: Measurement Problems Hardware: No suitable memory technologies: SRAM memories are fast but too small. DRAM memories are large but slow. Small SRAMSpeedup

Background: Measurement Problems Software: Memory is less constraint but the key limitation is speed. Different methods of acceleration: Algorithmic complexity. Sampling. Trading memory for speed make sense!

Problem Statement: Estimate flows’ volume (in bytes) Given a flow’s identifier we provide (approximate) answers such as: What is the byte volume of flow 7?

Our Contributions We present the first (asymptotically) optimal algorithm for weighted frequency estimation. Algorithm Space Query Time Update Time Deterministic Space Saving 𝑂 𝜀 −1 𝑂 1 𝑂 log 𝜀 −1 Yes Count Min Sketch 𝑂 𝜀 −1 log 𝛿 −1 𝑂 log 𝛿 −1 No IM-SUM amortized DIM-SUM worst case

Gist of existing works… Existing works maintain a flow cache of limited size. When the cache fills the smallest flow is evicted. This requires logarithmic time. What can we do differently?…

Iterative Median SUMing In IM-SUM: Instead of removing the minimum every time we: Double the memory. Periodically find median. (At linear time) Periodically remove all flows whose volume is less than the median. (amortized) constant time updates.

Iterative Median SUMing: Example Arriving packets are admitted to the flow table, with the volume of the Last Median. Flow Table Last Median ID Volume 0 2+0=2 2

Iterative Median SUMing: Example Arriving packets are admitted to the flow table. Flow Table Last Median ID Volume 0 2 4 5+0=5 4 5

Iterative Median SUMing: Example If the flow has an entry, update its volume Flow Table Last Median ID Volume 0 2+2=4 0 2 4 5 2

Iterative Median SUMing: Example Arriving packets are admitted to the flow table. Flow Table Last Median ID Volume 0 4 4 5 3 8+0=8 3 8

Iterative Median SUMing: Example Arriving packets are admitted to the flow table. Flow Table Last Median ID Volume 0 4 4 5 3 8 2 6 2 6+0=6

Iterative Median SUMing: Example If the flow has an entry, update its volume. Flow Table Last Median ID Volume 0 4 4 5 3 8 3 8+5=13 3 5 2 6

Iterative Median SUMing: Example If the table is full: Find median Flow Table Last Median ID Volume 5.5 0 4 4 5 3 13 5 2 2 6

Iterative Median SUMing: Example If the table is full: Find median Remove Entries below median Flow Table Last Median ID Volume 5.5 0 4 4 5 3 13 5 2 2 6

Iterative Median SUMing: Example If the table is full: Find median Remove Entries below median Admit new entry (median + weight) Flow Table Last Median ID Volume 5.5 5 2+5.5=7.5 4 5 3 13 5 2 2 6

Iterative Median SUMing: Example Query for monitored items flow table: Query(5) = 8 Query for unmonitored items Last Median: Query(4)=5.5 Flow Table Last Median ID Volume 5.5 5 8 4 5 3 13 2 6

𝑓 𝑥 ≤𝑄𝑢𝑒𝑟𝑦 𝑥 ≤ 𝑓 𝑥 + 𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑀 IM-SUM: Guarantees When IM-SUM is configured with 2M counters it guarantees that: 𝑓 𝑥 ≤𝑄𝑢𝑒𝑟𝑦 𝑥 ≤ 𝑓 𝑥 + 𝑇𝑜𝑡𝑎𝑙 𝑆𝑢𝑚 𝑀 Thus to for an error 𝜀 𝑇𝑜𝑡𝑎𝑙𝑆𝑢𝑚 we require 2 𝜀 counters, which is (asymptotically) optimal.

Summary Space Saving Count Min Sketch Algorithm Space Query Time Update Time Deterministic Space Saving 𝑂 𝜀 −1 𝑂 1 𝑂 log 𝜀 −1 Yes Count Min Sketch 𝑂 𝜀 −1 log 𝛿 −1 𝑂 log 𝛿 −1 No We saw: IM-SUM amortized In the paper: DIM-SUM worst case

UCLA Packet trace. IM-SUM (Captured in UCLA campus) DIM-SUM Empirical Evalutaion UCLA Packet trace. (Captured in UCLA campus) IM-SUM DIM-SUM Space Saving Count Min Sketch

San Jose Internet Trace Empirical Evalutaion IM-SUM DIM-SUM (Backbone link in San Jose) IM-SUM DIM-SUM Space Saving Count Min Sketch

Theoretical: Practical: Contributions: First Memory optimal and constant time heavy hitters algorithm for weighted inputs. Practical: Speedup on real Internet packet traces.

Thank You! IM-SUM and DIM-SUM are open sourced: https://github.com/kassnery/dimsum