By: Ran Ben Basat, Technion, Israel

Slides:

Advertisements

Similar presentations

1 School of Computing Science Simon Fraser University CMPT 771/471: Internet Architecture & Protocols TCP-Friendly Transport Protocols.

Advertisements

Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.

Finding Frequent Items in Data Streams Moses CharikarPrinceton Un., Google Inc. Kevin ChenUC Berkeley, Google Inc. Martin Franch-ColtonRutgers Un., Google.

Fast Algorithms For Hierarchical Range Histogram Constructions

3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.

Yoshiharu Ishikawa (Nagoya University) Yoji Machida (University of Tsukuba) Hiroyuki Kitagawa (University of Tsukuba) A Dynamic Mobility Histogram Construction.

Polymorphic blending attacks Prahlad Fogla et al USENIX 2006 Presented By Himanshu Pagey.

SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.

Kuang-Hao Liu et al Presented by Xin Che 11/18/09.

Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.

Hash Tables With Finite Buckets Are Less Resistant to Deletions Yossi Kanizo (Technion, Israel) Joint work with David Hay (Columbia U. and Hebrew U.) and.

Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:

Ph.D. SeminarUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:

What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.

Retrieval Evaluation: Precision and Recall. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity.

Probabilistic Data Aggregation Ling Huang, Ben Zhao, Anthony Joseph Sahara Retreat January, 2004.

Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.

What’s Hot and What’s Not: Tracking Most Frequent Items Dynamically By Graham Cormode & S. Muthukrishnan Rutgers University, Piscataway NY Presented by.

Retrieval Evaluation. Introduction Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.

Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.

By Graham Cormode and Marios Hadjieleftheriou Presented by Ankur Agrawal ( )

1 Efficient Computation of Frequent and Top-k Elements in Data Streams.

CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.

Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.

CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.

TinyLFU: A Highly Efficient Cache Admission Policy

Resource/Accuracy Tradeoffs in Software-Defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan HotSDN’13.

A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.

PODC Distributed Computation of the Mode Fabian Kuhn Thomas Locher ETH Zurich, Switzerland Stefan Schmid TU Munich, Germany TexPoint fonts used in.

1 LD-Sketch: A Distributed Sketching Design for Accurate and Scalable Anomaly Detection in Network Data Streams Qun Huang and Patrick P. C. Lee The Chinese.

Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.

Enabling a “RISC” Approach for Software-Defined Monitoring using Universal Streaming Vyas Sekar Zaoxing Liu, Greg Vorsanger, Vladimir Braverman.

The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Calculating frequency moments of Data Stream

Histograms for Selectivity Estimation, Part II Speaker: Ho Wai Shing Global Optimization of Histograms.

@ Carnegie Mellon Databases 1 Finding Frequent Items in Distributed Data Streams Amit Manjhi V. Shkapenyuk, K. Dhamdhere, C. Olston Carnegie Mellon University.

SCREAM: Sketch Resource Allocation for Software-defined Measurement Masoud Moshref, Minlan Yu, Ramesh Govindan, Amin Vahdat (CoNEXT’15)

REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.

Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.

SketchVisor: Robust Network Measurement for Software Packet Processing

Constant Time Updates in Hierarchical Heavy Hitters

Updating SF-Tree Speaker: Ho Wai Shing.

A Resource-minimalist Flow Size Histogram Estimator

The Variable-Increment Counting Bloom Filter

Streaming & sampling.

Augmented Sketch: Faster and More Accurate Stream Processing

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

Introduction to Hashing & Hashing Techniques

Query-Friendly Compression of Graph Streams

Data Stream Management System (DSMS)

Optimal Elephant Flow Detection Presented by: Gil Einziger,

Qun Huang, Patrick P. C. Lee, Yungang Bao

SCREAM: Sketch Resource Allocation for Software-defined Measurement

Smita Vijayakumar Qian Zhu Gagan Agrawal

Programmable Data Plane

Feifei Li, Ching Chang, George Kollios, Azer Bestavros

Range-Efficient Computation of F0 over Massive Data Streams

Memento: Making Sliding Windows Efficient for Heavy Hitters

Constant Time Updates in Hierarchical Heavy Hitters

Introduction to Hashing & Hashing Techniques

By: Ran Ben Basat, Technion, Israel

Network-Wide Routing Oblivious Heavy Hitters

Heavy Hitters in Streams and Sliding Windows

Ran Ben Basat, Xiaoqi Chen, Gil Einziger, Ori Rottenstreich

Lu Tang , Qun Huang, Patrick P. C. Lee

Approximate Counting Algorithm

Maintaining Stream Statistics over Sliding Windows

NitroSketch: Robust and General Sketch-based Monitoring in Software Switches Alan (Zaoxing) Liu Joint work with Ran Ben-Basat, Gil Einziger, Yaron Kassner,

2019/11/12 Efficient Measurement on Programmable Switches Using Probabilistic Recirculation Presenter：Hung-Yen Wang Authors：Ran Ben Basat, Xiaoqi Chen,

(Learned) Frequency Estimation Algorithms

Presentation transcript:

By: Ran Ben Basat, Technion, Israel Randomized Admission Policy for Efficient Top-k and Frequency Estimation By: Ran Ben Basat, Technion, Israel Based on a joint work with Gil Einziger, Roy Friedman, and Yaron Kassner IEEE INFOCOM2017 4/18/2019

Computing network statistics. Monitoring a large number of flows. Motivation Computing network statistics. Load balancing, Fairness, Anomaly detection. Monitoring a large number of flows. Allowing real-time queries. 4/18/2019

How many packets has sent? How about ? Frequency Estimation How many packets has sent? How about ? Can’t allocate a counter for each flow! 1 5 3 7 8 4 2 4/18/2019

Space Saving (Metwally et al.) For a N-sized stream, when allocated with m counters, Space Saving guarantees: 𝑓 𝑥 ≤𝑄𝑢𝑒𝑟𝑦 𝑥 ≤ 𝑓 𝑥 + 𝑁 𝑚 Keep a set of m counters. Each counter keeps the ID of the associated item. If an element that had a counter arrives, increase its value by one. If an element without a counter arrive, allocate it with the minimal counter and increment. When queried for an element that has a counter, we return its value. Query( ) returns 3 When queried for unmonitored element, return the value of the minimal counter. Query ( ) returns 2 For top-k, simply return the IDs of the largest counters (e.g., guess the most frequent element to be ). For achieving an 𝑁𝜖 approximation, ⌈1/𝜖⌉ counters are needed. ID Value 2 3 1 2 1 4/18/2019

Space Saving (Metwally et al.) Invented in the Database community. Works great on skewed data, where a small fraction of the elements appear most of the times. But not that well when data is heavy tailed, which is often the case in flow monitoring. 4/18/2019

Illustration 4/18/2019

- Probabilistic Admission The solution - Probabilistic Admission 4/18/2019

Limited Associativity Randomly distribute elements into small size algorithm instances. ℎ =5 Implementing constant size instances reduce data structures complexity to (nearly) zero. 1 2 3 4 5 6 7 4/18/2019

Evaluation – Top-𝑘 How many counters are needed to identify top-32?

Evaluation – Top-𝑘 Which precision can we achieve for retrieving 50% of the top-512?

Evaluation – Top-𝑘 How fast does the take for the algorithms to converge?

How well can we estimate the sender’s packet count? Evaluation – Frequency Estimation How well can we estimate the sender’s packet count?

Limited Associativity Effect 4/18/2019

Takeaways Randomization allows effective filtering of tail items and increased accuracy. Small associativity allows data structure-less implementation – saving space and time. 4/18/2019

Any Questions 4/18/2019

1.0 0.8 0.6 0.4 0.2 0.0 Recall 1.0 0.8 0.6 0.4 0.2 0.0 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 Mean Square Error Precision 2 5 2 6 2 7 2 8 2 9 2 10 2 11 0.0 0.2 0.4 0.6 0.8 1.0 2 5 2 6 2 7 2 8 2 9 2 10 2 11 Recall 1.0 0.8 0.6 0.4 0.2 0.0 Recall 0 2 4 6 8 10 Number of Packets [x100K]

10 9 10 8 10 7 10 6 10 5 10 4 10 3 Mean Square Error 10 9 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 Mean Square Error 10 8 10 7 10 6 10 5 10 4 10 3 10 2 10 Mean Square Error 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 5 2 6 2 7 2 8 2 9 2 10 2 11

1.0 0.8 0.6 0.4 0.2 0.0 Recall 0 2 4 6 8 10 0 2 4 6 8 10 0 2 4 6 8 10 Number of Packets [x100K] Number of Packets [x100K] Number of Videos [x100K]

1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 1.0 0.8 0.6 0.4 0.2 0.0 Precision Precision Precision 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Recall Recall Recall

1.0 0.8 0.6 0.4 0.2 0.0 Recall 1.0 0.8 0.6 0.4 0.2 0.0 Recall 1.0 0.8 0.6 0.4 0.2 0.0 Recall 2 5 2 6 2 7 2 8 2 9 2 10 2 11 2 5 2 6 2 7 2 8 2 9 2 10 2 11