Maintaining Stream Statistics over Sliding Windows

Slides:



Advertisements
Similar presentations
An Optimal Algorithm for the Distinct Elements Problem
Advertisements

Sketch-based Querying of Distributed Sliding-window Data Streams
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
An Improved Data Stream Summary: The Count-Min Sketch and its Applications Graham Cormode, S. Muthukrishnan 2003.
Ariel Rosenfeld Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc,
Maintaining Variance and k-Medians over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O’Callaghan Stanford University.
1 A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window Costas Busch Rensselaer Polytechnic Institute Srikanta Tirthapura.
Maintaining Variance over Data Stream Windows Brian Babcock, Mayur Datar, Rajeev Motwani, Liadan O ’ Callaghan, Stanford University ACM Symp. on Principles.
Mining Data Streams.
Estimating TCP Latency Approximately with Passive Measurements Sriharsha Gangam, Jaideep Chandrashekar, Ítalo Cunha, Jim Kurose.
Computing Diameter in the Streaming and Sliding-Window Models J. Feigenbaum, S. Kannan, J. Zhang.
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal.
Heavy hitter computation over data stream
Query Assurance on Data Streams  Ke Yi (AT&T Labs, now at HKUST)  Feifei Li (Boston U, now at Florida State)  Marios Hadjieleftheriou (AT&T Labs) 
Computing Diameter in the Streaming and Sliding-Window Models J. Feigenbaum, S. Kannan, J. Zhang.
1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura.
Processing Data-Stream Joins Using Skimmed Sketches Minos Garofalakis Internet Management Research Department Bell Labs, Lucent Technologies Joint work.
Stream Data Introduction or “Stream Data in 30 minutes or less…” Magdiel Galán CSE591: DataMining Dr. Huan Liu Spring 2004.
What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.
A survey on stream data mining
Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,
Data Stream Processing (Part IV)
Pattern Matching in the streaming model Ely Porat Google inc & Bar-Ilan University.
CS591A1 Fall Sketch based Summarization of Data Streams Manish R. Sharma and Weichao Ma.
1 Mining Data Streams The Stream Model Sliding Windows Counting 1’s.
1 Section 2.3 Complexity of Algorithms. 2 Computational Complexity Measure of algorithm efficiency in terms of: –Time: how long it takes computer to solve.
1 Mining Data Streams The Stream Model Sliding Windows Counting 1’s.
CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.
CS 591 A11 Algorithms for Data Streams Dhiman Barman CS 591 A1 Algorithms for the New Age 2 nd Dec, 2002.
One-Pass Wavelet Decompositions of Data Streams TKDE May 2002 Anna C. Gilbert,Yannis Kotidis, S. Muthukrishanan, Martin J. Strauss Presented by James Chan.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
Computer Science and Engineering Loyalty-based Selection: Retrieving Objects That Persistently Satisfy Criteria Presented By: Zhitao Shen Joint work with.
Maintaining Variance and k-Medians over Data Stream Windows Paper by Brian Babcock, Mayur Datar, Rajeev Motwani and Liadan O’Callaghan. Presentation by.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565.
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.
Data in Motion Michael Hoffman (Leicester) S Muthukrishnan (Google) Rajeev Raman (Leicester)
Facility Location in Dynamic Geometric Data Streams Christiane Lammersen Christian Sohler.
PIRS: Query Verification on Data Streams  Ke Yi, Hong Kong University of Science and Technology  Feifei Li, Florida State University  Marios Hadjieleftheriou,
The Misra Gries Algorithm. Motivation Espionage The rest we monitor.
Data Streams Topics in Data Mining Fall 2015 Bruno Ribeiro © 2015 Bruno Ribeiro.
Data Mining: Concepts and Techniques Mining data streams
Lecture 2 Analysis of Algorithms How to estimate time complexity? Analysis of algorithms Techniques based on Recursions ACKNOWLEDGEMENTS: Some contents.
Sampling for Windows on Data Streams by Vladimir Braverman
Mining of Massive Datasets Ch4. Mining Data Streams
June 16, 2004 PODS 1 Approximate Counts and Quantiles over Sliding Windows Arvind Arasu, Gurmeet Singh Manku Stanford University.
REU 2009-Traffic Analysis of IP Networks Daniel S. Allen, Mentor: Dr. Rahul Tripathi Department of Computer Science & Engineering Data Streams Data streams.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window MADALGO – Center for Massive Data Algorithmics, a Center of the Danish.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Ariel Rosenfeld.  Counter ranges from 0 to M requiers log 2 M bits.  For large data log 2 M is still a lot.  Using probability to reduce to log 2 log.
Mining Data Streams (Part 1)
CS38 Introduction to Algorithms
The Stream Model Sliding Windows Counting 1’s
Finding Frequent Items in Data Streams
Counting How Many Elements Computing “Moments”
Optimal Elephant Flow Detection Presented by: Gil Einziger,
Mining Data Streams Some of these slides are based on Stanford Mining Massive Data Sets Course slides at
Approximate Frequency Counts over Data Streams
Range-Efficient Computation of F0 over Massive Data Streams
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
Catching the Microburst Culprits with Snappy
Approximation and Load Shedding Sampling Methods
Approximate Counting Algorithm
Catching the Microburst Culprits with Snappy
(Learned) Frequency Estimation Algorithms
Counting Bits.
Presentation transcript:

Maintaining Stream Statistics over Sliding Windows Ariel Rosenfeld

Streams Here, There, Everywhere! 1 Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc, etc, etc.

Sliding Window Model Time Increases ….1 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1… Window Size = N Current Time

The Problem –Basic counting Count the number of ones in N size window. Exact Solution: Θ(N) memory. Approximate Solution: ? Good approx with o(N) memory?

Sliding Window Computation Main difficulty: discount expiring data As each element arrives, one element expires value of expiring element can’t be known exectly. How do we update our structure? One solution: Use Histograms …1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 Bucket sums = (3,2,1,2)

Results Exponential Histogram (EH): 1 + ε approximation. (k = 1/ε) Space: O(1/ε(log2N)) bits. Time: O(log N) worst case, O(1) amortized.

Histograms (remainder)

Example k/2 = 1. Bucket sizes = 4,2,2,1. Bucket sizes = 4,2,2,1,1. ….1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 1 1 1… Element arrived this step. Future

Observations Error in last (leftmost) bucket. Bucket Sizes (left to right): Cm,Cm-1, …,C2,C1 Absolute Error <= Cm/2. Answer >= Cm-1+…+C2+C1+1. Error <= Cm/2(Cm-1+…+C2+C1+1). Maintain: Cm/2(Cm-1+…+C2+C1+1) <= 1/k.

Observations Every Bucket will become last bucket in future. New elements may be all zeros. Bucket Sizes (left to right): Cm,Cm-1, …,C2,C1 For every bucket i, Ci/2(Ci-1+…+C2+C1+1) <= 1/k.

Invariant Maintain Ci/2(Ci-1+…+C2+C1+1) <= 1/k. Exponentially increasing bucket sizes from right to left. At least k/2 buckets (at most k/2 +1)of each size(1,2,4,8,…,2i,...).

Guarantees. Error Guarantee: Number of buckets: O(k log N). Error <= Cm/2(Cm-1+…+C2+C1) <= 1/k. Number of buckets: O(k log N). Buckets require O(log N) bits. Total memory: O(k log2 N) bits.

Random Counter If exact size of bucket is not “a must”. Number of buckets: O(k log N). Buckets require O(loglog N) bits. Total memory: O(k logN loglogN) bits.