Maintaining Stream Statistics over Sliding Windows Ariel Rosenfeld
Streams Here, There, Everywhere! 1 Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc, etc, etc.
Sliding Window Model Time Increases ….1 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1… Window Size = N Current Time
The Problem –Basic counting Count the number of ones in N size window. Exact Solution: Θ(N) memory. Approximate Solution: ? Good approx with o(N) memory?
Sliding Window Computation Main difficulty: discount expiring data As each element arrives, one element expires value of expiring element can’t be known exectly. How do we update our structure? One solution: Use Histograms …1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 0 0 0 0 0 1 0 Bucket sums = (3,2,1,2)
Results Exponential Histogram (EH): 1 + ε approximation. (k = 1/ε) Space: O(1/ε(log2N)) bits. Time: O(log N) worst case, O(1) amortized.
Histograms (remainder)
Example k/2 = 1. Bucket sizes = 4,2,2,1. Bucket sizes = 4,2,2,1,1. ….1 1 0 1 1 1 0 1 0 1 0 0 1 0 1 1 1 1… Element arrived this step. Future
Observations Error in last (leftmost) bucket. Bucket Sizes (left to right): Cm,Cm-1, …,C2,C1 Absolute Error <= Cm/2. Answer >= Cm-1+…+C2+C1+1. Error <= Cm/2(Cm-1+…+C2+C1+1). Maintain: Cm/2(Cm-1+…+C2+C1+1) <= 1/k.
Observations Every Bucket will become last bucket in future. New elements may be all zeros. Bucket Sizes (left to right): Cm,Cm-1, …,C2,C1 For every bucket i, Ci/2(Ci-1+…+C2+C1+1) <= 1/k.
Invariant Maintain Ci/2(Ci-1+…+C2+C1+1) <= 1/k. Exponentially increasing bucket sizes from right to left. At least k/2 buckets (at most k/2 +1)of each size(1,2,4,8,…,2i,...).
Guarantees. Error Guarantee: Number of buckets: O(k log N). Error <= Cm/2(Cm-1+…+C2+C1) <= 1/k. Number of buckets: O(k log N). Buckets require O(log N) bits. Total memory: O(k log2 N) bits.
Random Counter If exact size of bucket is not “a must”. Number of buckets: O(k log N). Buckets require O(loglog N) bits. Total memory: O(k logN loglogN) bits.