Download presentation
Presentation is loading. Please wait.
Published byDamian Gregory Modified over 8 years ago
1
1 Approximating Quantiles over Sliding Windows Srimathi Harinarayanan CMPS 565
2
2 Streams Here, There, Everywhere! 1 0 1 1 1 0 1 0 0 1 1 Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc, etc, etc.
3
3 Problem Definition Data Stream Environment One Pass Data element is a value Φ -quantile ( [0,1) ) The element with rank Ceiling ( Φ N) of an ordered sequence of N data elements.
4
4 t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 t 11 t 12 t 13 t 14 t 15 121011101 119678 4523 sort 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 10, 11, 11, 11, 12 N = 16 0.5 quantile returns element ranked 8 ( 0.5*16) which is 8 0.75 quantile returns element ranked 12 (0.75*16) which is 10
5
5 3 Models Data Stream Model Computing Φ-quantile for all the data items seen so far Sliding Window Model Computing Φ-quantile against the N most recent elements in a data stream seen so far n of N Model For any n of N, computing Φ-quantile among the n most recent elements in a data stream seen so far
6
6 Sliding Window Model ….1 0 1 0 0 0 1 0 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 1… Time Increases Current Time Window Size = N Most Recent N Elements
7
7 Sliding window model t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 t 11 t 12 t 13 t 14 t 15 121011101 119678 4523 1678910 11 12 1234567891011 Window size = 12, 0.5-quantile returns 10 at time t 11 0.5-quantile returns 6 at time t 15
8
8 n-of-N model t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 t 11 t 12 t 13 t 14 t 15 121011101 119678 4523 167891011 2345 N = 12, 0.5-quantile returns 8 at time t 11 for n = 8, 0.5-quantile returns 3 at time t 15 for n = 4
9
9 Applications - Sliding Window Model in Data Streams Useful for Network Traffic Management, Sensor Data. To find out Top Ranked Web pages from Most Recently accessed N pages In the financial market, investors are often interested in finding out the most recent N bids.
10
10 Previous Work on Approximating Quantiles in One Scan of Data 1/є log²єN] G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Approximate medians and other quantiles in one pass and with limited memory [ 1/є log²єN] G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Random sampling techniques for space efficient online computation of order statistics of large datasets. 1/є log єN] {GK Algorithm} M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. [ 1/є log єN] {GK Algorithm} GK Algorithm MOST EFFICIENT OWING TO LEAST SPACE USAGE + does not require advance knowledge of N
11
11 Definitions -Quantile: A -quantile ( (0,1]) of an ordered sequence of N data elements is the element with rank N . Quantile Query: Given , find the data element with rank N among all elements in the stream. Variation: N recent elements (sliding window model). ( -approximate): Find the element with rank r within the interval [r- N, r+ N].
12
12 Computation of Quantile Summaries over Sliding Windows – 2 Methods Continuously Maintaining Quantile Summaries of the Most Recent N Elements over a Data Stream, Xuemin Lin, Hongjun Lu, Jian Xu, Jeffrey Xu Yu, 2004 IEEE Approximating frequency counts and quantiles using sliding window model, Arvind Arasu, Gurmeet Singh Manku,Stanford University, 2004
13
13 Computation of Quantile Summaries over Sliding Windows – LLXY04 GK Algorithm + Concept Of Aging (Computing quantiles over a Sliding Window of Most Recent N Elements) Under sliding window model, a summary is maintained for the most recently seen N data elements. Eliminate exact out-dated elements requires a space of O(N).
14
14 e-approximate A quantile summary for a data sequence is e- approximate if, for any given rank r, it returns a value whose rank r’ is guaranteed to be within the interval [r - εN, r + εN ] Example : A data stream with 100 elements, 0.5 – quantile with ε= 0.1 returns a value v. The true rank of v is within [40,60]
15
15 Quantile Sketch Data structure { (v i, r i –,r i + ) : 1 ≦ i ≦ m} A value v i is one of the element seen so far r i – is the lower bound on the rank of v i r i + is the upper bound on the rank of v i v i <= v i+1, for 1 ≦ i ≦ m - 1 r i – <= r i+1 –, for 1 ≦ i ≦ m – 1 r i – < =r i <= r i +, where r i is the rank of v i
16
16 Example t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 t 11 t 12 t 13 t 14 t 15 121011101 119678 4523 Quantile sketch consisting of 6 tuples {(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)}
17
17 e - approximate sketch Theorem 1. r 1 + ≦ εN + 1, 2. r m – ≧ (1-ε)N, 3. for 2 ≦ i ≦ m, Sketch S is e - approximate, That is for each Φ(0,1], there is a (v i, r i –,r i + ) in S such that
18
18 Query t0t0 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 t 11 t 12 t 13 t 14 t 15 121011101 119678 4523 Quantile sketch consisting of 6 tuples ε= 0.25 {(1,1,1), (2,2,9), (3,3,10), (5,4,10), (10,10,10), (12,16,16)} 0.5 – quantile return the v i of rank 8, εN = 4 Find the first tuple to satisfy the rule, and return vi (4,4,10) => return 4
19
19 One-Pass summary for sliding windows Continuously divide a stream into the buckets based on the arrival ordering of data elements The capacity of each bucket is For each bucket, we maintain an - approximate continuously by GK-algorithm Once a bucket is full its - approximate sketch is compressed into an - approximate sketch The oldest bucket is expired if currently the total number of elements is N+1
20
20 Current bucket the most recent N elements ….…. expired bucket Compressed - approximate sketch in each bucket GK Summary Technique
21
21 -approximate sketch Example N = 8, ε= 1, = 4 123456789 Current bucket Expire Current bucket Full, compress -approximate sketch
22
22 Compress Compress an - approximate sketch into e- approximate sketch Memory space is most Why not use - approximate sketch in each bucket directly? Compress technique takes about half of the number of tuples given by - approximate sketch
23
23 Merge There are h data stream D i,and each D i has N i data elements. Suppose each S i is an e-approximate sketch of D i. S merge is a sketch of |S merge | = Suppose each S i is an e-approximate sketch. Then, S merge is also an e-approximate sketch
24
24 Another Problem 5, 6, 7, 8,1, 2, 3, 4, Expired 9 Current ε=1 and N = 8 Approximate sketch The first tuple in S merge is, but the rank of 5 is 4. S merge is not an - approximate sketch
25
25 Lift To solve the pervious problem, we use a “lift” operation to lift the value of by for each tuple i If S is an - approximate sketch, then S lift is an e-approximate sketch That is why the bucket size is and we maintain - approximate sketch of each bucket summary
26
26 Query Step1. merge the local sketch … S merge Step2. lift S merge lift S lift Current bucket Step3. for a given rank r =,find the first tuple in S lift such that, return v i
27
27 Space – Sliding Window LLXY ‘04 1/є²+(log (є²N)/є)) O(1/є² +(log (є²N)/є))Reason: Sketch in each bucket produced by the GK algorithm takes O (log (є²N)/є) space which will be compressed to O(1/є) once the bucket is full Sketch in each bucket produced by the GK algorithm takes O (log (є²N)/є) space which will be compressed to O(1/є) once the bucket is full O(1/є) buckets O(1/є) buckets
28
28 Performance Studies Sliding window model Compare with the ARS-algorithm Avg Errors Space Consumption Distributions n-of-N model Compare with the heuristic algorithm nN’ Avg Errors Space Consumption Query performance
29
29 Conclusion This work presented is among the first attempts to develop space efficient, one pass, deterministic quantile summary algorithms with performance guarantees under the sliding window model of data streams
30
30 Approximating quantiles using sliding window model - Manku’s Approximating Quantiles: GK Algorithm + Concept of Aging Improves over [ LLXY `04 ] 1/є²+(log (є²N)/ є)) [LLXY `04] space: O(1/є² +(log (є²N)/ є)) Manku’s Space: 1/є(log (1/є log N))) Manku’s Space: O(1/є(log (1/є log N))) The space complexity is achieved by minimising the space used for maintaining the state The space complexity is achieved by minimising the space used for maintaining the state at any point in time,e-approximate quantiles, for any (0; 1]) over the current contents of the sliding window can be computed using the maintained state. The goal is to minimize the space required for maintaining the state.
31
31 Overview N
32
32 Overview N
33
33 Overview N
34
34 Overview N
35
35 Overview N
36
36 Overview N
37
37 Overview N
38
38 Overview N
39
39 Details N єNєN 4 1 є log ( ) є 1 є 0 є 2 = O(єN)
40
40 1/є(log (1/є log N))) Space Requirement O(1/є(log (1/є log N))) Space required for level-ℓ blocks: 1 є ℓ x N N ℓ Size of a quantile sketch Number of “active” blocks N єN / log ( 1 є ) == 1 є 1 є () x 1/є log єN Space required for GK Algorithm = 1/є log єN 1/є log єN = 1/є(log (1/є log N))) O(1/є(log (1/є log N))) 1 є 1 є log ()
41
41 Conclusion The work presented is better than the first method with respect to space. This paper also provides a randomized quantile finding algorithm with further improvement in space.
42
42 Any Question?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.