Download presentation
Presentation is loading. Please wait.
1
Time-Decaying Sketches for Sensor Data Aggregation Graham Cormode AT&T Labs, Research Srikanta Tirthapura Dept. of Electrical and Computer Engineering Iowa State University Bojian Xu Dept. of Electrical and Computer Engineering Iowa State University
2
2/30 Mean of the Temperatures in the Last 30 Minutes 76F 11:45 73F 11:40 79F 11:30 70F 11:22 76F 11:15 78F 11:41 73F 11:39 76F 11:38 76F 11:26 75F 11:39 76F 11:34 72F 11:29 73F 11:19 80F 11:38 79F 11:30 76F 11:25 76F 11:45 78F 11:41 73F 11:39 76F 11:38 76F 11:26
3
3/30 Sketch 76F 11:45 73F 11:40 79F 11:30 70F 11:22 76F 11:15 78F 11:41 73F 11:39 76F 11:38 76F 11:26 75F 11:39 76F 11:34 72F 11:29 73F 11:19 80F 11:38 79F 11:30 76F 11:25 76F 11:45
4
4/30 Sketch Merging Answer
5
5/30 General Time Decay General Decay function: Time decayed value of element at time c is: age0
6
6/30 Formal Model of the Data (on One Sensor) Data stream: e 0 =(v 0,t 0,id 0 ), e 1 =(v 1,t 1,id 1 ), … –v: value –t: timestamp of creation –id: a unique id of the observation User defined Time Decay: Asynchronous arrival: It is possible t i > t j, while i<j Duplicates: id i = id j is possible –Assume: if id i = id j, then v i = v j, t i =t j
7
7/30 Contribution First mergable sketch combines the following: Logarithmic space of the universe size Guaranteed accuracy Any time decay modelSum Asynchronous arrivalQuantile Duplicate insensitiveFrequent elements Data aggregation under any multi-path routing protocol
8
8/30 Related Work Any time decay model Asynchronous arrival Duplicate insensitive SumQuantileFrequent Elements 1 √√ 2 √√ 3 √√ 4 √√ Our work √√√√√√ 1.S. Nath, P. B. Gibbons, S. Seshan and Z. R. Anderson, “Synopsis diffusion for robust aggregation in sensor networks”, SenSys 2004 2.J. Considine, F. Li, G. Kollios and J. Byers, “Approximate Aggregation Techniques for Sensor Databases”, ICDE 2004 3.E. Cohen and M. Strauss, “Maintaining time-decaying stream aggregates”, PODS 2003; Journal of Algorithm 2006 4.S. Tirthapura, B. Xu and C. Busch, “Sketching Asynchronous Streams Over Sliding Windows”, PODC 2006
9
9/30 Outline Problem: Time decayed sum of distinct elements over an asynchronous stream. Focus on Integral decay model: is always an integer
10
10/30 Estimate of the Sum (on One Sensor) Given: –Stream: R = (v 0,t 0,id 0 ),…, (v n,t n,id n ), … –User defined decay function: f() Maintain: –c: current time –D: set of distinct elements in R
11
11/30 Estimate of the Sum (cont’d) Linear space lower bound on duplicate-insensitive sum (Alon, Matias and Szegedy, STOC 1996) –Deterministic approximate algorithm –Randomized algorithm giving accurate result Goal: Continuously maintain an ( , )-estimate of: –User inputs: –D: set of distinct elements in R An ( , )- estimate for X is a random variable Y, such that Pr[|Y-X| > X] < .
12
12/30 Algorithm for Sum (High Level Picture) Sum v 1 =4 v 2 =8 + SampleRate = p Count the number of selected integers Multiply by 1/p √√√√ + Count Random Sampling
13
13/30 Duplicate Detection √ Copy 1 √√ Copy 2 Hash Function Random Sampling Select x
14
14/30 Intuition - I Sample sample rate By Chebyshev inequality, for an ε-approximation of the count with constant probability: (v,t,id)
15
15/30 Intuition - II t t+ Sample rate ?
16
16/30 SIZE ?? p 1 = 1/2 p 0 = 1 p 2 = 1/4 SampleRate p j Maintain Multiple Samples
17
17/30 Faster Sampling RangeSample (Pavan & Tirthapura, SICOMP 2007) –Efficiently compute the number of selected integers √√√ SIZE ?? p 1 = 1/2 p 0 = 1 p 2 = 1/4 SampleRate p j p 1 = 1/2 p 0 = 1 p 2 = 1/4
18
18/30 At time: t At time: t + e=(v, t, id) = Expiry Time Expiry Time √√√ At time: t At time: t + expiry time Binary search over [t, t max ] using RangeSample √√√
19
19/30 t0t0 t1t1 t2t2 1/4 1/8 p=1 1/2 Level 0 Level 1 Level 2 Largest expiry time of all the elements discarded from the sample Sample 0 Sketch Sketch Structure
20
20/30 (e 1,22) (e 1,19) 1/4 p=1 1/2 Level 0 Level 1 Level 2 current time 17 data: (v, t, id) e 1 (22, 16, 6) Expiry 0 22 Expiry 1 19 Expiry 2 17
21
21/30 (e 3,21)(e 2,23)(e 1,22) (e 2,21)(e 1,19) 1/4 p=1 1/2 Level 0 Level 1 Level 2 current time 17 18 data: (v, t, id) e 1 (22, 16, 6) e 2 (32, 17, 9) e 3 (7, 16, 11) Expiry 0 22 2321 Expiry 1 19 2116 Expiry 2 17 1816
22
22/30 (e 4,23)(e 2,23)(e 1,22) (e 4,21) (e 2,21)(e 1,19) 1/4 p=1 1/2 Level 0 Level 1 Level 2 current time 1718 20 data: (v, t, id) e 1 (22, 16, 6) e 2 (32, 17, 9) e 3 (7, 16, 11) e 4 (21, 18, 8) Expiry 0 222321 23 Expiry 1 192116 21 Expiry 2 171816 20 (e 3,21) Discard the element with smallest expiry time
23
23/30 (e 4,23)(e 2,23)(e 1,22)t 0 = 21 (e 4,21) (e 2,21)(e 1,19) 1/4 p=1 1/2 Level 0 Level 1 Level 2 current time 1718 20 data: (v, t, id) e 1 (22, 16, 6) e 2 (32, 17, 9) e 3 (7, 16, 11) e 4 (21, 18, 8) Expiry 0 22232123 Expiry 1 19211621 Expiry 2 17181620
24
24/30 (e 4,23)(e 2,23)(e 1,22)t 0 = 21 (e 4,21) (e 2,21)(e 1,19) 1/4 p=1 1/2 Level 0 Level 1 Level 2 current time 1718 20 data: (v, t, id) e 1 (22, 16, 6) e 2 (32, 17, 9) e 3 (7, 16, 11) e 4 (21, 18, 8) e 5 (32, 17, 9) Expiry 0 22232123 Expiry 1 19211621 Expiry 2 1718162018 Duplicate
25
25/30 Answer a Query for the Decayed Sum Current time = 20 t 0 = 21 Level 0 Level 1 Level 2 Level used to answer the query e2e4e2e4 √ √ (e 4,23)(e 2,23)(e 1,22) (e 4,21) (e 2,21)(e 1,19) 1/4 p=1 1/2
26
26/30 Over the Whole Sensor N/W (e 3,13)(e 2,9)(e 1,6) (e 3,13)(e 5,10)(e 4,6) Each sample keeps 3 distinct items with largest expiry time. union (e 3,13)(e 5,10)(e 2,9) union Sketch 1 Sketch 2 Result of merging sketch 1&2
27
27/30 Algorithm Complexity Space complexity: Time complexity –expected time for processing one item –Time for answering a query –Time for merging two sketches
28
28/30 Conclusion First sketch combines the following Logarithmic space of the universe size Guaranteed accuracy Any time decay modelSum Asynchronous arrivalQuantile Duplicate insensitiveFrequent elements Data aggregation under any multi-path routing protocol
29
29/30 Ongoing and Future Work Implementation –Observed results better than theoretical predictions Better duplicate insensitive sketches for specific decay models? Other aggregates, such as Variance, clustering?
30
30/30 THANKS
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.