Download presentation
Presentation is loading. Please wait.
Published byCheyanne Seckler Modified over 10 years ago
1
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei Wang (UNSW & NICTA) Jeffrey Xu Yu (CUHK)
2
Outline Background Framework Algorithms Experiment Conclusion 2
3
Background Elements continuously arrive with occurrence probabilities Problem : How to continuously compute skylines in a sliding window with size N (elements) ? 1 1 2 2 3 3 5 5 4 4 0.1 0.4 0.1 0.8 6 6 0.5 1 Sliding window: N = 5 3
4
Background Multi-criteria decision making regarding uncertain data: Online auction Financial market … … 4
5
Related work Probabilistic skyline computation Uncertain stream processing Probabilistic skyline (VLDB07) Probabilistic reverse skyline (SIGMOD08) Probabilistic aggregates and sketches over uncertain streams (SIGMOD07, SODA07, PODS07) Frequent items on uncertain streams (SIGMOD08) Top-k queries over uncertain sliding window (VLDB08) … … 5
6
Models and Problem Definition Model: DS is a stream of elements, each element a is in a d-dimensional space and with an occurrence probability P(a) ( in (0, 1]) The skyline probability of an element a is: Problem Definition: retrieving elements from the most recent N elements, with skyline probability no less than a given threshold q 6
7
Challenges and Contributions Space efficiency: Contribution: Space reduction: O(N) to O(ln d-1 N) Time efficiency Contribution: R-tree based efficient incremental algorithms 7
8
Outline Background and Preliminaries Framework Algorithms Experiment Conclusion 8
9
Framework: what to keep ? 1 1 2 2 3 3 5 5 4 4 0.1 0.4 0.1 0.8 P new (2) < q, element 2 will never become skyline in the window window size N : 5 probability threshold: 0.5 P old (2) = 1 – P(1) 9 P new (2) = (1 – P(3)) * (1 – P(4))
10
Framework: what to keep ? Candidate set S N,q : Correctness: (1) no missing skyline points (2) no false hits to determine S N, q (3) no false positive to determine skyline results (4) no false negative to determine skyline results --- probability based on S N,q may not be accurate, but satisfies the threshold requirement. 10
11
Framework Space required for S N,q : S N,q is the minimum information to be maintained to get a correct answer. 1 1 4 4 2 2 0.3 0.8 0.4 3 3 0.9 window size N : 4 probability threshold q: 0.5 11 P sky (3) = 0.9 * (1 – 0.4) * (1- 0.3) < q 1 2 P sky (3) = 0.9 > q
12
Space of Candidate Set Theorem: Candidate Set requires a poly-logarithmic space on average case regarding uniform distributions, O(f(q)ln d-1 N). 12
13
Outline Background and Preliminaries Framework Algorithms Experiment Conclusion 13
14
Algorithms We maintain two R-trees R1: SKY N,q --- skylines R2: S N,q - SKY N,q --- candidates – skylines 14
15
Algorithms 1 (.1) 2 (.1) 3 (.4) 4 (.1) 5 (.8) 6 (.8) 7 (.6) 8 (.2) 9 (.5) 10 (.2) 11 (.6) 12 (.1) 13 (.1) window size N : 13 probability threshold q: 0.2 15 not in S N,q R1: SKY N,q R2: S N,q – SKY N,q
16
Algorithms New element arrives Check P sky & P new on R1 Check P new on R2 Handling elements with P new < q Old element expires Update P old Check P sky on R2 16
17
Algorithms: new elements arrives 2(.1) 3(.4) 4(.1) 5(.8) 6(.8) 7(.6) 8(.2) 9(.5) 10(.2) 11(.6) 12(.1) 13(.1) R1: SKY N,q R2: S N,q - SKY N,q window size N : 13 probability threshold q: 0.2 14(0.8) Before update: P new : (1, 1) P sky : (0.8, 0.8) global P new = 1 – 0.2 After update: global P new *= 1- 0.8 Delete from R1 17 Delete an Entry:
18
Algorithms: new elements arrives 2(.1) 3(.4) 4(.1) 7(.6) 8(.2) 9(.5) 10(.2) 11(.6) 12(.1) 13(.1) R1: SKY N,q R2: S N,q - SKY N,q window size N : 13 probability threshold q: 0.2 14(0.8) Before update: P new : (1, 1) P sky : (0.24, 0.6) global P new = 1 After update: global P new *= 1 – 0.8 min P new = 0.2 ≥ q max P sky = 0.12 < q Move from R1 to R2 18 Move an Entry from R1 to R2:
19
Algorithms: new elements arrives 2(.1) 3(.4) 4(.1) 7(.6) 8(.2) 9(.5) 10(.2) 11(.6) 12(.1) 13(.1) R1: SKY N,q R2: S N,q - SKY N,q window size N : 13 probability threshold q: 0.2 14(0.8) Before update: P new : (0.9, 1) global P new = 1 After update: global P new *= 1 – 0.8 min P new < q; max P new ≥ q Drill down and delete 2 19
20
Algorithms: new elements arrives 2(.1) 3(.4) 4(.1) 7(.6) 8(.2) 9(.5) 10(.2) 11(.6) 12(.1) 13(.1) R1: SKY N,q window size N : 13 probability threshold q: 0.2 14(0.8) R2: S N,q - SKY N,q Update P old of 12 & 13 global P old /= (1 – 0.1) 20 Update P old :
21
Algorithms: new elements arrives 3(.4) 4(.1) 7(.6) 8(.2) 9(.5) 10(.2) 11(.6) 12(.1) 13(.1) R1: SKY N,q window size N : 13 probability threshold q: 0.2 14(0.8) R2: S N,q - SKY N,q Insert new element: P new = 1. compute P sky 21
22
Algorithm: old element expires Delete it from R1 or R2. Update P old of remaining elements: Record global P old on intermediate entries fully dominated by it Check P sky after update 22
23
Algorithms: old element expires 3(.4) 4(.1) 7(.6) 8(.2) 9(.5) 10(.2) 11(.6) 12(.1) 13(.1) R1: SKY N,q R2: SKY N,q window size N : 13 probability threshold q: 0.2 14(0.8) P old (7) /= 1 – P(3) global P old /= 1 – P(4) 23
24
Algorithms: handling multiple thresholds Continuous queries Users specify k probability thresholds q 1, …, q k. (q i < q i-1 ) Solution: instead of maintaining R1, we maintain R 1, …, R k, each corresponding to a confidence value. Ad-hoc queries Users issue a query: retrieve skylines with probability at least q’ (q’ ≥ q k ) Solution: find an R i with q i ≤ q’ < q i-1. Then all elements in {R j : j < i -1} are results. We search R i-1 to output qualified skylines 24
25
Experiment Data set: Real: stock transactions. 2-d. probability assigned randomly. Size: 2 million Synthetic: spatial location (independent or anti- correlated); probability (uniform or normal); 2d to 5d; 2 million Default values: p : 0.3; d: 3; N : 1M; spatial distribution: anti-correlated; probability: uniform; 25
26
Experiment: space 0.1% to the sliding window size for 2-d data; save around 89% space even for 5-d data. 26
27
Experiment: space Size of S N,q deceases with the increase of P u, while size of SKY N,q increases with it. 27
28
Experiment: space 28
29
Experiment: time 29
30
Experiment: time Maintenance time increases with # probability thresholds; query time deceases with it. 30
31
Conclusion We characterize a candidate set with minimum size and propose time efficient techniques. We extend the framework to handle multiple thresholds. 31
32
Thanks ! 32
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.