Download presentation
Presentation is loading. Please wait.
Published byAnissa Mole Modified over 9 years ago
1
指導教授:陳良弼 老師 報告者:鄧雅文 97753034
2
Introduction Related Work Problem Formulation Future Work
3
Top-k query on certain data ◦ Rank results according to a user-defined score ◦ Important for explore large databases ◦ E.g., top-2 = {T 1, T 2 } TIDPIDScore T1T1 A100 T2T2 B90 T3T3 C80 T4T4 D70
4
Uncertain database ◦ How to define top-k on uncertain data? ◦ Mutually exclusive rules E.g., T 1 ♁ T 4 TIDPIDScorePr. T1T1 A1000.2 T2T2 B900.9 T3T3 C800.6 T4T4 A700.8 …………
5
C. C. Aggarwal and P. S. Yu. A Survey of Uncertain Data Algorithms and Applications. In TKDE, 2009. ◦ Causes: Sensor networks, privacy, trajectories prediction… ◦ The main areas of research on the uncertain data: Modeling of uncertain data Uncertain data management Top-k query, range query, NN query… Uncertain data mining Clustering, classification, frequent pattern, outliers…
6
M. Soliman, I. Ilyas, and K. Chang. Top-k Query Processing in Uncertain Databases. In ICDE, 2007. ◦ Possible Worlds
7
◦ U-Topk query Return k tuples that can co-exist in a possible world with the highest probability E.g., {T 1, T 2 } as U-Top2 ◦ U-kRanks query Return k tuples each of which is a clear winner in its rank over all possible worlds E.g., {T 2, T 6 } as U-2Ranks
8
s 1,1 = {t1} p = 0.4 U-Topk s 2,2 = {t1, t2} p = 0.28 s 1,2 = {t2} p = 0.42 s 2,3 = {t2, t5} p = 0.252 s 0,1 = {} p = 0.6 s 0,2 = {} p = 0.18 s 1,3 = {t2} p = 0.168 s 1,2 = {t1} p = 0.12 s 0,0 = {} p = 1 1 t1: 0.4 2 t2: 0.7 3 t5: 0.6 Storage Layer buffer: probability priority queue Complete! return {t1, t2} as top-2 Find U-Top2 query answer.
9
U-kRanks i=1i=2 {} 1 {} 0.6 {} 0.18 Find U-2Ranks query answer. answer: ubound: 11 Storage Layer Report: t1: 0.4 {t1} 0.4 P t1,1 = 0.4 t1 0.4 0.6 t2: 0.7 {t2} 0.42 0.18 P t2,1 = 0.42 t2 0.42 top1: t2(0.42) top1top2 {t1} 0.12 {t1, t2} 0.28 0.54 P t2,2 = 0.28 t2 0.28 t5: 0.6 {} 0.072 {t5} 0.108 {t1} 0.048 {t1, t5} 0.072 {t2} 0.168 {t2, t5} 0.252 0.324 P t5,2 = 0.324 t5 0.324 t6: 1 {} 0 {t6} 0.072 {t1} 0 {t2} 0 {t5} 0 {t1, t6} 0.048 {t2, t6} 0.168 {t5, t6} 0.108 0.072 P t6,2 = 0.324 t6 0.324 top2: t6(0.324)
10
M. Hua, J. Pei, W. Zhang, X. Lin. Ranking Queries on Uncertain Data: A Probabilistic Threshold Approach. In SIGMOD, 2008. ◦ PT-k query Return a set of all tuples whose top-k probability values are at least p E.g., {T 1, T 2, T 5 } as PT-2 (with p=0.4)
11
C. Jin, K. Yi, L. Chen, J. Yu, X. Lin. Sliding- Window Top-k Queries on Uncertain Streams. In VLDB, 2008. ◦ Applicable to those definitions of top-k above ◦ Maintain compact sets A compact set of the window guarantees that tuples not in this compact set would not be the top-k answer of this window ◦ Both time- and space-efficient
12
T. Ge, S. Zdonik, and S. Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. In SIGMOD, 2009. ◦ The tradeoff between reporting high-scoring tuples and tuples with a high probability of being in the top-k ◦ Return a number of typical vectors that efficiently sample the distribution of all potential top-k tuple vectors
13
Example: ◦ In an International Tenpin Bowling Championship, the events include single, double, and trio. Due to the budget, the coach can only choose 3 players to attend. Therefore, we hope these 3 players can have relatively high probability to perform well over these 3 types of events.
14
◦ U-Top3={T 2, T 5, T 6 } ◦ But U-Top2={T 1, T 2 }, U-Top1={T 1 } ◦ How about also considering {T 1, T 2, T 5 } as top-3? TIDPlayerPr. T1T1 A0.4100 T2T2 D0.6200 T3T3 B0.1400 T4T4 C0.3400 T5T5 C0.6600 T6T6 B0.8600 T7T7 D0.3800 T8T8 A0.5900 Possible WorldPr.Possible WorldPr. PW1T1, T2, T3, T40.0121PW9T2, T3, T4, T80.0174 PW2T1, T2, T3, T50.0235PW10T2, T3, T5, T80.0338 PW3T1, T2, T4, T60.0743PW11T2, T4, T6, T80.1070 PW4T1, T2, T5, T60.1443PW12T2, T5, T6, T80.2076 PW5T1, T3, T4, T70.0074PW13T3, T4, T7, T80.0107 PW6T1, T3, T5, T70.0144PW14T3, T5, T7, T80.0207 PW7T1, T4, T6, T70.0456PW15T4, T6, T7, T80.0656 PW8T1, T5, T6, T70.0884PW16T5, T6, T7, T80.1273
15
We choose the answers of a top-k query not only depending on the probability (P) but also on the confidence (C). ◦ Confidence: to express the top-(k-1) probabilities of the sets formed by k-1 tuples of this possible top-k answer E.g., k=3 {T1, T2, T3} as a possible top-k with P=0.0356 C is composed in some way of Pr({T1, T2}) to be top-2=0.2542 and its confidence, Pr({T1, T3}) to be top-2=0.0218 and its confidence, Pr({T2, T3}) to be top-2=0.0512 and its confidence
16
Since every possible top-k answer has two features—probability (P) and confidence (C), we only return those non-dominated ones as a result set. ◦ E.g., {T 1, T 3, T 5 }: P=0.8, C=0.4 {T 1, T 4, T 7 }: P=0.5, C=0.7 {T 2, T 6, T 7 }: P=0.3, C=0.2 this will not be returned
17
Formulate the confidence function Find an algorithm to generate the result set Try to calculate the confidence in an efficient way Carry out an empirical study on datasets
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.