Download presentation
Presentation is loading. Please wait.
1
Top-k Queries on Uncertain Data: On score Distribution and Typical Answers Presented by Qian Wan, HKUST Based on [1][2]
2
Introduction: Uncertain Data Management Modeling Uncertain Data Possible Worlds Model Uncertain data management Top-k, Join, kNN, Skyline, Indexing, etc. Uncertain Data Mining Clustering, Classification, Frequent Pattern, Outlier Detection
3
Introduction: Data Representation A simple way to representing probabilistic data Each tuple has a confidence Pr(instance)= ∏ Pr(attendance) x ∏ Pr(absence) Mutual Exclusion Constraints for each tuple* Scoring function*
4
Introduction: Other Works K tuples that co-exist in a possible world U-Topk Returning tuples according to marginal distribution of top-k results U-kRanks and PT-k
5
Introduction: Other Works (Example)
6
Introduction: Other Works (drawback) The top-k result may be atypical The distribution of scores is not used
7
Introduction: c-Typical-Top k 3-Typical-Top 2 scores of this example is {118, 183, 235} Expected distance is 6.6 The vectors are {(t2, t6), (T7,T6), (T7,T3)}
8
Algorithm Distribution of top-2 tuples’ scores
9
Algorithm – Naïve approach INPUT: tuples with membership probabilities OUTPUT: Top-k scores distribution IDEA: recursively go through all possible worlds to calculate all probabilities, until reaching a threshold
10
Algorithm – a DP approach D(i,j): score distribution of top-j starting at Ti. The main problem is D(1,k) (?)
11
Algorithm – a DP approach Transformation: D(i,j) = TF[D(i+1,j),D(i+1,j-1)] D(i+1,j): For each (v,p) add (v, p(1-pi)) D(i+1,j-1): For each (v,p) add (v+si, p*pi) Merge duplicate items Bottom up DP Approximation
12
Handling More Real Scenarios Handling Mutually Exclusive Rules Compress the ME group Refine by lead tuple region Handling Ties When two tuples have the same score, rank them according to probability
13
Algorithm 3-Typical-Top 2 scores
14
c-Typical-Top k 3-Typical-Top 2 scores of this example is {118, 183, 235} Expected distance is 6.6 The vectors are {(t2, t6), (T7,T6), (T7,T3)}
15
Computing c-Typical-Top k Define F^a(j) to be the optimal objective over {sj, …, sn} where a is the number of typical scores. G^a(j) means the same
16
Computing c-Typical-Top k Just solve the two function optimization problem, using DP Boundary conditions
17
Empirical Study 3 -Typical VS U-Topk
18
Empirical Study
20
Q&A
21
Reference [1] Charu C. Aggarwal, Philip S. Yu “A Survey of Uncertain Data Algorithms and Applications”, IEEE Transactions on Knowledge and Data Engineering, 2009 [2] Tingjian Ge, Stan Zdonik, Samuel Madden. Top-k Queries on Uncertain Data: On Score Distribution and Typical Answers. SIGMOD, 2009
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.