Download presentation
Presentation is loading. Please wait.
Published byJeffrey Morrison Modified over 8 years ago
1
A Unified Approach to Ranking in Probabilistic Databases Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA VLDB 2009 2010. 2. 19 Presented by JongHeum Yeon
2
Probabilistic Databases Motivation Increasing amounts of uncertain data Information retrieval, data integration and cleaning, text analytics, social network analysis Probabilistic Databases Annotate tuples with existence probabilistics, and/or attribute values with probability distribution Interpretation according to the “possible worlds semantics” 2Copyright© 2010 by CEBTCenter for E-Business Technology
3
Possible World Semantics IDScoreProb t1t1 2000.2 t2t2 1500.8 t3t3 1000.4 3Copyright© 2010 by CEBTCenter for E-Business Technology IDScore t1t1 200 t2t2 150 t3t3 100 IDScore t1t1 200 t2t2 150 IDScore t2t2 150 t3t3 100 pw1 pw2 pw3 w.p. 0.064 A probabilistic table (assume tuple-independence) w.p. 0.096 w.p. 0.256
4
Possible World Semantics IDScoreProb t1t1 2000.2 t2t2 1500.8 t3t3 1000.4 4Copyright© 2010 by CEBTCenter for E-Business Technology IDScore t1t1 200 t2t2 150 t3t3 100 IDScore t1t1 200 t2t2 150 IDScore t2t2 150 t3t3 100 pw1 pw2 pw3 w.p. 0.064 A probabilistic table (assume tuple-independence) w.p. 0.096 w.p. 0.256 Score values are used to rank the tuples in every pw The top-1 answer for each possible world
5
Probabilistic Databases (cont’d) 5Copyright© 2010 by CEBTCenter for E-Business Technology
6
Top-k Queries Multi-criteria optimization problem [score=100, Pr(t 1 )=0.5] vs. [score=50, pr(t 2 )=1.0] Many Prior Proposals Return k tuples t with the highest score(t)Pr(t) [exp. score] Return the most probable top k-answer [U-top-k] – [Solimanet al. ’07] At rank i, return tuple with max. prob. Of being at rank i [U-rank-k] – [Solimanet al. ’07] Return k tuples t with the largest Pr(r(t)≤k) values [PT-k/GT-k] – [Huaet al. ’08] [Zhang et al. ’08] Return k tuples t with smallest expected rank:∑ pw Pr(pw)r pw (t) – [Cormodeet al. ’09] 6Copyright© 2010 by CEBTCenter for E-Business Technology
7
Top-k Queries (cont’d) Which one should we use? Comparing different ranking functions Normalized Kendall Distance between two top-k answers: – Penalizes #reversals and #mismatches – Lies in [0,1], 0: Same answers; 1: Disjoint answers 7Copyright© 2010 by CEBTCenter for E-Business Technology
8
Unified Approach Define two parameterized ranking functions: PRF w ; PRF e that can simulate or approximate a variety of ranking functions PRF e much more efficient to evaluate (than PRF w ) 8Copyright© 2010 by CEBTCenter for E-Business Technology
9
Outline Parameterized Ranking Functions Computing PRF Independent Tuples Computing PRF e (α) x-Tuples Summery of Other Results Approximating Ranking Functions Experiments 9Copyright© 2010 by CEBTCenter for E-Business Technology
10
Parameterized Ranking Function Weight Function Parameterized Ranking Function (PRF) Return k tuples with the highest values. 10Copyright© 2010 by CEBTCenter for E-Business Technology
11
Parameterized Ranking Function (cont’d) ω(t,i)= 1 : Rank the tuples by probabilities ω(t,i)=score(t): E-Score PRF e (α): ω(i)= α i where α can be a real or a complex number PRF ω (h): Generalizes PT/GT-k and U-Rank 11Copyright© 2010 by CEBTCenter for E-Business Technology
12
Computing PRF: Independent Tuples 12Copyright© 2010 by CEBTCenter for E-Business Technology
13
Computing PRF: Independent Tuples (cont’d) 13Copyright© 2010 by CEBTCenter for E-Business Technology
14
Computing PRF: Independent Tuples (cont’d) 14Copyright© 2010 by CEBTCenter for E-Business Technology
15
Computing PRF e (α): Independent Tuples 15Copyright© 2010 by CEBTCenter for E-Business Technology
16
Summary of Other Results PRF computation for probabilistic and/xor trees Generalizes x-tuplesby allowing both mutual exclusivity and coexistence PRF computation : O(n3) PRF e computation : O(nlogn+nd) (d = height of the tree) PRF computation on graphical models A polynomial time algorithm when the junction tree has bounded treewidth A nontrivial dynamic program combined with the generating function method. 16Copyright© 2010 by CEBTCenter for E-Business Technology
17
Approximating Ranking Functions 17Copyright© 2010 by CEBTCenter for E-Business Technology
18
Experiments: Approximation Ability 18Copyright© 2010 by CEBTCenter for E-Business Technology
19
Experiments: Execution Time 19Copyright© 2010 by CEBTCenter for E-Business Technology
20
Conclusion Proposed a unifying framework for ranking over probabilistic databases through: Parameterized ranking functions Incorporation of user feedback Designed highly efficient algorithms for computing PRF and PRF e Developed novel approximation techniques for approximating PRF w with PRF e 20Copyright© 2010 by CEBTCenter for E-Business Technology
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.