Presentation is loading. Please wait.

Presentation is loading. Please wait.

LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases.

Similar presentations


Presentation on theme: "LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases."— Presentation transcript:

1 LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases Matthias Renz Ludwig-Maximilians-Universität München Munich, Germany www.dbs.ifi.lmu.de Dagstuhl Seminar 2008 Uncertainty Management in Information Systems

2 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 2 Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

3 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 3 Introduction modern database applications involve data. often vague and imprecise attributes –sensor data, e.g. traffic monitoring –feature extraction, e.g. person identification  probabilistic databases spatial,temporal andmultimedia

4 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 4 Introduction types of probabilistic databases –relational uncertainty representation tuples with confidence e.g. x-relation model (Trio system) –uncertainty in feature spaces uncertain vectors representations: –continuous, discrete (point objects) –spatially uncertainty representation uncertain spatially extended objects x y IDNAMECONF p1john0.6 p2fred0.3 p3mary0.7 ………

5 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 5 Introduction types of probabilistic databases –relational uncertainty representation tuples with confidence e.g. x-relation model (Trio system) –uncertainty in feature spaces uncertain vectors representations: –continuous, discrete (point objects) –spatially uncertainty representation uncertain spatially extended objects x y IDNAMECONF p1john0.6 p2fred0.3 p3mary0.7 ………

6 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 6 Introduction Probabilistic Similarity Queries –given: database with uncertain vectors (uncertain) query object Q –queries: 1 2 3  Q Q Q  -range query k-NN query ranking query

7 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 7 Introduction Probabilistic Similarity Queries –given: two databases DB A and DB B with uncertain vectors –queries: –challenges: uncertain similarity distances, uncertain query results join query

8 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 8 Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

9 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 9 Modelling Uncertainty in Feature Spaces Uncertain Vector Data –vector data in d-dimensional space  d –objects are represented by multiple d-dimensional vectors that are mutually exclusive a confidence value is assigned to each vector –types of uncertain object representations x y pdf (continuous) x y vector samples (discrete)

10 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 10 Probabilistic Similarity Queries Example: Probabilistic  -Range Query –query object and set of uncertain objects (discrete) q i = {q 1,…,q M } and o i ={o i,1,…,o i,N } –distance between q and o i : –probability that the distance between q and o i is less than  0 + :

11 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 11 Probabilistic Similarity Queries Clustered Object Representation build approximations by grouping vector points of an object into clusters object o = {o 1,..,o s }simple object approximation MBR(o) clustered object approximation MBR(C 1 (o)),.., MBR(C k (o))

12 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 12 Probabilistic Similarity Queries advantages of clustered object approximation –efficiently managed by spatial access methods e.g. R-tree, X-tree –supports multi-step query processing true hits can be reported very early reduced refinement cost –efficient computation of approximate answers PTSQ and PTopkSQ efficiently supported

13 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 13 Probabilistic Similarity Queries multi-step query processing: –probabilistic filter Estimation of probability p = P(d(o,q) ≤  ): query point q uncertain object o (clustered object representation) 0.3 0.1 0.2 0.1 0.3 ≤ P(d(o,q) ≤  ) ≤ 0.6 lower bounding prob. estimation upper bounding prob. estimation 

14 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 14 Probabilistic Similarity Queries Filter Step for PSQs: –Probabilistic  -Range Queries (PTSQ type): for each uncertain object o: –compute lower and upper bounding probabilities based on cluster representations –if lower bounding probability P low > , then report o –if upper bounding probability P upper < , then prune o –otherwise refine o (partial refinement) query point q 0.3 0.1 0.2 0.1 

15 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 15 Probabilistic Similarity Queries Filter Step for PSQs: –Probabilistic k-NN-Queries (PTSQ type) upper bounding probability that p is NN is P upper =0.7 Example: query point q 0.3 0.1 0.2 0.1 object o object p

16 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 16 Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

17 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 17 Probabilistic Similarity Ranking Ranking Queries –very important for similarity search applications –give the most relevant answers first –are more flexible than  -range and NN queries probabilistic ranking queries –results are associated with confidence values –in contrast to  -range / NN queries no unique query predicate

18 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 18 Probabilistic Similarity Ranking output of probabilistic ranking: –for each object: discrete pdf over ranking positions prob_ranked q : D  {1,..,N}→[0..1] –prob_ranked q (o,k) reports the probability that object o is exactly the k th -nearest-neighbor of the query object q probability k 13456782 910

19 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 19 Probabilistic Similarity Ranking Example: Probabilistic Ranking Output A B E C D G F H I J K L P M N O Q S T R Probability ranking coefficient k Objects Probability Table vector spaceprobabilistic ranking output A B q C D E F G H I J K L M N O P Q R S T

20 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 20 Probabilistic Similarity Ranking probabilistic ranking output is inconvenient for most users coping with probabilistic ranking: –ranking with unique order and confidences A B E C … 1. 2. 3. 4. … 0.6 0.7 0.2 0.3 … RankOIDConf.

21 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 21 Probabilistic Similarity Ranking probabilistic ranking output is inconvenient for most users coping with probabilistic ranking: –ranking with unique order and confidences –aggregate conf. values to deterministic results A B E C … 1. 2. 3. 4. … 0.6 0.7 0.2 0.3 … RankOIDConf. How should we extract the conf. from the prob. ranking output? Which ranking order?

22 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 22 Probabilistic Similarity Ranking Approaches for Ranking Objects: –Approach 1: highest confidence [Soliman ICDE’07, Yi ICDE’08] –problem: duplicates neglected objects –Example: 1. (A,0.45) | 2. (C,0.40) | 3. (C,0.45) Result: or with duplicate elimination 1. (A,0.45) | 2. (C,0.40) | 3. (B,0.35)

23 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 23 Probabilistic Similarity Ranking Approaches for Ranking Objects: –Approach 2: highest aggregated confidence object with the highest prob. that it is one of the first k objects is assigned to ranking position k. sensible with duplicate elimination –Example: 1. (A,0.45) | 2. (B,0.35) | 3. (C,0.45) Result: or 1. (A,0.45) | 2. (B,0.75) | 3. (C,1.00)

24 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 24 Probabilistic Similarity Ranking further approaches to determine the ranking order, e.g. –expected ranking position –etc. most intuitive and robust: Approach 2. problem: –full probabilistic ranking information is required required: –efficient computation of prob. ranking output

25 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 25 Probabilistic Similarity Ranking Iterative Probability Computation –ranking applied on object vectors (samples) –during the radial sweep: maintain for each object o the probability –for each accessed sample o i,j, compute the probability P(o i,j,k) that exactly (k-1) objects o  o i are within the sweep-range , for k = 1..N.  radial sweep with increasing range  ABCD PoPo 0.20.40.10.0

26 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 26 Probabilistic Similarity Ranking computation of P(o i,j,k): problem: comp. very expensive –a lot of possibilities for  i must be reconsidered 1) Approach: –pruning objects that are beyond  : reduce DB  DB‘ (|DB‘|<<|DB|)

27 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 27 Probabilistic Similarity Ranking applying only relevant objects: A (1.0)B (1.0)F (0.8)D (0.6)H (0.2)C (0.1)E (0.0)G (0.0) N‘ N  A B F D H C E G I q o i,j N‘‘

28 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 28 problem: computation still exponential 2. Approach –problem can be solved in polynomial time by means of dynamic programming technique: Probabilistic Similarity Ranking  F D H C q o i,j  F D H C q  F D H C q P(2 of 4 in  -range)P(1 of 3 in  -range) assuming C in  -range P(2 of 3 in  -range) assuming C not in  -range

29 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 29 Probabilistic Similarity Ranking problem: computation still exponential 2. Approach: –problem can be solved in polynomial time by means of dynamic programming technique: –recursive function:

30 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 30 Outline Introduction Probabilistic Similarity Queries –multi-step query processing –probabilistic  -range/k-NN queries Probabilistic Similarity Ranking –probabilistic ranking models –efficient computation of probabilistic ranking queries Summary

31 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 31 Summary approaches to accelerate probabilistic similarity queries in vector spaces assumption: –objects are mutually independent –discrete uncertainty representations support by –traditional access methods –multi-step query processing techniques very high speed-up factor using Dyn. Prog.

32 DATABASE SYSTEMS GROUP M. Renz: Probabilistic Similarity Queries in Uncertain Databases, Seminar 08421, Schloss Dagstuhl 2008 32 Discussion any questions? Thank you for your attention..


Download ppt "LUDWIG- MAXIMILIANS- UNIVERSITY MUNICH DATABASE SYSTEMS GROUP DEPARTMENT INSTITUTE FOR INFORMATICS Probabilistic Similarity Queries in Uncertain Databases."

Similar presentations


Ads by Google