Probabilistic Data Management

Probabilistic Data Management
Chapter 8: Probabilistic Query Answering (6)

Objectives In this chapter, you will:
Explore the definitions of more probabilistic query types Probabilistic top-k query

Recall: Probabilistic Query Types
Probabilistic Spatial Query Uncertain/probabilistic database Probabilistic range query Probabilistic k-nearest neighbor query Probabilistic group nearest neighbor (PGNN) query Probabilistic reverse k-nearest neighbor query Probabilistic spatial join /similarity join Probabilistic top-k query (or ranked query) Probabilistic skyline query Probabilistic reverse skyline query Probabilistic Preference Query 3 3

Motivation Example In a coal mine surveillance application, a number of sensors are deployed to detect density of gas, temperature, and so on Assume we have a preference function f(O) = O.temp + O.den Top-k query: Retrieve k sensors with the highest scores (most dangerous) 4 4

Motivation Example (cont'd)
Sensor data usually contain noises The reported data can be modeled as uncertain objects Obtain top-k query answers over uncertain data with high confidence actual data actual data 5 5

Background of Probabilistic Top-k Query
Under possible worlds semantics Each tuple t is associated with a score t.score Each tuple t is associated with an existence probability t.prob possible worlds query answer in 6

Different Semantics of Probabilistic Top-k Query
Top-k query in probabilistic databases Consider each possible world from which top-k answers are retrieved Aggregate the top-k answers (weighted by the probabilities of possible worlds) Aggregation Semantics Uncertain Top-k (U-Topk) [Soliman et al., ICDE 2007] Uncertain Rank-k (U-kRank) [Soliman et al., ICDE 2007] Probabilistic Threshold Top-k (PT(h)) [Hua et al., SIGMOD 2008] Expected Ranks (Exp-Rank) [Cormode et al., ICDE 2009] Expected Score (E-Score) [Cormode et al., ICDE 2009] 7

Uncertain Top-k (U-Topk) [Soliman et al., ICDE 2007]
group by top-k answer vectors top-k answer vector Find one top-k answer vector that appears in possible worlds with the highest probability top-k answer vector … … … … … … … … probabilistic database top-k answer vector U-Topk answers possible worlds 8

Example of U-Topk Given the Uncertain Database and k=2
Tuple Score P(t) t1 100 0.4 t2 85 0.5 t3 70 1 t4 60 Rules R1 { t1 } R2 { t2, t4 } R3 { t3 } Pr[{ t1, t2 }] = 0.2 Pr[{ t1, t3 }] = 0.2 Pr[{ t2, t3 }] = 0.3 Pr[{ t3, t4 }] = 0.3 Final Result: {t2, t3} or {t3, t4} Possible World (W) Pr(W) { t1, t2, t3 } P(t1)P(t2)P(t3) = 0.2 { t1, t3, t4 } P(t1)P(t3)P(t4) = 0.2 { t2, t3 } (1-P(t1))P(t2)P(t3) = 0.3 { t3, t4 } (1-P(t1))P(t3)P(t4) = 0.3 9 9

Uncertain Rank-k (U-kRanks) [Soliman et al., ICDE 2007]
For some j  [1, k], group by tuples with the j-th rank tuple with the j-th rank For each j [1, k], find one tuple that has the j-th rank in possible worlds with the highest probability tuple with the j-th rank … … … … … … … … probabilistic database tuple with the j-th rank U-kRank answers possible worlds 10

Example of U-kRanks Given the Uncertain Database and k=2
Tuple Score P(t) t1 100 0.4 t2 85 0.5 t3 70 1 t4 60 Rules R1 { t1 } R2 { t2, t4 } R3 { t3 } At rank i = 1: Pr[t1] = 0.4 Pr[t2] = 0.3 Pr[t3] = 0.3 At rank i = 2: Pr[t2] = 0.2 Pr[t3] = 0.5 Pr[t4] = 0.3 Final Result: {t1, t3} Possible World (W) Pr(W) { t1, t2, t3 } P(t1)P(t2)P(t3) = 0.2 { t1, t3, t4 } P(t1)P(t3)P(t4) = 0.2 { t2, t3 } (1-P(t1))P(t2)P(t3) = 0.3 { t3, t4 } (1-P(t1))P(t3)P(t4) = 0.3 11 11

Probabilistic Threshold Top-k (PT(h)) [Hua et al., SIGMOD 2008]
group by tuples in top-h answer sets top-h answer set Find k tuples that are in top-h answer sets of possible worlds with the highest probabilities top-h answer set … … … … … … … … probabilistic database top-h answer set PT(h) answers possible worlds 12

Example of PT-k Given the Uncertain Database, k=2, Threshold=0.5
Tuple Score P(t) t1 100 0.4 t2 85 0.5 t3 70 1 t4 60 Rules R1 { t1 } R2 { t2, t4 } R3 { t3 } Pr[t1] = 0.4 Pr[t2] = 0.5 Pr[t3] = 0.8 Pr[t4] = 0.3 Threshold=0.5 Final Result: {t2, t3} Possible World (W) Pr(W) { t1, t2, t3 } P(t1)P(t2)P(t3) = 0.2 { t1, t3, t4 } P(t1)P(t3)P(t4) = 0.2 { t2, t3 } (1-P(t1))P(t2)P(t3) = 0.3 { t3, t4 } (1-P(t1))P(t3)P(t4) = 0.3 13 13

Expected Ranks (Exp-Rank) [Cormode et al., ICDE 2009]
expected rank of t1: pw rpw(t1)Pr(pw) t1 t2 … … … … … … Find k tuples with the highest expected ranks … … … … … … probabilistic database … … alternatives possible worlds 14

Expected Score (E-Score) [Cormode et al., ICDE 2009]
expected score of t1: pw score(t1)Pr(pw) t1 t2 … … … … … … Find k tuples with the highest expected scores … … … … … … probabilistic database … … alternatives possible worlds 15

Example of Expected Ranks
If a tuple doesn’t appear in a world, its rank is considered to be the last one Given the Uncertain Database and k=2 Tuple Score P(t) t1 100 0.4 t2 85 0.5 t3 70 1 t4 60 Rules R1 { t1 } R2 { t2, t4 } R3 { t3 } E[R(t1)] = 1×0.2+ 1×0.2+3×0.3+3× 0.3= 2.2 E[R(t2)] = 2.4 E[R(t3)] = 1.9 E[R(t4)] = 2.9 Final Result: {t3, t1} Possible World (W) Pr(W) { t1, t2, t3 } P(t1)P(t2)P(t3) = 0.2 { t1, t3, t4 } P(t1)P(t3)P(t4) = 0.2 { t2, t3 } (1-P(t1))P(t2)P(t3) = 0.3 { t3, t4 } (1-P(t1))P(t3)P(t4) = 0.3 16 16

Unified Ranking Functions
Parameterized Ranking Function (PRF) A probabilistic top-k query returns k tuples with the highest |gw| values weighted function Li, J., Deshpande, A. A Unified Approach to Ranking in Probabilistic Databases. In VLDB, 2009. 17

Unified Ranking Functions (cont'd)
When w(t, i) = 1, the result is the set of k tuples with the highest probability When w(t, i) = score(t), E-Score When , PT(h) When , U-Rank PRF cannot simulate U-Topk 18

Unified Ranking Functions (cont'd)
Two new semantics PRFw(h) and PRFe(h) PRFw(h): w(t, i) = wi for i  h, and w(t, i) = 0 for i > h PRFe(h): w(t, i) = a i, where a can be a real/complex number 19

Ranking Algorithms Assuming tuple independence
Compute the probability that a tuple ti has the j-th rank Observation: the coefficient cj of xj in a function, Fi(x), is exactly the probability that ti is at rank j 20

Example Consider the rank of a tuple t3, .4x
Incremental computation of Fi(x): Consider the rank of a tuple t3, .4x 21

Ranking Algorithms (cont'd)
Assuming correlated database represented by and/xor tree Generating functions on the and/xor tree Observation: the coefficient cj of the term xj-1y is Pr(r(ti) = j) 22

Summary Probabilistic top-k query
Different semantics w.r.t. ranks and probabilities in possible worlds A unified approach 23

Probabilistic Data Management

Similar presentations

Presentation on theme: "Probabilistic Data Management"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Data Management

Similar presentations

Presentation on theme: "Probabilistic Data Management"— Presentation transcript:

Similar presentations

About project

Feedback