Download presentation
Presentation is loading. Please wait.
1
1 Probabilistic/Uncertain Data Management Slides based on the Suciu/Dalvi SIGMOD’05 tutorial 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic databases”, VLDB’2004. 2.Das Sarma et al. “Working models for uncertain data”, ICDE’2006.
2
2 What is a Probabilistic Database ? “An item belongs to the database” is a probabilistic event –Tuple-existence uncertainty –Attribute-value uncertainty “A tuple is an answer to the query” is a probabilistic event Can be extended to all data models; we discuss only probabilistic relational data
3
3 Possible Worlds Semantics int, char(30), varchar(55), datetime Employee(name:varchar(55), dob:datetime, salary:int) Attribute domains: Relational schema: # values: 2 32, 2 120, 2 440, 2 64 # of tuples: 2 440 £ 2 64 £ 2 23 # of instances: 2 2 440 £ 2 64 £ 2 23 Employee(...), Projects(... ), Groups(...), WorksFor(...) Database schema: # of instances: N (= BIG but finite)
4
4 The Definition The set of all possible database instances: INST = {I 1, I 2, I 3,..., I N } Definition A probabilistic database I p is a probability distribution on INST s.t. i=1,N Pr(I i ) = 1 Pr : INST ! [0,1] Definition A possible world is I s.t. Pr(I) > 0 will use Pr or I p interchangeably
5
5 Query Semantics Given a query Q and a probabilistic database I p, what is the meaning of Q(I p ) ?
6
6 Query Semantics Semantics 1: Possible Answers A probability distribution on sets of tuples 8 A. Pr(Q = A) = I 2 INST. Q(I) = A Pr(I) Semantics 2: Possible Tuples A probability function on tuples 8 t. Pr(t 2 Q) = I 2 INST. t 2 Q(I) Pr(I)
7
7 Example: Query Semantics NameCityProduct JohnSeattleGizmo JohnSeattleCamera SueDenverGizmo SueDenverCamera Pr(I 1 ) = 1/3 NameCityProduct JohnBostonGizmo SueDenverGizmo SueSeattleGadget NameCityProduct JohnSeattleGizmo JohnSeattleCamera SueSeattleCamera NameCityProduct JohnBostonCamera SueSeattleCamera Pr(I 2 ) = 1/12 Pr(I 3 ) = 1/2 Pr(I 4 ) = 1/12 SELECT DISTINCT x.product FROM Purchase p x, Purchase p y WHERE x.name = 'John' and x.product = y.product and y.name = 'Sue' SELECT DISTINCT x.product FROM Purchase p x, Purchase p y WHERE x.name = 'John' and x.product = y.product and y.name = 'Sue' Possible answers semantics: Answer setProbability Gizmo, Camera1/3Pr(I 1 ) Gizmo1/12Pr(I 2 ) Camera7/12P(I 3 ) + P(I 4 ) TupleProbability Camera11/12Pr(I 1 )+P(I 3 ) + P(I 4 ) Gizmo5/12Pr(I 1 )+Pr(I 2 ) Possible tuples semantics: Purchase p
8
8 Possible Worlds Query Semantics Possible answers semantics Precise Can be used to compose queries Difficult user interface Possible tuples semantics Less precise, but simple; sufficient for most apps Cannot be used to compose queries Simple user interface
9
9 Possible Worlds Semantics: Summary Complete model; Clean formal semantics for SQL queries Not very useful as a representation or implementation tool HUGE number of possible worlds! Need more effective representation formalisms Something that users can understand/explore Allow more efficient query execution –Avoid “possible worlds explosion” Perhaps giving up completeness
10
10 Representation Formalisms Problem Need a good representation formalism Will be interpreted as possible worlds Several formalisms exists, but no winner Main open problem in probabilistic db
11
11 Evaluation of Formalisms Completeness? What possible worlds can it represent? What probability distributions on worlds? Closure? Is it closed under evaluation of query operators?
12
12 Outline A complete formalism: Intensional Databases Incomplete formalisms: Various expressibility/complexity tradeoffs Focus on Explicit Independent Tuples
13
13 Intensional Database [Fuhr&Roelleke:1997] Atomic event ids Intensional probabilistic database J: each tuple t has an event attribute t.E e 1, e 2, e 3, … p 1, p 2, p 3, … 2 [0,1] e 3 Æ (e 5 Ç : e 2 ) Probabilities: Event expressions: Æ, Ç, :
14
14 Intensional DB ) Possible Worlds NameAddressE JohnSeattle e 1 Æ (e 2 Ç e 3 ) SueDenver (e 1 Æ e 2 ) Ç (e 2 Æ e 3 ) J = IpIp e1e2e3=e1e2e3= 000001010011100101110111 JohnSeattle SueDenver JohnSeattleSueDenver ; (1-p 1 )(1-p 2 )(1-p 3 ) +(1-p 1 )(1-p 2 )p 3 +(1-p 1 )p 2 (1-p 3 ) +p 1 (1-p 2 )(1-p 3 p 1 (1-p 2 ) p 3 (1-p 1 )p 2 p 3 p 1 p 2 (1-p 3 ) +p 1 p 2 p 3
15
15 Possible Worlds ) Intensional DB NameAddressE JohnSeattle E 1 Ç E 2 JohnBoston E 1 Ç E 4 SueSeattle E 1 Ç E 2 Ç E 3 E 1 = e 1 E 2 = : e 1 Æ e 2 E 3 = : e 1 Æ : e 2 Æ e 3 E 4 = : e 1 Æ : e 2 Æ : e 3 Æ e 4 “Prefix code” Pr(e 1 ) = p 1 Pr(e 2 ) = p 2 /(1-p 1 ) Pr(e 3 ) = p 3 /(1-p 1 -p 2 ) Pr(e 4 ) = p 4 /(1-p 1 -p 2 -p 3 ) J = Intensional DBs are complete NameAddress JohnSeattle JohnBoston SueSeattle NameAddress JohnSeattle SueSeattle NameAddress SueSeattle NameAddress JohnBoston p1p1 p2p2 p3p3 p4p4 =I p
16
16 Closure Under Operators vEvE £ v1v1 E1E1 v1v1 v2v2 E1 ÆE2E1 ÆE2 v2v2 E2E2 vE1E1 vE2E2 …… v E 1 Ç E 2 Ç.. One still needs to compute probability of event expression [Fuhr&Roelleke:1997] - vE1E1 v E 1 Æ: E 2 vE2E2
17
17 Summary on Intensional Databases Event expression for each tuple Possible worlds: any subset Probability distribution: any Complete… but impractical Evaluate the probability of long event expressions Important abstraction: consider restrictions Related to c-tables [Imilelinski&Lipski:1984]
18
18 Restricted Formalisms Explicit tuples Have a tuple template for every tuple that may appear in a possible world Focus on the case of independent tuple events
19
19 Explicit Independent Tuples NameCityEpr JohnSeattlee1e1 0.8 SueBostone2e2 0.6 FredBostone3e3 0.9 0 0 independent Atomic, distinct. May use TIDs. tuple = independent event Can be easily extended to capture attribute-value uncertainty
20
20 Explicit Independent Tuples Pr(I) = t 2 I pr(t) £ t I (1-pr(t)) No restrictions pr : TUP ! [0,1] Tuple independent probabilistic database INST = P (TUP) N = 2 M TUP = {t 1, t 2, …, t M } = all tuples
21
21 Tuple Prob. ) Possible Worlds NameCitypr JohnSeattlep 1 = 0.8 SueBostonp 2 = 0.6 FredBostonp 3 = 0.9 I p = NameCity JohnSeattl SueBosto FredBosto NameCity SueBosto FredBosto NameCity JohnSeattl FredBosto NameCity JohnSeattl SueBosto NameCity FredBosto NameCity SueBosto NameCity JohnSeattl ; I1I1 (1-p 1 ) (1-p 2 ) (1-p 3 ) I2I2 p 1 (1-p 2 )(1-p 3 ) I3I3 (1-p 1 )p 2 (1-p 3 ) I4I4 (1-p 1 )(1-p 2 )p 3 I5I5 p 1 p 2 (1-p 3 ) I6I6 p 1 (1-p 2 )p 3 I7I7 (1-p 1 )p 2 p 3 I8I8 p1p2p3p1p2p3 = 1 J = E[ size(I p ) ] = 2.3 tuples
22
22 Tuple-Independent DBs are Incomplete NameAddresspr JohnSeattlep1p1 SueSeattlep2p2 NameAddress JohnSeattle SueSeattle NameAddress JohnSeattle p1p1 p1p2p1p2 =I p ; 1-p 1 - p 1 p 2 Very limited – cannot capture correlations across tuples Not Closed Query operators can introduce complex correlations!
23
23 Tuple Prob. ) Query Evaluation NameCitypr JohnSeattlep1p1 SueBostonp2p2 FredBostonp3p3 CustomerProductDatepr JohnGizmo...q1q1 JohnGadget...q2q2 JohnGadget...q3q3 SueCamera...q4q4 SueGadget...q5q5 SueGadget...q6q6 FredGadget...q7q7 SELECT DISTINCT x.city FROM Person x, Purchase y WHERE x.Name = y.Customer and y.Product = ‘Gadget’ SELECT DISTINCT x.city FROM Person x, Purchase y WHERE x.Name = y.Customer and y.Product = ‘Gadget’ TupleProbability Seattle Boston 1-(1-q 2 )(1-q 3 )p 1 ( ) 1- (1- ) £ (1 - ) p 2 ( )1-(1-q 5 )(1-q 6 ) p 3 q 7
24
24 Application 1: Similarity Predicates NameCityProfession JohnSeattlestatistician SueBostonmusician FredBostonphysicist CustProductCategory JohnGizmodishware JohnGadgetinstrument JohnGadgetinstrument SueCameramusicware SueGadgetmicrophone SueGadgetinstrument FredGadgetmicrophone SELECT DISTINCT x.city FROM Person x, Purchase y WHERE x.Name = y.Cust and y.Product = ‘Gadget’ and x.profession ~ ‘scientist’ and y.category ~ ‘music’ SELECT DISTINCT x.city FROM Person x, Purchase y WHERE x.Name = y.Cust and y.Product = ‘Gadget’ and x.profession ~ ‘scientist’ and y.category ~ ‘music’ Step 1: evaluate ~ predicates
25
25 Application 1: Similarity Predicates NameCityProfessionpr JohnSeattlestatisticianp 1 =0.8 SueBostonmusicianp 2 =0.2 FredBostonphysicistp 3 =0.9 CustProductCategorypr JohnGizmodishwareq 1 =0.2 JohnGadgetinstrumentq 2 =0.6 JohnGadgetinstrumentq 3 =0.6 SueCameramusicwareq 4 =0.9 SueGadgetmicrophoneq 5 =0.7 SueGadgetinstrumentq 6 =0.6 FredGadgetmicrophoneq 7 =0.7 SELECT DISTINCT x.city FROM Person p x, Purchase p y WHERE x.Name = y.Cust and y.Product = ‘Gadget’ and x.profession ~ ‘scientist’ and y.category ~ ‘music’ SELECT DISTINCT x.city FROM Person p x, Purchase p y WHERE x.Name = y.Cust and y.Product = ‘Gadget’ and x.profession ~ ‘scientist’ and y.category ~ ‘music’ TupleProbability Seattlep 1 (1-(1-q 2 )(1-q 3 )) Boston 1-(1-p 2 (1-(1-q 5 )(1-q 6 ))) £ (1-p 3 q 7 ) Step 1: evaluate ~ predicates Step 2: evaluate rest of query
26
26 Summary on Explicit Independent Tuples Independent tuples Possible worlds: subsets Probability distribution: restricted Closure: no
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.