Download presentation
Presentation is loading. Please wait.
Published byDerrick Dickerson Modified over 9 years ago
1
1 Special Cases: Query Semantics: (“Marginal Probabilities”) Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities of all instances D i where t exists. A probabilistic database D p (compactly) encodes a probability distribution over a finite set of deterministic database instances D i. D 1 : 0.42D 2 : 0.18D 3 : 0.28D 4 : 0.12 (1) D p tuple-independent(II) D p block-independent Note: (I) and (II) are not equivalent! Probabilistic Database
2
Soft Rules vs. Hard Rules (Soft) Deduction Rules vs. (Hard) Consistency Constraints PPeople may live in more than one place livesIn(x,y) marriedTo(x,z) livesIn(z,y) livesIn(x,y) hasChild(x,z) livesIn(z,y) PPeople are not born in different places/on different dates bornIn(x,y) bornIn(x,z) y=z bornOn(x,y) bornOn(x,z) y=z PPeople are not married to more than one person (at the same time, in most countries?) marriedTo(x,y,t 1 ) marriedTo(x,z,t 2 ) y≠z disjoint(t 1,t 2 ) [0.8] [0.5]
3
Basic Types of Inference MAP Inference Find the most likely assignment to query variables y under a given evidence x. Compute: arg max y P( y | x) (NP-complete for MaxSAT) Marginal/Success Probabilities Probability that query y is true in a random world under a given evidence x. Compute: ∑ y P( y | x ) (#P-complete already for conjunctive queries)
4
[Yahya,Theobald: RuleML’11 Dylla,Miliaraki,Theobald: ICDE’13] Deductive Grounding with Lineage (SLD Resolution in Datalog/Prolog) \/ /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) CD AB A (B (C D)) A (B (C D)) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Q1Q1 Q2Q2 Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z Rules hasAdvisor(x,y) worksAt(y,z) graduatedFrom(x,z) [0.4] graduatedFrom(x,y) graduatedFrom(x,z) y=z Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0] Base Facts graduatedFrom(Surajit, Princeton) [0.7] graduatedFrom(Surajit, Stanford) [0.6] graduatedFrom(David, Princeton) [0.9] hasAdvisor(Surajit, Jeff) [0.8] hasAdvisor(David, Jeff) [0.7] worksAt(Jeff, Stanford) [0.9] type(Princeton, University) [1.0] type(Stanford, University) [1.0] type(Jeff, Computer_Scientist) [1.0] type(Surajit, Computer_Scientist) [1.0] type(David, Computer_Scientist) [1.0]
5
Lineage & Possible Worlds 1) Deductive Grounding Dependency graph of the query Trace lineage of individual query answers 2) Lineage DAG (not in CNF), consisting of Grounded soft & hard rules Probabilistic base facts 3) Probabilistic Inference Compute marginals: P(Q): sum up the probabilities of all possible worlds that entail the query answers’ lineage P(Q|H): drop “impossible worlds” \/ /\ graduatedFrom (Surajit, Princeton) [0.7] graduatedFrom (Surajit, Princeton) [0.7] hasAdvisor (Surajit,Jeff )[0.8] hasAdvisor (Surajit,Jeff )[0.8] worksAt (Jeff,Stanford )[0.9] worksAt (Jeff,Stanford )[0.9] graduatedFrom (Surajit, Stanford) [0.6] graduatedFrom (Surajit, Stanford) [0.6] Query graduatedFrom(Surajit, y) Query graduatedFrom(Surajit, y) 0.7x(1-0.888)=0.078(1-0.7)x0.888=0.266 1-(1-0.72)x(1-0.6) =0.888 0.8x0.9 =0.72 CD AB A (B (C D)) A (B (C D)) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Princeton) graduatedFrom (Surajit, Stanford) graduatedFrom (Surajit, Stanford) Q1Q1 Q2Q2 [Das Sarma,Theobald,Widom: ICDE’08 Dylla,Miliaraki,Theobald: ICDE’13]
6
Possible Worlds Semantics 1.0 0.2664 0.412 P(Q 2 )=0.2664 P(Q 2 |H)=0.2664 / 0.412 = 0.6466 P(Q 1 )=0.0784P(Q 1 |H)=0.0784 / 0.412 = 0.1903 0.0784 Hard rule H: A (B (C D))
7
Basic Complexity Issue Theorem [Valiant:1979] For a Boolean expression E, computing P(E) is #P-complete. NP = class of problems of the form “is there a witness ?” SAT #P = class of problems of the form “how many witnesses ?” #SAT NP = class of problems of the form “is there a witness ?” SAT #P = class of problems of the form “how many witnesses ?” #SAT The decision problem for 2CNF is in PTIME. The counting problem for 2CNF is already #P-complete. [Suciu & Dalvi: SIGMOD’05 Tutorial on "Foundations of Probabilistic Answers to Queries"]
8
Tuple-Independent Probabilistic Database Theorem: The query answering problem for the above join is #P-hard. A probabilistic database D p (compactly) encodes a probability distribution over a finite set of deterministic database instances D i. Probabilistic Database Which professors work at a university that is located in CA?
9
Probabilistic & Temporal Database Sequenced Semantics & Snapshot Reducibility: Built-in semantics: reduce temporal-relational operators to their non- temporal counterparts at each snapshot (i.e., time point) of the database. Coalesce/split tuples with consecutive time intervals based on their lineages. Non-Sequenced Semantics Queries can freely manipulate timestamps just like regular attributes. Single temporal operator ≤ T supports all of Allen’s 13 temporal relations. Deduplicate tuples with overlapping time intervals based on their lineages. A temporal-probabilistic database D Tp (compactly) encodes a probability distribution over a finite set of deterministic database instances D i at each time point of a finite time domain T. [Dignös, Gamper, Böhlen: SIGMOD’12] [Dylla,Miliaraki,Theobald: PVLDB’13
10
Temporal Alignment & Deduplication Example Non-Sequenced Semantics: f1f1 193619761988 f 2 ¬f 3 f 2 f 3 f 1 ¬f 3 f 1 f 3 (f 1 f 3 ) (f 1 ¬f 3 ) (f 1 f 3 ) (f 1 ¬f 3 ) (f 2 f 3 ) (f 2 ¬f 3 ) (f 1 f 3 ) (f 2 ¬f 3 ) TT marriedTo(x,y)[t b1,T max ) wedding(x,y)[t b1,t e1 ) ¬divorce(x,y)[t b2,t e2 ) marriedTo(x,y)[t b1,t e2 ) wedding(x,y)[t b1,t e1 ) divorce(x,y)[t b2,t e2 ) t e1 ≤ T t b2 Base Facts Derived Facts Dedupl. Facts wedding(DeNiro,Abbott) divorce(DeNiro,Abbott) T max f2f2 f3f3 T min
11
0.08 0.12 0.16 0.4 0.6 ‘03 ‘05‘07 playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts 0.2 0.1 0.4 ‘05‘00‘02‘07 playsFor(Ronaldo, Real, T 2 ) ‘04 ‘03‘04 ‘07 ‘05 playsFor(Beckham, Real, T 1 ) playsFor(Ronaldo, Real, T 2 ) overlaps(T 1, T 2, T 3 ) t 3 teamMates(Beckham, Ronaldo, t 3 ) teamMates(Beckham, Ronaldo, T 3 ) Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13] Example using the Allen predicate overlaps
12
0.4 0.6 ‘03 ‘05‘07 playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts playsFor(Ronaldo, Real, T 2 ) 0.2 0.1 ‘05‘00‘02‘07 ‘04 0.4 0.08 0.12 0.16 ‘03‘04 ‘07 ‘05 playsFor(Zidane, Real, T 3 ) teamMates(Beckham, Zidane, T 5 ) teamMates(Ronaldo, Zidane, T 6 ) teamMates(Beckham, Ronaldo, T 4 ) Non-independent Independent Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13]
13
playsFor(Beckham, Real, T 1 ) Base Facts Derived Facts playsFor(Ronaldo, Real, T 2 ) playsFor(Zidane, Real, T 3 ) teamMates(Beckham, Zidane, T 5 ) teamMates(Ronaldo, Zidane, T 6 ) Non-independent Independent Closed and complete representation model (incl. lineage) Temporal alignment is polyn. in the number of input intervals Confidence computation per interval remains #P-hard In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning Closed and complete representation model (incl. lineage) Temporal alignment is polyn. in the number of input intervals Confidence computation per interval remains #P-hard In general requires Monte Carlo approximations (Luby-Karp for DNF, MCMC-style sampling), decompositions, or top-k pruning teamMates(Beckham, Ronaldo, T 4 ) Need Lineage! Inference in Probabilistic-Temporal Databases [Wang,Yahya,Theobald: MUD’10; Dylla,Miliaraki,Theobald: PVLDB’13]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.