A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.

Slides:



Advertisements
Similar presentations
University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
Advertisements

A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization Christopher Re and Dan Suciu University of Washington 1.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Representing and Querying Correlated Tuples in Probabilistic Databases
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
© by Kenneth H. Rosen, Discrete Mathematics & its Applications, Sixth Edition, Mc Graw-Hill, 2007 Chapter 1: (Part 2): The Foundations: Logic and Proofs.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 6 (Continued) The Relational Algebra and Calculus.
Top-K Query Evaluation on Probabilistic Data Christopher Ré, Nilesh Dalvi and Dan Suciu University of Washington.
A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Presenter : Amit Goyal Discussion Lead : Jonatan.
1 L is in NP means: There is a language L’ in P and a polynomial p so that L 1 · L 2 means: For some polynomial time computable map r : 8 x: x 2 L 1 iff.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 4-1 Introduction to Statistics Chapter 5 Random Variables.
1 Probabilistic/Uncertain Data Management -- III Slides based on the Suciu/Dalvi SIGMOD’05 tutorial 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic.
1 Polynomial Church-Turing thesis A decision problem can be solved in polynomial time by using a reasonable sequential model of computation if and only.
1 Management of Probabilistic Data: Foundations and Challenges Nilesh Dalvi and Dan Suciu Univerisity of Washington.
Probabilistic Information Retrieval Part II: In Depth Alexander Dekhtyar Department of Computer Science University of Maryland.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
1 COMP541 Combinational Logic Montek Singh Jan 16, 2007.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
MystiQ The HusQies* *Nilesh Dalvi, Brian Harris, Chris Re, Dan Suciu University of Washington.
1 Probabilistic/Uncertain Data Management -- IV 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic databases”, VLDB’ Sen, Deshpande. “Representing.
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
Great Theoretical Ideas in Computer Science.
1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.
The Relational Model: Relational Calculus
Logic Circuits Chapter 2. Overview  Many important functions computed with straight-line programs No loops nor branches Conveniently described with circuits.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
1 Section 7.2 Equivalent Formulas Two wffs A and B are equivalent, written A  B, if they have the same truth value for every interpretation. Property:
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
1 P P := the class of decision problems (languages) decided by a Turing machine so that for some polynomial p and all x, the machine terminates after at.
Chapter 6 The Relational Algebra Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Webdamlog and Contradictions Daniel Deutch Tel Aviv University Joint work with Serge Abiteboul, Meghyn Bienvenu, Victor Vianu.
Propositional Logic. Propositions Any statement that is either True (T) or False (F) is a proposition Propositional variables: a variable that can assume.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
Nikolaj Bjørner Microsoft Research DTU Winter course January 2 nd 2012 Organized by Flemming Nielson & Hanne Riis Nielson.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
Probabilities in Databases and Logics I Nilesh Dalvi and Dan Suciu University of Washington.
Relational Calculus Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Modified by Veeranjaneyulu Sadhanala.
1 SAT SAT: Given a Boolean function in CNF representation, is there a way to assign truth values to the variables so that the function evaluates to true?
Extensions of Datalog Wednesday, February 13, 2001.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
CSE202 Database Management Systems
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
A Course on Probabilistic Databases
Approximate Lineage for Probabilistic Databases
Queries with Difference on Probabilistic Databases
Propositional Calculus: Boolean Algebra and Simplification
Lecture 16: Probabilistic Databases
Lecture 10: Query Complexity
Data Exchange: Semantics and Query Answering
Probabilistic Databases
Probabilistic Databases with MarkoViews
CSE544 Wednesday, March 29, 2006.
Presentation transcript:

A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1

Outline 1. Motivating Applications 2. The Probabilistic Data ModelChapter 2 3. Extensional Query PlansChapter The Complexity of Query EvaluationChapter 3 5. Extensional EvaluationChapter Intensional EvaluationChapter 5 7. Conclusions June, 2014Probabilistic Databases - Dan Suciu 2 Part 1 Part 2 Part 3 Part 4

Overview Review: Unions of Conjunctive Queries, UCQ Four simple rules for evaluating queries Q Big Dichotomy Theorem: 1. If the rules succeed  Q is safe  in PTIME 2. If the rules fail  Q is unsafe  #P-complete Compare to the Small Dichotomy Theorem, which applies only to conjunctive queries w/o self-joins: Case 1 holds precisely when Q is hierarchical Case 2 holds precisely when Q is not hierarchical June, 2014Probabilistic Databases - Dan Suciu 3

Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 4 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”:

Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 5 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”: Union of conjunctive queries

Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 6 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”: Union of conjunctive queries Same as: Q(z) = Owner(z,x) ∧∃ t [Location(x,t,”Office444”) ∨ Location(x,t,”Hall7”)]

Review: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 7 Q(z) = ∃ x 1 ∃ t 1 (Owner(z,x 1 ) ∧ Location(x 1,t 1,”Office444”)) ∨ ∃ x 2 ∃ t 2 (Owner(z,x 2 ) ∧ Location(x 2,t 2,”Hall7”)) Q(z) = Owner(z,x 1 ),Location(x 1,t 1,”Office444”) ∨ Owner(z,x 2 ),Location(x 2,t 2,”Hall7”) Same as: Owners of items in either “Office444” or “Hall7”: Union of conjunctive queries Same as: Q(z) = Owner(z,x) ∧∃ t [Location(x,t,”Office444”) ∨ Location(x,t,”Hall7”)] 1.Distributivity law for ∨, ∧ 2.Commutativity law for ∃, ∨ : ( ∃ x P(x)) ∨ ( ∃ y T(y)) = ∃ z (P(z) ∨ T(z)) We will use these laws:

Four Rules for Computing Query Probabilities Independent join Independent project Independent union Inclusion/exclusion Rules apply to Boolean Queries only June, 2014Probabilistic Databases - Dan Suciu 8

June, 2014Probabilistic Databases - Dan Suciu 9 P(Q1 ∧ Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) Rule 1: Independent Join

June, 2014Probabilistic Databases - Dan Suciu 10 P(Q1 ∧ Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) P( ∃ z Q) = 1 – Π a ∈ Domain (1– P(Q[a/z]) If z is a “separator variable” in Q, meaning that for any constants a,b, Q[a/z] and Q[b/z] are independent Rule 1: Independent Join Rule 2: Independent Project

June, 2014Probabilistic Databases - Dan Suciu 11 P(Q1 ∧ Q2) = P(Q1)P(Q2) If Q1 and Q2 are independent (meaning: no common atoms) P( ∃ z Q) = 1 – Π a ∈ Domain (1– P(Q[a/z]) If z is a “separator variable” in Q, meaning that for any constants a,b, Q[a/z] and Q[b/z] are independent P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2)) Rule 1: Independent Join Rule 2: Independent Project Rule 3: Independent Union If Q1 and Q2 are independent (meaning: no common atoms)

Example June, 2014Probabilistic Databases - Dan Suciu 12 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )

Example June, 2014Probabilistic Databases - Dan Suciu 13 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )

Example June, 2014Probabilistic Databases - Dan Suciu 14 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )

Example June, 2014Probabilistic Databases - Dan Suciu 15 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 ) P(Q U ) = 1 – Π a ∈ Domain (1– P[(R(a) ∨ T(a)) ∧ ∃ y. S(a,y)] Distribute ∧ over ∨

Example June, 2014Probabilistic Databases - Dan Suciu 16 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 ) P(Q U ) = 1 – Π a ∈ Domain (1– P[(R(a) ∨ T(a)) ∧ ∃ y. S(a,y)] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∨ T(a)] P[ ∃ y. S(a,y)] Distribute ∧ over ∨ Independent join

Example June, 2014Probabilistic Databases - Dan Suciu 17 Q U = R(x 1 ),S(x 1,y 1 ) ∨ T(x 2 ),S(x 2,y 2 ) Commute ∃ with ∨ Q U = ∃ z [R(z) ∧ S(z,y 1 ) ∨ T(z) ∧ S(z,y 2 )] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∧ S(a,y 1 ) ∨ T(a) ∧ S(a,y 2 ))] Independent project: for a≠b, Q U [a/z] and Q U [b/z] are independent because atoms R(a),S(a,y 1 ),T(a),S(a,y 2 ) are distinct from R(b),S(b,y 1 ),T(b),S(b,y 2 ) = ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 ) ∨ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 ) P(Q U ) = 1 – Π a ∈ Domain (1– P[(R(a) ∨ T(a)) ∧ ∃ y. S(a,y)] P(Q U ) = 1 – Π a ∈ Domain (1– P[R(a) ∨ T(a)] P[ ∃ y. S(a,y)] Distribute ∧ over ∨ Independent join P(Q U ) = 1 – Π a ∈ Domain (1– (1-(1-P[R(a)])(1-P[T(a)])) (1-Π b ∈ Domain (1– P[S(a,b)])))

Rule 4: Inclusion-Exclusion June, 2014Probabilistic Databases - Dan Suciu P(Q1 ∧ Q2 ∧ Q3) = P(Q1) + P(Q2) + P(Q3) - P(Q1 ∨ Q2) – P(Q1 ∨ Q3) – P(Q2 ∨ Q3) + P(Q1 ∨ Q2 ∨ Q3) 18 Note: this is the dual of the more popular formula: P(Q1 ∨ Q2 ∨ Q3) = P(Q1) + P(Q2) + P(Q3) - P(Q1 ∧ Q2) – P(Q1 ∧ Q3) – P(Q2 ∧ Q3) + P(Q1 ∧ Q2 ∧ Q3)

Example June, 2014Probabilistic Databases - Dan Suciu 19 Q J = R(x 1 ),S(x 1,y 1 ), T(x 2 ),S(x 2,y 2 ) = [ ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 )] ∧ [ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )]

Example June, 2014Probabilistic Databases - Dan Suciu 20 Q J = R(x 1 ),S(x 1,y 1 ), T(x 2 ),S(x 2,y 2 ) = [ ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 )] ∧ [ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )] Q 1 = R(x 1 ),S(x 1,y 1 ) Q 2 = T(x 2 ),S(x 2,y 2 ) Q J = Q 1 ∧ Q 2 where

Example June, 2014Probabilistic Databases - Dan Suciu 21 Q J = R(x 1 ),S(x 1,y 1 ), T(x 2 ),S(x 2,y 2 ) = [ ∃ x 1 ∃ y 1 R(x 1 ) ∧ S(x 1,y 1 )] ∧ [ ∃ x 2 ∃ y 2 T(x 2 ) ∧ S(x 2,y 2 )] Q 1 = R(x 1 ),S(x 1,y 1 ) Q 2 = T(x 2 ),S(x 2,y 2 ) Q J = Q 1 ∧ Q 2 where P(Q J ) = P(Q 1 ) + P(Q 2 ) - P(Q 1 ∨ Q 2 ) Q 1 = a hierarchical conjunctive query w/o self-joins Q 2 = similar Q 1 ∨ Q 2 = Q U, which have see a couple of slides ago

Lesson 3 We need unions in order to handle self-joins! Conjunctive Queries = not a “natural” class of queries for Probabilistic DBs Unions of Conjunctive Queries = the “natural” class of queries June, 2014Probabilistic Databases - Dan Suciu 22

Unsafe Queries – When the Rules Fail 23 H 0 = R(x),S(x,y),T(y) June, 2014Probabilistic Databases - Dan Suciu

Unsafe Queries – When the Rules Fail 24 H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) June, 2014Probabilistic Databases - Dan Suciu = ∃ z [R(z) ∧ S(z,y 0 ) ∨ S(x 1,z) ∧ T(z)] Unlike Q U, here z occurs on different positions in S and we cannot apply Independent Project

Unsafe Queries – When the Rules Fail 25 H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) H 2 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),T(y 2 ) June, 2014Probabilistic Databases - Dan Suciu

Unsafe Queries – When the Rules Fail H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) H 2 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),T(y 2 ) H 3 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 ) ∨ S 3 (x 3,y 3 ),T(y 3 ) June, 2014Probabilistic Databases - Dan Suciu

Unsafe Queries – When the Rules Fail 27 The proof is in [Dalvi&S, JACM’2012]... H 0 = R(x),S(x,y),T(y) H 1 = R(x 0 ),S(x 0,y 0 ) ∨ S(x 1,y 1 ),T(y 1 ) H 2 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),T(y 2 ) Theorem. Each query H k is #P-hard H 3 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 ) ∨ S 3 (x 3,y 3 ),T(y 3 ) June, 2014Probabilistic Databases - Dan Suciu

The Amazing Queries H k 28 H 3 = R(x 0 ),S 1 (x 0,y 0 ) ∨ S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 ) ∨ S 3 (x 3,y 3 ),T(y 3 ) H k is #P-hard. But if we drop any one conjunctive query, then it is in PTIME Independent union = ∃ z [S 2 (x 2,z),S 3 (x 2,z) ∨ S 3 (x 3,z),T(z)] = ∃ z [ ∃ x 3 S 3 (x 3,z)] ∧ [( ∃ x 2 S 2 (x 2,z)) ∨ T(z)] = etc June, 2014Probabilistic Databases - Dan Suciu

Where We Are We have seen examples of unsafe queries: H k But if a query Q has H k as a subquery, it is not necessarily unsafe When the four rules succeed, then Q is safe But inclusion/exclusion is insufficient: need to replace with Mobius inversion formula We will discuss these issues then state the Big Dichotomy Theorem June, 2014Probabilistic Databases - Dan Suciu 29

A Safe Query with H 1 as Subquery June, 2014Probabilistic Databases - Dan Suciu 30 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 )

A Safe Query with H 1 as Subquery Disconnected query = H 1 (unsafe!) June, 2014Probabilistic Databases - Dan Suciu 31 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 )

A Safe Query with H 1 as Subquery DNF CNF Disconnected query = H 1 (unsafe!) June, 2014Probabilistic Databases - Dan Suciu 32 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 ) Q V =[S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 )] ∧ [R(x 1 ),S(x 1,y 1 ) ∨ T(y 3 )]

A Safe Query with H 1 as Subquery DNF CNF = R(x 3 ) ∨ T(y 3 ) PTIME ! Disconnected query = H 1 (unsafe!) Inclusion/exclusion: June, 2014Probabilistic Databases - Dan Suciu 33 Q V = R(x 1 ),S(x 1,y 1 ) ∨ S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 ),T(y 3 ) Q V =[S(x 2,y 2 ),T(y 2 ) ∨ R(x 3 )] ∧ [R(x 1 ),S(x 1,y 1 ) ∨ T(y 3 )] P(Q V ) = P(q 1 ∧ q 2 )= P(q 1 ) + P(q 2 )-P(q 1 ∨ q 2 )

Inclusion/Exclusion is Insufficient Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ June, 2014Probabilistic Databases - Dan Suciu 34

Inclusion/Exclusion is Insufficient P(Q W ) = P(Q 1 ) + P(Q 2 ) + P(Q 3 ) + - P(Q 1 ∨ Q 2 ) - P(Q 2 ∨ Q 3 ) – P(Q 1 ∨ Q 3 ) + P(Q 1 ∨ Q 2 ∨ Q 3 ) Also = H 3 = H 3 (hard !) June, 2014Probabilistic Databases - Dan Suciu 35 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */

#P-hard Inclusion/Exclusion is Insufficient P(Q W ) = P(Q 1 ) + P(Q 2 ) + P(Q 3 ) + - P(Q 1 ∨ Q 2 ) - P(Q 2 ∨ Q 3 ) – P(Q 1 ∨ Q 3 ) + P(Q 1 ∨ Q 2 ∨ Q 3 ) Also = H 3 June, 2014Probabilistic Databases - Dan Suciu 36 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ PTIME = H 3 (hard !)

Inclusion/Exclusion is Insufficient June, 2014Probabilistic Databases - Dan Suciu 37 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ P(Q W ) = P(Q 1 ) + P(Q 2 ) + P(Q 3 ) + - P(Q 1 ∨ Q 2 ) - P(Q 2 ∨ Q 3 ) – P(Q 1 ∨ Q 3 ) + P(Q 1 ∨ Q 2 ∨ Q 3 ) Also = H 3 #P-hard PTIME = H 3 (hard !)

August Ferdinand Möbius Möbius strip Möbius function μ in number theory Generalized to lattices [Stanley’97,Rota’09] And now to queries ! June, 2014Probabilistic Databases - Dan Suciu 38

The CNF Lattice June, 2014Probabilistic Databases - Dan Suciu 39 Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book.

The CNF Lattice June, 2014Probabilistic Databases - Dan Suciu 40 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book. Example

The CNF Lattice Q1Q1 Q2Q2 Q3Q3 Q2∨Q3Q2∨Q3 Q1∨Q2Q1∨Q2 Q 1 ∨ Q 2 ∨ Q 3 (= Q 1 ∨ Q 3 ) =max(L) June, 2014Probabilistic Databases - Dan Suciu 41 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book. Example

The CNF Lattice Q1Q1 Q2Q2 Q3Q3 Q2∨Q3Q2∨Q3 Q1∨Q2Q1∨Q2 Q 1 ∨ Q 2 ∨ Q 3 (= Q 1 ∨ Q 3 ) =max(L) June, 2014Probabilistic Databases - Dan Suciu 42 Q W = [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 2 (x 2,y 2 ),S 3 (x 2,y 2 )] ∧ /* Q1 */ [R(x 0 ),S 1 (x 0,y 0 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] ∧ /* Q2 */ [S 1 (x 1,y 1 ),S 2 (x 1,y 1 ) ∨ S 3 (x 3,y 3 ),T(y 3 )] /* Q3 */ Definition. The CNF lattice of Q = Q1 ∧ Q2 ∧ … is: See formal definition in the book. Example Nodes  in PTIME, Nodes  #P hard.

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 43

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 44 1

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 45 1

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi)

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi)

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu 1 Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi)

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu 2 1 Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi)

The Möbius’ Function June, 2014Probabilistic Databases - Dan Suciu 2 1 Def. The Möbius function: μ(, ) = 1 μ(u, ) = - Σ u < v ≤ μ(v, ) Möbius’ Inversion Formula: P(Q) = - Σ Qi < μ(Qi, ) P(Qi) New Rule Inclusion/Exclusion  Mobius’ Inversion Formula

The Big Dichotomy Theorem June, 2014Probabilistic Databases - Dan Suciu Dichotomy into PTIME/#P-complete based on “syntax” where “syntax” includes the Mobius function ! 51 Dichotomy Theorem Fix a UCQ query Q. 1.If rules terminates, then P(Q) is in PTIME 2.If rules fail, then P(Q) is #P-complete Dichotomy Theorem Fix a UCQ query Q. 1.If rules terminates, then P(Q) is in PTIME 2.If rules fail, then P(Q) is #P-complete The proof is in [Dalvi&S, JACM’2012]

Lesson 5 Four simple rules are all we need to compute query probabilities in PTIME: Independent join Independent project Independent union Inclusion/Exclusion  Mobius inversion formula Inclusion/exclusion is not used in modern model counting systems! It is specific to probabilistic databases June, 2014Probabilistic Databases - Dan Suciu 52

Representation Theorem Do we really need the lattice and Mobius function? Yes! For every lattice on can construct a query Q s.t.: Q is in PTIME if μ=0 Q is #P-complete if μ≠0 This suggests that using the Mobius function is unavoidable in Probabilistic Databases June, 2014Probabilistic Databases - Dan Suciu 53

Representation Theorem QWQW Examples: THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME 0 PTIME ! Q is in PTIME iff μ(, )=0 ! June, 2014Probabilistic Databases - Dan Suciu 54

Representation Theorem QWQW Q WW Examples: THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME 0 0 PTIME ! Q is in PTIME iff μ(, )=0 ! June, 2014Probabilistic Databases - Dan Suciu 55

Representation Theorem QWQW Q WW Q9Q9 Examples: THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME THEOREM Every lattice L is the CNF lattice of a query Q, s.t. The query at (= min(L)) is hard for #P All other queries are in PTIME PTIME ! Q is in PTIME iff μ(, )=0 ! June, 2014Probabilistic Databases - Dan Suciu 56

Landscape of Probabilistic Databases June, 2014Probabilistic Databases - Dan Suciu #P-hard PTIME Have safe plans Have approximate plans 57 QUQU QJQJ QVQV QWQW Q9Q9 H0H0 H1H1 H2H2 hierarchical H3H3 non-hierarchical

Extensional Plans for UCQ Recall extensional operators for Conjunctive Queries w/o self-joins Independent join: ⋈ Independent projectionΠ Selectionσ Now we need two more operators: Independent union: ∪ i Mobius sum: Σ μ1,μ2,μ3 June, 2014Probabilistic Databases - Dan Suciu 58

Independent-Union and Mobius-Sum June, 2014Probabilistic Databases - Dan Suciu ∪ AP a1p1 a2p2 a3p3 R(A) AP a2q2 a3q3 a4q4 T(A) AP a1p1 a21-(1-p2)(1-q2) a31-(1-p3)(1-q3) a4q4 SELECT (1.0 - (CASE WHEN R.p IS null THEN 0 ELSE R.p END))* (1.0 - (CASE WHEN S.p IS null THEN 0 ELSE S.p END)) FROM R full outer join S on r.x=s.x; 59 i

Independent-Union and Mobius-Sum June, 2014Probabilistic Databases - Dan Suciu ∪ AP a1p1 a2p2 a3p3 R(A) AP a2q2 a3q3 a4q4 T(A) AP a1p1 a21-(1-p2)(1-q2) a31-(1-p3)(1-q3) a4q4 SELECT (1.0 - (CASE WHEN R.p IS null THEN 0 ELSE R.p END))* (1.0 - (CASE WHEN S.p IS null THEN 0 ELSE S.p END)) FROM R full outer join S on r.x=s.x; 60 i Σ μ1,μ2,μ3 A AP a1p1 a2p2 a3p3 AP a2q2 a3q3 a4q4 AP a1s1 a3s3 AP a1μ1*p1+μ3*s1 a2μ1*p2+μ3*q2+μ3*s2 a3μ1*p3+μ3*q2+μ3*s3 a4μ3*q4 R(A) T(A) U(A) SELECT … -- long query -- here

Extensional Plans for UCQ June, 2014Probabilistic Databases - Dan Suciu ΠzΠz ΠxΠx S(x,y)R(z,x) ⋈x⋈x ΠzΠz ΠxΠx S(x,y)T(z,x) ⋈x⋈x ∪ Σ +1,-1,+1 ΠzΠz ΠxΠx S(x,y)R(z,x) ⋈x⋈x i ΠzΠz ΠxΠx S(x,y)T(z,x) ⋈x⋈x z z SELECT DISTINCT S.z FROM R r, S s1, T t, S s2 WHERE r.z = s.z and r.x = s1.x and t.z = s.z and t.x = s2.x SELECT DISTINCT S.z FROM R r, S s1, T t, S s2 WHERE r.z = s.z and r.x = s1.x and t.z = s.z and t.x = s2.x Can write back in SQL… … but won’t fit on one slide 61

Summary: Extensional Query Evaluation Four rules can evaluate all queries that are in PTIME Actually, a fifth rule is needed (ranking), see book Big Dichotomy Theorem: If the rules succeed  query is safe  in PTIME If the rules fail  query is unsafe  #P-complete Inclusion/exclusion is specific to probabilistic databases, not used by modern model counters: will discuss next. June, 2014Probabilistic Databases - Dan Suciu 62