A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.

Slides:



Advertisements
Similar presentations
COMPLEXITY THEORY CSci 5403 LECTURE XVI: COUNTING PROBLEMS AND RANDOMIZED REDUCTIONS.
Advertisements

University of Washington Database Group The Complexity of Causality and Responsibility for Query Answers and non-Answers Alexandra Meliou, Wolfgang Gatterbauer,
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
PAPER BY : CHRISTOPHER R’E NILESH DALVI DAN SUCIU International Conference on Data Engineering (ICDE), 2007 PRESENTED BY : JITENDRA GUPTA.
INHERENT LIMITATIONS OF COMPUTER PROGRAMS CSci 4011.
A COURSE ON PROBABILISTIC DATABASES June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases
Discrete Structures & Algorithms The P vs. NP Question EECE 320.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Presenter : Amit Goyal Discussion Lead : Jonatan.
Complexity 25-1 Complexity Andrei Bulatov #P-Completeness.
1 L is in NP means: There is a language L’ in P and a polynomial p so that L 1 · L 2 means: For some polynomial time computable map r : 8 x: x 2 L 1 iff.
1 Probabilistic/Uncertain Data Management -- III Slides based on the Suciu/Dalvi SIGMOD’05 tutorial 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic.
1 Boolean Satisfiability in Electronic Design Automation (EDA ) By Kunal P. Ganeshpure.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY Read sections 7.1 – 7.3 of the book for next time.
1 Polynomial Church-Turing thesis A decision problem can be solved in polynomial time by using a reasonable sequential model of computation if and only.
1 Polynomial Church-Turing thesis A decision problem can be solved in polynomial time in a reasonable sequential model of computation if and only if it.
1 Management of Probabilistic Data: Foundations and Challenges Nilesh Dalvi and Dan Suciu Univerisity of Washington.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
1 Polynomial Time Reductions Polynomial Computable function : For any computes in polynomial time.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
1 Probabilistic/Uncertain Data Management -- IV 1.Dalvi, Suciu. “Efficient query evaluation on probabilistic databases”, VLDB’ Sen, Deshpande. “Representing.
Daniel Kroening and Ofer Strichman 1 Decision Procedures in First Order Logic Decision Procedures for Equality Logic.
1 L is in NP means: There is a language L’ in P and a polynomial p so that L 1 ≤ L 2 means: For some polynomial time computable map r : x: x L 1 iff r(x)
A D ICHOTOMY ON T HE C OMPLEXITY OF C ONSISTENT Q UERY A NSWERING FOR A TOMS W ITH S IMPLE K EYS Paris Koutris Dan Suciu University of Washington.
1  Special Cases:  Query Semantics: (“Marginal Probabilities”)  Run query Q against each instance D i ; for each answer tuple t, sum up the probabilities.
The Relational Model: Relational Calculus
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
Pricing Combinatorial Markets for Tournaments Presented by Rory Kulz.
Lecture 22 More NPC problems
Theory of Computation, Feodor F. Dragan, Kent State University 1 NP-Completeness P: is the set of decision problems (or languages) that are solvable in.
Computational Complexity Theory Lecture 2: Reductions, NP-completeness, Cook-Levin theorem Indian Institute of Science.
NP Complexity By Mussie Araya. What is NP Complexity? Formal Definition: NP is the set of decision problems solvable in polynomial time by a non- deterministic.
Christopher Re and Dan Suciu University of Washington Efficient Evaluation of HAVING Queries on a Probabilistic Database.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
CSCI 2670 Introduction to Theory of Computing November 29, 2005.
Complexity 25-1 Complexity Andrei Bulatov Counting Problems.
First Order Logic Lecture 2: Sep 9. This Lecture Last time we talked about propositional logic, a logic on simple statements. This time we will talk about.
May 9, New Topic The complexity of counting.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
1 P P := the class of decision problems (languages) decided by a Turing machine so that for some polynomial p and all x, the machine terminates after at.
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
1 How to establish NP-hardness Lemma: If L 1 is NP-hard and L 1 ≤ L 2 then L 2 is NP-hard.
CS 461 – Nov. 30 Section 7.5 How to show a problem is NP-complete –Show it’s in NP. –Show that it corresponds to another problem already known to be NP-complete.
Probabilities in Databases and Logics I Nilesh Dalvi and Dan Suciu University of Washington.
1 CSE 326: Data Structures: Graphs Lecture 24: Friday, March 7 th, 2003.
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
CSE 421 Algorithms Richard Anderson Lecture 27 NP-Completeness Proofs.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Efficient Query Evaluation on Probabilistic Databases Nilesh Dalvi Dan Suciu Modified by Veeranjaneyulu Sadhanala.
Computability Examples. Reducibility. NP completeness. Homework: Find other examples of NP complete problems.
1 SAT SAT: Given a Boolean function in CNF representation, is there a way to assign truth values to the variables so that the function evaluates to true?
 2005 SDU Lecture15 P,NP,NP-complete.  2005 SDU 2 The PATH problem PATH = { | G is a directed graph that has a directed path from s to t} s t
CSE 332: NP Completeness, Part II Richard Anderson Spring 2016.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Decision Procedures in First Order Logic
Computational Complexity Theory
A Course on Probabilistic Databases
NP-Completeness (36.4-5) P: yes and no in pt NP: yes in pt NPH  NPC
NP-Completeness Yin Tat Lee
CS21 Decidability and Tractability
Queries with Difference on Probabilistic Databases
Lecture 16: Probabilistic Databases
NP-Completeness Proofs
Lecture 10: Query Complexity
NP-Completeness Yin Tat Lee
Umans Complexity Theory Lectures
Probabilistic Databases with MarkoViews
Presentation transcript:

A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1

Outline 1. Motivating Applications 2. The Probabilistic Data ModelChapter 2 3. Extensional Query PlansChapter The Complexity of Query EvaluationChapter 3 5. Extensional EvaluationChapter Intensional EvaluationChapter 5 7. Conclusions June, 2014Probabilistic Databases - Dan Suciu 2 Part 1 Part 2 Part 3 Part 4

The Model Counting Problem Given a Boolean Formula F, compute the number of satisfying assignments #F If F has n Boolean variables, then 0 ≤ #F ≤ 2 n SAT is the problem: Given F, check if there exists a satisfying assignment SAT is NP-complete #SAT is the model counting problem: Given F, compute #F #SAT is #P-complete We can reduce SAT to #SAT Probability computation is the problem: if each Boolean variable X i is set to true with probability p i, compute P(F), the probability that F is true Also #P-complete June, 2014Probabilistic Databases - Dan Suciu 3

4 Example The Model Counting Problem (#SAT): Compute #F F = X 1 Y 1 ∨ X 1 Y 2 ∨ X 2 Y 3 ∨ X 2 Y 4 June, 2014Probabilistic Databases - Dan Suciu

5 Example P(X 1 ) = p 1, P(X 2 ) = p 2, P(Y 1 ) = q 1, P(Y 2 ) = q 2, P(Y 3 ) = q 3, P(Y 4 ) = q 4 The Probability Computation Problem: Compute P(F) F = X 1 Y 1 ∨ X 1 Y 2 ∨ X 2 Y 3 ∨ X 2 Y 4 June, 2014Probabilistic Databases - Dan Suciu The Model Counting Problem (#SAT): Compute #F

6 Example P(X 1 ) = p 1, P(X 2 ) = p 2, P(Y 1 ) = q 1, P(Y 2 ) = q 2, P(Y 3 ) = q 3, P(Y 4 ) = q 4 Re-group: The Probability Computation Problem: Compute P(F) F = X 1 Y 1 ∨ X 1 Y 2 ∨ X 2 Y 3 ∨ X 2 Y 4 F = [X 1 (Y 1 ∨ Y 2 )] ∨ [X 2 (Y 3 ∨ Y 4 )] June, 2014Probabilistic Databases - Dan Suciu P(F)=1 – [1-p 1 (1-(1-q 1 )(1-q 2 ))] [1-p 2 (1-(1-q 3 )(1-q 4 ))] The Model Counting Problem (#SAT): Compute #F

7 Example P(X 1 ) = p 1, P(X 2 ) = p 2, P(Y 1 ) = q 1, P(Y 2 ) = q 2, P(Y 3 ) = q 3, P(Y 4 ) = q 4 Re-group: P(F)=1 – [1-p 1 (1-(1-q 1 )(1-q 2 ))] [1-p 2 (1-(1-q 3 )(1-q 4 ))] The Probability Computation Problem: Compute P(F) F = X 1 Y 1 ∨ X 1 Y 2 ∨ X 2 Y 3 ∨ X 2 Y 4 F = [X 1 (Y 1 ∨ Y 2 )] ∨ [X 2 (Y 3 ∨ Y 4 )] June, 2014Probabilistic Databases - Dan Suciu #F = 2 6 P(F), when p 1 = … = q 4 = 0.5 #F = 34 The Model Counting Problem (#SAT): Compute #F

8 Now let’s try this: F = X 1 X 2 ∨ X 1 X 3 ∨ X 2 X 3 June, 2014Probabilistic Databases - Dan Suciu

9 Now let’s try this: No clever grouping seems possible. Use brute force: X1X1 X2X2 X3X3 FP (1-p 1 )p 2 p p 1 (1-p 2 )p p 1 p 2 (1-p 3 ) 1111p1p2p3p1p2p3 F = X 1 X 2 ∨ X 1 X 3 ∨ X 2 X 3 June, 2014Probabilistic Databases - Dan Suciu

10 P(F)=(1-p 1 )p 2 p 3 + p 1 (1-p 2 )p 3 + p 1 p 2 (1-p 3 ) + p 1 p 2 p 3 Now let’s try this: No clever grouping seems possible. Use brute force: X1X1 X2X2 X3X3 FP (1-p 1 )p 2 p p 1 (1-p 2 )p p 1 p 2 (1-p 3 ) 1111p1p2p3p1p2p3 #F = 4 F = X 1 X 2 ∨ X 1 X 3 ∨ X 2 X 3 June, 2014Probabilistic Databases - Dan Suciu

11 Complexity of Model Counting NP = class of problems of the form “is there a witness ?” SAT #P = class of problems of the form “how many witnesses ?” #SAT Interesting fact: The decision problem 2CNF is in PTIME The counting problem #2CNF is #P-complete Theorem [Valiant:1979] #SAT is #P-complete SAT is the problem: Given F, check if there exists a satisfying assignment #SAT is the problem: Given F, compute #F June, 2014Probabilistic Databases - Dan Suciu

12 PP2DNF Formulas Example: F = X 1 Y 3 ∨ X 2 Y 1 ∨ X 2 Y 3 ∨ X 3 Y 2 Definition. A Positive, Partitioned 2DNF (PP2DNF) formula is: F = X i1 Y j1 ∨ X i1 Y j1 ∨ … ∨ X in Y jn Definition. A Positive, Partitioned 2DNF (PP2DNF) formula is: F = X i1 Y j1 ∨ X i1 Y j1 ∨ … ∨ X in Y jn Theorem [Provan and Ball 1982] Model counting for PP2DNF is #P-complete. Even for such simple formulas, counting the number of models is #P-complete! June, 2014Probabilistic Databases - Dan Suciu

13 Queries are #P-Complete Proof: by reduction from #PP2DNF Example: suppose F = X 1 Y 1 ∨ X 1 Y 2 ∨ X 2 Y 2 XY x1y1 x1y2 x2y2 XP x10.5 x20.5 YP y10.5 y20.5 R T S Then P(F) = P(H 0 ) Theorem. [Dalvi&S.04] Consider the query H 0 = R(x),S(x,y),T(y) The problem: given a probabilistic database D, compute P(H 0 ) is #P-complete in the size of D Theorem. [Dalvi&S.04] Consider the query H 0 = R(x),S(x,y),T(y) The problem: given a probabilistic database D, compute P(H 0 ) is #P-complete in the size of D June, 2014Probabilistic Databases - Dan Suciu

Where We Are Safe queries have safe plans: in PTIME Unsafe queries have no safe plans: provably #P-complete Problem: for a given query, determine if it is safe. Next: Conjunctive Queries without Self-joins Later: Unions of Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 14

Review: Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 15 Q(z) = ∃ x ∃ t.(Owner(z,x) ∧ Location(x,t,”Office444”)) Q(z) = Owner(z,x), Location(x,t,”Office444”) Same as: A conjunctive query:

Review: Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 16 Q(z) = ∃ x ∃ t.(Owner(z,x) ∧ Location(x,t,”Office444”)) Q(z) = Owner(z,x), Location(x,t,”Office444”) Same as: atom z=head variable x, t = existential variables A conjunctive query:

Review: Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 17 Q(z) = ∃ x ∃ t.(Owner(z,x) ∧ Location(x,t,”Office444”)) Q(z) = Owner(z,x), Location(x,t,”Office444”) Same as: atom z=head variable x, t = existential variables A conjunctive queries has self-joins if it has repeated atoms: A conjunctive query: Q() = R(z,x 1 ),S(z,x 1,y 1 ), T(z,x 2 ),S(z,x 2,y 2 ) Q() = R(x,y),R(y,z)

Review: Conjunctive Queries June, 2014Probabilistic Databases - Dan Suciu 18 Q(z) = ∃ x ∃ t.(Owner(z,x) ∧ Location(x,t,”Office444”)) Q(z) = Owner(z,x), Location(x,t,”Office444”) Same as: atom z=head variable x, t = existential variables A conjunctive queries has self-joins if it has repeated atoms: A conjunctive query: Q() = R(z,x 1 ),S(z,x 1,y 1 ), T(z,x 2 ),S(z,x 2,y 2 ) Q() = R(x,y),R(y,z) Conjunctive queries without self-joins (no repeated atoms): Q() = R(x),S(x,y) H 0 () = R(x),S(x,y),T(y)

19 Hierarchical Queries at(x) = set of atoms containing the variable x in a key position June, 2014Probabilistic Databases - Dan Suciu Definition A query Q is hierarchical if forall existential variables x, y: at(x)  at(y) or at(x) ⊇ at(y) or at(x)  at(y) =  Definition A query Q is hierarchical if forall existential variables x, y: at(x)  at(y) or at(x) ⊇ at(y) or at(x)  at(y) = 

20 Hierarchical Queries at(x) = set of atoms containing the variable x in a key position R S xy T Non-hierarchical R S x z Hierarchical y Q = R(x,y),S(x,z) H 0 = R(x), S(x, y), T(y) June, 2014Probabilistic Databases - Dan Suciu Definition A query Q is hierarchical if forall existential variables x, y: at(x)  at(y) or at(x) ⊇ at(y) or at(x)  at(y) =  Definition A query Q is hierarchical if forall existential variables x, y: at(x)  at(y) or at(x) ⊇ at(y) or at(x)  at(y) = 

21 The Small Dichotomy Theorem Fix a schema for probabilistic databases where all tuples are independent R S xy T Non-hierarchical R S x z Hierarchical y Q = R(x,y),S(x,z) H 0 = R(x), S(x, y), T(y) Theorem [Dalvi&S.04] For every conjunctive query Q w/o self-joins: If Q is hierarchical, then computing P(Q) is in PTIME If Q is not hierarchical then computing P(Q) is #P-complete Theorem [Dalvi&S.04] For every conjunctive query Q w/o self-joins: If Q is hierarchical, then computing P(Q) is in PTIME If Q is not hierarchical then computing P(Q) is #P-complete PTIME #P-complete June, 2014Probabilistic Databases - Dan Suciu

22 Proof: Part I R S x z y R(x,y) S(x,z)  i -z ⋈x ⋈x  i -x  i -y  Hierarchical  PTIME Q = R(x,y),S(x,z) If Q is hierarchical, then use the query’s hierarchy to derive a safe plan. Example: June, 2014Probabilistic Databases - Dan Suciu

Proof: Part II June, 2014Probabilistic Databases - Dan Suciu 23 Non-hierarchical  #P-hard If Q is not hierarchical, then there exists x,y: at(x) ⊈ at(y), at(x)  at(y) ≠  at(x) ⊉ at(y) There exists atoms R, S, T: R ∈ at(x) – at(y), S ∈ at(x) ∩ at(y), T ∈ at(y) – at(x) Q = … R(x, …), S(x,y,…), T(y,…), … …and we use a reduction from H 0  

Summary of the Complexity Safe queries are in PTIME Unsafe queries are #P-complete Small Dichotomy Theorem: classifies every query in one of these two classes, but only applies to Conjunctive Queries without Self-joins Big Dichotomy Theorem: applies to all Unions of Conjunctive Queries – will discuss next June, 2014Probabilistic Databases - Dan Suciu 24