CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.

Slides:



Advertisements
Similar presentations
Boolean Algebra and Logic Gates
Advertisements

1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Containment of Conjunctive Queries on Annotated Relations TJ Green University of Pennsylvania Symposium on Database Provenance University of Edinburgh.
Completeness and Expressiveness
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
1 Extended Conjunctive Queries Unions Arithmetic Negation.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
F22H1 Logic and Proof Week 7 Clausal Form and Resolution.
Efficient Query Evaluation on Probabilistic Databases
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
A Probabilistic Framework for Information Integration and Retrieval on the Semantic Web by Livia Predoiu, Heiner Stuckenschmidt Institute of Computer Science,
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
Logic and Proof. Argument An argument is a sequence of statements. All statements but the first one are called assumptions or hypothesis. The final statement.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
Restricted Satisfiability (SAT) Problem
CSE 636 Data Integration Answering Queries Using Views Overview.
Normal forms for Context-Free Grammars
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Monadic Predicate Logic is Decidable Boolos et al, Computability and Logic (textbook, 4 th Ed.)
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Restricted Satisfiability (SAT) Problem
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
PSPACE-Completeness Section 8.3 Giorgi Japaridze Theory of Computability.
10.4 How to Find a Perfect Matching We have a condition for the existence of a perfect matching in a graph that is necessary and sufficient. Does this.
MATH 224 – Discrete Mathematics
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
MA/CSSE 474 Theory of Computation DFSM Canonical Form Proof of NDFSM  DFSM ALGORITHM (as much as we have time for) This version includes the "answers"
Algebra Form and Function by McCallum Connally Hughes-Hallett et al. Copyright 2010 by John Wiley & Sons. All rights reserved. 3.1 Solving Equations Section.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.
1 Lecture 6: Schema refinement: Functional dependencies
CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.
CSE373: Data Structures & Algorithms Lecture 10: Implementing Union-Find Dan Grossman Fall 2013.
Naïve Set Theory. Basic Definitions Naïve set theory is the non-axiomatic treatment of set theory. In the axiomatic treatment, which we will only allude.
Closure Properties Lemma: Let A 1 and A 2 be two CF languages, then the union A 1  A 2 is context free as well. Proof: Assume that the two grammars are.
Chapter 2 Symbolic Logic. Section 2-1 Truth, Equivalence and Implication.
NP-completeness Section 7.4 Giorgi Japaridze Theory of Computability.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
CSE 311 Foundations of Computing I Lecture 28 Computability: Other Undecidable Problems Autumn 2011 CSE 3111.
CSC 413/513: Intro to Algorithms
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
Metalogic Soundness and Completeness. Two Notions of Logical Consequence Validity: If the premises are true, then the conclusion must be true. Provability:
1 Section 7.1 First-Order Predicate Calculus Predicate calculus studies the internal structure of sentences where subjects are applied to predicates existentially.
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Containment Mappings Canonical Databases Sariaya’s Algorithm
Semantics of Datalog With Negation
3.3 Applications of Maximum Flow and Minimum Cut
Local-as-View Mediators
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre
Presentation transcript:

CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006

2 Conjunctive Queries (CQ) A CQ is a single Datalog rule, with all subgoals assumed to be EDB. Meaning of a CQ is the mapping from databases (the EDB) to the relation produced for the head predicate by applying that rule to the EDB.

3 Containment of CQ’s Q1  Q2 iff for all databases D, Q1(D)  Q2(D). Example: –Q1: p(X,Y) :- arc(X,Z) & arc(Z,Y) –Q2: p(X,Y) :- arc(X,Z) & arc(W,Y) DB is a graph; Q1 produces paths of length 2, Q2 produces pairs of nodes with an arc out and in, respectively.

4 Example - Continued Whenever there is a path from X to Y, it must be that X has an arc out, and Y an arc in. Thus, every fact (tuple) produced by Q1 is also produced by Q2. That is, Q1  Q2.

5 Why Care About CQ Containment? Important optimization: if we can break a query into terms that are CQ’s, we can eliminate those terms contained in another. Especially important when we deal with integration of information: CQ containment is almost the only way to tell what information from sources we don’t need.

6 Why Care? - Continued Containment tests imply equivalence-of- programs tests. –Any theory of program (query) design or optimization requires us to know when programs are equivalent. –CQ’s, and some generalizations to be discussed, are the most powerful class of programs for which equivalence is known to be decidable.

7 Why Care? - Concluded Although CQ theory first appeared at a database conference, the AI community has taken CQ’s to heart. CQ’s, or similar logics like description logic, are used in a number of AI applications. –Again, their design theory is really containment and equivalence.

8 Two approaches: 1.Containment mappings. 2.Canonical databases. Really the same in the simple CQ case covered so far. Containment is NP-complete, but CQ’s tend to be small so here is one case where intractability doesn’t hurt you. Testing Containment

9 Containment Mappings A mapping from the variables of CQ Q2 to the variables of CQ Q1, such that: 1.The head of Q2 is mapped to the head of Q1. 2.Each subgoal of Q2 is mapped to some subgoal of Q1 with the same predicate.

10 Important Theorem There is a containment mapping from Q2 to Q1 if and only if Q1  Q2. Note that the containment mapping is opposite the containment - it goes from the larger (containing CQ) to the smaller (contained CQ).

11 Q1: p(X,Y):- r(X,Z) & g(Z,Z) & r(Z,Y) Q2: p(A,B):- r(A,C) & g(C,D) & r(D,B) Q1 looks for: Q2 looks for: Example XYZ ABDC Since C=D is possible, expect Q1  Q2.

12 Q1: p(X,Y):- r(X,Z) & g(Z,Z) & r(Z,Y) Q2: p(A,B):- r(A,C) & g(C,D) & r(D,B) Containment mapping: m(A)=X; m(B)=Y; m(C)=m(D)=Z. Example - Continued

13 Example - Concluded Q1: p(X,Y):- r(X,Z) & g(Z,Z) & r(Z,Y) Q2: p(A,B):- r(A,C) & g(C,D) & r(D,B) No containment mapping from Q1 to Q2. –g(Z,Z) can only be mapped to g(C,D). –No other g subgoals in Q2. –But then Z must map to both C and D - impossible. Thus, Q1 properly contained in Q2.

14 Q1: p(X,Y):- r(X,Y) & g(Y,Z) Q2: p(A,B):- r(A,B) & r(A,C) Q1 looks for: Q2 looks for: Another Example XZY AB C

15 Q1: p(X,Y):- r(X,Y) & g(Y,Z) Q2: p(A,B):- r(A,B) & r(A,C) Containment mapping: m(A)=X; m(B)=m(C)=Y. Example - Continued Notice two subgoals can map to one. And not every subgoal need be a target.

16 Example - Concluded Q1: p(X,Y):- r(X,Y) & g(Y,Z) Q2: p(A,B):- r(A,B) & r(A,C) No containment mapping from Q1 to Q2. –g(Y,Z) cannot map anywhere, since there is no g subgoal in Q2. Thus, Q1 properly contained in Q2.

17 Proof of Containment-Mapping Theorem First, assume there is a CM m : Q2Q1. Let D be any database; we must show that Q1(D)  Q2(D). Suppose t is a tuple in Q1(D); we must show t is also in Q2(D).

18 Proof - (2) Since t is in Q1(D), there is a substitution s from the variables of Q1 to values that: 1.Makes every subgoal of Q1 a fact in D. –More precisely, if p(X,Y,…) is a subgoal, then [s(X),s(Y),…] is a tuple in the relation for p. 2.Turns the head of Q1 into t.

19 Consider the effect of applying m and then s to Q2. head of Q2 :-subgoal of Q2 m m head of Q1 :-subgoal of Q1 s s t tuple of D Proof - (3) s  m maps each subgoal of Q2 to a tuple of D. And the head of Q2 becomes t, proving t is also in Q2(D); i.e., Q1  Q2.

20 Now, we must assume Q1  Q2, and show there is a containment mapping from Q2 to Q1. Key idea - frozen CQ Q : 1.For each variable of Q, create a corresponding, unique constant. 2.Frozen Q is a DB with one tuple formed from each subgoal of Q, with constants in place of variables. Proof of Converse

21 Example: Frozen CQ p(X,Y):- r(X,Z) & g(Z,Z) & r(Z,Y) Let’s use lower-case letters as constants corresponding to variables. Then frozen CQ is: Relation R for predicate r = {(x,z), (z,y)}. Relation G for predicate g = {(z,z)}.

22 Converse - (2) Suppose Q1  Q2, and let D be the frozen Q1. Claim: Q1(D) contains the frozen head of Q1 - that is, the head of Q1 with variables replaced by their corresponding constants. –Proof: the “freeze” substitution makes all subgoals in D, and makes the head become the frozen head.

23 Since Q1  Q2, the frozen head of Q1 must also be in Q2(D). Thus, there is a mapping s from variables of Q2 to D that turns subgoals of Q2 into tuples of D and turns the head of Q2 into the frozen head of Q1. But tuples of D are frozen subgoals of Q1, so s followed by “unfreeze” is a containment mapping from Q2 to Q1. Converse - (3)

24 Q2: h(X,Y) :- … p(Y,Z) … s s h(u,v) p(a,b) D freeze Q1: h(U,V) :- … p(A,B) … In Pictures s followed by inverse of freeze maps each subgoal p(Y,Z) of Q2 to a subgoal p(A,B) of Q1 and maps h(X,Y) to h(U,V).

25 Dual View of CM’s Instead of thinking of a CM as a mapping on variables, think of a CM as a mapping from atoms to atoms. Required conditions: 1.The head must map to the head. 2.Each subgoal maps to a subgoal. 3.As a consequence, no variable is mapped to two different variables.

26 Replace all variables in Q1 by distinct constants. Consider the canonical database D c that contains exactly the tuples corresponding to the subgoals in the “frozen” bodies of Q1. Evaluate Q2 on the canonical database. Q1 is contained in Q2 iff the “frozen heads” of Q1 are contained in Q2(D c ). Canonical Databases

27 Why Canonical DB Test Works Let D = frozen body of Q1; h = frozen head of Q1. Theorem: Q1  Q2 iff Q2(D) contains h. Proof (only if): Suppose Q2(D) does not contain h. Since Q1(D) surely contains h, it follows that Q1 is not contained in Q2.

28 Proof (if): Suppose Q2(D) contains h. Then there is a mapping from the variables of Q2 to the constants of D that maps: –The head of Q2 to h. –Each subgoal of Q2 to a frozen subgoal of Q1. This mapping, followed by “unfreeze,” is a containment mapping, so Q1  Q2.

29 Constants CQ’s are often allowed to have constants in subgoals. –Corresponds to selection in relational algebra. CM’s and CM test are the same, but: 1. A variable can map to one variable or one constant. 2. A constant can only map to itself.

30 Q2: p(X) :- r(X, Y), r(Y, Z), r(Z, W) Q1: p(X) :- r(X, Y), r(Y, X) Here is the test for whether Q2 contains Q1 –Choose constants X → 0; Y → 1 –Canonical database from Q1 is D = {r(0, 1), r(1, 0)} –Q2(D) = {p(0), p(1)} –Since the frozen head of Q1, p(0), is in Q2(D), Q2 contains Q1 Note that the instantiation of Q2 that shows p(0) is in Q2(D) is X → 0; Y → 1; Z → 0; W → 1 Example