Containment Mappings Canonical Databases Sariaya’s Algorithm

Slides:

Advertisements

Similar presentations

CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.

Advertisements

2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.

CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.

1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.

Inference and Reasoning. Basic Idea Given a set of statements, does a new statement logically follow from this. For example If an animal has wings and.

Theory of Computing Lecture 18 MAS 714 Hartmut Klauck.

F22H1 Logic and Proof Week 7 Clausal Form and Resolution.

1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.

1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.

2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.

CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.

Local-as-View Mediators Priya Gangaraju(Class Id:203)

Restricted Satisfiability (SAT) Problem

Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.

PSPACE-Completeness Section 8.3 Giorgi Japaridze Theory of Computability.

10.4 How to Find a Perfect Matching We have a condition for the existence of a perfect matching in a graph that is necessary and sufficient. Does this.

Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.

Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.

Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.

CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.

CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.

A Logic of Partially Satisfied Constraints Nic Wilson Cork Constraint Computation Centre Computer Science, UCC.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

3.3 Dividing Polynomials.

Indirect Argument: Contradiction and Contraposition

The NP class. NP-completeness

More NP-Complete and NP-hard Problems

WARM UP Find each equation, determine whether the indicated pair (x, y) is a solution of the equation. 2x + y = 5; (1, 3) 4x – 3y = 14; (5, 2)

Lap Chi Lau we will only use slides 4 to 19

Normal Forms for CFG’s Eliminating Useless Variables Removing Epsilon

Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman

Bipartite Matching Lecture 8: Oct 7.

Topics in Algorithms Lap Chi Lau.

Approximate Inference

Chapter 7: Beyond Definite Knowledge

Semantics of Datalog With Negation

NP-Completeness Yin Tat Lee

CS154, Lecture 16: More NP-Complete Problems; PCPs

3.3 Applications of Maximum Flow and Minimum Cut

Solution of Equations by Iteration

The Satisfiability Problem

Data Structures Review Session

Logic: Top-down proof procedure and Datalog

CSE 311: Foundations of Computing

Introduction to PCP and Hardness of Approximation

Recurrences (Method 4) Alexandra Stefan.

The Fundamental Theorem of Calculus

Local-as-View Mediators

Logic Based Query Languages

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)

NP-Completeness Yin Tat Lee

ECE 352 Digital System Fundamentals

Inference III: Approximate Inference

Brief Introduction to Computational Logic

Datalog Inspired by the impedance mismatch in relational databases.

AS-Level Maths: Core 2 for Edexcel

This Lecture Substitution model

Conjunctive Queries, Views, Datalog Monday, 4/29/2002

CS154, Lecture 16: More NP-Complete Problems; PCPs

Copyright © Cengage Learning. All rights reserved.

Theory of Computability

Instructor: Aaron Roth

The Complexity of Approximation

Instructor: Aaron Roth

Matrix Algebra THE INVERSE OF A MATRIX © 2012 Pearson Education, Inc.

The Satisfiability Problem

Presentation transcript:

Containment Mappings Canonical Databases Sariaya’s Algorithm Conjunctive Queries Containment Mappings Canonical Databases Sariaya’s Algorithm

Conjunctive Queries A CQ is a single Datalog rule, with all subgoals assumed to be EDB. Meaning of a CQ is the mapping from databases (the EDB) to the relation produced for the head predicate by applying that rule to the EDB.

Containment of CQ’s Q1  Q2 iff for all databases D, Q1(D)  Q2(D). Example: Q1: p(X,Y) :- arc(X,Z) & arc(Z,Y) Q2: p(X,Y) :- arc(X,Z) & arc(W,Y) DB is a graph; Q1 produces paths of length 2, Q2 produces pairs of nodes with an arc out and in, respectively.

Example --- Continued Whenever there is a path from X to Y, it must be that X has an arc out, and Y an arc in. Thus, every fact (tuple) produced by Q1 is also produced by Q2. That is, Q1  Q2.

Why Care About CQ Containment? Important optimization: if we can break a query into terms that are CQ’s, we can eliminate those terms contained in another. Especially important when we deal with integration of information: CQ containment is almost the only way to tell what information from sources we don’t need.

Why Care? --- Continued Containment tests imply equivalence-of-programs tests. Any theory of program (query) design or optimization requires us to know when programs are equivalent. CQ’s, and some generalizations to be discussed, are the most powerful class of programs for which equivalence is known to be decidable.

Why Care --- Concluded Although CQ theory first appeared at a database conference, the AI community has taken CQ’s to heart. CQ’s, or similar logics like description logic, are used in a number of AI applications. Again --- their design theory is really containment and equivalence.

Testing Containment Two approaches: Containment mappings. Canonical databases. Really the same in the simple CQ case covered so far. Containment is NP-complete, but CQ’s tend to be small so here is one case where intractability doesn’t hurt you.

Containment Mappings A mapping from the variables of CQ Q2 to the variables of CQ Q1, such that: The head of Q2 is mapped to the head of Q1. Each subgoal of Q2 is mapped to some subgoal of Q1 with the same predicate.

Important Theorem There is a containment mapping from Q2 to Q1 if and only if Q1  Q2. Note that the containment mapping is opposite the containment --- it goes from the larger (containing CQ) to the smaller (contained CQ).

Example Q1: p(X,Y):-r(X,Z) & g(Z,Z) & r(Z,Y) Q2: p(A,B):-r(A,C) & g(C,D) & r(D,B) Q1 looks for: Q2 looks for: X Z Y A C D B Since C=D is possible, expect Q1  Q2.

Example --- Continued Q1: p(X,Y):-r(X,Z) & g(Z,Z) & r(Z,Y) Q2: p(A,B):-r(A,C) & g(C,D) & r(D,B) Containment mapping: m(A)=X; m(B)=Y; m(C)=m(D)=Z.

Example ---Concluded Q1: p(X,Y):-r(X,Z) & g(Z,Z) & r(Z,Y) Q2: p(A,B):-r(A,C) & g(C,D) & r(D,B) No containment mapping from Q1 to Q2. g(Z,Z) can only be mapped to g(C,D). No other g subgoals in Q2. But then Z must map to both C and D --- impossible. Thus, Q1 properly contained in Q2.

Another Example Q1: p(X,Y):-r(X,Y) & g(Y,Z) Q2: p(A,B):-r(A,B) & r(A,C) Q1 looks for: Q2 looks for: X Z Y A B C

Example --- Continued Q1: p(X,Y):-r(X,Y) & g(Y,Z) And not every subgoal need be a target. Q1: p(X,Y):-r(X,Y) & g(Y,Z) Q2: p(A,B):-r(A,B) & r(A,C) Containment mapping: m(A)=X; m(B)=m(C)=Y. Notice two subgoals can map to one.

Example ---Concluded Q1: p(X,Y):-r(X,Y) & g(Y,Z) Q2: p(A,B):-r(A,B) & r(A,C) No containment mapping from Q1 to Q2. g(Y,Z) cannot map anywhere, since there is no g subgoal in Q2. Thus, Q1 properly contained in Q2.

Proof of Containment-Mapping Theorem --- (1) First, assume there is a CM m : Q2->Q1. Let D be any database; we must show that Q1(D)  Q2(D). Suppose t is a tuple in Q1(D); we must show t is also in Q2(D).

Proof --- (2) Since t is in Q1(D), there is a substitution s from the variables of Q1 to values that: Makes every subgoal of Q1 a fact in D. More precisely, if p(X,Y,…) is a subgoal, then [s(X),s(Y),…] is a tuple in the relation for p. Turns the head of Q1 into t.

Proof --- (3) Consider the effect of applying m and then s to Q2. head of Q2 :- subgoal of Q2 m m head of Q1 :- subgoal of Q1 s s t tuple of D sm maps each subgoal of Q2 to a tuple of D. And the head of Q2 becomes t, proving t is also in Q2(D); i.e., Q1  Q2.

Proof of Converse --- (1) Now, we must assume Q1  Q2, and show there is a containment mapping from Q2 to Q1. Key idea --- frozen CQ Q : For each variable of Q, create a corresponding, unique constant. Frozen Q is a DB with one tuple formed from each subgoal of Q, with constants in place of variables.

Example: Frozen CQ p(X,Y):-r(X,Z) & g(Z,Z) & r(Z,Y) Let’s use lower-case letters as constants corresponding to variables. Then frozen CQ is: Relation R for predicate r = {(x,z), (z,y)}. Relation G for predicate g = {(z,z)}.

Converse --- (2) Suppose Q1  Q2, and let D be the frozen Q1. Claim: Q1(D) contains the frozen head of Q1 --- that is, the head of Q1 with variables replaced by their corresponding constants. Proof: the “freeze” substitution makes all subgoals in D, and makes the head become the frozen head.

Converse --- (3) Since Q1  Q2, the frozen head of Q1 must also be in Q2(D). Thus, there is a mapping s from variables of Q2 to D that turns subgoals of Q2 into tuples of D and turns the head of Q2 into the frozen head of Q1. But tuples of D are frozen subgoals of Q1, so s followed by “unfreeze” is a containment mapping from Q2 to Q1.

In Pictures Q2: h(X,Y) :- … p(Y,Z) … s s h(u,v) p(a,b) D freeze Q1: h(U,V) :- … p(A,B) … s followed by inverse of freeze maps each subgoal p(Y,Z) of Q2 to a subgoal p(A,B) of Q1 and maps h(X,Y) to h(U,V).

Dual View of CM’s Instead of thinking of a CM as a mapping on variables, think of a CM as a mapping from atoms to atoms. Required conditions: The head must map to the head. Each subgoal maps to a subgoal. As a consequence, no variable is mapped to two different variables.

Canonical Databases General idea: test Q1  Q2 by checking that Q1(D1)  Q2(D1),…, Q1(Dn)  Q2(Dn), where D1,…,Dn are the canonical databases. For the standard CQ case, we only need one canonical DB --- the frozen Q1. But in more general forms of queries, larger sets of canonical DB’s are needed.

Why Canonical DB Test Works Let D = frozen body of Q1; h = frozen head of Q1. Theorem: Q1  Q2 iff Q2(D) contains h. Proof (only if): Suppose Q2(D) does not contain h. Since Q1(D) surely contains h, it follows that Q1 is not contained in Q2.

Proof (if): Suppose Q2(D) contains h. Then there is a mapping from the variables of Q2 to the constants of D that maps: The head of Q2 to h. Each subgoal of Q2 to a frozen subgoal of Q1. This mapping, followed by “unfreeze,” is a containment mapping, so Q1  Q2.

Constants CQ’s are often allowed to have constants in subgoals. Corresponds to selection in relational algebra. CM’s and CM test are the same, but: A variable can map to one variable or one constant. A constant can only map to itself.

Example Q2: p(X) :- e(X,Y) Q1: p(A) :- e(A,10) CM from Q2 -> Q1 maps X->A and Y->10. Thus, Q1  Q2. A CM from Q1 -> Q2 would have to map constant 10 to variable Y; hence no such mapping exists.

Sariaya’s Algorithm Containment of CQ’s is NP-complete. But Sariaya’s algorithm is a linear-time test for the common situation where Q1 (the contained query) has no more than two subgoals with any one predicate. Reduction to 2SAT. We’ll give a simple, quadratic version.

Saraiya’s Algorithm --- (2) For any subgoal p(…) of Q2, where there is only one p –subgoal of Q1, we know exactly where p(…) must map. If there is a subgoal of Q2 that can map to two different subgoals of Q1, assume one choice, and chase down the “consequences.”

Consequences If p(X1,…,Xn) is known to map to p(Y1,…,Yn), then we know each vaiable Xi maps to Yi. If p(X1,…,Xn) is a subgoal of Q2, and we know Xi maps to some variable Z, and only one of the p –subgoals of Q1 has Z in the i th component, then p(X1,…,Xn) must map to that subgoal.

Sariaya’s Algorithm --- (3) Eventually, one of two things happens: We derive a contradiction --- a subgoal or variable that must map to two different things. We close the set of inferences --- there is no contradiction, and no more consequences.

Case (1): Contradiction In this case, we go back and try the other choice if there is one, and fail if there is no other choice.

Case (2): Closure In this case, we have found some variables and subgoals of Q2 that can be mapped as chosen, with no effect on any remaining subgoals or variables. Fix these choices, and consider any remaining subgoals. If all subgoals are now mapped, we have found a CM and are done.

Example Q2: p(X) :- a(X,Y) & b(Y,Z) & b(Z,W) & a(W,X) Q1: p(B) :- a(A,B) & a(B,A) & b(A,C) & b(C,B) Start by choosing a(X,Y) -> a(A,B) Then X->A and Y->B Now, b(Y,Z) must map to some b(B,?). But both choices do not have first component B.

Example --- Continued Q2: p(X) :- a(X,Y) & b(Y,Z) & b(Z,W) & a(W,X) Q1: p(B) :- a(A,B) & a(B,A) & b(A,C) & b(C,B) We thus know that in any CM, a(X,Y) maps to a(B,A). Thus, X->B and Y->A. a(W,X) cannot map to a(A,B) [W doesn’t map to A] or to a(B,A) [X doesn’t map to A]. Complete failure. Then b(Y,Z) must map to b(A,C), and Z->C. Thus, b(Z,W) -> b(C,B), and W->B

Example ---Slight Variation Q2: p(X) :- a(X,Y) & b(Y,Z) & b(Z,W) & a(W,X) Q1: p(B) :- a(A,B) & a(B,A) & b(A,C) & b(C,A) We thus know that in any CM, a(X,Y) maps to a(B,A). Thus, X->B and Y->A. Now, a(W,X) -> a(A,B), and there are no more consequences. We have a CM. Then b(Y,Z) must map to b(A,C), and Z->C. Thus, b(Z,W) -> b(C,B), and W->A