Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.

Slides:



Advertisements
Similar presentations
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
Advertisements

CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Information Integration Using Logical Views Jeffrey D. Ullman.
primary key constraint foreign key constraint
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
Propositional and First Order Reasoning. Terminology Propositional variable: boolean variable (p) Literal: propositional variable or its negation p 
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
ICDT'2001, London, UK1 On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa.
2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM.
Efficient Query Evaluation on Probabilistic Databases
Infinite Horizon Problems
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
Copyright © Zeph Grunschlag, Set Operations Zeph Grunschlag.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
CSE 636 Data Integration Answering Queries Using Views Overview.
CSE 636 Data Integration Answering Queries Using Views MiniCon Algorithm.
Information Integration Using Logical Views Jeffrey D. Ullman.
CS246 Query Translation. Mind Your Vocabulary Q: What is the problem? A: How to integrate heterogeneous sources when their schema & capability are different.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
2005lav-i1 Local as View: First steps  Introduction and an example  Rewriting queries using views  The Information Manifold system.
Finite State Machines Data Structures and Algorithms for Information Processing 1.
SQL (almost end) April 26 th, Agenda HAVING clause Views Modifying views Reusing views.
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 3 (26/01/2006) Instructor: Haifeng YU.
1 CPSC 320: Intermediate Algorithm Design and Analysis July 11, 2014.
The Integers. The Division Algorithms A high-school question: Compute 58/17. We can write 58 as 58 = 3 (17) + 7 This forms illustrates the answer: “3.
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
1 Lecture 6: Views Friday, January 17th, Updating Views How can I insert a tuple into a table that doesn’t exist? Employee(ssn, name, department,
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC.
CS848 Presentation Heng YU (Henry)
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Inference in First Order Logic. Outline Reducing first order inference to propositional inference Unification Generalized Modus Ponens Forward and backward.
SchemaLog – A Visual Perspective CPSC 534B Laks V.S. Lakshmanan UBC (names of schema components abbreviated.)
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
Liaoruo Wang and John E. Hopcroft Dept. of Computer Engineering & Computer Science, Cornell University In Proc. 7th Annual Conference on Theory and Applications.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
1 Proving Properties of Recursive List Functions CS 270 Math Foundations of CS Jeremy Johnson.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Introduction to Logic for Artificial Intelligence Lecture 2
RankSQL: Query Algebra and Optimization for Relational Top-k Queries
Containment Mappings Canonical Databases Sariaya’s Algorithm
Data Integration with Dependent Sources
Lecture 10: Query Complexity
Local-as-View Mediators
Presentation transcript:

Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC

6/1/20162 Problem & Motivation Given a query Q: p  q1, …, qm and a set of (materialized) views v1, …, vn (presumably defined in terms of the q’s and/or other base relations), can we answer Q using the views? Can we do so while removing the maximal set of redundant subgoals (q’s) from the rewrite? Why care? –q’s may be just logical – not stored anywhere (information integration) –q’s may be expensive to access (data warehousing) –Additional option for classical QO.

6/1/20163 Key contributions Main focus – CQs with possible builtins. Characterization of when Q is answerable using the views. Intractability results. Sufficient condition (=> PTIME algo.) for special cases. Investigation of effect of builtins.

A Motivating Example [Halevy01] DB Schema: Prof(name, area), Course(c-number, title) Teaches(prof, c-number, quarter, evaluation) Registered(student, c-number, quarter), Major(student, dept) WorksIn(prof, dept), Advises(prof, student). Q: select R.student, C.title from Teaches T, Prof P, Registered R, Course C where P.name=T.prof and T.c-number=R.c-number and T.quarter=R.quarter and R.c-number= C.c-number and C.c-number ≥ 500 and P.area="DB". 6/1/20164

Motivating Example (contd.) Materialized view: create view Graduate as select R.student, C.title, C.c-number, R.quarter from Registered R, Course C where R.c-number=C.c-number and C.c-number ≥ /1/20165 Q can be answered (rewritten) using V as: select G.student, G.title from Teaches T, Prof P, Graduate G where P.name=T.prof and T.c-number=G.c-number and T.quarter=G.quarter and G.c-number ≥ 500 and P.area="DB".

Motivating Example (contd.) Why does it work? –V contains required attributes for output. –Conditions imposed in Q are stronger than those imposed in V. –V contains attributes needed to enforce remaining conditions in Q. We’ll look at the underlying theory next. 6/1/20166

7 An example (1) X YZ W U p p0 p1 p2 Q V A CB D p p0 p1 Q X Z W U p2 v Not leveraged. Is it leverageable?

6/1/20168 Some definitions Def.: Given a CQ Q, conjunctive views v1, …, vn, a CQ Q’ is a rewriting of Q provided: –Q’ is equiv. to Q. –Q’ contains one or more v’s. (revisit e.g.) Note: –Only “pure” CQ for now. –When builtins allowed, may lose closure.

6/1/20169 More definitions Def.: A rewriting Q’ is locally minimal if no subgoals can be removed w/o losing equivalence. It’s globally minimal if it has the fewest (database) subgoals among all rewritings. It is complete if it only contains v’s and builtins. Which is an easier objective? Why only count DB subgoals? Should we always remove subgoals?

6/1/ Another example (2) Q: p(X,Y)  a(X,Z), b(X,W), a(Y,Z). V: v(A)  a(A,B), b(A,C). Q’: p(X,Y)  v(X), a(X,Z), b(X,W), a(Y,Z). No subgoals removed; yet, v() acts as a filter and cuts down size of intermediate result early.

6/1/ Example of complete rewriting (3) Revisit e.g. (1). Add another view v’(A,B)  p1(A,C), p2(C,B), p0(H,K). Q”: p(X,U)  v(X,Z), v’(X,U). Note: –Even without p0 complete rewriting works. –Presence of p0 shows sometimes we have to do “parallel” reasoning.

6/1/ First Key Proposition Prop.: Q, V – CQs. Q is rewritable using V iff on every DB D, whenever V(D) is empty, then so is Q(D). Intuition: head(Q)  body(Q) & body(V) is a legal rewriting. In some sense the most “loose” rewriting. This rewriting is equiv. to Q iff the above property holds. []

6/1/ Some complexity remarks CQ containment is NP-complete even w/o builtins. Previous prop ==> when Q is CQ [+ builtins] and V is CQ, complexity of testing existence of a rewriting is NP-complete. When both can contain builtins, complexity shoots up to  p 2 -complete.

6/1/ Key Result Lemma 3.3: Q, V – CQ and set of CQs w/o builtins. Then 1.If Q’ is a locally minimal rewriting of Q using V, then the set of DB subgoals in Q’ is isomorphic to a subset of such subgoals in Q. 2.For every rewriting,  a corresp. rewriting which introduces no new variables. 3.Similar results when builtins present, except need union of CQs in the rewrite.

6/1/ Proof – Main ideas Q: p  q1, …, qm. Views: v1  …, vk  …. Q’: p  qi, …, qj, vl, …, vn. Q”: p  qi, …, qj, q’s q’s Q: p  q1,..., qm. Assume wlog that Q has no redundant subgoals. Can always throw them out. 

6/1/ Proof (contd.) Q”: p  qi, …, qj, q’s q’s S Q: p  q1,..., qm. C C = core of body of Q;  S =  (C); C’ =  (S). C’ must be isomorphic to C. (Why?) ==>  is an iso from C to S.

6/1/ Proof (contd.) If Q’ contains DB subgoals not iso to some in S, remove them. Q’ remains equiv. to Q. (Why?) So, every DB subgoal in Q’ is iso some in S, i.e., iso to some in C (aka Q). So, Q’ has no DB subgoal outside of Q. Proof of 2 and 3 – similar reasoning. []

6/1/ How large is a minimal rewrite? Lemma 3.5: A locally minimal complete rewriting contains at most as many subgoals as there are in the body of Q. Proof: Q: p  q1, …, qm. Q’: p  v1, …, vn (where n > m). Q”: p  exp(v1), …, exp(vn). Q and Q” are equiv. Consider a c.m. from Q to Q”. Each qi maps to (some subgoal inside) exactly one exp(vj). By pigeon hole principle, nothing in exp(vk) is mapped to, for some vk. Remove that vk, yielding a smaller rewrite than Q’. Contradiction. []

6/1/ Answering queries using views in general When builtins are allowed, complexity goes up. Need to consider union of CQs in the rewriting, in general. Finding (locally) minimal rewriting – several efficient algorithms have been developed. One of the best known – Minicon Algorithm (Rachel’s thesis).

6/1/ Example illustrating subtleties source1(E,P,M)  emp(E), phone(E,P), mgr(E,M). source2(E,O,D)  emp(E), office(E,O), dept(E,D). source3(E,P)  emp(E), phone(E,P), dept(E,`toy’). Q: q1(O,P)  phone(E,P), office(E,O). q1(O,P)  source1(E, P, M), source2(E, O, D). –l–locally minimal. q1(O,P)  source3(E, P), source2(E, O, D). –A–Another LM rewriting. neither is equivalent to Q so each is a contained rewriting. Equivalent rewritings – for classical QO. (maximally) contained rewriting – for data integration.