Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.

Slides:



Advertisements
Similar presentations
Uncertainty in Data Integration Ai Jing
Advertisements

Software AG provides no commitment to deliver any of the features described herein, and reserves the right to change its product roadmap from time to time.
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Relational data objects 1 Lecture 6. Relational data objects 2 Answer to last lectures activity.
Automatic Verification Book: Chapter 6. How can we check the model? The model is a graph. The specification should refer the the graph representation.
1 Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres
A Normal Form for XML Documents Marcelo Arenas Leonid Libkin Department of Computer Science University of Toronto.
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Relational Calculus and Datalog
Step By Step Lei Yang Computer Science Department.
ANHAI DOAN ALON HALEVY ZACHARY IVES CHAPTER 14: DATA PROVENANCE PRINCIPLES OF DATA INTEGRATION.
Optimization of Sequential Networks Step in Synthesis: Problem Flow Table Reduce States Minimum-State Table State Assignment Circuit Transition Table Flip-Flop.
Data Bits Many to Many Subkeys JoinsQueries Attributes $100 $200 $300 $400 $500 $100 $200 $300 $400 $500 Final DataBit.
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CSE 636 Data Integration Data Integration Approaches.
CHAPTER 3: DESCRIBING DATA SOURCES
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Logic.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Answering Queries Using Views Advanced DB Class Presented by David Fuhry March 9, 2006.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
Efficient Query Evaluation on Probabilistic Databases
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
CSE 636 Data Integration Answering Queries Using Views Overview.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
The Query Compiler 16.1 Parsing and Preprocessing Meghna Jain(205) Dr. T. Y. Lin.
Lesson 6. Refinement of the Operator Model This page describes formally how we refine Figure 2.5 into a more detailed model so that we can connect it.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 12: Ontologies and Knowledge Representation PRINCIPLES OF DATA INTEGRATION.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 6: General Schema Manipulation Operators PRINCIPLES OF DATA INTEGRATION.
Query Processing Presented by Aung S. Win.
Mining Association Rules of Simple Conjunctive Queries Bart Goethals Wim Le Page Heikki Mannila SIAM /8/261.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
CHAPTER 8: MANAGING DATA RESOURCES. File Organization Terms Field: group of characters that represent something Record: group of related fields File:
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
Advanced Topics in Propositional Logic Chapter 17 Language, Proof and Logic.
CSE S. Tanimoto Horn Clauses and Unification 1 Horn Clauses and Unification Propositional Logic Clauses Resolution Predicate Logic Horn Clauses.
1 Relational Algebra and Calculas Chapter 4, Part A.
Chapter 7: Relations Relations(7.1) Relations(7.1) n-any Relations & their Applications (7.2) n-any Relations & their Applications (7.2)
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
CS848 Presentation Heng YU (Henry)
Chapter 11 Introduction to Computational Complexity Copyright © 2011 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
©Silberschatz, Korth and Sudarshan2.1Database System Concepts - 6 th Edition Chapter 8: Relational Algebra.
Module 2: Intro to Relational Model
Computing Full Disjunctions
Chapter 2: Intro to Relational Model
Query Execution Presented by Khadke, Suvarna CS 257
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Materializing Views With Minimal Size To Answer Queries
Presentation transcript:

Manipulation of Query Expressions

Outline Query unfolding Query containment and equivalence Answering queries using views

Query unfolding/expansion Compositionality: we can write queries that refer to views (named queries) in their body  Advantage of declarative languages Query unfolding: process of undoing the composition of queries  Given a query that refer to views, query unfolding rewrites the query so that it refers database tables

Example Relation Flight stores pairs of cities between which there is a direct flight Relation Hub stores the set of hub cities of the airline Query Q1 asks for pairs of cities between which there is a flight that goes through a hub Query Q2 asks for pairs of cities that are on the same outgoing path from a hub Q1(X,Y) :- Flight(X,Z), Hub(Z), Flight(Z,Y) Q2(X,Y) :- Hub(Z), Flight(Z,X), Flight(X,Y) Q3(X,Z) :- Q1(X,Y), Q2(Y,Z) Unfolding of Q3 is: Q3’(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Hub(W), Flight(W,Y), Flight(Y,Z)

Observations Query unfolding may lead to a resulting query that has subgoals that seem redundant  There are algorithms to eliminate those The query and the view may have interpreted predicates that are satisfied in isolation, but after the unfolding, the query is unsatisfiable Through repeated applications of the unfolding step, the number of subgoals may increase exponentially Unfolding does not necessarily yield more efficient ways to execute the query Unfolding allows to explore a larger set of query plans due to the existence of a larger set of possible orderings of join operations

Query containment and equivalence Q3’(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Hub(W), Flight(W,Y), Flight(Y,Z) Should produce the same result as: Q4(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Flight(Y,Z) The result of Q4 should be a superset of the result of Q5: Q5(X,Z) :- Flight(X,U), Hub(U), Flight(U,Y), Hub(Y), Flight(Y,Z) That considers both intermediate stops to be hubs

Formal definition Containment and equivalence Let Q1 and Q2 be two queries of the same arity. Q1 is contained in Q2 (Q1  Q2) if for any database D, Q1(D)  Q2(D) Q1 is equivalent to Q2 (Q1  Q2) if Q1  Q2 and Q2  Q1

Answering queries using views Important use of query containment: when we detect that query Q1 is contained in query Q2, then the answer to Q1 can be often obtained from the answer to Q2 Example:  Given the following database of films Movie(ID, title, year, genre) Director(ID, director) Actor(ID, actor)  And the query asking for comedies produced after 1950 where the director was also an actor in the movie Q(T,Y,D) :- Movie(I,T,Y,G), Y>=1950, G = ‘Comedy’, Director(I,D), Actor(I,D )

Answering queries using views (cont.) Suppose there is the following view available: V1(T,Y,D) :- Movie(I,T,Y,G), Y>=1940, G=‘Comedy’, Director(I,D), Actor(I,D) Then: V1  Q We can use V1 to answer Q, because the Year attribute is in the head of V1 Q’(T,Y,D) :- V1(T,Y,D), Y>= 1950 This can be more efficient, because the join operations in Q do not need to executed, since V1 is pre-computed

Another example If, instead of V1, we only have access to the views: V2(I,T,Y) :- Movie(I,T,Y,G), Y>=1950, G=‘Comedy’ V3(I,D) :- Director(I,D), Actor(I,D) Neither V2 nor V3 contain Q, but we can still answer Q using V2 and V3: Q’’(T,Y,D) :- V2(I,T,Y), V3(I,D)

Anwering queries using views – formal definition Given a query Q and a set of views V1,..., Vm, a rewriting of the query using the views is a query expression Q’ that refers only to views V1,..., Vm  In SQL, a query refers only to views if all the relations mentioned in the from clause are views  In conjunctive queries, that means all the subgoals are views with the exception of interpreted atoms.

Equivalent query rewritings Given a query Q and a set of views V1,..., Vm, the query Q’ is an equivalent rewriting of Q using V1,...Vm if  Q’ refers only to the views in V1,...,Vm  Q’ is equivalent to Q Example:  Q’’(T,Y,D) :- V2(I,T,Y), V3(I,D) is an equivalent rewriting of Q using V2 and V3

Example There may not always exist an equivalent rewriting of a query using a set of views In that case, we may want to know what is the best we can do with the set of views If we have: V4(I,T,Y) :- Movie(I,T,Y,G), Y>=1960, G=‘Comedy’ It cannot be used to completely answer Q: Q(T,Y,D) :- Movie(I,T,Y,G), Y>=1950, G = ‘Comedy’, Director(I,D), Actor(I,D ) V3(I,D) :- Director(I,D), Actor(I,D). The best we can obtain in terms of rewriting is: Q’’’(T,Y,D) :- V4(I,T,Y), V3(I,D) Q’’’ is called a maximally-contained rewriting of Q using V3 and V4

Maximally-contained rewritings Let Q be a query and V1,...,Vm a set of view definitions, and L a query language. The query Q’ is a maximally-contained rewriting of Q using V1,..., Vm wrt L if:  Q’ is a query in L that refers only to views V1,..., Vm  Q’ is contained in Q, and  There is no rewriting Q1  L, such that Q’ Q1 Q and Q1 is not equivalent to Q’  There are algorithms to compute the rewriting of a query using a set of views

When is a view relevant to a query? If the set of relations it mentions overlaps with that of the query, and it selects some of the attributes selected by the query If the query applies predicates to attributes it has in common with the view, then the view must apply either equivalent or logically weaker predicates to be part of an equivalent rewriting If the view applies a logically stronger predicate, it may be part of a contained rewriting

Examples To answer the query: Q(T,Y,D) :- Movie(I,T,Y,G), Y>=1950, G = ‘Comedy’, Director(I,D), Actor(I,D ) Can the following views be used? V6(T,Y) :- Movie(I,T,Y,G),Y>=1950,G=‘Comedy’ V7(I,T,Y) :- Movie(I,T,Y,G), Y>=1950,G=‘Comedy’, Award(I,W) V8(I,T) :- Movie(I,T,Y,G),Y>=1940,G=‘comedy’ Similar to V1, but doesn’t include the ID attribute of Movie in the head, so cannot be joined with V2 Applies an additional condition (Award) that doesn’t exist in the query and cannot be used in an equivalent rewriting Applies a weaker predicate on the year than in the query, but the year is not included in the head

References Chapter 3, Draft of the book on “Principles of Data Integration” by AnHai Doan, Alon Halevy, Zachary Ives (in preparation). Alon Y. Halevy. “Answering Queries Using Views: A Survey”, VLDB Journal, 10(4), 2001.