CSE 636 Data Integration Answering Queries Using Views Overview.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CSE 636 Data Integration Data Integration Approaches.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
1 Extended Conjunctive Queries Unions Arithmetic Negation.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Information Integration Using Logical Views Jeffrey D. Ullman.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Data integration Chitta Baral Arizona State University.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
ICDT'2001, London, UK1 On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa.
2005lav-ii1 Local as View: Some refinements  IM: Filtering irrelevant sources  Views with restricted access patterns  A summary of IM.
Efficient Query Evaluation on Probabilistic Databases
1 A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy.
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
CPSC 322 Introduction to Artificial Intelligence September 20, 2004.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
1 Models for Information Integration: Tuning Local-as-view into Global-as-view Andrea Cali Giuseppe De Giacomo Maurizio Lenzerini.
A scalable algorithm for answering queries using views Rachel Pottinger, Alon Levy [2000] Rachel Pottinger and Alon Y. Levy A Scalable Algorithm for Answering.
Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Paea LePendu Week 8 (Nov. 16)
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.
CSE 636 Data Integration Answering Queries Using Views MiniCon Algorithm.
Information Integration Using Logical Views Jeffrey D. Ullman.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
1 Data integration Most slides are borrowed from Dr. Chen Li, UC Irvine.
1 Searching and Integrating Information on the Web Seminar 2: Data Integration Professor Chen Li UC Irvine.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
Slide 1 Propositional Definite Clause Logic: Syntax, Semantics and Bottom-up Proofs Jim Little UBC CS 322 – CSP October 20, 2014.
CSE S. Tanimoto Horn Clauses and Unification 1 Horn Clauses and Unification Propositional Logic Clauses Resolution Predicate Logic Horn Clauses.
Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Copyright © Cengage Learning. All rights reserved.
Ch. 13 Ch. 131 jcmt CSE 3302 Programming Languages CSE3302 Programming Languages (notes?) Dr. Carter Tiernan.
Answering Queries Using Views: The Last Frontier.
Answering Tree Pattern Queries Using Views Laks V.S. Lakshmanan, Hui (Wendy) Wang, and Zheng (Jessica) Zhao University of British Columbia Vancouver, BC.
CS848 Presentation Heng YU (Henry)
Principles of the Semantic Web DB or KRDB? Alon Halevy.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Data Integration Approaches
ICS 321 Fall 2011 Algebraic and Logical Query Languages (ii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
Extensions of Datalog Wednesday, February 13, 2001.
Introduction to Logic for Artificial Intelligence Lecture 2
Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman
STRUCTURE OF PRESENTATION :
Computing Full Disjunctions
Horn Clauses and Unification
Containment Mappings Canonical Databases Sariaya’s Algorithm
Implementing Mapping Composition
Horn Clauses and Unification
Horn Clauses and Unification
Local-as-View Mediators
Chen Li Information and Computer Science
Materializing Views With Minimal Size To Answer Queries
Section 5.5: The Substitution Rule
Answering Queries Using Views: A Survey
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

CSE 636 Data Integration Answering Queries Using Views Overview

2 The Problem Given a query Q and a set of view definitions V1,…,Vn: Is it possible to answer Q using only the V’s? V1(A,B) :- cites(A,B), cites(B,A) V2(C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1) Query: q(X,Y) :- sameTopic(X,Y), cites(X,Y), cites(Y,X) Query rewriting: q’(X,Y) :- V1(X,Y), V2(X,Y) Unfolding of the rewriting: q’’(X,Y) :- cites(X,Y), cites(Y,X), sameTopic(X,Y), cites(X,Z), cites(Y,W)

3 Motivation Local-As-View (LAV) Integration Approach Data sources are described as views over the global schema Global schema: ForSale(name, year, country, category) Review(product, review, category) French cars data source: V1(name, year) :- ForSale(name, year, “France”, “auto”), year > 1990 Car review database: V2(product, review) :- Review(product, review, “auto”) Query: q(X,Y,R):- ForSale(X,Y,C,”auto”), Review(X,R,”auto”), Y > 1985

4 LAV Assumptions 1.There is a set of predicates that define the global schema –These do not exist as stored relations 2.Each data source has its capabilities defined by views, which are conjunctive queries (CQs) whose subgoals involve the global predicates 3.A query is a CQ over the global predicates 4.A rewriting is an expression (union of CQ’s) involving the views –Ideally, the rewriting is equivalent to the query –In practice, we have to be happy with a rewriting maximally contained in the query

5 Interpretation of Views A view describes some of the facts that are available at the source A view does not define exactly what is at the source –Example: View V2(p, r) :- Review(p, r, “auto”) says that the source has some Review-facts with third component “auto”, not all of them –V2 could even be empty although Review(p, r, “auto”) is not

6 Interpretation of Views (2) In other words: The :- separator between head and body of a view definition should not be interpreted as “if” Rather, it is “only if”

7 Rewriting French cars data source: V1(name, year) :- ForSale(name, year, “France”, “auto”), year > 1990 Car review database: V2(product, review) :- Review(product, review, “auto”) Query: q(X,Y,R):- ForSale(X,Y,C,”auto”), Review(X,R,”auto”), Y > Query rewriting: q’(X,Y,R) :- V1(X,Y), V2(X,R) Note: Rewriting is not equivalent to the query, but we can’t do any better

8 Formal Definition: Rewriting Given a query Q and a set of view definitions V1,…,Vn: Q’ is a rewriting of the query using V’s if it refers only to the views or to arithmetic predicates Q’ is an equivalent rewriting of Q using the V’s if Q’ is equivalent to Q Q’ is a maximally-contained rewriting of Q w.r.t. L using the V’s if there is no other Q’’ such that Q’’ strictly contains Q’, and Q’’ is contained in Q

9 Usability Conditions for Views Query: q(X,Z) :- r(X,Y), s(Y,Z), t(X,Z), Y > 5 What can go wrong? V1(A,B) :- r(A,C), s(C1,B) (join predicate not applied) V2(A,B) :- r(A,C), s(C,B), C > 1 (predicate too weak) V3(A) :- r(A,B), s(B,C), t(A,C), B > 5: needed argument is projected out. Can be recovered if we have a functional dependency t: A  C

10 What Makes a Rewriting R Useful? 1.There must be no other rewriting containing R 2.When views in R are unfolded into global predicates, R is contained in the original query

11 View Unfolding If V(X,Y) is a subgoal in a rewriting, then substitute V(X,Y) with V’s body by: 1.Finding unique variables for the local variables of the view’s body (those that appear only in the body) 2.Substituting variables of the subgoal V(X,Y) for variables of V’s head

12 Example Consider the subgoal V2(X,Y) in the rewriting of our first example: q’(X,Y) :- V1(X,Y), V2(X,Y) V2‘s definition: V2(C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1) After step 1: V2(C,D) :- sameTopic(C,D), cites(C,Z), cites(D,W) After step 2 by substituting CX and DY: V2(X,Y) :- sameTopic(X,Y), cites(X,Z), cites(Y,W) Rewriting becomes: q’(X,Y) :- V1(X,Y), sameTopic(X,Y), cites(X,Z), cites(Y,W) Subgoal V1(X,Y) is unfolded similarly

13 Important Points To test containment of a rewriting in a query, we unfold the views in the rewriting first, then test CQ containment of the unfolding in the query The view definition describes what any tuples of the view look like, so CQ containment implies that the rewriting will provide only true answers

14 Query: q(X,Y) :- sameTopic(X,Y), cites(X,Y), cites(Y,X) Query rewriting: q’(X,Y) :- V1(X,Y), V2(X,Y) Unfolding of the rewriting: q’’(X,Y) :- cites(X,Y), cites(Y,X), sameTopic(X,Y), cites(X,Z), cites(Y,W) The Picture Is there a containment mapping?

15 Important Points (2) There is no guarantee a rewriting supplies any answers to the query Comparing different rewritings by testing if one rewriting is contained in another must be done at the level of the folded views

16 Example Two sources might have similar views, defined by: V2(C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1) V3(E,F) :- sameTopic(E,F), cites(E,E1), cites(F,F1) But the sources actually have different sets of tuples

17 Example - Continued Then, the two rewritings: q’(X,Y) :- V1(X,Y), V2(X,Y) q’’(X,Y) :- V1(X,Y), V3(X,Y) have the same unfolding, but there is no reason to believe one rewriting is contained in the other One view could provide lots of tuples, the other, few or none

18 Important Points (3) On the other hand, when one rewriting, folded, is contained in another, we can be sure the first provides no answers the second does not

19 Example Here are two rewritings: q’(X,Y) :- V1(X,Y), V2(X,Y) q’’(X,“WSDL”) :- V1(X,“WSDL”), V2(X,“WSDL”) There is a containment mapping q’  q’’ –Thus, q’’  q’ at the level of views No matter what tuples V1 and V2 represent, q’ provides all answers q’’ provides

20 Finding All Rewritings For conjunctive queries with no arithmetic predicates, the following holds: If Q has an equivalent rewriting using V, then there exists one with no more conjuncts than Q [Levy, Mendelzon, Sagiv & Srivastava, PODS 95] –The rewriting problem is NP-complete Maximally-contained rewriting: union of all conjunctive rewritings of the length of the query or less LMSS Test: If a query has n subgoals, then we only need to consider rewritings with at most n subgoals –Any other rewriting must be contained in one with < n subgoals

21 A Naive Algorithm Consider all rewrites containing up to as many views as Q has subgoals Test each unfolding for containment in Q Take the union of the contained ones Exponential and brute force Makes use of the LMSS test Can we do better?

22 Practical Algorithms Bucket Algorithm Inverse Rules Algorithm MINICON Algorithm Excellent survey: –Answering Queries Using Views: A Survey –By Alon Halevy –VLDB Journal, 2000 –

23 References Jeffrey D. Ullman –www-db.stanford.edu/~ullman/cs345-notes.html –Lecture Slides Alon Halevy –Answering Queries Using Views: Applications, Algorithms and Opportunities –International Workshop on Databases and Programming Languages (DBPL), 1999 –Invited Talk