Answering Queries Using Views: The Last Frontier.

Slides:



Advertisements
Similar presentations
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
Advertisements

CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CSE 636 Data Integration Data Integration Approaches.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Information Integration Using Logical Views Jeffrey D. Ullman.
The Volcano/Cascades Query Optimization Framework
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
Polynomial-time reductions We have seen several reductions:
Containment of Nested XML Queries Xin (Luna) Dong, Alon Halevy, Igor Tatarinov University of Washington.
University of Washington Database Group Tiresias The Database Oracle for How-To Queries Alexandra Meliou § ✜ Dan Suciu ✜ § University of Massachusetts.
Efficient Query Evaluation on Probabilistic Databases
20081COMMA08 – Toulouse, May 2008 The Computational Complexity of Ideal Semantics I Abstract Argumentation Frameworks Paul E. Dunne Dept. Of Computer Science.
1 A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy.
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.
1 Rewriting Nested XML Queries Using Nested Views Nicola Onose joint work with Alin Deutsch, Yannis Papakonstantinou, Emiran Curtmola University of California,
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
FORMAL LANGUAGES, AUTOMATA AND COMPUTABILITY Read sections 7.1 – 7.3 of the book for next time.
A scalable algorithm for answering queries using views Rachel Pottinger, Alon Levy [2000] Rachel Pottinger and Alon Y. Levy A Scalable Algorithm for Answering.
Implementing Mapping Composition Todd J. Green * University of Pennsylania with Philip A. Bernstein (Microsoft Research), Sergey Melnik (Microsoft Research),
Data Exchange & Composition of Schema Mappings Phokion G. Kolaitis IBM Almaden Research Center.
A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin.
TOSS: An Extension of TAX with Ontologies and Similarity Queries Edward Hung, Yu Deng, V.S. Subrahmanian Department of Computer Science University of Maryland,
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
CSE 636 Data Integration Answering Queries Using Views Overview.
CSE 636 Data Integration Answering Queries Using Views MiniCon Algorithm.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
ANHAI DOAN ALON HALEVY ZACHARY IVES Chapter 12: Ontologies and Knowledge Representation PRINCIPLES OF DATA INTEGRATION.
Graph Algebra with Pattern Matching and Aggregation Support 1.
Cooperative Query Answering for Semistructured data Michael Barg Raymond K. Wong Reviewed by SwethaJack Christian (Absent) Chris.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
General Database Statistics Using Maximum Entropy Raghav Kaushik 1, Christopher Ré 2, and Dan Suciu 3 1 Microsoft Research 2 University of Wisconsin--Madison.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
Relational Algebra.
Reconcilable Differences Todd J. GreenZachary G. IvesVal Tannen University of Pennsylvania March 24, ICDT 09, Saint Petersburg.
1 Design and Analysis of Algorithms Yoram Moses Lecture 11 June 3, 2010
Presented by Jiwen Sun, Lihui Zhao 24/3/2004
CS848 Presentation Heng YU (Henry)
Materialized View Selection and Maintenance using Multi-Query Optimization Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham.
Principles of the Semantic Web DB or KRDB? Alon Halevy.
Believe It or Not – Adding belief annotations to databases Wolfgang Gatterbauer, Magda Balazinska, Nodira Khoussainova, and Dan Suciu University of Washington.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
1 Provenance Semirings T.J. Green, G. Karvounarakis, V. Tannen University of Pennsylvania PODS 2007.
A Dichotomy in the Complexity of Deletion Propagation with Functional Dependencies 2012 ACM SIGMOD/PODS Conference Scottsdale, Arizona, USA PODS 2012 Benny.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Bhanu Pratap Gupta Devang Vira S. Sudarshan Dept. of Computer Science and Engineering, IIT Bombay.
NP-Completness Turing Machine. Hard problems There are many many important problems for which no polynomial algorithms is known. We show that a polynomial-time.
ICS 321 Fall 2011 Algebraic and Logical Query Languages (ii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
Lecture. Today Problem set 9 out (due next Thursday) Topics: –Complexity Theory –Optimization versus Decision Problems –P and NP –Efficient Verification.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
Extensions of Datalog Wednesday, February 13, 2001.
Computing Full Disjunctions
Probabilistic Data Management
Data Integration with Dependent Sources
Small is Again Beautiful in Description Logics
Implementing Mapping Composition
Consistent Query Answering: a personal perspective
Chen Li Information and Computer Science
A Framework for Testing Query Transformation Rules
Query Optimization.
Composing Mappings among Data Sources
Equivalence of Aggregate Queries in Conjunctive QL
Dichotomies in CSP Karl Lieberherr inspired by the paper:
Presentation transcript:

Answering Queries Using Views: The Last Frontier

The Problem Given a query Q and a set of view definitions V 1,…,V n : Is it possible to answer Q using only the V’s? V 1 (A,B) :- cites(A,B), cites(B,A) V 2 (C,D) :- sameTopic(C,D), cites(C,C1), cites(D,D1) Query: q(x,y) :- sameTopic(x,y), cites(x,y), cites(y,x) Query rewriting: q’(X,Y) :- V 1 (X,Y), V 2 (X,Y) Unfolding of the rewriting: q’’(X,Y) :- cites(X,Y), cites(Y,X), sameTopic(X,Y), cites(X,Z), cites(Y,W)

Another Example French cars data source: DB1(name, year) :- ForSale(name, year, “France”, “auto”), year > Car review database: DB2(product, review) :- Review(product, review, “auto”) Query: q(X,Y,R):- ForSale(X,Y,C,”auto”), Review(X,R,”auto”), Y > Query plan: q’(X,Y,R) :- DB1(X,Y), DB2(X,R) Note: rewriting is not equivalent to the query, but we can’t do any better.

Motivation Answering queries using views Query optimization Physical data independence Data integration Web-site management Data warehouse design Theory Algorithms Commercial systems Semantic data caching Survey paper: alon/views-survey.ps

Dimensions of the Problem View definition language Query language Semantic constraints (e.g., FD’s, inclusions) Completeness/soundness of the views Output: query execution plan or logical plan. Equivalent or maximally contained rewriting.

Usability Conditions Query: q(X,Z) :- r(X,Y), s(Y,Z), t(X,Z), Y > 5. What can go wrong? V1(A,B) :- r(A,C), s(C1,B) (join predicate not applied) V2(A,B) :- r(A,C), s(C,B), C > 1 (predicate too weak). V3(A,B) :- r(A,B), r1(A,B) (irrelevant condition). V4(A) :- r(A,B), s(B,C), t(A,C), B > 5: needed argument is projected out. Can be recovered if we have a functional dependency t: A --> C. See [Larson & Yang, 87 and LMSS-95] for conditions.

Formal Definition: Rewriting Given a query Q and a set of view definitions V 1,…,V n Q’ is a rewriting of the query using V’s if it refers only to the views or to interpreted predicates. Q’ is an equivalent rewriting of Q using the V’s if Q’ is equivalent to Q. L Q’ is a maximally-contained rewriting of Q w.r.t. L using the V’s if there is no other Q’’ such that: Q’’ strictly contains Q’, and Q’’ is contained in Q.

A Basic Decidability Result For conjunctive queries with no interpreted predicates, the following holds: V –If Q has an equivalent rewriting using V, then there exists one with no more conjuncts than Q. [Levy, Mendelzon, Sagiv & Srivastava, PODS95] The rewriting problem is NP-complete. Bound holds even if views have interpreted predicates. Maximally-contained rewriting: union of all conjunctive rewritings of the length of the query or less.

Certain Answers Given: A query Q, View definitions V 1,…V n, Extensions of the views: v 1,…v n. Dconsistent Consider the set of databases D that are consistent with V 1,…V n and v 1,…v n. The tuple t is a certain answer to Q if it would be an answer D. in every database in D. Note: an equivalent rewriting provides all certain answers.

Finding All Answers from Views If a rewriting is equivalent: you definitely get all answers Maximal containment: only w.r.t. a specific query language. So what is the complexity of finding all the answers? [Abiteboul & Duschka, PODS-98], [Grahne and Mendelzon, ICDT-99]: surprisingly hard! Certain answers: Given specific extensions v 1,…v n to the view, is the tuple t is an answer in every database D that is consistent with the extensions v 1,…,v n ?

Why & When is it Hard? Sources can be: sound (open world assumption) complete sound and complete (closed-world assumption) If sources are either all sound or all complete, then maximally-contained rewriting exists. If the query contains interpreted predicates, the problem is NP-hard. If sources are sound and complete, the problem is NP- complete.

Graph Colorability as Views V1(X) :- edge(X,Y) (set of nodes in the graph) V2(Y) :- color(X,Z) (the set {red, green, blue}) V3(X,Y):- edge(X,Y) (the set of edges). Query: q(a) :- edge(X,Y), color(X,Z), color(Y,Z)

Potpourri System-R optimization extensions: [Tsatalos et al., VLDB94], Chaudhuri et al., ICDE-95]. VLDB-98: Oracle’s implemented algorithm. Infinite # of views [LRU, PODS-96, VP VLDB-97]. Polynomial-time cases: [Chekuri & Rajaraman, ICDT-97]. Description logics: [Calvanese et al. 99]. Inclusion dependencies [Gryz, ICDE-97]. Unions in views [Afrati et al, ICDT-99, Duscha’s thesis]. Semi-structured data: [VP, Sigmod-99].

Containment Queries over Views [Millstein, Levy, Friedman, PODS-2000] Motivation: equivalence of queries to data integration systems. Two different queries can be equivalent given a specific set of sources. Certain(Q1) = Certain(Q2)?  p 2 for the conjunctive query case. Is decidable in some cases where the maximally-contained rewriting is recursive.