Answering Queries Using Views Presented by: Mahmoud ELIAS.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
พีชคณิตแบบสัมพันธ์ (Relational Algebra) บทที่ 3 อ. ดร. ชุรี เตชะวุฒิ CS (204)321 ระบบฐานข้อมูล 1 (Database System I)
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
Information Integration Using Logical Views Jeffrey D. Ullman.
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
CS4432: Database Systems II
CS CS4432: Database Systems II Logical Plan Rewriting.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.
Efficient Query Evaluation on Probabilistic Databases
Page 1 Integrating Multiple Data Sources using a Standardized XML Dictionary Ramon Lawrence Integrating Multiple Data Sources using a Standardized XML.
Incremental Maintenance for Non-Distributive Aggregate Functions work done at IBM Almaden Research Center Themis Palpanas (U of Toronto) Richard Sidle.
1 A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger Qualifying Exam October 29, 1999 Advisor: Alon Levy.
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Search Engines and Information Retrieval
Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
A scalable algorithm for answering queries using views Rachel Pottinger, Alon Levy [2000] Rachel Pottinger and Alon Y. Levy A Scalable Algorithm for Answering.
MiniCon: A Scalable Algorithm for Answering Queries Using Views Rachel Pottinger and Alon Levy Affiliates Meeting February 24, 2000.
Database management concepts Database Management Systems (DBMS) An example of a database (relational) Database schema (e.g. relational) Data independence.
1 CIS607, Fall 2005 Semantic Information Integration Presentation by Paea LePendu Week 8 (Nov. 16)
The Entity-Relationship (ER) Model CS541 Computer Science Department Rutgers University.
2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.
CSE 636 Data Integration Answering Queries Using Views Overview.
Infomaster: An information Integration Tool O. M. Duschka and M. R. Genesereth Presentation by Cui Tao.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Automatic Data Ramon Lawrence University of Manitoba
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
Methodology Conceptual Database Design
Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.
Overview of Search Engines
TECHNIQUES FOR OPTIMIZING THE QUERY PERFORMANCE OF DISTRIBUTED XML DATABASE - NAHID NEGAR.
Cooperative Query Answering Based on a talk by Erick Martinez.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Chapter 6: Integrity and Security Thomas Nikl 19 October, 2004 CS157B.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 3: Introduction.
Search Engines and Information Retrieval Chapter 1.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
CSc-340 2b1 Introduction to SQL Chapter 3 [2 of 2] Null Values Aggregate Functions Nested Subqueries Modification of the Database.
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.
1 Relational Algebra and Calculas Chapter 4, Part A.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
View 1. Lu Chaojun, SJTU 2 View Three-level vision of DB users Virtual DB views DB Designer Logical DB relations DBA DBA Physical DB stored info.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Lineage Tracing for General Data Warehouse Transformations Yingwei Cui and Jennifer Widom Computer Science Department, Stanford University Presentation.
CS848 Presentation Heng YU (Henry)
Integration what it takes to put data together Ir. Richard Vdovjak, MTD.
Relational Algebra p BIT DBMS II.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Chapter 13: Query Processing
SQL: Interactive Queries (2) Prof. Weining Zhang Cs.utsa.edu.
Computing Full Disjunctions
Database management concepts
Data Mining Chapter 6 Search Engines
Database management concepts
Chen Li Information and Computer Science
Views 1.
Course Instructor: Supriya Gupta Asstt. Prof
Presentation transcript:

Answering Queries Using Views Presented by: Mahmoud ELIAS

Plan Introduction Motivations Views Problem definition When is a view usable for a query? Answering queries using views for Data Integration Conclusions Bibliography

Introduction Relevance to a wide variety of data management problems Example: University schema: –Prof(name, area) –Course(c-number, title) –Teaches(prof, c-number, quarter, evaluation) –Registered(student, c-number, quarter) –Major(student, dept) –WorksIn(prof, dept) –Advises(prof, student)

Motivations Query optimization: +Speed up the computation of the query. –Be attention of indexes. Maintaining physical Data independence Vs Vs Data Integration: –uniform query interface to a multitude of autonomous heterogeneous data sources

Motivation (cont.) Data warehouse design: – we must be able to answer all the required queries over the warehouse using only these views Semantic data caching –Check whether the cached results of a previously computed query can be used for a new query

Choosing and creating views System analysis feedback statistics Building views The user will never be satisfied

Problem definition A view is a derived relation defined in terms of stored base relations. A query Q 1 is said to be contained in a query Q 2 (Q 1  Q 2 ), if for all databases D, the set of tuples computed for Q 1 is a subset of those computed for Q 2 Two queries are said to be equivalent if Q 1  Q 2 and Q 2  Q 1 Given a query Q and a set of view definitions V 1,V 2,…,V m, a rewriting of the query using the views is a query expression Q’ that refers only to the given views. Equivalent rewriting vs. maximally-contained rewriting

When is a View usable for a Query? A view can be useful for a query if the set of relations it mentions overlaps with that of the query, and it selects some of the attributes selected by the query. When the view contains grouping and aggregation but the query does not, then unless the query removes duplicates in the select clause, the view cannot be used to answer a query.

Example1: selectAdvises.prof, Advises.student, Registered.quarter fromRegistered, Teaches, Advises whereRegistered.c-number=Teaches.c-number and Registered.quarter=Teaches. quarter and Advises.prof=Teaches.prof and Advises.student= Registered.student and Registered.quarter ≥ « winter98 » create view V1 as selectRegistered.student, Teaches.prof, Registered.quarter fromRegistered, Teaches whereRegistered.c-number=Teaches.c-number and Registered.quarter=Teaches. quarter and Registered.quarter ≥ « winter97 »

Example2: create view V2 as selectc-number, year, max(evaluation) as maxeval, count(*) as offerings fromTeaches wherec-number ≥ 400 groupByc-number, year selectyear, count(*), max(evaluation) fromTeaches wherec-number ≥ 500 groupByyear selectyear, sum(offerings), Max(maxeval) fromV2 wherec-number ≥ 500 groupByyear

Conjunctive queries Safe - each variable in the head appears in the body. No arithmetic comparisons in predicates. headbody subgoal

The Bucket algorithm Nb of query rewritings that need to be considered can be drastically reduced if we first consider each subgoal in the query in isolation, and determine which views may be relevant to each subgoal. 2 steps: –Create a bucket for each subgoal in Q –Query rewritings that are conjunctive queries, each consisting of one conjunct from every bucket.

Example V1(st,c-n,qu,ti):- Registered(st,c-n,qu), Course(c-n,ti), c-n ≥ 500, qu ≥ Aur98 V2(st,pr,c-n,qu):- Registered(st,c-n,qu), Teaches(pr,c-n,qu) V3(st,c-n):- Registered(st,c-n,qu), year ≤ Aut94 V4(pr,c-n,ti,qu):- Registered(st,c-n,qu), Course(c-n,ti), Teaches(pr,c-n,qu), qu ≤ Aut97 Q(S,C,P) :- Teaches(P,C,Q), Registered(S,C,Q), Course(C,T), C ≥300, Q ≥ Aut95 Teaches(P,C,Q)Registered(S,C,Q)Course(C,T) V2(S’,P,C,Q) V4(P,C,T’,Q) V1(S,C,Q,T’) V2(S,P’,C,Q) V1(S’,C,Q’,T) V4(P’,C,T,Q’)

Example (cont.) All combinations: q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’), V1(S’,C,Q’,T)  q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T') q’(S,C,P) :- V4(P,C,T’,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’) q’(S,C,P) :- V2(S,P,C,Q), V4(P,C,T’,Q) The algorithm produces a maximally-contained rewriting Teaches(P,C,Q)Registered(S,C,Q)Course(C,T) V2(S’,P,C,Q) V4(P,C,T’,Q) V1(S,C,Q,T’) V2(S,P’,C,Q) V1(S’,C,Q’,T) V4(P’,C,T,Q’)

The Inverse-rules algorithm Construct a set of rules that invert the view definitions V3(dept, c-number) :- Major(student, dept), Registered(student,c-number) Major(f 1 (dept,X), dept) :- V3(dept,X) Registered(f 1 (Y,c-number),c-number) :- V3(Y,c-number) Q(dept) :- Major(student,dept), Registered(student,444) V3 = {(CS, 444), (EE, 444), (CS,333)}  Registered = {(f 1 (CS,444), CS), (f 1 (EE,444), EE), (f 1 (CS,333), CS)} Major = {(f 1 (CS,444), 444), (f 1 (EE,444), 444), (f 1 (CS,333), 333)}  Q = {CS, EE} more efficient rewriting  unfold the inverse rules and remove redundant subgoals from the unfolded rules.

Bucket vs. Inverse-rules Both algorithms produce a maximally-contained rewriting Computing buckets is similar in spirit to that of computing the inverse rules: compute the views that are relevant to single atoms of the DB relations. The Bucket algo. Computes the relevant views by taking into consideration the context in which th atom appears in the query. The inverse rules can be computed once, and be applicable to any query.

The MiniCon algorithm It addresses the limitations of previous algorithms instead of building rewritings by combining rewritings for each of the query subgoal or the DB relation, we consider how each of the variables in the query can interact with the available views (MiniCon Description MCD) Q(D) :- Major(S,D), Registered(S,444,Q), Advises(P,S) V1(dept) :- Major(student,dept), Registered(student,444,quarter) V2(prof,dept,area) :- Advises(prof,student), Prof(name,area) V3(dept,c-number) :- Major(student,dept), Registered(student,c- number,quarter), Advises(prof,student)

Results The key advantage of the MiniCon algorithm is that the second phase of the algorithm considers much fewer combinations of MCDs compared to the Cartesian product of the buckets or compared to the number of unfoldings of inverse rules.

Conclusions Using views to answer queries is an important problem. Especially for information integration on the web. Query containment and containment mappings provide the key for solving the problem.

Conclusions (cont.) The variants of the problem are NP- complete. This is not too bad, since queries are usually short. In many practical cases, there is an algorithm for solving the problem.

Bibliography [Lev00]Alon Y. Levy. Answering Queries Using Views: A Survey. Department of Computer Science and Engineering, University of Washington, pages 1-43, 2000 [Mit99]Prasenjit Mitra. An Algorithm for Answering Queries Efficiently Using Views. Infolab, Stanford University, pages 1-13, September,1999. [DG97b]Oliver M. Duschka and Michael R. Genesereth. Query Planning in InfoMaster. In Proceedings of the ACM Symposium on Applied Computing, San Jose, CA, 1997.