CS848 Presentation Heng YU (Henry)

Slides:



Advertisements
Similar presentations
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
Advertisements

CSE 636 Data Integration Answering Queries Using Views Bucket Algorithm.
CHAPTER 3: DESCRIBING DATA SOURCES
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
IS698: Database Management Min Song IS NJIT. The Relational Data Model.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Efficient Query Evaluation on Probabilistic Databases
Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Firewall Policy Queries Author: Alex X. Liu, Mohamed G. Gouda Publisher: IEEE Transaction on Parallel and Distributed Systems 2009 Presenter: Chen-Yu Chang.
A scalable algorithm for answering queries using views Rachel Pottinger, Alon Levy [2000] Rachel Pottinger and Alon Y. Levy A Scalable Algorithm for Answering.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
Local-as-View Data Integration Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems February 21, 2005.
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
How can Computer Science contribute to Research Publishing?
CSE 636 Data Integration Answering Queries Using Views Overview.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
Chapter 4 Relational Databases Copyright © 2012 Pearson Education 4-1.
SQL (almost end) April 26 th, Agenda HAVING clause Views Modifying views Reusing views.
Structured Query Language (SQL) A2 Teacher Up skilling LECTURE 2.
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 3 The Relational Model Transparencies Last Updated: Pebruari 2011 By M. Arief
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered  Databases  QL  Query containment  More on QL.
1 XML-KSI, 2004 XML- : an extendible framework for manipulating XML data Jaroslav Pokorny Charles University Praha.
Ming Fang 6/12/2009. Outlines  Classical logics  Introduction to DL  Syntax of DL  Semantics of DL  KR in DL  Reasoning in DL  Applications.
DBSQL 3-1 Copyright © Genetic Computer School 2009 Chapter 3 Relational Database Model.
Navigational Plans For Data Integration Marc Friedman Alon Levy Todd Millistein Presented By Avinash Ponnala Avinash Ponnala.
An Algebra for Composing Access Control Policies (2002) Author: PIERO BONATTI, SABRINA DE CAPITANI DI, PIERANGELA SAMARATI Presenter: Siqing Du Date:
Dimitrios Skoutas Alkis Simitsis
Mediators, Wrappers, etc. Based on TSIMMIS project at Stanford. Concepts used in several other related projects. Goal: integrate info. in heterogeneous.
1 Chapter 1 Introduction. 2 Introduction n Definition A database management system (DBMS) is a general-purpose software system that facilitates the process.
Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
Formal Specification of Intrusion Signatures and Detection Rules By Jean-Philippe Pouzol and Mireille Ducassé 15 th IEEE Computer Security Foundations.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez Presented by: Nazia Khatir Texas A&M University.
3.2 Semantics. 2 Semantics Attribute Grammars The Meanings of Programs: Semantics Sebesta Chapter 3.
Presented by Jiwen Sun, Lihui Zhao 24/3/2004
Answering Queries Using Views: The Last Frontier.
CS848: Topics in Databases: Information Integration Topics covered  Databases  QL  Query containment  An evaluation of QL.
Integration what it takes to put data together Ir. Richard Vdovjak, MTD.
Local-as-View Data Integration Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems October 16, 2008.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Presented by Kyumars Sheykh Esmaili Description Logics for Data Bases (DLHB,Chapter 16) Semantic Web Seminar.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Answering Queries Using Views Presented by: Mahmoud ELIAS.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Modifying the Database
Computing Full Disjunctions
Chapter 3 The Relational Database Model
Chapter 2: Intro to Relational Model
Chen Li Information and Computer Science
Query Optimization.
Materializing Views With Minimal Size To Answer Queries
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

CS848 Presentation Heng YU (Henry)

Paper to present Answering queries using views: A survey by A. Y. Halevy VLDB Journal 10: pp

Outline Introduction with examples Formal problem definitions Conditions of view usability Using materialized views in query optimization Answering queries using views in data integration Theoretical results Extensions Conclusion and challenges

Introduction

Problems (informal) Given a query Q and a set of views V 1,.., V n over a database schema, Is it possible to answer Q using only the answers to V 1,.., V n ? What is the maximal set of tuples in the answer of Q that we can get from V 1,.., V n ? If we can access both the views and the database relations, what is the cheapest query execution plan for answering Q?

Fields of applications Query optimization Physical data independence Data integration More: e.g. semantic cache

Example: a university schema Prof(name, area) Course(c-number, title) Teaches(prof, c-number, quarter) Registered(student, c-number, quarter) Major(student, dept) Works(prof, dept) Advises(prof, student) Keys: Prof(name) Courses(c-cumber) graduate course c-cumber ≥ 400 Ph.D. course c-cumber ≥ 500

Query Optimization Suppose we have a view for graduate course registration info: create view Graduate as selectRegistered.student, Course.title, Course.c-cnumber, Registered.quarter fromRegistered.course whereRegistered.c-number = Course.c-number and Course.c-number ≥ 400

Want to query students registering in Ph.D. level courses taught by a professor who in interested in DB area: selectRegistered.student, Course.title fromTeaches, Prof, Regestered, Course whereProf.name = Teaches.prof and Teaches.c-number = Register.c-number and Teachers.quarter = Registered.quarter and Registered.c-number = Course.c-number and Course.c-number ≥ 500 and Prof.area = ‘DB’ Query optimization (cont.)

Query selectRegistered.student, Course.title fromTeaches, Prof, Registered, Course whereProf.name = Teaches.prof and Teaches.c-number = Register.c-number and Teachers.quarter = Registered.quarter and Registered.c-number = Course.c-number and Course.c-number ≥ 500 and Prof.area = ‘DB’ View create view Graduate as selectRegistered.student, Course.title, Course.c-cnumber, Registered.quarter fromRegistered. Course whereRegistered.c-number = Course.c-number and Course.c-number ≥ 400

Query optimization (cont.) Result of query rewriting select Graduate.student, Graduate.title fromTeachers, Prof, Graduate whereProf.name = Teachers.prof and Teaches.c-number = Graduate.c-cumber and Teaches.quarter = Graduate.quarter and Graduate.c-number ≥ 500 and Prof.area = ‘DB’

Maintaining physical data independence Relational database systems rely on 1-1 mapping between relations and files. In object-oriented and semistructured databases, logical model is more redundant and does not reflect optimal physical design. Physical storage can be described as views over the logical model. e.g. GMAP (Tsatalos et al. 96)

Maintaining physical data independence (cont.) GMAP (generalized multi-level access paths) def.gmap G1 as b + -tree by given Student.name select Department where Student.major Department. def.gmap G2 as b + -tree by given Student.name select Course.c-number where Student registered Course def.gmap G3 as b + -tree by given Course.c-number select Department where Student.registered Course and Student major Department

Maintaining physical data independence (cont.) Query: select Student.name, Department where Student registered Course and Student major Department and Course.c-number ≥ 500 Plans: 1.P Student.name, Department (S Course.c-number≥500 (J Student.name (G1, G2))) 2.J Course.c-number (S Course.c-number≥50 (G3), G2)

Data integration Providing a uniform query interface to a multitude of autonomous heterogeneous data sources. Giving users a mediated schema. Local as View: specifying data source descriptions as a view over the mediated schema.

Data integration (cont.) Example: Prof(name, area) Course(c-number, title, univ) Teaches(prof, c-number, quarter, univ) Register(student, c-number, quarter) Major(student, dept) Works(prof, dept) Advises(prof, student)

Data integration (cont.) Suppose we have only 2 views available: create view DB-courses as select Course.title, Teaches.prof, Course.c-number, Course.univ fromTeaches, Course whereTeaches.c-number = Course.c-number and Teaches.univ = Course.univ and Course.title = “Database Systems” create view UW-phd-courses as select Course.title, Teaches.prof, Course.c-number,Course.univ fromTeaches, Course where Teaches.c-number = Course.c-number and Course.univ = ‘UW’ and Teaches.univ = ‘UW’ and Course.c-number ≥ 500

Data integration (cont.) Query who teaches database courses in UW: select prof from DB-courses where univ = ‘UW’ Query all graduate courses in UW: select title, c-number from DB-courses where univ = ‘UW’ and c-number ≥ 400 UNION select title, c-number from UW-phd-courses

Comparison for two applications Query Optimization and physical design Data Integration OutputQuery execution plan Q’ Query Q’ Equivalence with Q Q’ must be equivalent to Q Q’ can be equivalent to or contained in Q Data accessed Original relational data + materialized views Only views # of viewsModestHuge View completeness YesNo Rewriting reasoning Logical correctness + cost model Logical correctness

Formal Problem Definition

Query containment and equivalence Definition A query Q 1 is said to be contained in a query Q 2, denoted by Q 1 Q 2, if for all database instances D, the set of tuples computed for Q 1 is a subset of those computed for Q 2, i.e., Q 1 (D) Q 2 (D) ; The two queries are equivalent if Q 1 Q 2 and Q 2 Q 1.    

Equivalent rewritings Definition Let Q be a query, V = {V 1, V 2, …, V m } be a set of view definitions. The query Q’ is an equivalent rewriting of Q using V if: Q’ refers only to the views in V; Q’ is equivalent to Q.A query Q 1 is said to be contained in a query Q 2,

Maximally-contained rewritings Definition Let Q be a query, V = {V 1, V 2, …, V m } be a set of view definitions, and L be a query language. The query Q’ is maximally-contained rewriting of Q’ w.r.t. L if: Q’ is a query in L that refers only to the views in V; Q’ is contained in Q; there is no rewriting Q 1 L, such that Q’ Q 1 Q, and Q 1 is not equivalent to Q’. 

Certain Answers Problem: finding all the answers to a query given a set of views. Not equivalent to maximally-contained rewriting because Maximal containment relies on languages. Formalized by certain answers (Abiteboul et.al. 98) A tuple α is a certain answer of Q w.r.t. a set of view definitions {V i } and their extensions {v i }, if α is in Q(D) for any possible database instance D such that V i (D) = v i (CWA) or V i (D) v i (OWA).

Conditions of view usability

View usability conditions For SPJ views to be usable in an equivalent rewriting of a SPJ query Q under bag semantics: 1.There is a mapping ψ from occurrences of tables mentioned in the from clause of V to those mentioned in the from clause of Q, mapping every table name to itself. For bag semantics, ψ must be V must either apply the join and selection predicates in Q on the attributes on the attributes of the tables in the domain of ψ, or must apply to them a logical weaker selection, and select the attributes on which predicate still need to applied. 3. V must not project out any attributes of the tables in the domain of ψ that are needed in the selection of Q.

Using materialized view in query optimization

System-R style optimization Traditional optimizerOptimizer using views Single table access path Access paths on all tables Also consider usable materialized views Combining partial plans The predicates of the two partial plans are known, and the cheapest is considered. Consider joining partial plans with several alternative join predicates. Pruning of plans Save the cheapest of each equivalence class Compares any pairs of plans, and discard one if there is another cheaper plan dominates it. Termination testing Has the equivalent class including all relations in the query been considered? Are all partial plans examined?

System-R style (cont.)

Queries with grouping and aggregation Example: View: create view V as select c-number, year, Max(evaluation) as maxeval, Count(*) as offerings fromTeaches wherec-number ≥ 400 group by c-number, year Query: select year, Count(*), Max(evaluation) fromTeaches wherec-number ≥ 500 group by year

Queries with grouping and aggregation (cont.) The query can be rewritten to: select year, sum(offering), Max(evaluation) FromV wherec-number ≥ 500 group by year Comment: More limitations if grouping and aggregation are concerned. Grouping in view must be finer than that in query. Aggregations in query must be recoverable from the output fields and aggregations in the view.

Answering queries using views for data integration

Main approches Using datalog query representation for both Q and V. Algorithms: –Bucket algorithm (Levy et al. 96) –Inverse rules algorithm (Qian et al. 96 ) –MiniCon algorithm (Pottinger et al. 00)

Bucket algorithm Create a bucket for each non-comparison subgoal g in Q: For each subgoal g’ in V, if there is a unifier θ for g and g’ and the view, and after unification, 1)the comparison predicates in Q and V are simultaneously satisiable; 2)if a variable appears in head(Q) and subgoal g in the query, the corresponding variable in g’ also appears in head(V) in V, add θ(head(V)) into the bucket of g. Find a set of conjunctive query rewritings, and each produces a conjunctive query including one conjunct from each bucket. It is a conjunctive rewriting if either 1)The conjunctive is contained in Q, or 2)It is possible to add atoms of comparison predicates such that the resulting conjunction is contained in Q.

Bucket algorithm example V1(student, c-number, quarter, title):- Registered(student, c-number, quarter), Course(c-number, title), c-number ≥ 500, quarter ≥ Aut98. V2(student, prof, c-number, quarter):- Registered(student, c-number, quarter), Teaches(prof, c-number, quarter) V3(student, c-number):- Registered(student, c-number, quarter), quarter ≤ Aut94. V4(prof, c-number, title, quarter):- Registered(student, c-number, quarter), Course(c-number, title), Teaches(prof, c-number, quarter), quarter ≤ Aut97.

Bucket algorithm example (cont.) Query: Q(S,C,P) :- Teaches(P,C,Q), Registered(S,C,Q), Course(C,T), C ≥ 300, Q ≥ Aut95. Bucket: Teaches(P,C,Q)Registered(S,C,Q)Course(C,T) V2(S’,P,C,Q)V1(S,C,Q,T’)V1(S’,C,Q’,T) V4(P,C,T’,Q)V2(S,P’,C,Q)V4(P’,C,T,Q’)

Bucket algorithm example (cont.) Result of rewriting: q’(S,C,P) :- V2(S’,P,C,Q), V1(S,C,Q,T’) q’(S,C,P) :- V4(P,C,T’,Q), V1(S,C,Q,T’), V4(P’,C,T,Q’) q’(S,C,P) :- V2(S,P,C,Q), V4(P,C,T’,Q) The second query is empty, so the result is the union of the first and the third conjunctive queries.

Bucket algorithm comments Advantage –Prune significant number of query rewritings. –Return maximally-contained rewriting when the query does not have comparison. Disadvantage –Cartesian product of buckets is still large –Testing query containment is costly and -complete.

Inverse-rules algorithm Construct a set of rules that invert the view definitions. Idea: each tuple in the head of view definition query is a witness of tuples in relations corresponding to subgoals in the body. Assign one skelom function symbol for each existential variable in the view definition.

Inverse-rules algorithm example Example: View definition: V3(dept, c-number) :- Major(student, dept), Registered(student, c-number) Inverse rules: Major(f 1 (dept, X), dept) :- V3(dept, X) Registered(f 1 (Y, c-number), c-number) :- V3(Y, c-number)

Inverse-rule algorithm example (cont.) Query: q(dept) :- Major(student, dept), Registered(student, 444) V3 has tuples: {(CS, 444), (EE, 444), (CS 333)} Applying inverse rules: Registered: {(f 1 (CS, 444), CS), (f 1 (EE, 444), EE), (f 1 (CS, 333), CS)} Major: {(f 1 (CS, 444), 444), (f 1 (EE, 444), 444), (f 1 (CS, 333), 333)} Answer: {EE, CS}

Inverse-rule algorithm comments Advantage –Simplicity and modularity –Return maximally-contained rewriting Disadvantage –Keep more non-contributive views than bucket algorithm –Require recomputing the relations from the views. The reason to use precomputed materialized views is lost.

MiniCon algorithm Improvement on bucket algorithm. Aim to eliminate more views that are useless to the query. When we find a unification between a subgoal g’ in V and a subgoal g in Q, all other subgoals that join with g in Q are examined. V must either have the join attribute in its head, or contain the corresponding joined subgoals in the body. For each view, compute a MiniCon consisting all subgoals in the query the view contributes.

MiniCon example Example: q(D) :- Major(S, D), Registered(S, 444, Q), Advises(P, S) V1(dept) :- Major(student, dept), Registered(student, 444, quarter). V2(prof,dept,area) :- Advises(prof, student), Prof(name, area). V3(dep,c-number) :- Major(student, dept), Registered(student, c-number, quarter) Advises(prof, student) MiniCon(V 1 ) = Φ, MiniCon(V 2 ) = Φ, MiniCon(V 3 ) = {Major, Registered, Advises}

Theoretical results (very selective)

Completeness Question: given a query Q and a set of views V, will the algorithm find an equivalent rewriting of Q using V, when there one exists? When a CQ has no comparison predicates and has n subgoals, there exists an equivalent conjunctive rewriting of Q using V only if there is a rewriting with at most n subgoals. The complexity is NP-hard. (Levy et al. 1995)

Recursive rewriting Goal: when we apply maximally-contained rewriting, we can also get the set of all certain answers. Recursive query rewriting is necessary when: –The query is recursive. –Database relations have functional dependencies. –There exist access pattern limitations on the views. –Views have unions. –Additional semantic information about class hierarchies on objects is expressed in DL.

Recursive rewriting (example with fd) Relation: schedule(Airline, Flight_no, Date, Pilot, Aircraft) FDs: Pilot -> Airline, Aircraft->Airline View: V(D,P,C) :- schedule(A, N, D, P, C) Query: Q(P) :- schedule(A, N, D, ‘mike’, C), schedule(A, N’, D’, P, C’) Rewriting: relevantPilot(‘mike’) relevantAircraft(C) :- v(D, ‘mike’, C) relevantAircraft(C) :- v(D, P, C), relevantPilot(P) relevantPilot(P) :- relevantPilot(P1), relevantAircraft(C), v(D1, P1, C), v(D2, P, C)

Finding certain answers Open-world assumption: polynomial in most practical cases. NP-hard (in the size of view extensions) if unions are allowed in view definition or inequality predicates are allowed in query languages. Close-world assumption: co-NP-hard even if both views and queries are CQs and have no comparison predicates. c.f.GAV: polynomial In cases views can contain incorrect tuples : –assume no comparison predicates in views or query –If all views are complete or all views may have incorrect tuples: ploynomial in view ext. size –o.w.: co-NP-hard

Extensions

Object query languages (OQL) (Florescu 96) –more semantic info for class hierarchy and attributes –OQL does not clearly separate select and where clauses, both can have path navigation. Access pattern limitation (Rajaraman 95) –Restricted parameterized queries on views CitationDB bf (X,Y) :- Cites(X,Y) –Finite rewriting requires recursiveness.

Conclusion and challenges

Answering queries using views plays significant roles in query optimization, physical data independency, and data integration. New fields to explore: –Consider new query languages –Consider integration constraints –Bridge the gap between query optimization and data integration –Facilitate data warehouse query: query result reuse, incremental computation, –Decide which views are materialized first.

Thank you