Materializing Views With Minimal Size To Answer Queries

Slides:



Advertisements
Similar presentations
Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.
Advertisements

ICDT'2001, London, UK1 Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
DOLAP'04 - Washington DC1 Constructing Search Space for Materialized View Selection Dimiti Theodoratos Wugang Xu New Jersey Institute of Technology.
Lecture 24 MAS 714 Hartmut Klauck
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
BY ANISH D. SARMA, XIN DONG, ALON HALEVY, PROCEEDINGS OF SIGMOD'08, VANCOUVER, BRITISH COLUMBIA, CANADA, JUNE 2008 Bootstrapping Pay-As-You-Go Data Integration.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Efficient Query Evaluation on Probabilistic Databases
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Local-as-View Mediators Priya Gangaraju(Class Id:203)
1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,
2005Integration-intro1 Data Integration Systems overview The architecture of a data integration system:  Components and their interaction  Tasks  Concepts.
CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina.
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
Query Processing Presented by Aung S. Win.
Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.
25th VLDB, Edinburgh, Scotland, September 7-10, 1999 Extending Practical Pre-Aggregation for On-Line Analytical Processing T. B. Pedersen 1,2, C. S. Jensen.
Database Management 9. course. Execution of queries.
OLAP Recap 3 characteristics of OLAP cubes: Large data sets ~ Gb, Tb Expected Query : Aggregation Infrequent updates Star Schema : Hierarchical Dimensions.
Ch. 13 Ch. 131 jcmt CSE 3302 Programming Languages CSE3302 Programming Languages (notes?) Dr. Carter Tiernan.
Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.
Donghyun (David) Kim Department of Mathematics and Computer Science North Carolina Central University 1 Chapter 7 Time Complexity Some slides are in courtesy.
Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.
1 Integration of data sources Patrick Lambrix Department of Computer and Information Science Linköpings universitet.
Scrubbing Query Results from Probabilistic Databases Jianwen Chen, Ling Feng, Wenwei Xue.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Logical Agents. Outline Knowledge-based agents Logic in general - models and entailment Propositional (Boolean) logic Equivalence, validity, satisfiability.
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
The Subset-sum Problem
MySQL Subquery Source: Dev.MySql.com
CSCI5570 Large Scale Data Processing Systems
Database Management System
Answering Queries using Templates with Binding Patterns
Computing Full Disjunctions
Relational Algebra Chapter 4, Part A
Analysis and design of algorithm
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra.
Automatic Physical Design Tuning: Workload as a Sequence
Rank Aggregation.
Hidden Markov Models Part 2: Algorithms
Objective of This Course
Propositional Resolution
The Relational Algebra
Relational Algebra Chapter 4, Sections 4.1 – 4.2
This Lecture Substitution model
Linear Programming Duality, Reductions, and Bipartite Matching
Chapter 11 Limitations of Algorithm Power
NP-Complete Problems.
Lecture 10: Query Complexity
CUBE MATERIALIZATION E0 261 Jayant Haritsa
Local-as-View Mediators
Probabilistic Databases
Chen Li Information and Computer Science
This Lecture Substitution model
Query Optimization.
This Lecture Substitution model
More NP-Complete Problems
Answering Queries Using Views: A Survey
Data Structures and Algorithms
Presentation transcript:

Materializing Views With Minimal Size To Answer Queries Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine)

Materializing Minimal-Size Views Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Materializing Minimal-Size Views Context: relational databases The problem: minimize the amount of data required to answer queries, by: automatically designing new relations (views), and precomputing and storing (materializing) the new relations Central issue: inventing new views to materialize Applications include: Mediators in data-integration systems “Database as a service” in enterprise computing 02/06/02: need to split into two slides? (want to talk about related work - view selection) The talk needs to be 45 minutes! People use db: put data, ask queries (same queries again) - can add or change data => make these queries more efficient Stress that *multiple* queries Will not be discussing: - update costs - or indexing costs The big question: why need to materialize views at all => the ancestor example Why view selection is not a completely satisfactory solution Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 2 A formal perspective on the view selection problem

Example: Modified TPC-H Query Q(name,o_date,priority,comment,o_key,quantity, shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V1(name,o_date,priority,comment,o_key) :- V2(o_key,quantity,shipmode) :- Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 3

Partial Answer to the Query Q Name O_Date Priority Comment O_Key Quantity Shipmode Tom 3/14/95 0 close… 134721 26 REG AIR Tom 3/14/95 0 close… 134721 75 REG AIR Tom 3/14/95 0 close… 134721 43 AIR Jack 12/21/94 0 final… 571683 43 MAIL Jack 12/21/94 0 final… 571683 33 AIR Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 4

Minimal-Size Views for the Query Q Q(name,o_date,priority,comment,o_key,quantity, shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V1(name,o_date,priority,comment,o_key) :- V2(o_key,quantity,shipmode) :- Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 5

Questions How do we know that views V1 and V2 are minimal-size views for the query Q? On what databases? How to find a set of minimal-size views, given a set of queries and a database: Is the problem decidable? For what inputs? What is the complexity of the problem? Are there good efficient algorithms for finding minimal-size views? Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 6

Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Preliminaries Two queries are equivalent if they return the same answers on any database. An equivalent rewriting of a query Q in terms of views V is a query that: is defined using the relations in V only, and is equivalent to Q A conjunctive query (view) can be defined using only equality selections, projections, and joins A disjunctive query (view) can be defined as a union of a finite number of conjunctive queries (views) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 7 A formal perspective on the view selection problem

Problem Specification Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Problem Specification Input: Database instance D with schema R Workload Q of queries on D Output (optimal solution): a set V of views, such that: each query in Q has an equivalent rewriting in terms of V, and the total size of the views, SVi Î V size(Vi), is minimal on D Stress that *multiple* queries Will not be discussing: - update costs - or indexing costs Why rewriting uses only the views? - because can dematerialize all original relations (and some new views can be original relations) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 8 A formal perspective on the view selection problem

Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Assumptions Single database instance Set semantics Finite query workloads Conjunctive queries Disjunctive views and rewritings 02/06/02: the “no indexes” assumption is essential Put weighted sum on slides? Set semantics? Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 9 A formal perspective on the view selection problem

Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Main Results Decidability and upper bounds on the complexity of the problem Relationship between: a restriction on the language of the queries, and the language of optimal views Dynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 10 A formal perspective on the view selection problem

Conjunctive Views and Rewritings Theorem. Given a query workload Q and a database D. It is possible to construct a finite search space of views that includes all views in all optimal solutions for Q on D. The number of views in the search space is at most doubly-exponential in the size of the input query workload Q. Corollary. The problem of finding a minimal-size conjunctive viewset is decidable for finite workloads of conjunctive queries, assuming all rewritings are conjunctive. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 11

Self-Joins in Queries Q1(X,Y) :- p(X,Z), p(Z,T), s(Z,Y). // self-join Q2(X,Y) :- p(X,Z), r(Z,T), s(Z,Y). // no self-joins Result 1. For some databases and queries, there is a set of disjunctive views that is better than any conjunctive solution. Example for a single query with self-joins Result 2. The problem of finding an optimal solution in the space of disjunctive views is decidable, assuming conjunctive rewritings. Result 3. It is not necessary to consider disjunctive rewritings. Result 4. The size of the search space of views is at most triply-exponential in the size of the input query workload. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 12

Queries Without Self-Joins: The Problem Is in NP Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13

Queries Without Self-Joins: The Problem Is in NP disjunctive views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13

Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13

Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views subexpression views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13

Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views subexpression views full-reducer views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13

1. Conjunctive Views Are Enough Theorem. Given a database D and a set of queries Q without self-joins. Suppose a set V of disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: all views in V’ are conjunctive, and size (V’) £ size (V). Corollary. For any database and any set of queries without self-joins, some optimal disjunctive solution is a set of conjunctive views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 14

What We Have Shown disjunctive views conjunctive views 15 Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 15

Idea of the Proof Given: Q(…) :- S1(…), S2(…), …, Sn(…); rewriting P of Q that uses V: V = V1 È V2 È … È Vt Then there exists: V’ = V’1 È V’2 È … È V’t such that: for some mapping m, each V’i is an image of Vi, and each V’i alone can replace any Vj in the rewriting of Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 16

Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17

Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17

Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17

Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17

Details of the Proof (2) We can always change b, to redirect all subgoals of Q that map into subgoals of more than one Vij in Pr Then, we can replace Pr with P’r: Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). P’r(…):- Vij(…), G(…). And P’r º Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 18

Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19

Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19

Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19

Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19

Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19

Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19

Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b’ b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19

Details of the Proof (4) Thus, we can replace Pr with P’r: Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). P’r(…):- Vij(…), G(…). And P’r º Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 20

2. Subexpression Views Are Enough Theorem. Given a database D and a set of queries Q without self-joins. Suppose a set V of disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: all views in V’ are conjunctive subexpression-type, and size (V’) £ size (V). Corollary. For any database and set of queries without self-joins, some optimal disjunctive solution is a set of conjunctive subexpression-type views. The size of the search space of views is at most singly-exponential in the size of the input query workload Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 21

3. Full-Reducer Views Are Enough A view V is a full-reducer view for a query Q if V and Q have the same body. Theorem. Given a database D and a single query Q without self-joins. Suppose a set V of disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: all views in V’ are conjunctive full-reducer views for Q, and size (V’) £ size (V). Corollary. For any database and any query without self-joins, some optimal disjunctive solution is a set of conjunctive full-reducer views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 22

Using Full-Reducer Views To Rewrite Sets of Queries For query workloads with more than one query, we can merge optimal full-reducer views for individual queries in the workload - and the number of subgoals in the merged views never exceeds the number of subgoals in full-reducer views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 23

What We Have Shown disjunctive views conjunctive views subexpression views full-reducer views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 24

The Problem Is in NP Theorem. Given a database instance, for any finite workload of conjunctive queries without self-joins, the problem of finding a minimal-size disjunctive viewset is in NP. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 25

Generating Minimal-Size Views Input: a conjunctive query without self-joins and a database Output: a minimal-size disjunctive viewset for the query on the database Method: produce a minimal-size set of conjunctive full-reducer views, by doing exhaustive search in the space of the views using a dynamic-programming algorithm (cf. query optimization in System R) The algorithm returns an optimal solution Can be modified to work for non-singleton query workloads Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 26

Heuristics for Generating Views Consider only those views that “cover” up to a fixed number of subgoals of the query Consider only those views that have up to a fixed number of head attributes Apply the algorithm separately to several subsets of subgoals of the query, then combine the solutions Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 27

Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Main Results Decidability and upper bounds on the complexity of the problem Relationship between: a restriction on the language of the queries, and the language of optimal views Dynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 28 A formal perspective on the view selection problem

Some Directions of Future Work Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Some Directions of Future Work Rewriting queries in more expressive languages: built-in predicates disjunctive queries … Using more expressive languages of views and rewritings Maximally-contained rewritings of queries in terms of views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 29 A formal perspective on the view selection problem

Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Reference Jia Li, Rada Chirkova, and Chen Li. Minimizing Data-Communication Costs by Decomposing Query Results in Client-Server Environments. UCI ICS Technical Report, 2003. http://www-db.ics.uci.edu/pages/raccoon/ Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 30 A formal perspective on the view selection problem