Materializing Views With Minimal Size To Answer Queries Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine)
Materializing Minimal-Size Views Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Materializing Minimal-Size Views Context: relational databases The problem: minimize the amount of data required to answer queries, by: automatically designing new relations (views), and precomputing and storing (materializing) the new relations Central issue: inventing new views to materialize Applications include: Mediators in data-integration systems “Database as a service” in enterprise computing 02/06/02: need to split into two slides? (want to talk about related work - view selection) The talk needs to be 45 minutes! People use db: put data, ask queries (same queries again) - can add or change data => make these queries more efficient Stress that *multiple* queries Will not be discussing: - update costs - or indexing costs The big question: why need to materialize views at all => the ancestor example Why view selection is not a completely satisfactory solution Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 2 A formal perspective on the view selection problem
Example: Modified TPC-H Query Q(name,o_date,priority,comment,o_key,quantity, shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V1(name,o_date,priority,comment,o_key) :- V2(o_key,quantity,shipmode) :- Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 3
Partial Answer to the Query Q Name O_Date Priority Comment O_Key Quantity Shipmode Tom 3/14/95 0 close… 134721 26 REG AIR Tom 3/14/95 0 close… 134721 75 REG AIR Tom 3/14/95 0 close… 134721 43 AIR Jack 12/21/94 0 final… 571683 43 MAIL Jack 12/21/94 0 final… 571683 33 AIR Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 4
Minimal-Size Views for the Query Q Q(name,o_date,priority,comment,o_key,quantity, shipmode) :- customer(c_key,name,’building’), order(o_key,c_key,o_date,priority,comment), lineitem(lineno,o_key,quantity,shipmode). V1(name,o_date,priority,comment,o_key) :- V2(o_key,quantity,shipmode) :- Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 5
Questions How do we know that views V1 and V2 are minimal-size views for the query Q? On what databases? How to find a set of minimal-size views, given a set of queries and a database: Is the problem decidable? For what inputs? What is the complexity of the problem? Are there good efficient algorithms for finding minimal-size views? Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 6
Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Preliminaries Two queries are equivalent if they return the same answers on any database. An equivalent rewriting of a query Q in terms of views V is a query that: is defined using the relations in V only, and is equivalent to Q A conjunctive query (view) can be defined using only equality selections, projections, and joins A disjunctive query (view) can be defined as a union of a finite number of conjunctive queries (views) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 7 A formal perspective on the view selection problem
Problem Specification Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Problem Specification Input: Database instance D with schema R Workload Q of queries on D Output (optimal solution): a set V of views, such that: each query in Q has an equivalent rewriting in terms of V, and the total size of the views, SVi Î V size(Vi), is minimal on D Stress that *multiple* queries Will not be discussing: - update costs - or indexing costs Why rewriting uses only the views? - because can dematerialize all original relations (and some new views can be original relations) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 8 A formal perspective on the view selection problem
Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Assumptions Single database instance Set semantics Finite query workloads Conjunctive queries Disjunctive views and rewritings 02/06/02: the “no indexes” assumption is essential Put weighted sum on slides? Set semantics? Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 9 A formal perspective on the view selection problem
Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Main Results Decidability and upper bounds on the complexity of the problem Relationship between: a restriction on the language of the queries, and the language of optimal views Dynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 10 A formal perspective on the view selection problem
Conjunctive Views and Rewritings Theorem. Given a query workload Q and a database D. It is possible to construct a finite search space of views that includes all views in all optimal solutions for Q on D. The number of views in the search space is at most doubly-exponential in the size of the input query workload Q. Corollary. The problem of finding a minimal-size conjunctive viewset is decidable for finite workloads of conjunctive queries, assuming all rewritings are conjunctive. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 11
Self-Joins in Queries Q1(X,Y) :- p(X,Z), p(Z,T), s(Z,Y). // self-join Q2(X,Y) :- p(X,Z), r(Z,T), s(Z,Y). // no self-joins Result 1. For some databases and queries, there is a set of disjunctive views that is better than any conjunctive solution. Example for a single query with self-joins Result 2. The problem of finding an optimal solution in the space of disjunctive views is decidable, assuming conjunctive rewritings. Result 3. It is not necessary to consider disjunctive rewritings. Result 4. The size of the search space of views is at most triply-exponential in the size of the input query workload. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 12
Queries Without Self-Joins: The Problem Is in NP Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views subexpression views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
Queries Without Self-Joins: The Problem Is in NP disjunctive views conjunctive views subexpression views full-reducer views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 13
1. Conjunctive Views Are Enough Theorem. Given a database D and a set of queries Q without self-joins. Suppose a set V of disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: all views in V’ are conjunctive, and size (V’) £ size (V). Corollary. For any database and any set of queries without self-joins, some optimal disjunctive solution is a set of conjunctive views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 14
What We Have Shown disjunctive views conjunctive views 15 Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 15
Idea of the Proof Given: Q(…) :- S1(…), S2(…), …, Sn(…); rewriting P of Q that uses V: V = V1 È V2 È … È Vt Then there exists: V’ = V’1 È V’2 È … È V’t such that: for some mapping m, each V’i is an image of Vi, and each V’i alone can replace any Vj in the rewriting of Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 16
Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (1) P º Q, P = P1 È P2 È ... È Ps There exists a conjunctive query Pi: Pi º Q Pi (…) :- Vi1(…), …, Vij(…), …, Vim(…), G(…). Fix any Vij in Pi; consider, in P, Pr (…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). Because Pr is contained in Q, there exists a mapping b from Q to the expansion of Pr We can always change b, to redirect all subgoals of Q that map into subgoals of Vij in Pr Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 17
Details of the Proof (2) We can always change b, to redirect all subgoals of Q that map into subgoals of more than one Vij in Pr Then, we can replace Pr with P’r: Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). P’r(…):- Vij(…), G(…). And P’r º Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 18
Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (3) Changing b, to redirect all subgoals of Q that map into subgoals of Vij in Pr : Q(…) :- …, Sk(…,W,…), … Prexp(…) :- …, Sk(…,Y’,…), …, Sk(…,Y,…), … Pr(…) :- Vij(…), Vij(…), …, Vij(…), G(…) b’ b Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 19
Details of the Proof (4) Thus, we can replace Pr with P’r: Pr(…) :- Vij(…), …, Vij(…), …, Vij(…), G(…). P’r(…):- Vij(…), G(…). And P’r º Q Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 20
2. Subexpression Views Are Enough Theorem. Given a database D and a set of queries Q without self-joins. Suppose a set V of disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: all views in V’ are conjunctive subexpression-type, and size (V’) £ size (V). Corollary. For any database and set of queries without self-joins, some optimal disjunctive solution is a set of conjunctive subexpression-type views. The size of the search space of views is at most singly-exponential in the size of the input query workload Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 21
3. Full-Reducer Views Are Enough A view V is a full-reducer view for a query Q if V and Q have the same body. Theorem. Given a database D and a single query Q without self-joins. Suppose a set V of disjunctive views is a solution for (D,Q). Then there exists another solution V’ for (D,Q), such that: all views in V’ are conjunctive full-reducer views for Q, and size (V’) £ size (V). Corollary. For any database and any query without self-joins, some optimal disjunctive solution is a set of conjunctive full-reducer views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 22
Using Full-Reducer Views To Rewrite Sets of Queries For query workloads with more than one query, we can merge optimal full-reducer views for individual queries in the workload - and the number of subgoals in the merged views never exceeds the number of subgoals in full-reducer views. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 23
What We Have Shown disjunctive views conjunctive views subexpression views full-reducer views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 24
The Problem Is in NP Theorem. Given a database instance, for any finite workload of conjunctive queries without self-joins, the problem of finding a minimal-size disjunctive viewset is in NP. Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 25
Generating Minimal-Size Views Input: a conjunctive query without self-joins and a database Output: a minimal-size disjunctive viewset for the query on the database Method: produce a minimal-size set of conjunctive full-reducer views, by doing exhaustive search in the space of the views using a dynamic-programming algorithm (cf. query optimization in System R) The algorithm returns an optimal solution Can be modified to work for non-singleton query workloads Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 26
Heuristics for Generating Views Consider only those views that “cover” up to a fixed number of subgoals of the query Consider only those views that have up to a fixed number of head attributes Apply the algorithm separately to several subsets of subgoals of the query, then combine the solutions Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 27
Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Main Results Decidability and upper bounds on the complexity of the problem Relationship between: a restriction on the language of the queries, and the language of optimal views Dynamic-programming algorithm for finding an optimal solution for conjunctive queries (restricted case) Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 28 A formal perspective on the view selection problem
Some Directions of Future Work Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Some Directions of Future Work Rewriting queries in more expressive languages: built-in predicates disjunctive queries … Using more expressive languages of views and rewritings Maximally-contained rewritings of queries in terms of views Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 29 A formal perspective on the view selection problem
Chirkova, Halevy, and Suciu VLDB-2001 5/9/2019 Reference Jia Li, Rada Chirkova, and Chen Li. Minimizing Data-Communication Costs by Decomposing Query Results in Client-Server Environments. UCI ICS Technical Report, 2003. http://www-db.ics.uci.edu/pages/raccoon/ Chirkova and Li Materializing Views with Minimal Size to Answer Queries 6/09/2003 30 A formal perspective on the view selection problem