Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman.

Slides:

Advertisements

Similar presentations

Manipulation of Query Expressions. Outline Query unfolding Query containment and equivalence Answering queries using views.

Advertisements

ICDT'2001, London, UK1 Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman.

Information Integration Using Logical Views Jeffrey D. Ullman.

CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.

CMPT 354 Views and Indexes Spring 2012 Instructor: Hassan Khosravi.

SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.

ICDT'2001, London, UK1 On Answering Queries in the Presence of Limited Access Patterns Chen Li Stanford University joint work with Edward Chang, UC Santa.

Efficient Query Evaluation on Probabilistic Databases

New Sampling-Based Summary Statistics for Improving Approximate Query Answers P. B. Gibbons and Y. Matias (ACM SIGMOD 1998) Rongfang Li Feb 2007.

1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.

Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.

SECTIONS 21.4 – 21.5 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.

Relational Data Mining in Finance Haonan Zhang CFWin /04/2003.

1 Distributed Databases CS347 Lecture 14 May 30, 2001.

A scalable algorithm for answering queries using views Rachel Pottinger, Alon Levy [2000] Rachel Pottinger and Alon Y. Levy A Scalable Algorithm for Answering.

Cost based transformations Initial logical query plan Two candidates for the best logical query plan.

Local-as-View Mediators Priya Gangaraju(Class Id:203)

1 Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems Chen Li Information and Computer Science University of California,

1 Query Optimization In Compressed Database Systems Zhiyuan Chen and Johannes Gehrke Cornell University Flip Korn AT&T Labs.

Rewiring – Review, Quantitative Analysis and Applications Matthew Tang Wai Chung CUHK CSE MPhil 10/11/2003.

1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.

Parametric Query Generation Student: Dilys Thomas Mentor: Nico Bruno Manager: Surajit Chaudhuri.

EFFICIENT COMPUTATION OF DIVERSE QUERY RESULTS Presenting: Karina Koifman Course : DB Seminar.

1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.

Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.

Data Warehouse View Maintenance Presented By: Katrina Salamon For CS561.

Query Processing Presented by Aung S. Win.

Query Planning for Searching Inter- Dependent Deep-Web Databases Fan Wang 1, Gagan Agrawal 1, Ruoming Jin 2 1 Department of Computer.

IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.

Presenter: Dongning Luo Sept. 29 th 2008 This presentation based on The following paper: Alon Halevy, “Answering queries using views: A Survey”, VLDB J.

1 Searching and Integrating Information on the Web Seminar 2: Data Integration Professor Chen Li UC Irvine.

Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.

1 On Provenance of Non-Answers for Queries over Extracted Data Jiansheng Huang Ting Chen AnHai Doan Jeffrey F. Naughton.

Access Path Selection in a Relational Database Management System Selinger et al.

CSE 636 Data Integration Limited Source Capabilities Slides by Hector Garcia-Molina Fall 2006.

Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.

Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.

Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Swarup Acharya Phillip B. Gibbons Viswanath Poosala Sridhar Ramaswamy Presented By Vinay Hoskere.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.

P15 Lai Xiaoni (U077151L) Qiao Li (U077194E) Saw Woei Yuh (U077146X) Wang Yong (U077138Y)

SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.

Answering Queries Using Views LMSS’95 Laks V.S. Lakshmanan Dept. of Comp. Science UBC.

Output URL Bidding Panagiotis Papadimitriou, Hector Garcia-Molina, (Stanford University) Ali Dasdan, Santanu Kolay (Ebay Inc)

The Volcano Optimizer Generator Extensibility and Efficient Search.

C-Store: Data Model and Data Organization Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.

1 Using Tiling to Scale Parallel Datacube Implementation Ruoming Jin Karthik Vaidyanathan Ge Yang Gagan Agrawal The Ohio State University.

1. 2 Preface In the time since the 1986 edition of this book, the world of compiler design has changed significantly 3.

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.

Answering Queries Using Views: The Last Frontier.

CS6321 Query Optimization Over Web Services Utkarsh Kamesh Jennifer Rajeev Shrivastava Munagala Wisdom Motwani Presented By Ajay Kumar Sarda.

Lecture 24 Query Execution Monday, November 28, 2005.

AnHai Doan & Alon Halevy Department of Computer Science & Engineering University of Washington Efficiently Ordering Query Plans for Data Integration.

Containment of Relational Queries with Annotation Propagation Wang-Chiew Tan University of California, Santa Cruz.

Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.

Handling Data Skew in Parallel Joins in Shared-Nothing Systems Yu Xu, Pekka Kostamaa, XinZhou (Teradata) Liang Chen (University of California) SIGMOD’08.

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

Chapter 13: Query Processing

Answering Queries Using Views Presented by: Mahmoud ELIAS.

Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.

Mining for Empty Rectangles in Large Data Sets

Module 11: File Structure

Computing Full Disjunctions

Database Management Systems (CS 564)

Lecture 21: ML Optimizers

Chen Li Information and Computer Science

Materializing Views With Minimal Size To Answer Queries

Presentation transcript:

Generating Efficient Plans for Queries Using Views Chen Li Stanford University with Foto Afrati (National Technical University of Athens) and Jeff Ullman (Stanford University) SIGMOD, Santa Barbara, CA, May 23, 2001

2 Answering queries using views How to answer a query using only the results of views? [LMSS95] Many applications: –Data warehouses –Data integration –Query optimization –… Base relations Views V 1 V 2 … V n Query Q R1 Rm …

3 An example View: V1(M, D, C) :- car(M, D), loc(D, C) Query Q: Q(M, C) :- car(M, anderson), loc(anderson, C) Rewriting P1: Q(M, C) :- V1(M, anderson, C) car BMW Alison Honda Anderson Make Dealer … … Ford Varsity loc Anderson Palo alto Varsity Redwood City Dealer City Alison Mountain View … …

4 Existing algorithms Bucket algorithm [LRO96], Inverse-rule algorithm [DG97], MiniCon algorithm [PL00], … However, instead of generating P1: Q(M, C) :- V1(M, anderson, C) they generate rewriting P2: Q(M, C) :- V1(M, anderson, C1), V1(M1, anderson, C) Why P2, not P1? – These algorithms take the Open-World Assumption (OWA): “P2  P1.” – However, under the Closed-World Assumption (CWA): “P1 = P2.”

5 Differences between OWA and CWA W1(Make, Dealer) :- car(Make, Dealer) W2(Make, Dealer) :- car(Make, Dealer) All car tuples W1 = W2 = CWA – W1 and W2 have all car tuples. – E.g.: W1 and W2 are computed from the same car table in a database. – W1 and W2 have some car tuples. – E.g.: W1 and W2 are from two different web sites. OWA W1 W2

6 Our problem: generating efficient plans using views under CWA Base relations Materialized views V 1 V 2 … V n Query Q Existing algorithms work under both assumptions. Our study –takes the CWA assumption. –considers efficiency of rewritings. Efficient plans? R1 R2 Rm

7 Challenge: in what space should we generate rewritings? Rewritings: P1: Q(S, C) :- V1(M, a, C), V2(S, M, C) P2: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C) P2 could be more efficient than P1! car(Make, Dealer) loc(Dealer, City) part(Store, Make, City) Q(S, C) :- car(M, a), loc(a, C), part(S, M, C) V1(M, D, C) :- car(M, D), loc(D, C) V2(S, M, C) :- part(S, M, C) V3(S) :- car(M, a), loc(a, C), part(S, M, C) a = ‘anderson’

8 Focus Views V1,V2,…,Vn Query Q Step 1: generate a rewriting P (logical plan) Step 2: generate an efficient physical plan from P We focus on the logical level (step 1). –Prune rewriting space to generate “good” rewritings. –Different from the one-step approach: [CKPS95, ZCLPU00]. Both steps are cost-based. Consider select-project-join queries, i.e., conjunctive queries. Cost model CM

9 Three cost models: –CM1: number of subgoals in a physical plan –CM2: sizes of views and intermediate relations –CM3: CM2 + dropping attributes in intermediate relations Experimental results Conclusion and future directions Rest of the talk

10 CM1: number of subgoals in a physical plan –Goal: generate rewritings with minimum number of subgoals Motivations: –Reduce the number of joins –Reduce the number of view accesses Example: –P1: Q(S, C) :- V1(M, a, C), V2(S, M, C)  more efficient –P2: Q(S, C) :- V1(M1, a, C), V1(M, a, C1), V2(S, M, C) A view can appear more than once in different “forms.” Cost model CM1

11 Results under CM1 Analyze the rewriting space: –Find an interesting structure of the space; –Show a procedure to reduce number of subgoals in a rewriting. Develop an algorithm CoreCover: –Input: a query Q, views V1, …, Vn –Output: rewritings with minimum number of subgoals Optimality: if there is a rewriting, then CoreCover guarantees to find a rewriting with minimum number of subgoals.

12 CoreCover: example Query: Q(S, C) :- car(M, a), loc(a, C), part(S, M, C) Evaluate views on D: V1(M, D, C) :- car(M, D), loc(D, C)  V1(m0, a, c0) V2(S, M, C) :- part(S, M, C)  V2(s0, m0, c0) V3(S) :- car(M, a), loc(a, C), part(S, M, C)  V3(s0) Construct database D = { car(m0, a), loc(a, c0), part(s0, m0, c0) } D View tuples : V1(M, a, C), V2(S, M, C), V3(S) Intuition: translate the problem to a set-covering problem.

13 CoreCover: example (cont.) Find minimal covers of query subgoals using view tuples Q(S, C) :- V1(M, a, C), V2(S, M, C) Find query subgoals “covered” by each view tuple: V1(M, a, C) car(M, a) V2(S, M, C) loc(a, C) V3(S) part(S, M, C) View tuples : V1(M, a, C), V2(S, M, C), V3(S) V1(M, D, C):- car(M, D), loc(D, C) V2(S, M, C) :- part(S, M, C) V3(S) :- car(M, a), loc(a, C), part(S, M, C) Query: Q(S, C) :- car(M, a), loc(a, C), part(S, M, C)

14 Algorithm: CoreCover Q Construct database D from Q D Find minimal covers of query subgoals using view tuples. rewritings Evaluate views on D “View tuples” View tuples T1 T2 … Tk Find query subgoals “answered” by each view tuple. Query subgoals G1 G2 G3 … Gm

15 Cost model CM2: considering sizes of views and intermediate relations Motivation: cost of V1 V2 is related to size(V1) and size(V2). Cost = size(V1) + size(V2) + … + size(Vn) + size(IR1) + size(IR2) + … + size(IRn) Physical plan: Q( ) :- V1, V2, V3, …, Vn IR1 IR2 IRn “IR”: intermediate relation

16 Results under CM2 Observation: Adding more views may make a rewriting more efficient. P1: Q(S, C) :- V1(M, a, C), V2(S, M, C) P2: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C) If V3(S) is very selective, P2 can be more efficient than P1. Larger search space: rewritings using view tuples produce an optimal physical plan under CM2. –Modify CoreCover to find these rewritings. –We discuss how to condense rewritings.

17 Cost model CM3: dropping nonrelevant attributes CM2: assumes all attributes are kept in IRs. CM3: assumes attributes can be dropped in IRs to reduce sizes. Bad news: didn’t find a space that guarantees to produce an optimal physical plan. Good news: found a heuristic for optimizer to drop more attributes. IRi Y Q( ) :- … V i V i+1 …

18 Drop what attributes? Drop Y if: (1) Y is not used in later joins, and (2) Y is not in the answers. Called the “supplementary-relation approach.” [BR87] IRi Y Q( ) :- … V i V i+1 …

19 Search space under CM3? Q(A) :- r(A, A), t(A, B), s(B, B) V1(A, B) :- r(A, A), s(B, B) V2(A, B) :- t(A, B), s(B, B) r(A,B) s(C,D) t(E,F) Rewritings using view tuples may not produce optimal physical plans! Rewriting using view tuples: P1: Q(A) :- V1(A, B), V2(A, B) A more efficient rewriting: P2: Q(A) :- V1(A, C), V2(A, B) Note: P1 and P2 both compute the answers to Q.

20 Targeting rewritings to facilitate dropping of attributes Goal: after the transformation, we may drop more attributes. Main idea: given a sequence of subgoals, rename variables. If Y  Y’, the new rewriting is still equivalent to Q, then drop Y’ in IRi even if Y appears in later joins. IRi Y  Y’ Q( ) :- … V i V i+1 … P1: Q(A) :- V1(A, B), V2(A, B) P2: Q(A) :- V1(A, C), V2(A, B)

21 Experimental study Purpose: –Test how fast CoreCover generates rewritings (cost model CM1). –Analyze its efficiency and scalability. Experiment setup: –A query generator (in Java). Input parameters: Number of base relations Number of attributes in a relation Number of views (1-1000), queries (5) Number of subgoals in a view and a query Shape of queries and views (star, chain, …) –Implemented in Java on a dual-processor Sun Ultra 2 workstation, running SUNOS 5.6, 256MB memory

22 Star queries and views Each query has 8 subgoals, and each view has 1, 2, or 3 subgoals. No attribute projection in the head of the queries/views.

23 Chain queries and views Each query has 8 subgoals, and each view has 1, 2, or 3 subgoals. 1 variable is projected in the head of the queries/views.

24 Conclusion Generating efficient plans using views under CWA: –Cost model CM1: number of subgoals in a plan Analysis of the rewriting space A search space for rewritings CoreCover: finding rewritings with minimum number of subgoals –Cost model CM2: sizes of views and IRs A search space for rewritings Condense rewritings –Cost model CM3: dropping irrelevant attributes in IRs A heuristic to help optimizer drop attributes

25 Future work More complicated queries and views: –Arithmetic comparisons ( =, …) –Aggregations Different assumptions: –Open-world assumption –Maximally-contained rewritings Constraints: –Functional dependencies –Foreign-key constraints

26 Thank you! Questions?

27 Differences between CoreCover and MiniCon CoreCover takes CWA, and MiniCon takes the OWA. MiniCon tries to minimize the number of query subgoals, but it has no guarantee. Technical differences: –CoreCover is more “aggressive” than MiniCon about finding query subgoals answered by a view tuple. –Finding set covers of query subgoals: CoreCover allows overlapping, and MiniCon does not allow it.

28 Difference from earlier studies Views V1,V2,…,Vn Query Q Step 1: generate a rewriting P (logical plan) Step 2: generate an efficient physical plan from P One-step approach: [CKPS95, ZCLPU00]. We focus on the logical level (step 1). –Prune rewriting space to generate “good” rewritings. –Cost-based. Cost model CM

29 Rewriting space All rewritings Minimal rewritings Locally minimal rewritings Containment minimal rewritings Globally minimal rewritings P P’ Rewriting P  P’: Remove its redundant subgoals [Chandra & Merlin 77]:

30 P’  P’’: Remove its subgoals while retaining its equivalence to Q: P3: Q(S, C) :- V3(S), V1(M, a, C), V2(S, M, C) V3(S) can still be removed. Rewriting space (cont.) All rewritings Minimal rewritings Locally minimal rewritings Containment minimal rewritings Globally minimal rewritings P P’ P’’

31 Rewriting space (cont.) All rewritings Minimal rewritings Locally minimal rewritings Containment minimal rewritings Globally minimal rewritings P P’ P’’  P*: transform P’’ using the mapping from the expansion of P’’ to the query: P1: Q(S,C) :- v1(M1,a,C),v1(M,a,C1),v2(S,M,C)  P2: Q(S,C) :- v1(M,a,C), v2(S,M,C) P’’ P*

32 Concise representation of rewritings Problem: as the number of views increases, the number of rewritings could be large! Solution: –Group views into equivalence classes –Group view tuples into equivalence classes based on their covered query subgoals.

33 Advantages Advantages: –Number of equivalence classes bounded by the number of query subgoals. –The optimizer finds efficient physical plans by considering the “representative rewritings,” then decides how to make them more efficient by adding more view tuples. –The optimizer can replace a view tuple in a rewriting by another view tuple in the same equivalence class to have another rewriting. Equivalence classes Views V1 V2 … Vn {V1, V3} {V4,V10,V15} {V2, V9} … Equivalence classes View tuples T1 T2 … Tn {T2, T5} {T1,T6,T9} {T3} …

34 Main results of experiments CoreCover has good efficiency and scalability. By grouping views and view tuples into equivalence classes, we can reduce the number of views and view tuples used by CoreCover.

35 Star queries and views: Number of equivalence classes

36 Star queries and views: Number of view tuples