Optimizing Queries Using Materialized Views

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Tuning: overview Rewrite SQL (Leccotech)Leccotech Create Index Redefine Main memory structures (SGA in Oracle) Change the Block Size Materialized Views,
February 18, 2012 Lesson 3 Standard SQL. Lesson 3 Standard SQL.
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
CS4432: Database Systems II
CS CS4432: Database Systems II Logical Plan Rewriting.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Ingres/Vectorwise Implementation Details XXV Ingres Benutzerkonferenz 2012 Confidential © 2011 Actian Corporation Doug Inkster 1 of 9.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
Query Processing & Optimization
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Optimizing queries using materialized views J. Goldstein, P.-A. Larson SIGMOD 2001.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
Optimizing Queries Using Materialized Views Qiang Wang CS848.
Database Management 9. course. Execution of queries.
QMapper for Smart Grid: Migrating SQL-based Application to Hive Yue Wang, Yingzhong Xu, Yue Liu, Jian Chen and Songlin Hu SIGMOD’15, May 31–June 4, 2015.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
SQL Server 2005 Implementation and Maintenance Chapter 3: Tables and Views.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
SqlExam1Review.ppt EXAM - 1. SQL stands for -- Structured Query Language Putting a manual database on a computer ensures? Data is more current Data is.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
CS4432: Database Systems II Query Processing- Part 1 1.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
Practical Database Design and Tuning
More SQL: Complex Queries,
Some TPC-H queries on Teradata and PostgreSQL
Indexes By Adrienne Watt.
Relational Database Design
Module 2: Intro to Relational Model
Database Management System
SQL Implementation & Administration
PL/SQL LANGUAGE MULITPLE CHOICE QUESTION SET-1
Lecture 2 The Relational Model
Bolin Ding Silu Huang* Surajit Chaudhuri Kaushik Chakrabarti Chi Wang
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
SQL Structured Query Language 11/9/2018 Introduction to Databases.
File Processing : Query Processing
Relational Operations
Physical Database Design
Practical Database Design and Tuning
Chapter 4 Indexes.
CH 4 Indexes.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
CH 4 Indexes.
Four Rules For Columnstore Query Performance
Contents Preface I Introduction Lesson Objectives I-2
Overview of Query Evaluation
Implementation of Relational Operations
Chapter 2: Intro to Relational Model
Relational Database Design
Chapter 8 Advanced SQL.
Chapter 11 Database Performance Tuning and Query Optimization
Example of a Relation attributes (or columns) tuples (or rows)
Chapter 2: Intro to Relational Model
Evaluation of Relational Operations: Other Techniques
A Framework for Testing Query Transformation Rules
Diving into Query Execution Plans
Yan Huang - CSCI5330 Database Implementation – Query Processing
All about Indexes Gail Shaw.
CS 405G: Introduction to Database Systems
Presentation transcript:

Optimizing Queries Using Materialized Views Paul Larson & Jonathan Goldstein Microsoft Research 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Materialized views Precomputed, stored result defined by a view expression Faster queries but slower updates Issues View design View exploitation View maintenance View exploitation: determine whether and how a query (sub)expression can be computed from existing views 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Query optimization Generate rewrites, estimate cost, choose lowest-cost alternative Generating rewrites in SQL Server Apply local algebraic transformation rules to generate substitute expressions Logical exploration followed by physical optimization View matching is a logical rule that fires a view matching algorithm 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Example view create view v1 with schemabinding as select s_suppkey, s_name, s_nationkey, count_big(*) as cnt, sum(l_extendedprice*l_quantity) as grv from dbo.lineitem, dbo.supplier where p_partkey < 1000 and l_suppkey = s_suppkey group by s_suppkey, s_name, s_nationkey   create unique clustered index v1_cidx on v1(s_suppkey) create index v1_sidx on v1( grv, s_name) 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Example query Select n_nationkey, n_name, sum(l_extendedprice*l_quantity) from lineitem, supplier, nation where l_partkey between 100 and 500 and l_suppkey = s_suppkey and s_nationkey = n_nationkey group by n_nationkey, n_name Execution time on 1GB TPC-R database: 99 sec (cold), 27 sec (hot) 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Rewrite using v1 Select n_nationkey, n_name, smp from (select s_nationkey, sum(l_extendedprice*l_quantity) as smp from lineitem, supplier where l_suppkey = s_suppkey and l_suppkey between 100 and 500 group by s_nationkey) as sq1, nation where s_nationkey = n_nationkey Select n_nationkey, n_name, smp from (select s_nationkey, sum(grv)as smp from v1 where s_suppkey between 100 and 500 group by s_nationkey ) as sq1, nation where s_nationkey = n_nationkey Execution time on 1GB TPCD-R database: less than 1 sec 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Outline of the talk View matching algorithm Algorithm overview SPJ expressions, same tables referenced Extra tables in the view Grouping and aggregation Fast filtering of views Experimental results 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Design objectives SPJG views and query expressions Single-view substitutes Fast algorithm Scale to hundreds, even thousands of views 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Algorithm overview Quickly dismiss most views that cannot be used Detailed checking of remaining candidate views Construct substitute expressions 11/9/2018 Paul Larson, View matching

When can a SPJ expression be computed from a view? View contains all required rows The required rows can be selected from the view All output expressions can be computed from the view output All output rows occur with the right duplication factor (not always required) 11/9/2018 Paul Larson, View matching

Column equivalence classes W = PE and PNE PE = column equality predicates (R.Ci = S.Cj) PNE = all other predicates Compute column equivalence classes using PE Columns in the same equivalence class interchangeable in PNE, output expressions, and grouping expressions Replace column references by references to equivalence classes 11/9/2018 Paul Larson, View matching

View contains all required rows? Assumption: query and view reference the same tables Wq  Wv (containment) Pq1Pq2…Pqm  Pv1Pv2…Pvn Convert predicates to CNF Check that every Pvi matches some Pqj Shallow or deep matching? Too conservative – can do better 11/9/2018 Paul Larson, View matching

Exploiting column equivalences and range predicates PEq  PRq PUq  PEv  PRv Puv PE = column equality predicates (R.Ci = S.Cj) PR = range predicates (R.Ci < 50) PU = residual (uninterpreted) predicates PEq  PEv (Equijoin subsumption) PEq  PRq  PRv (Range subsumption) PEq PUq  PUv (Residual subsumption) 11/9/2018 Paul Larson, View matching

Equijoin subsumption test PEq  PEv Compute column equivalence classes for the query and the view Every view equivalence class must be a subset of some query equivalence class 11/9/2018 Paul Larson, View matching

Range subsumption test PEq  PRq  PRv Compute range intervals for every column equivalence class (initially (-,+)) Check that every query range interval is contained in a range interval of the corresponding view equivalence class 11/9/2018 Paul Larson, View matching

Residual subsumption test PEq PUq  PUv Treat as uninterpreted predicates Convert to CNF Apply predicate matching algorithm, taking into account column equivalences Currently using a shallow matching algorithm (convert to strings, compare strings) 11/9/2018 Paul Larson, View matching

Selecting rows from the view Compensating predicates Unmatched column equality predicates from the query Range predicates obtained when comparing query and view ranges Unmatched residual predicates from the query All column references must map to an output column in the view (taking into account column equivalences) 11/9/2018 Paul Larson, View matching

Compute output expressions Map simple column references to view output columns (taking into account column equivalences) Complex scalar expressions Check whether view outputs a matching expression Otherwise, check whether all operand columns available in view output 11/9/2018 Paul Larson, View matching

Correct duplication factor? Always true when query and view reference the same tables 11/9/2018 Paul Larson, View matching

Extra tables in the view View: R join S join T Query: R join S View usable if every row in (R join S) joins with exactly one row in T Row-extension join Corresponds to a foreign key from S to T Foreign key columns must be non-null Referenced columns in T must be a unique key 11/9/2018 Paul Larson, View matching

View join graph and the hub Row-extension join 11/9/2018 Paul Larson, View matching

If view contains extra tables… Compute hub of view join graph Hub must be a subset of tables used in the query Logically add the extra tables to the query through row-extension joins Just modify query’s column equivalence classes Proceed normally because query and view now reference the same tables 11/9/2018 Paul Larson, View matching

Group-by queries and views SPJ part of view contains all required rows and with correct duplication factor Compensating predicates computable View less or equally aggregated Query grouping columns available if further grouping required Query output expressions computable 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Further aggregation GB list of query must be a subset of GB list of view Query must use only partitionable aggregates Count, sum, min, max 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Example view and query Create view SalesByCust with schemabinding as Select c_custkey, c_name, c_mktsegment, n_name, count_big(*) as cnt, sum(o_totalprice)as stp from orders, customer, nation where c_custkey between 1000 and 5000 and o_custkey = c_custkey and c_nationkey = n_nationkey group by c_custkey, c_name, c_mktsegment, n_name Select c_mktsegment, sum(o_totalprice) from orders, customer where c_custkey between 1000 and 2000 and o_custkey = c_custkey group by c_mktsegment 11/9/2018 Paul Larson, View matching

Rewritten example query View hub {orders} subset of {orders, customer} Compensating predicate (c_custkey <= 2000) computable Query GB-list subset of view GB-list Output expressions computable Select c_mktsegment, sum(stp) from SalesByCust where c_custkey <= 2000 group by c_mktsegment 11/9/2018 Paul Larson, View matching

Fast filtering of views View descriptions in memory Too expensive to check all views each time Filter tree – index on view descriptions Tree subdivides views into smaller and smaller subsets Locating candidate views by traversing down the tree 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Filter tree structure Lattice index Key (set) Pointers Filter tree node 11/9/2018 Paul Larson, View matching

Source table condition TSv = set of tables referenced in view TSq must be a subset of TSv Subdivide views based on set of tables referenced Filter tree node with key = table set 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Hub condition View hub must be a subset of query’s source tables Add another level to the tree One tree node for each subset of views Further subdivide each set of views based on view hubs 11/9/2018 Paul Larson, View matching

Other partitioning conditions Output columns View’s output columns must be a superset of query’s output columns Grouping columns View’s GB list must be a subset of view’s GB list Range constrained columns View’s RC columns must be a subset of query’s RC columns Residual predicates View’s RP set must be a subset of query’s RP set Must consider column equivalences everywhere 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Experimental results Prototyped in SQL Server code base Database: TPC-H/R at 500MB Views: up to 1000 views Randomly generated, 75% with grouping Queries: 1000 queries 2:40%, 3:20%, 4:17%, 5:13%, 6:8%, 7:2% Machine: 700 MHz Pentium, 128MB 11/9/2018 Paul Larson, View matching

Paul Larson, View matching 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Statistics About 17.8 invocations per query Filter tree was highly effective Average fraction of views in candidate set 100 views 0.29%, 1000 views 0.36% 15-20% of candidates produced substitutes Avg. no of substitutes produced per query 100 views 0.7, 1000 views 10.5 11/9/2018 Paul Larson, View matching

Paul Larson, View matching 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Conclusion Our view matching algorithm is Flexible column equivalences, range predicates, hubs Fast and scalable But limited to SPJG expressions and single-view substitutes 11/9/2018 Paul Larson, View matching

Paul Larson, View matching Possible extensions Additional substitutes Back-joins to base tables Union of views Additional view types Self-joins Grouping sets, cube and rollup Outer joins Union views 11/9/2018 Paul Larson, View matching