Lecture 10 Query Optimization II Automatic Database Design.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Extending Q-Grams to Estimate Selectivity of String Matching with Low Edit Distance [1] Pirooz Chubak May 22, 2008.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Data Mining Techniques Association Rule
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Frequent Closed Pattern Search By Row and Feature Enumeration
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
Association Analysis: Basic Concepts and Algorithms.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Mining Association Rules
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
Query Processing & Optimization
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.
Optimizing Queries and Diverse Data Sources Laura M. Hass Donald Kossman Edward L. Wimmers Jun Yang Presented By Siddhartha Dasari.
P ERMUTATIONS AND C OMBINATIONS Homework: Permutation and Combinations WS.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Recursion. Review  Recursive solutions, by definition, are built off solutions to sub-problems.  Many times, this will mean simply to compute f(n) by.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
Lecture 9 Query Optimization.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
CS4432: Database Systems II Query Processing- Part 2.
Data Engineering SQL Query Processing Shivnath Babu.
March 11, 2005 Recursion (Implementation) Satish Dethe
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Query Optimization Problem Pick the best plan from the space of physical plans.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
CS4432: Database Systems II Query Processing- Part 1 1.
Jennifer Widom Indexes. Jennifer Widom Indexes  Primary mechanism to get improved performance on a database  Persistent data structure, stored in database.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
Case tool Relational Database Schema Designer Cai Xinlei Tang Ning Xu Chen Zhang Yichuan CS4221 P06.
CS 540 Database Management Systems
Reducing Number of Candidates
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Data Mining and Its Applications to Image Processing
Frequent Pattern Mining
Query Optimization for Object-Relational Database Systems
Data Engineering Query Optimization (Cost-based optimization)
Lecture 16: Relational Operators
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
External Joins Query Optimization 10/4/2017
Design and Analysis of Multi-Factored Experiments
Fractional Factorial Design
CPSC-608 Database Systems
Yan Huang - CSCI5330 Database Implementation – Query Processing
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Lecture 10 Query Optimization II Automatic Database Design

Recap: Query Planning Use analytical cost to estimate time needed for a query execution plan tree Selectivity (fraction of tuples returned from input): – col = value: 1/ICARD– 1/nth of # of unique col values, 1/10 if no index – col > value: (value – max) / (max – min) or 1/3 – col1 = col2: 1/max(ICARD(c1), ICARD(c2)) or 1/10

Selinger Heuristics Push down all filters and projections Skip cross-joins Left-deep plans only Get from O(n!) to O(2^n) optimization time

Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,... n (in that order) e.g. {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC} R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join optjoin(S-a) to a + min. access cost for a Precomputed in previous iteration!

Selinger, as code R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optcost s = ∞ optjoin S = ø for a in S: //a is a relation c = optcost s-a + min. cost to join optjoin s-a to a + min. access cost for a if c < optcost s optcost s = c optjoin s = optjoin s-a joined optimally w/ a This is the same algorithm as on the previous slide, written differently Pre-computed in previous iteration!

Example 4 Relations: ABCD (only consider NL join) Optjoin: A = best way to access A (e.g., sequential scan, or predicate pushdown into index...) B = " " " " B C = " " " " C D = " " " " D {A,B} = AB (or BA) {A,C} = AC (or CA) {B,C} = BC (or CB) {A,D} … {B,D} {C,D} R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin

Example (con’t) Optjoin {A,B,C} = remove A: ({B,C})A remove B: ({A,C})B remove C: ({A,B})C {A,C,D} = … {A,B,D} = … {B,C,D} = … … {A,B,C,D} = remove A: ({B,C,D})A remove B: ({A,C,D})B remove C: ({A,B,D})C remove D: ({A,B,C})D R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin

Complexity Number of subsets of set of size n = |power set of n| = 2 n (here, n is number of relations) How much work per subset? Have to iterate through each element of each subset, so this at most n n2 n complexity (vs n!) n=12  49K vs 479M R  set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin

Interesting Orders Push down sorts when it is profitable – Merge joins usually faster than NLJ Another round of dynamic programming For k interesting orders, have complexity kn2 n

Study Break – Join Ordering For the query: SELECT * FROM A,B,C,D WHERE A.v = B.v and B.w = C.w and C.w = D.w; How many left-deep plans are possible? How many plans or subsets of plans do we evaluate using the opt algo? Which one(s) can we eliminate as cross products?

Automatic DB Design Key idea: optimize data layout for performance Make a well-known set of queries execute fast Use cost models to estimate utility of different designs

Materialized Views sales : (saleid, date, time, register, product, price,...) CREATE MATERIALIZED VIEW sales_by_date AS SELECT date, product, sum(price), count(*) AS quantity FROM sales GROUP BY date, product Key properties: Kept up to date as data is added Selected for use automatically by optimizer when appropriate

Conclusions Use dynamic programming to efficiently enumerate costs of different query plans – Start with one table and add more Physical db design is complicated! – Picking the right indexes and materialized views – Combinations of heuristics and what-if cost modeling needed – Designs may be adaptive to changing workloads