Lecture 9 Query Optimization.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Selinger Optimizer Lecture 10 October 15, 2009 Sam Madden.
6.830 Lecture 9 10/1/2014 Join Algorithms. Database Internals Outline Front End Admission Control Connection Management (sql) Parser (parse tree) Rewriter.
Lecture 8 Join Algorithms. Intro Until now, we have used nested loops for joining data – This is slow, n^2 comparisons How can we do better? – Sorting.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
1 Lecture 23: Query Execution Friday, March 4, 2005.
6.830 Lecture 10 Query Optimization 10/6/2014. Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2,...
Lecture 10 Query Optimization II Automatic Database Design.
6.830 Lecture 11 Query Optimization & Automatic Database Design 10/8/2014.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Implementation of Relational Operations Module 5, Lecture 1.
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: Notes #8.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Query Processing & Optimization
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Access Path Selection in a Relation Database Management System (summarized in section 2)
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
1 Implementation of Relational Operations: Joins.
Lecture 11 Main Memory Databases Midterm Review. Time breakdown for Shore DBMS Source: “OLTP Under the Looking Glass”, SIGMOD 2008 Systematically removed.
Access Path Selection in a Relational Database Management System Selinger et al.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
A small country has 4 provinces – A, B, C and D. Each province contains 30%, 20%, 10% and 40% of the population in the country, respectively. The.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
Status “Lifetime of a Query” –Query Rewrite –Query Optimization –Query Execution Optimization –Use cost-estimation to iterate over all possible plans,
Copyright © Curt Hill Query Evaluation Translating a query into action.
1 Fast Computation of Sparse Datacubes Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan Ken :: Yiu Man Lung.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
1 Ch20. Dynamic Programming. 2 BIRD’S-EYE VIEW Dynamic programming The most difficult one of the five design methods Has its foundation in the principle.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Lecture 17: Query Execution Tuesday, February 28, 2001.
David Konopnicki –1997, Rev. MS Optimizing Join Statements To choose an execution plan for a join statement, the optimizer must choose: ä Access.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
March 11, 2005 Recursion (Implementation) Satish Dethe
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
CS4432: Database Systems II Query Processing- Part 1 1.
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
Data Mining and Its Applications to Image Processing
Frequent Pattern Mining
Data Engineering Query Optimization (Cost-based optimization)
Lecture 16: Relational Operators
Evaluation of Relational Operations
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Query Optimization Overview
Database Query Execution
External Joins Query Optimization 10/4/2017
Binomial Expansion 2 L.O. All pupils can follow where the binomial expansion comes from All pupils understand binomial notation All pupils can use the.
Design and Analysis of Multi-Factored Experiments
Overview of Query Evaluation
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Lecture 9 Query Optimization

Selinger Optimizer Algorithm algorithm: compute optimal way to generate every sub-join: size 1, size 2, ... n (in that order) e.g. {A}, {B}, {C}, {AB}, {AC}, {BC}, {ABC} R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Precomputed in previous iteration!

Join Costs Notation: P partitions / passes over data; assuming hash is O(1) Sort-Merge Simple Hash Grace Hash I/O: 3 (|R| + |S|) CPU: O(P x {S}/P log {S}/P) I/O: P (|R| + |S|) CPU: O({R} + {S}) I/O: 3 (|R| + |S|) Grace hash is generally a safe bet, unless memory is close to size of tables, in which case simple can be preferable Extra cost of sorting makes sort merge unattractive unless there is a way to access tables in sorted order (e.g., a clustered index), or a need to output data in sorted order (e.g., for a subsequent ORDER BY)

Query Cost Model Selectivity estimates Data path Single relation Multi-relation Data path Index Sequential scan

Study Break: Cost Estimation For the query: SELECT * FROM A,B WHERE A.v > 5 and B.v < 3 AND A.x = B.x; Each table has 1000 tuples and a range of values from [1,10] Draw up a query execution plan Estimate the cardinality of each step How many tuples are in its output?

Join Ordering, as code R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optcosts = ∞ optjoinS = ø for a in S: //a is a relation csa = optcosts-a + min. cost to join (S-a) to a + min. access cost for a if csa < optcosts optcosts = csa optjoins = optjoin(S-a) joined optimally w/ a Pre-computed in previous iteration!

Example 4 Relations: ABCD (only consider NL join) Optjoin: A = best way to access A (e.g., sequential scan, or predicate pushdown into index...) B = " " " " B C = " " " " C D = " " " " D {A,B} = AB or BA {A,C} = AC or CA {B,C} = BC or CB {A,D} {B,D} {C,D} R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin

Example (con’t) Optjoin R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin Optjoin {A,B,C} = remove A: compare A({B,C}) to ({B,C})A remove B: compare ({A,C})B to B({A,C}) remove C: compare C({A,B}) to ({A,B})C {A,C,D} = … {A,B,D} = … {B,C,D} = … … {A,B,C,D} = remove A: compare A({B,C,D}) to ({B,C,D})A remove B: compare B({A,C,D}) to ({A,C,D})B remove C: compare C({A,B,D}) to ({A,B,D})C remove D: compare D({A,C,C}) to ({A,B,C})D ABC != BCA because of interesting orders like sorts or group bys before the join

Study Break: Join Ordering For the query: SELECT * FROM A, B, C WHERE A.v = B.v and B.w = C.w; All tables have 1,000 tuples and A has 1,000 unique values, B has 100, C has 500 Enumerate all possible plans What are the costs of each possible pairings? What is the min cost plan?

Complexity Number of subsets of set of size n = |power set of n| = 2n (here, n is number of relations) How much work per subset? Have to iterate through each element of each subset, so this at most n n2n complexity (vs n!) n=12  49K vs 479M R set of relations to join For i in {1...|R|}: for S in {all length i subsets of R}: optjoin(S) = a join (S-a), where a is the relation that minimizes: cost(optjoin(S-a)) + min. cost to join (S-a) to a + min. access cost for a Optjoin (string of length n, 0 if element is in, 1 if it is out; clearly, 2^n such strings)