Lecture 24: Query Execution and Optimization

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
CS CS4432: Database Systems II Logical Plan Rewriting.
Query Optimization May 31st, Today A few last transformations Size estimation Join ordering Summary of optimization.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Query Optimization Goal: Declarative SQL query
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
Query Optimization: Transformations May 29 th, 2002.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
CS 4432logical query rewriting - lecture 151 CS4432: Database Systems II Lecture #15 Logical Query Rewriting Professor Elke A. Rundensteiner.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
16.2 ALGEBRAIC LAWS FOR IMPROVING QUERY PLANS Ramya Karri ID: 206.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Query Optimization Imperative query execution plan: Declarative SQL query Ideally: Want to find best plan. Practically: Avoid worst plans! Goal: Purchase.
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
1 Lecture 23: Query Execution Wednesday, March 8, 2006.
1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CS4432: Database Systems II Query Processing- Part 2.
1 Lecture 25: Query Optimization Wednesday, November 26, 2003.
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
CS4432: Database Systems II Query Processing- Part 1 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
1 Lecture 24: Query Execution Monday, November 27, 2006.
Chapter 14: Query Optimization
Database System Implementation CSE 507
CS 540 Database Management Systems
CS 440 Database Management Systems
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Data Engineering Query Optimization (Cost-based optimization)
Evaluation of Relational Operations
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
April 20th – RDBMS Internals
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Lecture 26: Query Optimization
Algebraic Laws.
Lecture 25: Query Execution
Relational Algebra Friday, 11/14/2003.
Overview of Query Evaluation
Lecture 24: Query Execution
Lecture 25: Query Optimization
Lecture 13: Query Execution
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Lecture 23: Query Execution
Lecture 22: Query Execution
CSE 444: Lecture 25 Query Execution
Lecture 22: Query Execution
Monday, 5/13/2002 Hash table indexes, query optimization
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Yan Huang - CSCI5330 Database Implementation – Query Processing
CSE 544: Query Execution Wednesday, 5/12/2004.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Lecture 23: Monday, November 25, 2002.
CSE 544: Optimizations Wednesday, 5/10/2006.
Lecture 26 Monday, December 3, 2001.
Lecture 22: Friday, November 22, 2002.
Evaluation of Relational Operations: Other Techniques
Lecture 24: Query Execution
Lecture 20: Query Execution
Lecture 24: Wednesday, November 27, 2002.
Presentation transcript:

Lecture 24: Query Execution and Optimization Monday, November 24, 2003

Outline Finish query execution (really) Query optimization: algebraic laws 16.2 Cost-based optimization 16.5, 16.6

Example Product(pname, maker), Company(cname, city) How do we execute this query ? Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle”

Example Product(pname, maker), Company(cname, city) Assume: Clustered index: Product.pname, Company.cname Unclustered index: Product.maker, Company.city

Logical Plan: scity=“Seattle” Product (pname,maker) maker=cname scity=“Seattle” Product (pname,maker) Company (cname,city)

Index-based selection Physical plan 1: Index-based join Index-based selection cname=maker scity=“Seattle” Company (cname,city) Product (pname,maker)

Scan and sort (2a) index scan (2b) Physical plans 2a and 2b: Merge-join Which one is better ?? maker=cname scity=“Seattle” Product (pname,maker) Company (cname,city) Index- scan Scan and sort (2a) index scan (2b)

Index-based selection Physical plan 1:  T(Product) / V(Product, maker) Index-based join Index-based selection Total cost: T(Company) / V(Company, city)  T(Product) / V(Product, maker) cname=maker scity=“Seattle” Company (cname,city) Product (pname,maker) T(Company) / V(Company, city)

Scan and sort (2a) index scan (2b) Total cost: (2a): 3B(Product) + B(Company) (2b): T(Product) + B(Company) Physical plans 2a and 2b: Merge-join No extra cost (why ?) maker=cname scity=“Seattle” 3B(Product) Product (pname,maker) Company (cname,city) T(Product) Table- scan Scan and sort (2a) index scan (2b) B(Company)

Which one is better ?? It depends on the data !! Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Which one is better ?? It depends on the data !!

Example Case 1: V(Company, city)  T(Company) T(Company) = 5,000 B(Company) = 500 M = 100 T(Product) = 100,000 B(Product) = 1,000 We may assume V(Product, maker)  T(Company) (why ?) Case 1: V(Company, city)  T(Company) Case 2: V(Company, city) << T(Company) V(Company,city) = 2,000 V(Company,city) = 20

Which Plan is Best ? Case 1: Case 2: Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Case 1: Case 2:

Lessons Need to consider several physical plan even for one, simple logical plan No magic “best” plan: depends on the data In order to make the right choice need to have statistics over the data the B’s, the T’s, the V’s

The three components of an optimizer We need three things in an optimizer: Algebraic laws An optimization algorithm A cost estimator

Algebraic Laws Commutative and Associative Laws Distributive Laws R  S = S  R, R  (S  T) = (R  S)  T R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T R ⋈ S = S ⋈ R, R ⋈ (S ⋈ T) = (R ⋈ S) ⋈ T Distributive Laws R ⋈ (S  T) = (R ⋈ S)  (R ⋈ T)

Algebraic Laws Laws involving selection: s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R) s C OR C’(R) = s C(R) U s C’(R) s C (R ⋈ S) = s C (R) ⋈ S When C involves only attributes of R s C (R – S) = s C (R) – S s C (R  S) = s C (R)  s C (S) s C (R ∩ S) = s C (R) ∩ S

Algebraic Laws Example: R(A, B, C, D), S(E, F, G) s F=3 (R ⋈D=E S) = ? s A=5 AND G=9 (R ⋈D=E S) = ?

Algebraic Laws Laws involving projections PM(R ⋈ S) = PN(PP(R) ⋈ PQ(S)) Where N, P, Q are appropriate subsets of attributes of M PM(PN(R)) = PM,N(R) Example R(A,B,C,D), S(E, F, G) PA,B,G(R ⋈D=E S) = P ? (P?(R) ⋈ D=E P?(S))

Algebraic Laws Laws involving grouping and aggregation: (A, agg(B)(R)) = A, agg(B)(R) A, agg(B)((R)) = A, agg(B)(R) if agg is “duplicate insensitive” Which of the following are “duplicate insensitive” ? sum, count, avg, min, max A, agg(D)(R(A,B) ⋈B=C S(C,D)) = A, agg(D)(R(A,B) ⋈B=C (C, agg(D)S(C,D))) Why is this true ? Why would we do it ?

Heuristic Based Optimizations Query rewriting based on algebraic laws Result in better queries most of the time Heuristics number 1: Push selections down Heuristics number 2: Sometimes push selections up, then down

Predicate Pushdown pname pname s price>100 AND city=“Seattle” maker=name maker=name price>100 city=“Seattle” Product Company Product Company The earlier we process selections, less tuples we need to manipulate higher up in the tree (but may cause us to loose an important ordering of the tuples, if we use indexes).

Predicate Pushdown Select y.name, Max(x.price) From product x, company y Where x.maker = y.name GroupBy y.name Having Max(x.price) > 100 Select y.name, Max(x.price) From product x, company y Where x.maker=y.name and x.price > 100 GroupBy y.name Having Max(x.price) > 100 For each company, find the maximal price of its products. Advantage: the size of the join will be smaller. Requires transformation rules specific to the grouping/aggregation operators. Won’t work if we replace Max by Min.

Cost-based Optimizations Main idea: apply algebraic laws, until estimated cost is minimal Practically: start from partial plans, introduce operators one by one Will see in a few slides Problem: there are too many ways to apply the laws, hence too many (partial) plans

Cost-based Optimizations Approaches: Top-down: the partial plan is a top fragment of the logical plan Bottom up: the partial plan is a bottom fragment of the logical plan

Search Strategies Branch-and-bound: Hill climbing: Remember the cheapest complete plan P seen so far and its cost C Stop generating partial plans whose cost is > C If a cheaper complete plan is found, replace P, C Hill climbing: Remember only the cheapest partial plan seen so far Dynamic programming: Remember the all cheapest partial plans

Dynamic Programming Unit of Optimization: select-project-join Push selections down, pull projections up

Join Trees R1 ⋈ R2 ⋈ …. ⋈ Rn Join tree: A plan = a join tree A partial plan = a subtree of a join tree R3 R1 R2 R4

Types of Join Trees Left deep: R4 R2 R5 R3 R1

Types of Join Trees Bushy: R3 R2 R4 R1 R5

Types of Join Trees Right deep: R3 R1 R5 R2 R4

Problem Given: a query R1 ⋈ R2 ⋈ … ⋈ Rn Assume we have a function cost() that gives us the cost of every join tree Find the best join tree for the query