Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Query Optimization Goal: Declarative SQL query
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
1 Relational Query Optimization Module 5, Lecture 2.
Relational Query Optimization CS186, Fall 2005 R & G Chapters 12/15.
Relational Query Optimization 198:541. Overview of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Implementation of Relational Operations Module 5, Lecture 1.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Relational Query Optimization (this time we really mean it)
Relational Query Optimization CS 186, Spring 2006, Lectures16&17 R & G Chapter 15 It is safer to accept any chance that offers itself, and extemporize.
Query Optimization R&G, Chapter 15 Lecture 17. Administrivia Homework 3 available from class website –Due date: Tuesday, March 20 by end of class period.
Query Optimization Chapter 15. Query Evaluation Catalog Manager Query Optmizer Plan Generator Plan Cost Estimator Query Plan Evaluator Query Parser Query.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
Overview of Query Evaluation R&G Chapter 12 Lecture 13.
Relational Query Optimization
Query Optimization: Transformations May 29 th, 2002.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
CS186 Final Review Query Optimization.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
Relational Query Optimization Implementation of single Relational Operations Choices depend on indexes, memory, stats,… Joins –Blocked nested loops: simple,
1 Implementation of Relational Operations: Joins.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Relational Query Optimization R & G Chapter 12/15.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Relational Query Optimization R & G Chapter 12/15.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
1 Database Systems ( 資料庫系統 ) December 13, 2004 Chapter 15 By Hao-hua Chu ( 朱浩華 )
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Implementation of Database Systems, Jarek Gryz1 Relational Query Optimization Chapters 12.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Database Systems ( 資料庫系統 ) Chapter 12 Overview of Query Evaluation November 22, 2004 By Hao-hua Chu ( 朱浩華 )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction To Query Optimization and Examples Chpt
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Relational Query Optimization
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Relational Query Optimization
Overview of Query Evaluation
Overview of Query Evaluation
Relational Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Database Systems (資料庫系統)
Relational Query Optimization (this time we really mean it)
Overview of Query Evaluation
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

Query Optimization R&G, Chapter 15 Lecture 16

Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date: Tuesday, March 20 by end of class period Homework 4 available later this week – Implement nested loops and hash join operators for minibase –Due date: April 5 (after Spring Break) Midterm 2 is 3/22, 2 weeks from today –In class, covers lectures –Review will be held Tuesday 3/ pm 306 Soda Hall

Review Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Now you are here You were here Operators are the building blocks for computing results of queries Sort Project Join Filter Access methods for files... Query plans are a tree of operators that compute the result of a query Optimization is the process of picking the best plan Execution is the process of executing the plan

Query Plans: turning text into tuples SELECT A.aname, max(F.feedingshift) FROM Animals A, Feeding F WHERE A.aid = F.aid AND (A.species = 'Big Cat' or A.species = 'Bear') GROUP BY A.aname HAVING COUNT(*) > 1 Aslan3 Bageera3 Elsa3 Shere Khan3 Tigger2 Name Shift …………… 40100AslanBig Cat 50300BalooBear 60100BageeraBig Cat Shere Khan Big Cat 90100DumboElephant ………… Operators Query Plan Query Result Query

Operator Review Access Path : pulls tuples from tables –File scans –Index scans (clustered or unclustered) –Index-only scans Select (or Filter): conditionally excludes tuples –Can be ‘pushed’/combined with Access Path operator Use indexes where possible and apply other predicates on the result –Can also be applied at intermediate point in query plan Projection: removes columns and duplicates –Column projection often done by operators –Duplicate elimination via Sort or Hash

Operator Review Sort: sorts tuples in a particular order –Simple merge sort –General external merge sort (with various optimizations) –B+ tree traversal Join: combine tuples from 2 other operators –Page nested loops –Block nested loops –Index nested loops –Sort-merge join –Hash-join Other operators for –Group By, Temping, …

Query Optimization steps 1.Parse query from text to ‘intermediate model’ 2.Traverse ‘intermediate model’ and produce alternate query plans –Query plan = tree of relational operators –Optimizer keeps track of cost and properties of plans 3.Pick the cheapest plan 4.Pass cheapest plan on to query execution engine to execute and produce results of query Query parser Query optimizer SELECT A.aname, max(F.feedingshift) FROM Animals A, Feeding F WHERE A.aid = F.aid AND (A.species = 'Big Cat' or A.species = 'Bear') GROUP BY A.aname HAVING COUNT(*) > 1 Block 2 Block 1 Block 3 Cost = 200 Cost = 150 Cost = 500 To execution engine

Query Blocks: Units of Optimization Intermediate model is a set of query blocks –1 per SELECT/FROM/WHERE/GROUP BY/HAVING clause SELECT S.sname FROM Sailors S WHERE S.age IN ( SELECT MAX (S2.age) FROM Sailors S2 GROUP BY S2.rating ) SELECT A.aname, max(F.feedingshift) FROM Animals A, Feeding F WHERE A.aid = F.aid AND (A.species = 'Big Cat' or A.species = 'Bear') GROUP BY A.aname HAVING COUNT(*) > 1 Query Block Subqueries produce nested query blocks –treated as calls to a subroutine, made once per tuple produced by outer query block –sometimes subqueries can be rewritten to produce cheaper plan Outer Query Block Nested Query Block Rewritten Query Block X

Query blocks are optimized 1 at a time 1.Convert block to relational algebra tree 2.Traverse tree and build plan bottom up: Pick best access method for each relation in FROM clause Applying predicates if possible Consider all join trees All ways to join relations in FROM clause 1-at-a time Consider multiple permutations and join methods –But not all! too many choices –Restrict to left-deep plans –Prune bad plans along the way Query Block

Converting Query Blocks to Relational Algebra Trees SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname SQL is relationally complete; can express everything in relational algebra  (sname)  (bid=100  rating > 5) (Reserves  Sailors)

Π sname (σ (age in set from subquery) Sailors) SQL extends Relational Algebra SQL is more powerful than relational algebra –extend relational algebra to include aggregate ops: GROUP BY, HAVING How is this query block expressed? SELECT S.sname FROM Sailors S WHERE S.age IN (constant set from subquery) And this query block? SELECT MAX (S2.age) FROM Sailors S2 GROUP BY S2.rating σ (age in set from subquery) Sailors Π Max(age) (GroupBy Rating (Sailors) )GroupBy Rating (Sailors)

Why optimize? Reserves Sailors sid=sid bid=100 rating > 5 sname Operators have implementation choices –Index scan? File scan? Nested loop join? Sort merge? Operators can also be applied in different order!

Motivating Example -- Schema used As seen in previous lectures… Reserves: – Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. – Assume there are 100 boats Sailors: – Each tuple is 50 bytes long, 80 tuples per page, 500 pages. – Assume there are 10 different ratings Assume we have 5 pages in our buffer pool! Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string)

Motivating Example Cost: *1000 I/Os Not the worst plan, but… Misses several opportunities: –selections could have been `pushed’ earlier, –indexes might have been helpful…. Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) Plan:

Selectivity calculation Sailors: 500 pages, 80 tuples per page, 10 ratings Selectivity of S.rating > 5? –½ -> 500*80/2 = 20,000 tuples –20,000/80 = 250 pages Reserves: 1000 pages, 100 tuples per page, 100 boats Selectivity of R.bid = 100? –1/100 -> 1000*100/100 = 1000 tuples –1000/100 = 10 pages

500,500 IOs Alternative Plans – Push Selects (No Indexes) Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) 250,500 IOs *1000 =

Alternative Plans – Push Selects (No Indexes) Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) SailorsReserves sid=sid bid = 100 sname (Page-Oriented Nested loops) (On-the-fly) rating > 5 (On-the-fly) 250,500 IOs *1000 =

Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) 6000 IOs Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) 250,500 IOs Alternative Plans – Try different join order swap *500= 1000

SailorsReserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (Scan & Write to temp T2) (On-the-fly) 6000 IOs Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) Alternative Plans – Push Selects and precompute result (No Indexes) (10 * 250) = IOs

ReservesSailors sid=sid bid=100 sname (Page-Oriented Nested loops) (On-the-fly) rating>5 (Scan & Write to temp T2) (On-the-fly) Alternative Plans – Try different join order (250 *10) = SailorsReserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (Scan & Write to temp T2) (On-the-fly) 4250 IOs swap IOs

500,500 IOs Optimized query is 124x cheaper than the original! Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) ReservesSailors sid=sid bid=100 sname (Page-Oriented Nested loops) (On-the-fly) rating>5 (Scan & Write to temp T2) (On-the-fly) 4010 IOs

More Alternative Plans (No Indexes) Main difference: Sort Merge Join With 5 buffers, cost of plan: – Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution) = – Scan Sailors (500) + write temp T2 (250 pages, if have 10 ratings) = 750. – Sort T1 (2*2*10) + sort T2 (2*4*250) + merge (10+250) = 2300 – Total: 4060 page I/Os. If use BNL join, join = 10+4*250, total cost = Can also `push’ projections, but must be careful! –T1 has only sid, T2 only sid, sname: – T1 fits in 3 pgs, cost of BNL under 250 pgs, total < Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) log4 ceil(10/5) = 1; log4 ceil(250/5))=3 Ceil(10/3) =4

More Alt Plans: Indexes With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages. INL with outer not materialized. v Decision not to push rating>5 before the join is based on availability of sid index on Sailors. v Cost: Selection of Reserves tuples (10 I/Os); then, for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. v Join column sid is a key for Sailors. –At most one matching tuple, unclustered index on sid OK. – Projecting out unnecessary fields from outer doesn’t help. (On-the-fly) (Use hash Index, do not write to temp) Reserves Sailors sid=sid bid=100 sname rating > 5 (Index Nested Loops, with pipelining ) (On-the-fly) 10 I/Os for 1000 tuples on 10 pages For each tuple assume 1.2 pages to find match

What is needed for optimization? A closed set of operators –Relational ops (table in, table out) –Encapsulation based on iterators Plan space, based on –Based on relational equivalences, different implementations Cost Estimation, based on –Cost formulas –Size estimation, based on Catalog information on base tables Selectivity (Reduction Factor) estimation A search algorithm –To sift through the plan space based on cost!