Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:

Query Optimization R&G, Chapter 15 Lecture 16

Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date: Tuesday, March 20 by end of class period Homework 4 available later this week – Implement nested loops and hash join operators for minibase –Due date: April 5 (after Spring Break) Midterm 2 is 3/22, 2 weeks from today –In class, covers lectures 10-17 –Review will be held Tuesday 3/20 7-9 pm 306 Soda Hall

Review Query Optimization and Execution Relational Operators Files and Access Methods Buffer Management Disk Space Management DB Now you are here You were here Operators are the building blocks for computing results of queries Sort Project Join Filter Access methods for files... Query plans are a tree of operators that compute the result of a query Optimization is the process of picking the best plan Execution is the process of executing the plan

Query Plans: turning text into tuples SELECT A.aname, max(F.feedingshift) FROM Animals A, Feeding F WHERE A.aid = F.aid AND (A.species = 'Big Cat' or A.species = 'Bear') GROUP BY A.aname HAVING COUNT(*) > 1 Aslan3 Bageera3 Elsa3 Shere Khan3 Tigger2 Name Shift 10211003 10321003 20321003 20231003 30321003 …………… 40100AslanBig Cat 50300BalooBear 60100BageeraBig Cat 70100 Shere Khan Big Cat 90100DumboElephant ………… Operators Query Plan Query Result Query

Operator Review Access Path : pulls tuples from tables –File scans –Index scans (clustered or unclustered) –Index-only scans Select (or Filter): conditionally excludes tuples –Can be ‘pushed’/combined with Access Path operator Use indexes where possible and apply other predicates on the result –Can also be applied at intermediate point in query plan Projection: removes columns and duplicates –Column projection often done by operators –Duplicate elimination via Sort or Hash

Operator Review Sort: sorts tuples in a particular order –Simple merge sort –General external merge sort (with various optimizations) –B+ tree traversal Join: combine tuples from 2 other operators –Page nested loops –Block nested loops –Index nested loops –Sort-merge join –Hash-join Other operators for –Group By, Temping, …

Query Optimization steps 1.Parse query from text to ‘intermediate model’ 2.Traverse ‘intermediate model’ and produce alternate query plans –Query plan = tree of relational operators –Optimizer keeps track of cost and properties of plans 3.Pick the cheapest plan 4.Pass cheapest plan on to query execution engine to execute and produce results of query Query parser Query optimizer SELECT A.aname, max(F.feedingshift) FROM Animals A, Feeding F WHERE A.aid = F.aid AND (A.species = 'Big Cat' or A.species = 'Bear') GROUP BY A.aname HAVING COUNT(*) > 1 Block 2 Block 1 Block 3 Cost = 200 Cost = 150 Cost = 500 To execution engine

Query Blocks: Units of Optimization Intermediate model is a set of query blocks –1 per SELECT/FROM/WHERE/GROUP BY/HAVING clause SELECT S.sname FROM Sailors S WHERE S.age IN ( SELECT MAX (S2.age) FROM Sailors S2 GROUP BY S2.rating ) SELECT A.aname, max(F.feedingshift) FROM Animals A, Feeding F WHERE A.aid = F.aid AND (A.species = 'Big Cat' or A.species = 'Bear') GROUP BY A.aname HAVING COUNT(*) > 1 Query Block Subqueries produce nested query blocks –treated as calls to a subroutine, made once per tuple produced by outer query block –sometimes subqueries can be rewritten to produce cheaper plan Outer Query Block Nested Query Block Rewritten Query Block X

Query blocks are optimized 1 at a time 1.Convert block to relational algebra tree 2.Traverse tree and build plan bottom up: Pick best access method for each relation in FROM clause Applying predicates if possible Consider all join trees All ways to join relations in FROM clause 1-at-a time Consider multiple permutations and join methods –But not all! too many choices –Restrict to left-deep plans –Prune bad plans along the way Query Block

Converting Query Blocks to Relational Algebra Trees SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Reserves Sailors sid=sid bid=100 rating > 5 sname SQL is relationally complete; can express everything in relational algebra  (sname)  (bid=100  rating > 5) (Reserves  Sailors)

Π sname (σ (age in set from subquery) Sailors) SQL extends Relational Algebra SQL is more powerful than relational algebra –extend relational algebra to include aggregate ops: GROUP BY, HAVING How is this query block expressed? SELECT S.sname FROM Sailors S WHERE S.age IN (constant set from subquery) And this query block? SELECT MAX (S2.age) FROM Sailors S2 GROUP BY S2.rating σ (age in set from subquery) Sailors Π Max(age) (GroupBy Rating (Sailors) )GroupBy Rating (Sailors)

Why optimize? Reserves Sailors sid=sid bid=100 rating > 5 sname Operators have implementation choices –Index scan? File scan? Nested loop join? Sort merge? Operators can also be applied in different order!

Motivating Example -- Schema used As seen in previous lectures… Reserves: – Each tuple is 40 bytes long, 100 tuples per page, 1000 pages. – Assume there are 100 boats Sailors: – Each tuple is 50 bytes long, 80 tuples per page, 500 pages. – Assume there are 10 different ratings Assume we have 5 pages in our buffer pool! Sailors ( sid : integer, sname : string, rating : integer, age : real) Reserves ( sid : integer, bid : integer, day : dates, rname : string)

Motivating Example Cost: 500+500*1000 I/Os Not the worst plan, but… Misses several opportunities: –selections could have been `pushed’ earlier, –indexes might have been helpful…. Goal of optimization: To find more efficient plans that compute the same answer. SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid=S.sid AND R.bid=100 AND S.rating>5 Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) Plan: 500 1000

Selectivity calculation Sailors: 500 pages, 80 tuples per page, 10 ratings Selectivity of S.rating > 5? –½ -> 500*80/2 = 20,000 tuples –20,000/80 = 250 pages Reserves: 1000 pages, 100 tuples per page, 100 boats Selectivity of R.bid = 100? –1/100 -> 1000*100/100 = 1000 tuples –1000/100 = 10 pages

500,500 IOs Alternative Plans – Push Selects (No Indexes) Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) 250,500 IOs 500 500 + 250 *1000 = 1000 2501000 500

Alternative Plans – Push Selects (No Indexes) Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) SailorsReserves sid=sid bid = 100 sname (Page-Oriented Nested loops) (On-the-fly) rating > 5 (On-the-fly) 250,500 IOs 1000 250 500 + 250 *1000 = 1000 250 10 500

Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) 6000 IOs Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) 250,500 IOs Alternative Plans – Try different join order swap 10 500 1000 + 10 *500= 1000

SailorsReserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (Scan & Write to temp T2) (On-the-fly) 6000 IOs Sailors Reserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (On-the-fly) Alternative Plans – Push Selects and precompute result (No Indexes) 1000 + 500+ 250 + (10 * 250) = 1000 10 500 1000 10 500 250 4250 IOs

ReservesSailors sid=sid bid=100 sname (Page-Oriented Nested loops) (On-the-fly) rating>5 (Scan & Write to temp T2) (On-the-fly) Alternative Plans – Try different join order 500 + 1000 +10 +(250 *10) = SailorsReserves sid=sid rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) bid=100 (Scan & Write to temp T2) (On-the-fly) 4250 IOs swap 1000 10 500 250 500 250 1000 10 4010 IOs

500,500 IOs Optimized query is 124x cheaper than the original! Sailors Reserves sid=sid bid=100 rating > 5 sname (Page-Oriented Nested loops) (On-the-fly) ReservesSailors sid=sid bid=100 sname (Page-Oriented Nested loops) (On-the-fly) rating>5 (Scan & Write to temp T2) (On-the-fly) 4010 IOs

More Alternative Plans (No Indexes) Main difference: Sort Merge Join With 5 buffers, cost of plan: – Scan Reserves (1000) + write temp T1 (10 pages, if we have 100 boats, uniform distribution) = 1010. – Scan Sailors (500) + write temp T2 (250 pages, if have 10 ratings) = 750. – Sort T1 (2*2*10) + sort T2 (2*4*250) + merge (10+250) = 2300 – Total: 4060 page I/Os. If use BNL join, join = 10+4*250, total cost = 2770. Can also `push’ projections, but must be careful! –T1 has only sid, T2 only sid, sname: – T1 fits in 3 pgs, cost of BNL under 250 pgs, total < 2000. Reserves Sailors sid=sid bid=100 sname (On-the-fly) rating > 5 (Scan; write to temp T1) (Scan; write to temp T2) (Sort-Merge Join) log4 ceil(10/5) = 1; log4 ceil(250/5))=3 Ceil(10/3) =4

More Alt Plans: Indexes With clustered index on bid of Reserves, we get 100,000/100 = 1000 tuples on 1000/100 = 10 pages. INL with outer not materialized. v Decision not to push rating>5 before the join is based on availability of sid index on Sailors. v Cost: Selection of Reserves tuples (10 I/Os); then, for each, must get matching Sailors tuple (1000*1.2); total 1210 I/Os. v Join column sid is a key for Sailors. –At most one matching tuple, unclustered index on sid OK. – Projecting out unnecessary fields from outer doesn’t help. (On-the-fly) (Use hash Index, do not write to temp) Reserves Sailors sid=sid bid=100 sname rating > 5 (Index Nested Loops, with pipelining ) (On-the-fly) 10 I/Os for 1000 tuples on 10 pages For each tuple assume 1.2 pages to find match

What is needed for optimization? A closed set of operators –Relational ops (table in, table out) –Encapsulation based on iterators Plan space, based on –Based on relational equivalences, different implementations Cost Estimation, based on –Cost formulas –Size estimation, based on Catalog information on base tables Selectivity (Reduction Factor) estimation A search algorithm –To sift through the plan space based on cost!

Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:

Similar presentations

Presentation on theme: "Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:

Similar presentations

Presentation on theme: "Query Optimization R&G, Chapter 15 Lecture 16. Administrivia Homework 3 available today –Written exercise; will be posted on class website –Due date:"— Presentation transcript:

Similar presentations

About project

Feedback