1 Lecture 25 Friday, November 30, 2001. 2 Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Query Optimization May 31st, Today A few last transformations Size estimation Join ordering Summary of optimization.
1 Lecture 23: Query Execution Friday, March 4, 2005.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Optimization Goal: Declarative SQL query
Query Optimization. Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks (corresponding to.
Lecture 14: Query Optimization. This Lecture Query rewriting Cost estimation –We have learned how atomic operations are implemented and their cost –We’ll.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Lecture 24: Query Execution Monday, November 20, 2000.
Query Optimization: Transformations May 29 th, 2002.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Lecture 7: Query Execution and Optimization Tuesday, February 20, 2007.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
Query Optimization March 6 th, Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks.
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
1 Lecture 23: Query Execution Wednesday, March 8, 2006.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CS4432: Database Systems II Query Processing- Part 2.
1 Lecture 25: Query Optimization Wednesday, November 26, 2003.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 440 Database Management Systems Query Optimization 1.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CSE 544: Lecture 14 Wednesday, 5/15/2002 Optimization, Size Estimation.
CS4432: Database Systems II Query Processing- Part 1 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
1 Lecture 24: Query Execution Monday, November 27, 2006.
CS 440 Database Management Systems
15.1 – Introduction to physical-Query-plan operators
CS 440 Database Management Systems
Lecture 24: Query Execution and Optimization
Indexing and Execution
Query Execution Presented by Khadke, Suvarna CS 257
Cse 344 April 25th – Disk i/o.
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Lecture 26: Query Optimization
Query Optimization and Perspectives
Lectures 5: Introduction to SQL 4
Lecture 25: Query Execution
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Relational Algebra Friday, 11/14/2003.
Lecture 24: Query Execution
Lecture 25: Query Optimization
Query Execution Index Based Algorithms (15.6)
Lecture 23: Query Execution
Lecture 22: Query Execution
CSE 444: Lecture 25 Query Execution
Lecture 22: Query Execution
Monday, 5/13/2002 Hash table indexes, query optimization
Query Optimization March 7th, 2003.
Lecture 11: B+ Trees and Query Execution
Query Optimization May 16th, 2002
CSE 544: Query Execution Wednesday, 5/12/2004.
Lecture 23: Monday, November 25, 2002.
CSE 544: Optimizations Wednesday, 5/10/2006.
Lecture 26 Monday, December 3, 2001.
Lecture 27 Wednesday, December 5, 2001.
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Lecture 24: Wednesday, November 27, 2002.
Presentation transcript:

1 Lecture 25 Friday, November 30, 2001

2 Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical plans (7.3) –Algebraic laws (7.2) –Choosing an order for Joins (7.6)

3 Indexed Based Algorithms Recall that in a clustered index all tuples with the same value of the key are clustered on as few blocks as possible Note: book uses another term: “clustering index”. Difference is minor… a a aa a a a aa

4 Index Based Selection Selection on equality:  a=v (R) Clustered index on a: cost B(R)/V(R,a) Unclustered index on a: cost T(R)/V(R,a)

5 Index Based Selection Example: Table scan: –If R is clustered: B(R) = 2,000 I/Os –If R is unclustered: T(R) = 100,000 I/Os Index based selection: –If index is clustered: B(R)/V(R,a) = 100 –If index is unclustered: T(R)/V(R,a) = 5,000 Notice: when V(R,a) is small, then unclustered index is useless B(R) = 2000 T(R) = 100,000 V(R, a) = 20 cost of  a=v (R) = ?

6 Index Based Join R S Assume S has an index on the join attribute Iterate over R, for each tuple fetch corresponding tuple(s) from S Assume R is clustered. Cost: –If index is clustered: B(R) + T(R)B(S)/V(S,a) –If index is unclustered: B(R) + T(R)T(S)/V(S,a)

7 Index Based Join Assume both R and S have a sorted index (B+ tree) on the join attribute Then perform a merge join –called zig-zag join Cost: B(R) + B(S)

8 Example Product(pname, maker), Company(cname, city) Clustered index: Product.pname, Company.cname Unclustered index: Product.maker, Company.city Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle” Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle”

9 Case 1: V(Company, city)  T(Company) V(Product, maker)  T(Product) Case 2: V(Company, city) << T(Company) V(Product, maker)  T(Product) Case 3: V(Company, city) << T(Company) V(Product, maker) << T(Product)  city=“Seattle” ProductCompany maker=cname T(Company) = 5,000 B(Company) = 500 M = 100 T(Product) = 100,000 B(Product) = 1,000 T(Company) = 5,000 B(Company) = 500 M = 100 T(Product) = 100,000 B(Product) = 1,000 V(Company,city) = 2,000 V(Product,maker) = 20,000 V(Company,city) = 2,000 V(Product,maker) = 20,000 V(Company,city) = 20 V(Product,maker) = 20,000 V(Company,city) = 20 V(Product,maker) = 20,000 V(Company,city) = 20 V(Product,maker) = 100 V(Company,city) = 20 V(Product,maker) = 100

10 Case 1: –Index-based selection: T(Company) / V(Company, city) –Index-based join: x T(Product) / V(Product, maker) Case 3: –Table scan and selection on Company: B(Company) sorted !!! –Merge-join with Product: + B(Product) Case 2: –Plan 1 costs 250 x 5 = 1250, Plan 3 costs 1,500 –Plan 1 is better T(Company) / V(Company, city) x T(Product) / V(Product, maker) = 2.5 x 5  13 T(Company) / V(Company, city) x T(Product) / V(Product, maker) = 2.5 x 5  13 B(Company) + B(Product) = 1,500

11 Optimization Start: convert SQL to Logical Plan Algebraic laws: –foundation for every optimization Two approaches to optimizations: –Heuristics: apply laws that seem to result in cheaper plans –Cost based: estimate size and cost of intermediate results, search systematically for best plan

12 Converting from SQL to Logical Plans Select a1, …, an From R1, …, Rk Where C Select a1, …, an From R1, …, Rk Where C  a1,…,an (  C (R1 R2 … Rk))  a1,…,an (  b1, …, bm, aggs (  C (R1 R2 … Rk))) Select a1, …, an From R1, …, Rk Where C Group by b1, …, bl Select a1, …, an From R1, …, Rk Where C Group by b1, …, bl

13 Converting Nested Queries Select distinct product.name From product Where product.maker in (Select company.name From company where company.city=“Seattle”) Select distinct product.name From product Where product.maker in (Select company.name From company where company.city=“Seattle”) Select distinct product.name From product, company Where product.maker = company.name AND company.city=“Seattle” Select distinct product.name From product, company Where product.maker = company.name AND company.city=“Seattle”

14 Converting Nested Queries Select distinct x.name, x.maker From product x Where x.color= “blue” AND x.price >= ALL (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”) Select distinct x.name, x.maker From product x Where x.color= “blue” AND x.price >= ALL (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”) How do we convert this one to logical plan ?

15 Converting Nested Queries Select distinct x.name, x.maker From product x Where x.color= “blue” AND x.price < SOME (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”) Select distinct x.name, x.maker From product x Where x.color= “blue” AND x.price < SOME (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”) Let’s compute the complement first:

16 Converting Nested Queries Select distinct x.name, x.maker From product x, product y Where x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price Select distinct x.name, x.maker From product x, product y Where x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price This one becomes a SFW query: This returns exactly the products we DON’T want, so…

17 Converting Nested Queries (Select x.name, x.maker From product x Where x.color = “blue”) EXCEPT (Select x.name, x.maker From product x, product y Where x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price) (Select x.name, x.maker From product x Where x.color = “blue”) EXCEPT (Select x.name, x.maker From product x, product y Where x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price)

18 Algebraic Laws Commutative and Associative Laws –R U S = S U R, R U (S U T) = (R U S) U T –R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T –R S = S R, R (S T) = (R S) T Distributive Laws –R (S U T) = (R S) U (R T)

19 Algebraic Laws Laws involving selection: –  C AND C’ (R) =  C (  C’ (R)) =  C (R) ∩  C’ (R) –  C OR C’ (R) =  C (R) U  C’ (R) –  C (R S) =  C (R) S When C involves only attributes of R –  C (R – S) =  C (R) – S –  C (R U S) =  C (R) U  C (S) –  C (R ∩ S) =  C (R) ∩ S

20 Algebraic Laws Example: R(A, B, C, D), S(E, F, G) –  F=3 (R S) = ? –  A=5 AND G=9 (R S) = ? D=E

21 Algebraic Laws Laws involving projections –  M (R S) =  N (  P (R)  Q (S)) Where N, P, Q are appropriate subsets of attributes of M –  M (  N (R)) =  M,N (R) Example R(A,B,C,D), S(E, F, G) –  A,B,G (R S) =  ? (  ? (R)  ? (S)) D=E

22 Heuristic Based Optimizations Query rewriting based on algebraic laws Result in better queries most of the time Heuristics number 1: –Push selections down Heuristics number 2: –Sometimes push selections up, then down

23 Predicate Pushdown Product Company maker=name  price>100 AND city=“Seattle” pname Product Company maker=name price>100 pname city=“Seattle” The earlier we process selections, less tuples we need to manipulate higher up in the tree (but may cause us to loose an important ordering of the tuples, if we use indexes).

24 Predicate Pushdown Select y.name, Max(x.price) From product x, company y Where x.maker = y.name GroupBy y.name Having Max(x.price) > 100 Select y.name, Max(x.price) From product x, company y Where x.maker = y.name GroupBy y.name Having Max(x.price) > 100 Select y.name, Max(x.price) From product x, company y Where x.maker=y.name and x.price > 100 GroupBy y.name Having Max(x.price) > 100 Select y.name, Max(x.price) From product x, company y Where x.maker=y.name and x.price > 100 GroupBy y.name Having Max(x.price) > 100 For each company, find the maximal price of its products. Advantage: the size of the join will be smaller. Requires transformation rules specific to the grouping/aggregation operators. Won’t work if we replace Max by Min.

25 Pushing predicates up Create View V1 AS Select x.category, Min(x.price) AS p From product x Where x.price < 20 GroupBy x.category Create View V1 AS Select x.category, Min(x.price) AS p From product x Where x.price < 20 GroupBy x.category Create View V2 AS Select y.name, x.category, x.price From product x, company y Where x.maker=y.name Create View V2 AS Select y.name, x.category, x.price From product x, company y Where x.maker=y.name Select V2.name, V2.price From V1, V2 Where V1.category = V2.category and V1.p = V2.price Select V2.name, V2.price From V1, V2 Where V1.category = V2.category and V1.p = V2.price Bargain view V1: categories with some price<20, and the cheapest price

26 Query Rewrite: Pushing predicates up Create View V1 AS Select x.category, Min(x.price) AS p From product x Where x.price < 20 GroupBy x.category Create View V1 AS Select x.category, Min(x.price) AS p From product x Where x.price < 20 GroupBy x.category Create View V2 AS Select y.name, x.category, x.price From product x, company y Where x.maker=y.name Create View V2 AS Select y.name, x.category, x.price From product x, company y Where x.maker=y.name Select V2.name, V2.price From V1, V2 Where V1.category = V2.category and V1.p = V2.price AND V1.p < 20 Select V2.name, V2.price From V1, V2 Where V1.category = V2.category and V1.p = V2.price AND V1.p < 20 Bargain view V1: categories with some price<20, and the cheapest price

27 Query Rewrite: Pushing predicates up Create View V1 AS Select x.category, Min(x.price) AS p From product x Where x.price < 20 GroupBy x.category Create View V1 AS Select x.category, Min(x.price) AS p From product x Where x.price < 20 GroupBy x.category Create View V2 AS Select y.name, x.category, x.price From product x, company y Where x.maker=y.name AND V1.p < 20 Create View V2 AS Select y.name, x.category, x.price From product x, company y Where x.maker=y.name AND V1.p < 20 Select V2.name, V2.price From V1, V2 Where V1.category = V2.category and V1.p = V2.price AND V1.p < 20 Select V2.name, V2.price From V1, V2 Where V1.category = V2.category and V1.p = V2.price AND V1.p < 20 Bargain view V1: categories with some price<20, and the cheapest price

28 Cost-based Optimizations Unit of Optimization Select-project-join –Push selections down, pull projections up Hence: remains to choose the join order This is the main focus of the cost-based optimizer

29 Join Trees R1 R2 …. Rn Join tree: A join tree represents a plan. An optimizer needs to inspect many (all ?) join trees R3R1R2R4

30 Types of Join Trees Left deep: R3 R1 R5 R2 R4

31 Types of Join Trees Bushy: R3 R1 R2R4 R5

32 Types of Join Trees Right deep: R3 R1 R5 R2R4

33 Problem Given: a query R1 R2 … Rn Assume we have a function cost() that gives us the cost of every join tree Find the best join tree for the query

34 Dynamic Programming Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset In increasing order of set cardinality: –Step 1: for {R1}, {R2}, …, {Rn} –Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn} –… –Step n: for {R1, …, Rn} A subset of {R1, …, Rn} is also called a subquery

35 Dynamic Programming For each subquery Q ⊆ {R1, …, Rn} compute the following: –Size(Q) –A best plan for Q: Plan(Q) –The cost of that plan: Cost(Q)

36 Dynamic Programming Step 1: For each {Ri} do: –Size({Ri}) = B(Ri) –Plan({Ri}) = Ri –Cost({Ri}) = (cost of scanning Ri)

37 Dynamic Programming Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do: –Compute Size(Q) (later…) –For every pair of subqueries Q’, Q’’ s.t. Q = Q’ U Q’’ compute cost(Plan(Q’) Plan(Q’’)) –Cost(Q) = the smallest such cost –Plan(Q) = the corresponding plan

38 Dynamic Programming Return Plan({R1, …, Rn})

39 Dynamic Programming To illustrate, we will make the following simplifications: Cost(P1 P2) = Cost(P1) + Cost(P2) + size(intermediate result(s)) Intermediate results: –If P1 = a join, then the size of the intermediate result is size(P1), otherwise the size is 0 –Similarly for P2 Cost of a scan = 0

40 Dynamic Programming Example: Cost(R5 R7) = 0 (no intermediate results) Cost((R2 R1) R7) = Cost(R2 R1) + Cost(R7) + size(R2 R1) = size(R2 R1)