Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 24: Query Execution and Optimization

Similar presentations


Presentation on theme: "Lecture 24: Query Execution and Optimization"— Presentation transcript:

1 Lecture 24: Query Execution and Optimization
Monday, November 24, 2003

2 Outline Finish query execution (really)
Query optimization: algebraic laws 16.2 Cost-based optimization 16.5, 16.6

3 Example Product(pname, maker), Company(cname, city)
How do we execute this query ? Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle”

4 Example Product(pname, maker), Company(cname, city) Assume:
Clustered index: Product.pname, Company.cname Unclustered index: Product.maker, Company.city

5 Logical Plan: scity=“Seattle” Product (pname,maker)
maker=cname scity=“Seattle” Product (pname,maker) Company (cname,city)

6 Index-based selection
Physical plan 1: Index-based join Index-based selection cname=maker scity=“Seattle” Company (cname,city) Product (pname,maker)

7 Scan and sort (2a) index scan (2b)
Physical plans 2a and 2b: Merge-join Which one is better ?? maker=cname scity=“Seattle” Product (pname,maker) Company (cname,city) Index- scan Scan and sort (2a) index scan (2b)

8 Index-based selection
Physical plan 1:  T(Product) / V(Product, maker) Index-based join Index-based selection Total cost: T(Company) / V(Company, city)  T(Product) / V(Product, maker) cname=maker scity=“Seattle” Company (cname,city) Product (pname,maker) T(Company) / V(Company, city)

9 Scan and sort (2a) index scan (2b)
Total cost: (2a): 3B(Product) + B(Company) (2b): T(Product) + B(Company) Physical plans 2a and 2b: Merge-join No extra cost (why ?) maker=cname scity=“Seattle” 3B(Product) Product (pname,maker) Company (cname,city) T(Product) Table- scan Scan and sort (2a) index scan (2b) B(Company)

10 Which one is better ?? It depends on the data !!
Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Which one is better ?? It depends on the data !!

11 Example Case 1: V(Company, city)  T(Company)
T(Company) = 5, B(Company) = M = 100 T(Product) = 100, B(Product) = 1,000 We may assume V(Product, maker)  T(Company) (why ?) Case 1: V(Company, city)  T(Company) Case 2: V(Company, city) << T(Company) V(Company,city) = 2,000 V(Company,city) = 20

12 Which Plan is Best ? Case 1: Case 2:
Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Case 1: Case 2:

13 Lessons Need to consider several physical plan
even for one, simple logical plan No magic “best” plan: depends on the data In order to make the right choice need to have statistics over the data the B’s, the T’s, the V’s

14 The three components of an optimizer
We need three things in an optimizer: Algebraic laws An optimization algorithm A cost estimator

15 Algebraic Laws Commutative and Associative Laws Distributive Laws
R  S = S  R, R  (S  T) = (R  S)  T R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T R ⋈ S = S ⋈ R, R ⋈ (S ⋈ T) = (R ⋈ S) ⋈ T Distributive Laws R ⋈ (S  T) = (R ⋈ S)  (R ⋈ T)

16 Algebraic Laws Laws involving selection:
s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R) s C OR C’(R) = s C(R) U s C’(R) s C (R ⋈ S) = s C (R) ⋈ S When C involves only attributes of R s C (R – S) = s C (R) – S s C (R  S) = s C (R)  s C (S) s C (R ∩ S) = s C (R) ∩ S

17 Algebraic Laws Example: R(A, B, C, D), S(E, F, G) s F=3 (R ⋈D=E S) = ?
s A=5 AND G=9 (R ⋈D=E S) = ?

18 Algebraic Laws Laws involving projections
PM(R ⋈ S) = PN(PP(R) ⋈ PQ(S)) Where N, P, Q are appropriate subsets of attributes of M PM(PN(R)) = PM,N(R) Example R(A,B,C,D), S(E, F, G) PA,B,G(R ⋈D=E S) = P ? (P?(R) ⋈ D=E P?(S))

19 Algebraic Laws Laws involving grouping and aggregation:
(A, agg(B)(R)) = A, agg(B)(R) A, agg(B)((R)) = A, agg(B)(R) if agg is “duplicate insensitive” Which of the following are “duplicate insensitive” ? sum, count, avg, min, max A, agg(D)(R(A,B) ⋈B=C S(C,D)) = A, agg(D)(R(A,B) ⋈B=C (C, agg(D)S(C,D))) Why is this true ? Why would we do it ?

20 Heuristic Based Optimizations
Query rewriting based on algebraic laws Result in better queries most of the time Heuristics number 1: Push selections down Heuristics number 2: Sometimes push selections up, then down

21 Predicate Pushdown pname pname s price>100 AND city=“Seattle” maker=name maker=name price>100 city=“Seattle” Product Company Product Company The earlier we process selections, less tuples we need to manipulate higher up in the tree (but may cause us to loose an important ordering of the tuples, if we use indexes).

22 Predicate Pushdown Select y.name, Max(x.price)
From product x, company y Where x.maker = y.name GroupBy y.name Having Max(x.price) > 100 Select y.name, Max(x.price) From product x, company y Where x.maker=y.name and x.price > 100 GroupBy y.name Having Max(x.price) > 100 For each company, find the maximal price of its products. Advantage: the size of the join will be smaller. Requires transformation rules specific to the grouping/aggregation operators. Won’t work if we replace Max by Min.

23 Cost-based Optimizations
Main idea: apply algebraic laws, until estimated cost is minimal Practically: start from partial plans, introduce operators one by one Will see in a few slides Problem: there are too many ways to apply the laws, hence too many (partial) plans

24 Cost-based Optimizations
Approaches: Top-down: the partial plan is a top fragment of the logical plan Bottom up: the partial plan is a bottom fragment of the logical plan

25 Search Strategies Branch-and-bound: Hill climbing:
Remember the cheapest complete plan P seen so far and its cost C Stop generating partial plans whose cost is > C If a cheaper complete plan is found, replace P, C Hill climbing: Remember only the cheapest partial plan seen so far Dynamic programming: Remember the all cheapest partial plans

26 Dynamic Programming Unit of Optimization: select-project-join
Push selections down, pull projections up

27 Join Trees R1 ⋈ R2 ⋈ …. ⋈ Rn Join tree: A plan = a join tree
A partial plan = a subtree of a join tree R3 R1 R2 R4

28 Types of Join Trees Left deep: R4 R2 R5 R3 R1

29 Types of Join Trees Bushy: R3 R2 R4 R1 R5

30 Types of Join Trees Right deep: R3 R1 R5 R2 R4

31 Problem Given: a query R1 ⋈ R2 ⋈ … ⋈ Rn
Assume we have a function cost() that gives us the cost of every join tree Find the best join tree for the query


Download ppt "Lecture 24: Query Execution and Optimization"

Similar presentations


Ads by Google