1 Lecture 24: Query Execution Monday, November 27, 2006.

Slides:



Advertisements
Similar presentations
1 Lecture 23: Query Execution Friday, March 4, 2005.
Advertisements

Lecture 13: Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data.
Query Optimization Goal: Declarative SQL query
Query Optimization. Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks (corresponding to.
Lecture 14: Query Optimization. This Lecture Query rewriting Cost estimation –We have learned how atomic operations are implemented and their cost –We’ll.
1 Overview of Indexing Chapter 8 – Part II. 1. Introduction to indexing 2. First glimpse at indices and workloads.
Lecture 9 Query Optimization November 24, 2010 Dan Suciu -- CSEP544 Fall
Query Optimization: Transformations May 29 th, 2002.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Lecture 03: SQL Friday, January 7, Administrivia Have you logged in IISQLSRV yet ? HAVE YOU CHANGED YOUR PASSWORD ? Homework 1 is now posted.
1 Overview of Indexing Chapter 8 – Part II. 1. Introduction to indexing 2. First glimpse at indices and workloads.
1 Lecture 7: Query Execution and Optimization Tuesday, February 20, 2007.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone.
IM433-Industrial Data Systems Management Lecture 5: SQL.
CSCE Database Systems Chapter 15: Query Execution 1.
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
Query Optimization March 6 th, Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks.
Query Optimization Imperative query execution plan: Declarative SQL query Ideally: Want to find best plan. Practically: Avoid worst plans! Goal: Purchase.
1 Lecture 23: Query Execution Wednesday, March 8, 2006.
1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
CSE544 Query Optimization Tuesday-Thursday, February 8 th -10 th, 2011 Dan Suciu , Winter
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Hash Tables and Query Execution March 1st, Hash Tables Secondary storage hash tables are much like main memory ones Recall basics: –There are n.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
CSE 544: Lecture 14 Wednesday, 5/15/2002 Optimization, Size Estimation.
CS4432: Database Systems II Query Processing- Part 1 1.
DBMS Internals Execution and Optimization May 10th, 2004.
CPS216: Advanced Database Systems Notes 02:Query Processing (Overview) Shivnath Babu.
Lecture 8: Relational Algebra
Lecture 24: Query Execution and Optimization
Lecture 16: Relational Operators
Indexing and Execution
Database Management Systems (CS 564)
CS222: Principles of Data Management Notes #13 Set operations, Aggregation Instructor: Chen Li.
Introduction to Database Systems
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Introduction to Database Systems CSE 444 Lecture 03: SQL
Introduction to Database Systems CSE 444 Lecture 03: SQL
SQL Introduction Standard language for querying and manipulating data
Lecture 12: SQL Friday, October 20, 2000.
Database Management Systems (CS 564)
Lecture 25: Query Execution
Lectures 6: Introduction to SQL 5
Relational Algebra Friday, 11/14/2003.
Lecture 3 Monday, April 8, 2002.
Lecture 24: Query Execution
Lecture 25: Query Optimization
Lecture 13: Query Execution
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Lecture 23: Query Execution
Lecture 22: Query Execution
CSE 444: Lecture 25 Query Execution
Lecture 22: Query Execution
Monday, 5/13/2002 Hash table indexes, query optimization
Query Optimization March 7th, 2003.
Query Optimization May 16th, 2002
CSE 544: Query Execution Wednesday, 5/12/2004.
Lecture 23: Monday, November 25, 2002.
CSE 544: Optimizations Wednesday, 5/10/2006.
Lecture 26 Monday, December 3, 2001.
Lecture 22: Friday, November 22, 2002.
Lecture 14: SQL Wednesday, October 31, 2001.
Lecture 24: Query Execution
Lecture 20: Query Execution
Lecture 24: Wednesday, November 27, 2002.
Presentation transcript:

1 Lecture 24: Query Execution Monday, November 27, 2006

2 Outline Query optimization: algebraic laws 16.2

3 Example Product(pname, maker), Company(cname, city) How do we execute this query ? Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle” Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle”

4 Example Product(pname, maker), Company(cname, city) Assume: Clustered index: Product.pname, Company.cname Unclustered index: Product.maker, Company.city

5  city=“Seattle” Product (pname,maker) Company (cname,city) maker=cname Logical Plan:

6  city=“Seattle” Product (pname,maker) Company (cname,city) cname=maker Physical plan 1: Index-based selection Index-based join

7  city=“Seattle” Product (pname,maker) Company (cname,city) maker=cname Physical plans 2a and 2b: Index- scan Merge-join Scan and sort (2a) index scan (2b) Which one is better ??

8  city=“Seattle” Product (pname,maker) Company (cname,city) cname=maker Physical plan 1: Index-based selection Index-based join T(Company) / V(Company, city)  T(Product) / V(Product, maker) Total cost: T(Company) / V(Company, city)  T(Product) / V(Product, maker) Total cost: T(Company) / V(Company, city)  T(Product) / V(Product, maker)

9  city=“Seattle” Product (pname,maker) Company (cname,city) maker=cname Physical plans 2a and 2b: Table- scan Merge-join Scan and sort (2a) index scan (2b) B(Company) 3B(Product) T(Product) No extra cost (why ?) Total cost: (2a): 3B(Product) + B(Company) (2b): T(Product) + B(Company) Total cost: (2a): 3B(Product) + B(Company) (2b): T(Product) + B(Company)

10 Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Which one is better ?? It depends on the data !!

11 Example Case 1: V(Company, city)  T(Company) Case 2: V(Company, city) << T(Company) T(Company) = 5,000 B(Company) = 500 M = 100 T(Product) = 100,000 B(Product) = 1,000 We may assume V(Product, maker)  T(Company) (why ?) T(Company) = 5,000 B(Company) = 500 M = 100 T(Product) = 100,000 B(Product) = 1,000 We may assume V(Product, maker)  T(Company) (why ?) V(Company,city) = 2,000 V(Company,city) = 20

12 Which Plan is Best ? Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Case 1: Case 2:

13 Lessons Need to consider several physical plan –even for one, simple logical plan No magic “best” plan: depends on the data In order to make the right choice –need to have statistics over the data –the B’s, the T’s, the V’s

14 Query Optimzation Have a SQL query Q Create a plan P Find equivalent plans P = P’ = P’’ = … Choose the “cheapest”. HOW ??

15 Logical Query Plan SELECT P.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND P.city=‘seattle’ AND Q.phone > ‘ ’ SELECT P.buyer FROM Purchase P, Person Q WHERE P.buyer=Q.name AND P.city=‘seattle’ AND Q.phone > ‘ ’ Purchase Person Buyer=name City=‘seattle’ phone>’ ’ buyer  In class: find a “better” plan P’ P= Purchasse(buyer, city) Person(name, phone)

16 Logical Query Plan SELECT city, sum(quantity) FROM sales GROUP BY city HAVING sum(quantity) < 100 SELECT city, sum(quantity) FROM sales GROUP BY city HAVING sum(quantity) < 100 sales(product, city, quantity)  city, sum(quantity)→p  p < 100 T1(city,p) T2(city,p) In class: find a “better” plan P’ Q= P=

17 The three components of an optimizer We need three things in an optimizer: Algebraic laws An optimization algorithm A cost estimator

18 Algebraic Laws Commutative and Associative Laws R  S = S  R, R  (S  T) = (R  S)  T R |  | S = S |  | R, R |  | (S |  | T) = (R |  | S) |  | T Distributive Laws R |  | (S  T) = (R |  | S)  (R |  | T)

19 Algebraic Laws Laws involving selection:  C AND C’ (R) =  C (  C’ (R)) =  C (R)   C’ (R)  C OR C’ (R) =  C (R)   C’ (R)  C (R |  | S) =  C (R) |  | S When C involves only attributes of R  C (R – S) =  C (R) – S  C (R  S) =  C (R)   C (S)  C (R |  | S) =  C (R) |  | S

20 Algebraic Laws Example: R(A, B, C, D), S(E, F, G)  F=3 (R |  | D=E S) = ?  A=5 AND G=9 (R |  | D=E S) = ?

21 Algebraic Laws Laws involving projections  M (R |  | S) =  M (  P (R) |  |  Q (S))  M (  N (R)) =  M,N (R) Example R(A,B,C,D), S(E, F, G)  A,B,G (R |  | D=E S) =  ? (  ? (R) |  | D=E  ? (S))

22 Algebraic Laws Laws involving grouping and aggregation:  (  A, agg(B) (R)) =  A, agg(B) (R)  A, agg(B) (  (R)) =  A, agg(B) (R) if agg is “duplicate insensitive” Which of the following are “duplicate insensitive” ? sum, count, avg, min, max  A, agg(D) (R(A,B) |  | B=C S(C,D)) =  A, agg(D) (R(A,B) |  | B=C (  C, agg(D) S(C,D)))

23 Optimizations Based on Semijoins THIS IS ADVANCED STUFF; NOT ON THE FINAL R S =  A1,…,An (R S) Where the schemas are: –Input: R(A1,…An), S(B1,…,Bm) –Output: T(A1,…,An)

24 Optimizations Based on Semijoins Semijoins: a bit of theory (see [AHV]) Given a query: A full reducer for Q is a program: Such that no dangling tuples remain in any relation Q :-  (  (R 1 |x| R 2 |x|... |x| R n )) R i1 := R i1 R j1 R i2 := R i2 R j R ip := R ip R jp R i1 := R i1 R j1 R i2 := R i2 R j R ip := R ip R jp

25 Optimizations Based on Semijoins Example: A full reducer is: Q(A,E) :- R1(A,B) |x| R2(B,C) |x| R3(C,D,E) R2(B,C) := R2(B,C) |x R1(A,B) R3(C,D,E) := R3(C,D,E) |x R2(B,C) R2(B,C) := R2(B,C) |x R3(C,D,E) R1(A,B) := R1(A,B) |x R2(B,C) R2(B,C) := R2(B,C) |x R1(A,B) R3(C,D,E) := R3(C,D,E) |x R2(B,C) R2(B,C) := R2(B,C) |x R3(C,D,E) R1(A,B) := R1(A,B) |x R2(B,C) The new tables have only the tuples necessary to compute Q(E)

26 Optimizations Based on Semijoins Example: Doesn’t have a full reducer (we can reduce forever) Theorem a query has a full reducer iff it is “acyclic” Q(E) :- R1(A,B) |x| R2(B,C) |x| R3(A,C, E)

27 Optimizations Based on Semijoins Semijoins in [Chaudhuri’98] CREATE VIEW DepAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, DepAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal CREATE VIEW DepAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, DepAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal

28 Optimizations Based on Semijoins First idea: CREATE VIEW LimitedAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Dept D WHERE E.did = D.did AND D.buget > 100k GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, LimitedAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal CREATE VIEW LimitedAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Dept D WHERE E.did = D.did AND D.buget > 100k GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, LimitedAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal

29 Optimizations Based on Semijoins Better: full reducer CREATE VIEW PartialResult AS (SELECT E.id, E.sal, E.did FROM Emp E, Dept D WHERE E.did=D.did AND E.age < 30 AND D.budget > 100k) CREATE VIEW Filter AS (SELECT DISTINCT P.did FROM PartialResult P) CREATE VIEW LimitedAvgSal AS (SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Filter F WHERE E.did = F.did GROUP BY E.did) CREATE VIEW PartialResult AS (SELECT E.id, E.sal, E.did FROM Emp E, Dept D WHERE E.did=D.did AND E.age < 30 AND D.budget > 100k) CREATE VIEW Filter AS (SELECT DISTINCT P.did FROM PartialResult P) CREATE VIEW LimitedAvgSal AS (SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Filter F WHERE E.did = F.did GROUP BY E.did)

30 Optimizations Based on Semijoins SELECT P.eid, P.sal FROM PartialResult P, LimitedDepAvgSal V WHERE P.did = V.did AND P.sal > V.avgsal SELECT P.eid, P.sal FROM PartialResult P, LimitedDepAvgSal V WHERE P.did = V.did AND P.sal > V.avgsal