Lecture 23: Monday, November 25, 2002.

Slides:



Advertisements
Similar presentations
1 Lecture 23: Query Execution Friday, March 4, 2005.
Advertisements

CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
COMP 451/651 Optimizing Performance
Lecture 24: Query Execution Monday, November 20, 2000.
Query Optimization: Transformations May 29 th, 2002.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSE 444: Lecture 24 Query Execution Monday, March 7, 2005.
CSCE Database Systems Chapter 15: Query Execution 1.
1 Lecture 23: Query Execution Wednesday, March 8, 2006.
1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CS4432: Database Systems II Query Processing- Part 2.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
1 Lecture 24: Query Execution Monday, November 27, 2006.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Lecture 24: Query Execution and Optimization
Lecture 16: Relational Operators
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Lecture 2 (cont’d) & Lecture 3: Advanced SQL – Part I
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Cse 344 April 25th – Disk i/o.
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Relational Operations
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
15.6 Index Based Algorithms
April 27th – Cost Estimation
Lectures 5: Introduction to SQL 4
Lecture 25: Query Execution
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Relational Algebra Friday, 11/14/2003.
Lecture 24: Query Execution
Lecture 25: Query Optimization
Lecture 13: Query Execution
Query Execution Index Based Algorithms (15.6)
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Lecture 23: Query Execution
Evaluation of Relational Operations: Other Techniques
Lecture 22: Query Execution
CSE 444: Lecture 25 Query Execution
Lecture 22: Query Execution
Monday, 5/13/2002 Hash table indexes, query optimization
Query Optimization March 7th, 2003.
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Lecture 11: B+ Trees and Query Execution
CSE 544: Query Execution Wednesday, 5/12/2004.
Lecture 26 Monday, December 3, 2001.
CPSC-608 Database Systems
Lecture 22: Friday, November 22, 2002.
Evaluation of Relational Operations: Other Techniques
Lecture 24: Query Execution
Lecture 20: Query Execution
Lecture 24: Wednesday, November 27, 2002.
Presentation transcript:

Lecture 23: Monday, November 25, 2002

Outline Query execution: 15.1 – 15.5 Query optimization: algebraic laws 16.2

Indexed Based Algorithms Recall that in a clustered index all tuples with the same value of the key are clustered on as few blocks as possible Note: book uses another term: “clustering index”. Difference is minor… a a a a a a a a a a

Index Based Selection Selection on equality: sa=v(R) Clustered index on a: cost B(R)/V(R,a) Unclustered index on a: cost T(R)/V(R,a)

Index Based Selection B(R) = 2000 T(R) = 100,000 V(R, a) = 20 Example: Table scan: If R is clustered: B(R) = 2,000 I/Os If R is unclustered: T(R) = 100,000 I/Os Index based selection: If index is clustered: B(R)/V(R,a) = 100 If index is unclustered: T(R)/V(R,a) = 5,000 Notice: when V(R,a) is small, then unclustered index is useless cost of sa=v(R) = ?

Index Based Join R S Assume S has an index on the join attribute Iterate over R, for each tuple fetch corresponding tuple(s) from S Assume R is clustered. Cost: If index is clustered: B(R) + T(R)B(S)/V(S,a) If index is unclustered: B(R) + T(R)T(S)/V(S,a)

Index Based Join Assume both R and S have a sorted index (B+ tree) on the join attribute Then perform a merge join called zig-zag join Cost: B(R) + B(S)

Example Product(pname, maker), Company(cname, city) Clustered index: Product.pname, Company.cname Unclustered index: Product.maker, Company.city Select Product.pname From Product, Company Where Product.maker=Company.cname and Company.city = “Seattle”

Logical Plan: maker=cname scity=“Seattle” Product Company

scity=“Seattle” scity=“Seattle” Physical plan 1: Physical plans 2a and 2b: Index-based join Index-based selection Merge-join maker=cname maker=cname scity=“Seattle” scity=“Seattle” Product Company Company Product Index- scan Scan and sort (2a) index scan (2b)

Index-based selection: T(Company) / V(Company, city) Plan 1: Index-based selection: T(Company) / V(Company, city) Index-based join:  T(Product) / V(Product, maker) Plan 2: Table scan and selection on Company: B(Company) Plan 2a: scan and sort: 3B(Product) Plan 2b: index-scan: T(Product) Merge-join: their sum Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product)

Example T(Company) = 5,000 B(Company) = 500 M = 100 T(Product) = 100,000 B(Product) = 1,000 V(Company,city) = 2,000 V(Product,maker) = 20,000 Case 1: V(Company, city)  T(Company) V(Product, maker)  T(Product) Case 2: V(Company, city) << T(Company) V(Product, maker)  T(Product) Case 3: V(Company, city) << T(Company) V(Product, maker) << T(Product) V(Company,city) = 20 V(Product,maker) = 20,000 V(Company,city) = 20 V(Product,maker) = 100

Which Plan is Best ? Case 1: Case 2: Case 3: Plan 1: T(Company)/V(Company,city)  T(Product)/V(Product,maker) Plan 2a: B(Company) + 3B(Product) Plan 2b: B(Company) + T(Product) Case 1: Case 2: Case 3:

Optimization Chapter 16 At the hart of the database engine Step 1: convert the SQL query to some logical plan Step 2: find a better logical plan, find an associated physical plan

Converting from SQL to Logical Plans Select a1, …, an From R1, …, Rk Where C Pa1,…,an(s C(R1 R2 … Rk)) Select a1, …, an From R1, …, Rk Where C Group by b1, …, bl Pa1,…,an(g b1, …, bm, aggs (s C(R1 R2 … Rk)))

Converting Nested Queries Select distinct product.name From product Where product.maker in (Select company.name From company where company.city=“Seattle”) Select distinct product.name From product, company Where product.maker = company.name AND company.city=“Seattle”

Converting Nested Queries Select distinct x.name, x.maker From product x Where x.color= “blue” AND x.price >= ALL (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”) How do we convert this one to logical plan ?

Converting Nested Queries Let’s compute the complement first: Select distinct x.name, x.maker From product x Where x.color= “blue” AND x.price < SOME (Select y.price From product y Where x.maker = y.maker AND y.color=“blue”)

Converting Nested Queries This one becomes a SFW query: Select distinct x.name, x.maker From product x, product y Where x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price This returns exactly the products we DON’T want, so…

Converting Nested Queries (Select x.name, x.maker From product x Where x.color = “blue”) EXCEPT From product x, product y Where x.color= “blue” AND x.maker = y.maker AND y.color=“blue” AND x.price < y.price)

Optimization: the Logical Query Plan Now we have one logical plan Algebraic laws: foundation for every optimization Two approaches to optimizations: Heuristics: apply laws that seem to result in cheaper plans Cost based: estimate size and cost of intermediate results, search systematically for best plan All modern database optimizers use a cost-based optimizer Why ?

The three components of an optimzer We need three things in an optimizer: Algebraic laws An optimization algorithm A cost estimator

Algebraic Laws Commutative and Associative Laws Distributive Laws R  S = S  R, R  (S  T) = (R  S)  T R ∩ S = S ∩ R, R ∩ (S ∩ T) = (R ∩ S) ∩ T R ⋈ S = S ⋈ R, R ⋈ (S ⋈ T) = (R ⋈ S) ⋈ T Distributive Laws R ⋈ (S  T) = (R ⋈ S)  (R ⋈ T)

Algebraic Laws Laws involving selection: s C AND C’(R) = s C(s C’(R)) = s C(R) ∩ s C’(R) s C OR C’(R) = s C(R) U s C’(R) s C (R ⋈ S) = s C (R) ⋈ S When C involves only attributes of R s C (R – S) = s C (R) – S s C (R  S) = s C (R)  s C (S) s C (R ∩ S) = s C (R) ∩ S

Algebraic Laws Example: R(A, B, C, D), S(E, F, G) s F=3 (R ⋈D=E S) = ? s A=5 AND G=9 (R ⋈D=E S) = ?

Algebraic Laws Laws involving projections PM(R ⋈ S) = PN(PP(R) ⋈ PQ(S)) Where N, P, Q are appropriate subsets of attributes of M PM(PN(R)) = PM,N(R) Example R(A,B,C,D), S(E, F, G) PA,B,G(R ⋈ S) = P ? (P?(R) ⋈ P?(S))

Algebraic Laws Laws involving grouping and aggregation: (A, agg(B)(R)) = A, agg(B)(R) A, agg(B)((R)) = A, agg(B)(R) if agg is “duplicate insensitive” Which of the following are “duplicate insensitive” ? sum, count, avg, min, max A, agg(D)(R(A,B) ⋈B=C S(C,D)) = A, agg(D)(R(A,B) ⋈B=C (B, agg(D)S(C,D))) Why is this true ? Why would we do it ?