CSE 544: Lecture 14 Wednesday, 5/15/2002 Optimization, Size Estimation.

Slides:



Advertisements
Similar presentations
Query Optimization May 31st, Today A few last transformations Size estimation Join ordering Summary of optimization.
Advertisements

1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Query Optimization Goal: Declarative SQL query
Query Optimization. Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks (corresponding to.
Lecture 14: Query Optimization. This Lecture Query rewriting Cost estimation –We have learned how atomic operations are implemented and their cost –We’ll.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Lecture 9 Query Optimization November 24, 2010 Dan Suciu -- CSEP544 Fall
1 Distributed Databases CS347 Lecture 14 May 30, 2001.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Query Optimization: Transformations May 29 th, 2002.
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
1 Lecture 7: Query Execution and Optimization Tuesday, February 20, 2007.
1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Access Path Selection in a Relational Database Management System Selinger et al.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
Database Management 9. course. Execution of queries.
Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Query Optimization March 6 th, Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks.
Query Optimization Imperative query execution plan: Declarative SQL query Ideally: Want to find best plan. Practically: Avoid worst plans! Goal: Purchase.
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.
CSE544 Query Optimization Tuesday-Thursday, February 8 th -10 th, 2011 Dan Suciu , Winter
CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
1 Lecture 25: Query Optimization Wednesday, November 26, 2003.
CS 440 Database Management Systems Query Optimization 1.
1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.
Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
DBMS Internals Execution and Optimization May 10th, 2004.
1 Lecture 24: Query Execution Monday, November 27, 2006.
Chapter 14: Query Optimization
CS 440 Database Management Systems
Lecture 8: Relational Algebra
Lecture 26: Query Optimizations and Cost Estimation
Chapter 13: Query Optimization
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Lecture 24: Query Execution and Optimization
Data Engineering Query Optimization (Cost-based optimization)
Indexing and Execution
Introduction to Query Optimization
Overview of Query Optimization
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Lecture 26: Query Optimization
January 19th – Subqueries 2 and relational algebra
Query Optimization and Perspectives
Lecture 25: Query Execution
Lecture 27: Optimizations
Lecture 24: Query Execution
Lecture 25: Query Optimization
CPSC-608 Database Systems
Monday, 5/13/2002 Hash table indexes, query optimization
Distributed Database Management Systems
Query Optimization March 7th, 2003.
CPS216: Data-Intensive Computing Systems Query Processing (contd.)
Query Optimization May 16th, 2002
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
CSE 544: Optimizations Wednesday, 5/10/2006.
Lecture 26 Monday, December 3, 2001.
Lecture 27 Wednesday, December 5, 2001.
Lecture 24: Wednesday, November 27, 2002.
Presentation transcript:

CSE 544: Lecture 14 Wednesday, 5/15/2002 Optimization, Size Estimation

Heuristic Based Optimizations Query rewriting based on algebraic laws Result in better queries most of the time Main heuristics: –Push selections down the tree

Heuristic Based Optimizations Product Company maker=name  price>100 AND city=“Seattle” pname Product Company maker=name price>100 pname city=“Seattle” The earlier we process selections, less tuples we need to manipulate higher up in the tree (but may cause us to indexes).

Heuristic Based Optimizations Semi-join based optimizations R S =  A1,…,An (R S) Where the schemas are: –Input: R(A1,…An), S(B1,…,Bm) –Output: T(A1,…,An)

Heuristic Based Optimizations Semijoins: motivated by distributed databases: Product(pid, cid, pname,...) at site 1 Company(cid, cname,...) at site 2 Query:  price>1000 (Product) cid=cid Company Compute as follows: T1 =  price>1000 (Product) site 1 T2 =  cid (T1) site 1 send T2 to site 2 (T2 smaller than T1) T3 = T2 Company site 2 (semijoin) send T3 to site 1 (T3 smaller than Company) Answer = T3 T1 site 1 (semijoin)

Heuristic Based Optimizations Semijoins: a bit of theory (see [AHV]) Given a conjunctive query: A full reducer for Q is a program: Such that no dangling tuples remain in any relation Q :- R 1, R 2,..., R n R i1 := R i1 R j1 R i2 := R i2 R j R ip := R ip R jp R i1 := R i1 R j1 R i2 := R i2 R j R ip := R ip R jp

Heuristic Based Optimizations Example: A full reducer is: Example: Doesn’t have a full reducer (we can reduce forever) Q :- R1(A,B), R2(B,C), R3(C,D) R2(B,C) := R2(B,C), R1(A,B) R3(C,D) := R3(C,D), R2(B,C) R2(B,C) := R2(B,C), R3(C,D) R1(A,B) := R1(A,B), R2(B,C) R2(B,C) := R2(B,C), R1(A,B) R3(C,D) := R3(C,D), R2(B,C) R2(B,C) := R2(B,C), R3(C,D) R1(A,B) := R1(A,B), R2(B,C) Q :- R1(A,B), R2(B,C), R3(A,C)

Heuristic Based Optimizations Semijoins in [Chaudhuri’98] CREATE VIEW DepAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, DepAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal CREATE VIEW DepAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, DepAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal

Heuristic Based Optimizations Semijoins in [Chaudhuri’98] CREATE VIEW partialresult AS (SELECT E.id, E.sal, E.did FROM Emp E, Dept D WHERE E.did=D.did AND E.age < 30 AND D.budget > 100k) CREATE VIEW Filter AS (SELECT DISTINCT P.did FROM PartialResult P) CREATE VIEW LimitedAvgSal AS (SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Filter F WHERE E.did = F.did GROUP BY E.did) CREATE VIEW partialresult AS (SELECT E.id, E.sal, E.did FROM Emp E, Dept D WHERE E.did=D.did AND E.age < 30 AND D.budget > 100k) CREATE VIEW Filter AS (SELECT DISTINCT P.did FROM PartialResult P) CREATE VIEW LimitedAvgSal AS (SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Filter F WHERE E.did = F.did GROUP BY E.did)

Heuristic Based Optimizations Semijoins in [Chaudhuri’98] SELECT P.eid, P.sal FROM PartialResult P, LimitedDepAvgSal V WHERE P.did = V.did AND P.sal > V.avgsal SELECT P.eid, P.sal FROM PartialResult P, LimitedDepAvgSal V WHERE P.did = V.did AND P.sal > V.avgsal

Cost-Based Optimization Main optimization unit: –set of joins, i.e. single select-from-where block –Hence: the join reordering problem Optimization methods: –Dynamic programming (System R, 1977), for joins: Conceptually cleanest –Rule-based optimizations, for arbitrary queries: Volcano  SQL server Starburst  DB2

Join Trees R1 R2 …. Rn Join tree: A join tree represents a plan. An optimizer needs to inspect many (all ?) join trees R3R1R2R4

Types of Join Trees Left deep: R3 R1 R5 R2 R4

Types of Join Trees Bushy: R3 R1 R2R4 R5

Types of Join Trees Right deep: R3 R1 R5 R2R4

Problem Given: a query R1 R2 … Rn Assume we have a function cost() that gives us the cost of every join tree Find the best join tree for the query

Dynamic Programming Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset In increasing order of set cardinality: –Step 1: for {R1}, {R2}, …, {Rn} –Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn} –… –Step n: for {R1, …, Rn} A subset of {R1, …, Rn} is also called a subquery

Dynamic Programming For each subquery Q ⊆ {R1, …, Rn} compute the following: –Size(Q) –A best plan for Q: Plan(Q) –The cost of that plan: Cost(Q)

Dynamic Programming Step 1: For each {Ri} do: –Size({Ri}) = B(Ri) –Plan({Ri}) = Ri –Cost({Ri}) = (cost of scanning Ri)

Dynamic Programming Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do: –Compute Size(Q) (later…) –For every pair of subqueries Q’, Q’’ s.t. Q = Q’  Q’’ compute cost(Plan(Q’) Plan(Q’’)) –Cost(Q) = the smallest such cost –Plan(Q) = the corresponding plan

Dynamic Programming Return Plan({R1, …, Rn})

Dynamic Programming To illustrate, we will make the following simplifications: Cost(P1 P2) = Cost(P1) + Cost(P2) + size(intermediate result(s)) Intermediate results: –If P1 = a join, then the size of the intermediate result is size(P1), otherwise the size is 0 –Similarly for P2 Cost of a scan = 0

Dynamic Programming Example: Cost(R5 R7) = 0 (no intermediate results) Cost((R2 R1) R7) = Cost(R2 R1) + Cost(R7) + size(R2 R1) = size(R2 R1)

Dynamic Programming Relations: R, S, T, U Number of tuples: 2000, 5000, 3000, 1000 Size estimation: T(A B) = 0.01*T(A)*T(B)

SubquerySizeCostPlan RS RT RU ST SU TU RST RSU RTU STU RSTU

SubquerySizeCostPlan RS100k0RS RT60k0RT RU20k0RU ST150k0ST SU50k0SU TU30k0TU RST3M60k(RT)S RSU1M20k(RU)S RTU0.6M20k(RU)T STU1.5M30k(TU)S RSTU30M60k+50k=110k(RT)(SU)

Dynamic Programming Summary: computes optimal plans for subqueries: –Step 1: {R1}, {R2}, …, {Rn} –Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn} –… –Step n: {R1, …, Rn} We used naïve size/cost estimations In practice: –more realistic size/cost estimations (next) –heuristics for Reducing the Search Space Restrict to left linear trees Restrict to trees “without cartesian product” –need more than just one plan for each subquery: “interesting orders”

Rule-based Optimizations Volcano: –Main idea: let programmers define rewrite rules, based on the algebraic laws –System searches for “best plan” by applying laws repeatedly –Need to avoid cycles, etc. –Join-reordering becomes harder, but can handle other operators too Starburst: –Same, but keep larger nodes, corresponding to one select-from- where block –Apply rewrite rules inter-blocks –Do dynamic programming inside blocks