CSE 544: Lecture 14 Wednesday, 5/15/2002 Optimization, Size Estimation.

Slides:

Advertisements

Similar presentations

Query Optimization May 31st, Today A few last transformations Size estimation Join ordering Summary of optimization.

Advertisements

1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.

Query Optimization Goal: Declarative SQL query

Query Optimization. Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks (corresponding to.

Lecture 14: Query Optimization. This Lecture Query rewriting Cost estimation –We have learned how atomic operations are implemented and their cost –We’ll.

Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.

Lecture 9 Query Optimization November 24, 2010 Dan Suciu -- CSEP544 Fall

1 Distributed Databases CS347 Lecture 14 May 30, 2001.

1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?

Query Optimization: Transformations May 29 th, 2002.

Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.

Database System Concepts 5 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 14: Query Optimization.

Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.

©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.

1 Lecture 7: Query Execution and Optimization Tuesday, February 20, 2007.

1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:

CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

Access Path Selection in a Relational Database Management System Selinger et al.

Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.

CS411 Database Systems Kazuhiro Minami 12: Query Optimization.

Database Management 9. course. Execution of queries.

Lecture 4 - Query Optimization Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch

Query Optimization March 6 th, Query Optimization Process (simplified a bit) Parse the SQL query into a logical tree: –identify distinct blocks.

Query Optimization Imperative query execution plan: Declarative SQL query Ideally: Want to find best plan. Practically: Avoid worst plans! Goal: Purchase.

Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.

CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.

1 Lecture 25 Friday, November 30, Outline Query execution –Two pass algorithms based on indexes (6.7) Query optimization –From SQL to logical.

CSE544 Query Optimization Tuesday-Thursday, February 8 th -10 th, 2011 Dan Suciu , Winter

CPS216: Advanced Database Systems Notes 09:Query Optimization (Cost-based optimization) Shivnath Babu.

CS411 Database Systems Kazuhiro Minami 12: Query Optimization.

Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.

1 Lecture 25: Query Optimization Wednesday, November 26, 2003.

CS 440 Database Management Systems Query Optimization 1.

1 CSE544: Lecture 7 XQuery, Relational Algebra Monday, 4/22/02.

Chapter 13 Query Optimization Yonsei University 1 st Semester, 2015 Sanghyun Park.

Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.

DBMS Internals Execution and Optimization May 10th, 2004.

1 Lecture 24: Query Execution Monday, November 27, 2006.

Chapter 14: Query Optimization

CS 440 Database Management Systems

Lecture 8: Relational Algebra

Lecture 26: Query Optimizations and Cost Estimation

Chapter 13: Query Optimization

CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

Lecture 24: Query Execution and Optimization

Data Engineering Query Optimization (Cost-based optimization)

Indexing and Execution

Introduction to Query Optimization

Overview of Query Optimization

Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.

Lecture 26: Query Optimization

January 19th – Subqueries 2 and relational algebra

Query Optimization and Perspectives

Lecture 25: Query Execution

Lecture 27: Optimizations

Lecture 24: Query Execution

Lecture 25: Query Optimization

CPSC-608 Database Systems

Monday, 5/13/2002 Hash table indexes, query optimization

Distributed Database Management Systems

Query Optimization March 7th, 2003.

CPS216: Data-Intensive Computing Systems Query Processing (contd.)

Query Optimization May 16th, 2002

CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.

CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.

CSE 544: Optimizations Wednesday, 5/10/2006.

Lecture 26 Monday, December 3, 2001.

Lecture 27 Wednesday, December 5, 2001.

Lecture 24: Wednesday, November 27, 2002.

Presentation transcript:

CSE 544: Lecture 14 Wednesday, 5/15/2002 Optimization, Size Estimation

Heuristic Based Optimizations Query rewriting based on algebraic laws Result in better queries most of the time Main heuristics: –Push selections down the tree

Heuristic Based Optimizations Product Company maker=name  price>100 AND city=“Seattle” pname Product Company maker=name price>100 pname city=“Seattle” The earlier we process selections, less tuples we need to manipulate higher up in the tree (but may cause us to indexes).

Heuristic Based Optimizations Semi-join based optimizations R S =  A1,…,An (R S) Where the schemas are: –Input: R(A1,…An), S(B1,…,Bm) –Output: T(A1,…,An)

Heuristic Based Optimizations Semijoins: motivated by distributed databases: Product(pid, cid, pname,...) at site 1 Company(cid, cname,...) at site 2 Query:  price>1000 (Product) cid=cid Company Compute as follows: T1 =  price>1000 (Product) site 1 T2 =  cid (T1) site 1 send T2 to site 2 (T2 smaller than T1) T3 = T2 Company site 2 (semijoin) send T3 to site 1 (T3 smaller than Company) Answer = T3 T1 site 1 (semijoin)

Heuristic Based Optimizations Semijoins: a bit of theory (see [AHV]) Given a conjunctive query: A full reducer for Q is a program: Such that no dangling tuples remain in any relation Q :- R 1, R 2,..., R n R i1 := R i1 R j1 R i2 := R i2 R j R ip := R ip R jp R i1 := R i1 R j1 R i2 := R i2 R j R ip := R ip R jp

Heuristic Based Optimizations Example: A full reducer is: Example: Doesn’t have a full reducer (we can reduce forever) Q :- R1(A,B), R2(B,C), R3(C,D) R2(B,C) := R2(B,C), R1(A,B) R3(C,D) := R3(C,D), R2(B,C) R2(B,C) := R2(B,C), R3(C,D) R1(A,B) := R1(A,B), R2(B,C) R2(B,C) := R2(B,C), R1(A,B) R3(C,D) := R3(C,D), R2(B,C) R2(B,C) := R2(B,C), R3(C,D) R1(A,B) := R1(A,B), R2(B,C) Q :- R1(A,B), R2(B,C), R3(A,C)

Heuristic Based Optimizations Semijoins in [Chaudhuri’98] CREATE VIEW DepAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, DepAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal CREATE VIEW DepAvgSal As ( SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E GROUP BY E.did) SELECT E.eid, E.sal FROM Emp E, Dept D, DepAvgSal V WHERE E.did = D.did AND E.did = V.did AND E.age 100k AND E.sal > V.avgsal

Heuristic Based Optimizations Semijoins in [Chaudhuri’98] CREATE VIEW partialresult AS (SELECT E.id, E.sal, E.did FROM Emp E, Dept D WHERE E.did=D.did AND E.age < 30 AND D.budget > 100k) CREATE VIEW Filter AS (SELECT DISTINCT P.did FROM PartialResult P) CREATE VIEW LimitedAvgSal AS (SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Filter F WHERE E.did = F.did GROUP BY E.did) CREATE VIEW partialresult AS (SELECT E.id, E.sal, E.did FROM Emp E, Dept D WHERE E.did=D.did AND E.age < 30 AND D.budget > 100k) CREATE VIEW Filter AS (SELECT DISTINCT P.did FROM PartialResult P) CREATE VIEW LimitedAvgSal AS (SELECT E.did, Avg(E.Sal) AS avgsal FROM Emp E, Filter F WHERE E.did = F.did GROUP BY E.did)

Heuristic Based Optimizations Semijoins in [Chaudhuri’98] SELECT P.eid, P.sal FROM PartialResult P, LimitedDepAvgSal V WHERE P.did = V.did AND P.sal > V.avgsal SELECT P.eid, P.sal FROM PartialResult P, LimitedDepAvgSal V WHERE P.did = V.did AND P.sal > V.avgsal

Cost-Based Optimization Main optimization unit: –set of joins, i.e. single select-from-where block –Hence: the join reordering problem Optimization methods: –Dynamic programming (System R, 1977), for joins: Conceptually cleanest –Rule-based optimizations, for arbitrary queries: Volcano  SQL server Starburst  DB2

Join Trees R1 R2 …. Rn Join tree: A join tree represents a plan. An optimizer needs to inspect many (all ?) join trees R3R1R2R4

Types of Join Trees Left deep: R3 R1 R5 R2 R4

Types of Join Trees Bushy: R3 R1 R2R4 R5

Types of Join Trees Right deep: R3 R1 R5 R2R4

Problem Given: a query R1 R2 … Rn Assume we have a function cost() that gives us the cost of every join tree Find the best join tree for the query

Dynamic Programming Idea: for each subset of {R1, …, Rn}, compute the best plan for that subset In increasing order of set cardinality: –Step 1: for {R1}, {R2}, …, {Rn} –Step 2: for {R1,R2}, {R1,R3}, …, {Rn-1, Rn} –… –Step n: for {R1, …, Rn} A subset of {R1, …, Rn} is also called a subquery

Dynamic Programming For each subquery Q ⊆ {R1, …, Rn} compute the following: –Size(Q) –A best plan for Q: Plan(Q) –The cost of that plan: Cost(Q)

Dynamic Programming Step 1: For each {Ri} do: –Size({Ri}) = B(Ri) –Plan({Ri}) = Ri –Cost({Ri}) = (cost of scanning Ri)

Dynamic Programming Step i: For each Q ⊆ {R1, …, Rn} of cardinality i do: –Compute Size(Q) (later…) –For every pair of subqueries Q’, Q’’ s.t. Q = Q’  Q’’ compute cost(Plan(Q’) Plan(Q’’)) –Cost(Q) = the smallest such cost –Plan(Q) = the corresponding plan

Dynamic Programming Return Plan({R1, …, Rn})

Dynamic Programming To illustrate, we will make the following simplifications: Cost(P1 P2) = Cost(P1) + Cost(P2) + size(intermediate result(s)) Intermediate results: –If P1 = a join, then the size of the intermediate result is size(P1), otherwise the size is 0 –Similarly for P2 Cost of a scan = 0

Dynamic Programming Example: Cost(R5 R7) = 0 (no intermediate results) Cost((R2 R1) R7) = Cost(R2 R1) + Cost(R7) + size(R2 R1) = size(R2 R1)

Dynamic Programming Relations: R, S, T, U Number of tuples: 2000, 5000, 3000, 1000 Size estimation: T(A B) = 0.01*T(A)*T(B)

SubquerySizeCostPlan RS RT RU ST SU TU RST RSU RTU STU RSTU

SubquerySizeCostPlan RS100k0RS RT60k0RT RU20k0RU ST150k0ST SU50k0SU TU30k0TU RST3M60k(RT)S RSU1M20k(RU)S RTU0.6M20k(RU)T STU1.5M30k(TU)S RSTU30M60k+50k=110k(RT)(SU)

Dynamic Programming Summary: computes optimal plans for subqueries: –Step 1: {R1}, {R2}, …, {Rn} –Step 2: {R1, R2}, {R1, R3}, …, {Rn-1, Rn} –… –Step n: {R1, …, Rn} We used naïve size/cost estimations In practice: –more realistic size/cost estimations (next) –heuristics for Reducing the Search Space Restrict to left linear trees Restrict to trees “without cartesian product” –need more than just one plan for each subquery: “interesting orders”

Rule-based Optimizations Volcano: –Main idea: let programmers define rewrite rules, based on the algebraic laws –System searches for “best plan” by applying laws repeatedly –Need to avoid cycles, etc. –Join-reordering becomes harder, but can handle other operators too Starburst: –Same, but keep larger nodes, corresponding to one select-from- where block –Apply rewrite rules inter-blocks –Do dynamic programming inside blocks