1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Overview of Query Evaluation (contd.) Chapter 12 Ramakrishnan and Gehrke (Sections )
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Query Optimization Goal: Declarative SQL query
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
ICS 421 Spring 2010 Relational Query Optimization Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 9/8/20091Lipyeow.
Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang
1 King Saud University College of Computer & Information Sciences IS 335 Database Management System Lecture 6 Query Processing and Optimization (Practice)
Relational Query Optimization 198:541. Overview of Query Optimization  Plan: Tree of R.A. ops, with choice of alg for each op. Each operator typically.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
Relational Query Optimization (this time we really mean it)
Query Optimization Chapter 15. Query Evaluation Catalog Manager Query Optmizer Plan Generator Plan Cost Estimator Query Plan Evaluator Query Parser Query.
Overview of Query Evaluation R&G Chapter 12 Lecture 13.
1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.
Query Optimization Example Source: Query Optimization, Y. E. Ioannidis, ACM Computing Surveys, 28(1), March Database Tables: Emp (name, age, sal,
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 1, 2005 Some slide content derived.
CS186 Final Review Query Optimization.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
1 Query Optimization Vishy Poosala Bell Labs. 2 Outline Introduction Necessary Details –Cost Estimation –Result Size Estimation Standard approach for.
1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
1 Implementation of Relational Operations: Joins.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
1 Relational Query Optimization Chapter Query Blocks: Units of Optimization  An SQL query is parsed into a collection of query blocks :  An SQL.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
1 Database Systems ( 資料庫系統 ) December 13, 2004 Chapter 15 By Hao-hua Chu ( 朱浩華 )
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Implementation of Database Systems, Jarek Gryz1 Relational Query Optimization Chapters 12.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction To Query Optimization and Examples Chpt
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Query Optimization Kush Kashyap B.Tech -IT.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Overview of Query Optimization
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Query Optimization Overview
Relational Query Optimization
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Query Optimization Overview
Query Optimization.
Relational Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Relational Query Optimization (this time we really mean it)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

1 Query Optimization

2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example: R(A) with 2 rows. One row has value 0. One row has value 1. –How many rows are in R x R? –How many in R x R x R?  Size of output as a function of input: O( ? )

3 Data Complexity Usually, queries are small. Therefore, it is usually assumed that queries are of a fixed size. Use term Data Complexity when we analyze time, assuming that query is constant What is the size of the output in this case?

4 Optimizer Architecture Algebraic Space Method-Structure Space Cost Model Size-Distribution Estimator Planner Rewriter

5 Optimizer Architecture Rewriter: Finds equivalent queries that, perhaps can be computed more efficiently. All such queries are passed on to the Planner. –Examples of Equivalent queries: Join orderings Planner: Examines all possible execution plans and chooses the cheapest one, i.e., fastest one. –Uses other modules to find best plan.

6 Optimizer Architecture (cont.) Algebraic Space: Determines which types of queries will be examined. –Example: Try to avoid Cartesian Products Method-Space Structure: Determines what types of indexes are available and what types of algorithms for algebraic operations can be used. –Example: Which types of join algorithms can be used

7 Optimizer Architecture (cont.) Cost Model: Estimates the cost of execution plans. –Uses Size-Distribution Estimator for this. Size-Distribution Estimator: Estimates size of tables, intermediate results, frequency distribution of attributes and size of indexes.

8 Algebraic Space We consider queries that consist of select, project and join. (Cartesian product is a special case of join.) Such queries can be represented by a tree. Example: emp(name, age, sal, dno) dept(dno, dname, floor, mgr, ano) act(ano, type, balance, bno) bank(bno, bname, address) select name, floor from emp, dept where emp.dno=dept.dno and sal>100K

9 3 Trees  name, floor  dno=dno  sal>100K EMP DEPT  name, floor  dno=dno  sal>100K EMPDEPT  name, floor  dno=dno  sal>100K EMP DEPT  dno, floor  name,sal,dno  dno, name T1T2T3

10 Restriction 1 of Algebraic Space Algebraic space may contain many equivalent queries Important to restrict space – Why? Restriction 1: Only allow queries for which selection and projection: –are processed as early as possible –are processed on the fly Which trees in our example conform to Restriction 1?

11 Performing Selection and Projection "On the Fly" Selection and projection are performed as part of other actions Projection and selection that appear one after another are performed one immediately after another  Projection and Selection do not require writing to the disk Selection is performed while reading relations for the first time Projection is performed while computing answers from previous action

12 Processing Selection/Projection as Early as Possible The three trees differ in the way that selection and projection are performed In T3, there is "maximal pushing of selection and projection" –Rewriter finds such expressions Why is it good to push selection and projection?

13 Restriction 2 of Algebraic Space Since the order of selection and projection is determined, we can write trees only with joins. Restriction 2: Cross products are never formed, unless the query asks for them. Why this restriction? Example: select name, floor, balance from emp, dept, acnt where emp.dno=dept.dno and dept.ano = acnt.ano

14 3 Trees  ano=ano EMPDEPT T1T2T3  dno=dno ACNT  ano=ano, dno=dno EMPACNT  ACNT  dno=dno ACNTDEPT  ano=ano EMP Which trees have cross products?

15 Restriction 3 of Algebraic Space The left relation is called the outer relation in a join and the right relation is the inner relation. (as in terminology of nested loops algorithms) Restriction 3: The inner operand of each join is a database relation, not an intermediate result. Example: select name, floor, balance from emp, dept, acnt, bank where emp.dno=dept.dno and dept.ano=acnt.ano and acnt.bno = bank.bno

16 3 Trees  ano=ano EMPDEPT T1T2T3  dno=dno ACNT  ano=ano EMP DEPT  dno=dno ACNT Which trees follow restriction 3?  bno=bno BANK  bno=bno  ano=ano EMP DEPT  dno=dno ACNT  bno=bno BANK

17 Algebraic Space - Summary We allow plans that 1.Perform selection and projection early and on the fly 2.Do not create cross products 3.Use database relations as inner relations (also called left –deep trees)

18Planner Dynamic programming algorithm to find best plan for performing join of N relations. Intuition: –Find all ways to access a single relation. Estimate costs and choose best access plan(s) –For each pair of relations, consider all ways to compute joins using all access plans from previous step. Choose best plan(s)... –For each i-1 relations joined, find best option to extend to i relations being joined... –Given all plans to compute join of n relations, output the best.

19 Pipelining Joins Consider computing: (Emp  Dept)  Acnt. In principle, we should –compute Emp  Dept, write the result to the disk –then read it from the disk to join it with Acnt When using block and index nested loops join, we can avoid the step of writing to the disk. Do you understand now restriction 3?

20 Pipelining Joins - Example Read block from Emp Find matching Dept tuples using index Find matching Acnt tuples using index Write final output Emp blocks Dept blocks Acnt blocks Output blocks Buffer

21 Reminder: Dynamic Programming To find an optimal plan for joining R 1, R 2, R 3, R 4, choose the best among: –Optimal plan for joining R 2, R 3, R 4 + for reading R 1 + optimal join of R 1 with result of previous joins –Optimal plan for joining R 1, R 3, R 4 + for reading R 2 + optimal join of R 2 with result of previous joins –Optimal plan for joining R 1, R 2, R 4 + for reading R 3 + optimal join of R 3 with result of previous joins –Optimal plan for joining R 1, R 2, R 3 + for reading R 4 + optimal join of R 4 with result of previous joins

22 Not Good Enough Example, suppose we are computing (R(A,B)  S(B,C))  T(B,D) Maybe merge-sort join of R and S is not the most efficient, but the result is sorted on B If T is sorted on B, the performing a sort- merge join of R and S, and then of the result with T, maybe the cheapest total plan

23 Interesting Orders For some joins, such as sort-merge join, the cost is cheaper if relations are ordered Therefore, it is of interest to create plans where attributes that participate in a join are ordered on attributes in joins later on For each interesting order, save the best plan. We save plans for non order if it better than all interesting order costs

24 Example We want to compute the query: select name, mgr from emp, dept where emp.dno=dept.dno and sal>30K and floor = 2 Available Indexes: B+tree index on emp.sal, B+tree index on emp.dno, hashing index on dept.floor Join Methods: Block nested loops, index nested loops and sort-merge In the example, all cost estimations are fictional.

25 Step 1 – Accessing Single Relations Relation Interesting Order PlanCost empemp.dnoAccess through B+tree on emp.dno700 Access through B+tree on emp.sal Sequential scan deptAccess through hashing on dept.floor Sequential scan Which do we save for the next step?

26 Step 2 – Joining 2 Relations Join Method Outer/InnerPlanCost nested loops empt/dno For each emp tuple obtained through B+Tree on emp.sal, scan dept through hashing index on dept.floor to find tuples matching on dno 1800 For each emp tuple obtained through B+Tree on emp.dno and satisfying selection, scan dept through hashing index on dept.floor to find tuples matching on dno 3000

27 Step 2 – Joining 2 Relations Join Method Outer/InnerPlanCost nested loops dept/empFor each dept tuple obtained through hashing index on dept.floor, scan emp through B+Tree on emp.sal to find tuples matching on dno 2500 For each dept tuple obtained through hashing index on dept.floor, scan emp through B+Tree on emp.dno to find tuples satisfying the selection on emp.sal 1500

28 Step 2 – Joining 2 Relations Join Method Outer/ Inner PlanCost sort merge Sort the emp tuples resulting from accessing the B+Tree on emp.sal into L1 Sort the dept tuples resulting from accessing the hashing index on dept.floor into L2 Merge L1 and L Sort the dept tuples resulting from accessing the hashing index on dept.floor into L2 Merge L2 and the emp tuples resulting from accessing the B+Tree on emp.dno and satisfying the selection on emp.sal 2000

29 The Plan Which plan will be chosen?

30 Cost Model Taught In class: estimate time of computing joins Now: Estimating result size

31 Estimating Result Sizes

32 Picking a Query Plan Suppose we want to find the natural join of: Reserves, Sailors, Boats. The 2 options that appear the best are (ignoring the order within a single join): (Sailors  Reserves)  Boats Sailors  (Reserves  Boats) We would like intermediate results to be as small as possible. Which is better?

33 Analyzing Result Sizes In order to answer the question in the previous slide, we must be able to estimate the size of (Sailors  Reserves) and (Reserves  Boats). The DBMS stores statistics about the relations and indexes. They are updated periodically (not every time the underlying relations are modified).

34 Statistics Maintained by DBMS Cardinality: Number of tuples NTuples(R) in each relation R Size: Number of pages NPages(R) in each relation R Index Cardinality: Number of distinct key values NKeys(I) for each index I Index Size: Number of pages INPages(I) in each index I Index Height: Number of non-leaf levels IHeight(I) in each B+ Tree index I Index Range: The minimum ILow(I) and maximum value IHigh(I) for each index I

35 Estimating Result Sizes Consider The maximum number of tuples is the product of the cardinalities of the relations in the FROM clause The WHERE clause is associating a reduction factor with each term Estimated result size is: maximum size times product of reduction factors SELECT attribute-list FROM relation-list WHERE term 1 and... and term n

36 Estimating Reduction Factors column = value: 1/NKeys(I) if there is an index I on column. This assumes a uniform distribution. Otherwise, System R assumes 1/10. column1 = column2: 1/Max(NKeys(I1),NKeys(I2)) if there is an index I1 on column1 and I2 on column2. If only one column has an index, we use it to estimate the value. Otherwise, use 1/10. column > value: (High(I)-value)/(High(I)- Low(I)) if there is an index I on column.

37 Example Cardinality(R) = 1,000 * 100 = 100,000 Cardinality(S) = 500 * 80 = 40,000 NKeys(Index on R.agent) = 100 High(Index on Rating) = 10, Low = 0 SELECT * FROM Reserves R, Sailors S WHERE R.sid = S.sid and S.rating > 3 and R.agent = ‘Joe’

38 Example (cont.) Maximum cardinality: 100,000 * 40,000 Reduction factor of R.sid = S.sid: 1/40,000 Reduction factor of S.rating > 3: (10–3)/(10-0) = 7/10 Reduction factor of R.agent = ‘Joe’: 1/100 Total Estimated size: 700