1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:

Slides:



Advertisements
Similar presentations
Examples of Physical Query Plan Alternatives
Advertisements

Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS 540 Database Management Systems
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
Spring 2003 ECE569 Lecture 06.1 ECE 569 Database System Engineering Spring 2003 Topic VIII: Query Execution and optimization Yanyong Zhang
1 Implementation of Relational Operations Module 5, Lecture 1.
DB performance tuning using indexes Section 8.5 and Chapters 20 (Raghu)
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Relational Query Optimization (this time we really mean it)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Query Optimization Example Source: Query Optimization, Y. E. Ioannidis, ACM Computing Surveys, 28(1), March Database Tables: Emp (name, age, sal,
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Introduction to Database Systems 1 Join Algorithms Query Processing: Lecture 1.
Query Processing & Optimization
1 Query Optimization Vishy Poosala Bell Labs. 2 Outline Introduction Necessary Details –Cost Estimation –Result Size Estimation Standard approach for.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
1 Implementation of Relational Operations: Joins.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CSE 6331 © Leonidas Fegaras System R1 System R Optimizer Read the paper (available at the course web page): G. Selinger, M. Astrahan, D. Chamberlin, R.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Query optimization in relational DBs Leveraging the mathematical formal underpinnings of the relational model.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
CS 540 Database Management Systems
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Query Optimization Kush Kashyap B.Tech -IT.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Data Engineering Query Optimization (Cost-based optimization)
Introduction to Query Optimization
Overview of Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
File Processing : Query Processing
Yan Huang - CSCI5330 Database Implementation – Access Methods
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Lecture 2- Query Processing (continued)
Relational Query Optimization
Advance Database Systems
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Relational Query Optimization (this time we really mean it)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

1 Optimization

2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example: Consider: R(A) with 2 rows. One row has value 0. One row has value 1. How many rows are in R x R? How many in R x R x R?  Size of output as a function of input:

3 Data Complexity Usually, queries are small. Therefore, it is usually assumed that queries are of a fixed size. What is the size of the output in this case?

4 Optimizer Architecture Algebraic Space Method-Structure Space Cost Model Size-Distribution Estimator Planner Rewriter

5 Optimizer Architecture Rewriter: Finds equivalent queries that, perhaps can be computed more efficiently. All such queries are passed on to the Planner. (Example: Removing unnecessary joins, like we did in Datalog) Planner: Examines all possible execution plans and chooses the cheapest one. Uses other modules to find best plan. (We will see this in detail later on)

6 Optimizer Architecture (cont.) Algebraic Space: Determines which types of queries will be examined. (Example: Can specify not to examine Cartesian products) Method-Space Structure: Determines what types of indexes are available and what types of algorithms for algebraic operations can be used. (Example: Which types of join algorithms are available)

7 Optimizer Architecture (cont.) Cost Model: Estimates the cost of execution plans. Uses Size-Distribution Estimator for this. Size-Distribution Estimator: Estimates size of databases, frequency distribution of attributes and size of indexes.

8 Algebraic Space We consider queries that consist of select, project and join. (Cartesian product is a special case of join.) Such queries can be represented by a tree. Example: emp(name, age, sal, dno) dept(dno, dname, floor, mgr, ano) act(ano, type, balance, bno) bank(bno, bname, address) select name, floor from emp, dept where emp.dno=dept.dno and sal>100K

9 3 Trees  name, floor  dno=dno  sal>100K EMP DEPT  name, floor  dno=dno  sal>100K EMPDEPT  name, floor  dno=dno  sal>100K EMP DEPT  dno, floor  name,sal,dno  dno, name T1T2T3

10 Restriction 1 The trees differ in the way selections and projections are handled. Note that T3 does maximal pushing of selections and projections Restriction 1: The algebraic space usually restricts selection to be performed as the relations are processed and projections are performed as the results of the other operators are generated. Which trees in our example conform to Restriction 1?

11 Restriction 2 Since the order of selection and projection is determined, we can write trees only with joins. Restriction 2: Cross products are never formed, unless the query asks for them. Example: select name, floor, balance from emp, dept, acnt where emp.dno=dept.dno and dept.ano = acnt.ano

12 3 Trees  ano=ano EMPDEPT T1T2T3  dno=dno ACNT  ano=ano, dno=dno EMPACNT  ACNT  dno=dno ACNTDEPT  ano=ano EMP Which trees have cross products?

13 Restriction 3 In a join, the left relation is the outer relation in the join and the right relation is the inner relation. (in the nested loops algorithms) Restriction 3: The inner operand of each join is a database relation, not an intermediate result. Example: select name, floor, balance from emp, dept, acnt, bank where emp.dno=dept.dno and dept.ano=acnt.ano and acnt.bno = bank.bno

14 3 Trees  ano=ano EMPDEPT T1T2T3  dno=dno ACNT  ano=ano EMP DEPT  dno=dno ACNT Which trees follow restriction 3?  bno=bno BANK  bno=bno  ano=ano EMP DEPT  dno=dno ACNT  bno=bno BANK

15 Algebraic Space - Summary We allow plans that 1.Perform selection and projection on the fly 2.Do not create cross products 3.Use database relations as inner relations (also called left –deep trees)

16 Planner We use a dynamic programming algorithm Intuition: (joining N relations) –Find all ways to access a single relation. Estimate costs and choose best access plan(s) –For each pair of relations, consider all ways to compute joins using all access plans from previous step. Choose best plan(s)... –For each i-1 relations joined, find best option to extend to i relations being joined... –Given all plans to compute join of n relations, output the best.

17 Pipelining Joins Consider computing: (Emp  Dept)  Acnt. In principle, we should compute Emp  Dept, write the result to the disk. Then read it from the disk to join it with Acnt. When using nested loops join, we can avoid the step of writing to the disk.

18 Pipelining Joins - Example Read block from Emp Find matching Dept tuples using index Find matching Acnt tuples using index Write final output Emp blocks Dept blocks Acnt blocks Output blocks Buffer

19 Interesting Orders For some joins, such as sort-merge join, the ordering of the relations is important. Therefore, it is of interest to create plans where attributes that participate in a join are ordered. For each interesting order, we save the best plan. We save plans for non order if it is the best. We will look at an example. All cost estimations are fictional.

20 Example We want to compute the query: select name, mgr from emp, dept where emp.dno=dept.dno and sal>30K and floor = 2 Indexes: B+tree index on emp.sal, B+tree index on emp.dnp, hashing index on dept.floor. Join Methods: Nested loops and merge sort

21 Step 1 – Accessing Single Relations RelationInteresting Order PlanCost empemp.dnoAccess through B+tree on emp.dno700 Access through B+tree on emp.sal Sequential scan deptAccess through hashing on dept.floor Sequential scan Which do we save for the next step?

22 Step 2 – Joining 2 Relations Join Method Outer/InnerPlanCost nested loops empt/dnpFor each emp tuple obtained through B+Tree on emp.sal, scan dept through hashing index on dept.floor to find tuples matching on dno 1800 For each emp tuple obtained through B+Tree on emp.dno and satisfying selection, scan dept through hashing index on dept.floor to find tuples matching on dno 3000

23 Step 2 – Joining 2 Relations Join Method Outer/InnerPlanCost nested loops dept/empFor each dept tuple obtained through hashing index on dept.floor, scan emp through B+Tree on emp.sal to find tuples matching on dno 2500 For each dept tuple obtained through hashing index on dept.floor, scan emp through B+Tree on emp.dno to find tuples satisfying the selection on emp.sal 1500

24 Step 2 – Joining 2 Relations Join Method Outer/ Inner PlanCost sort merge Sort the emp tuples resulting from accessing the B+Tree on emp.sal into L1 Sort the dept tuples resulting from accessing the hashing index on dept.floor into L2 Merge L1 and L Sort the dept tuples resulting from accessing the hashing index on dept.floor into L2 Merge L2 and the emp tuples resulting from accessing the B+Tree on emp.dno and satisfying the selection on emp.sal 2000

25 The Plan Which plan will be chosen?

26 Cost Model See last slides from previous lecture to estimate size of results See slides from previous lecture to estimate time of computing joins