Query Optimization Heuristic Optimization

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Copyright © 2004 Ramez Elmasri and Shamkant Navathe Elmasri/Navathe, Fundamentals of Database Systems, Fourth Edition Chapter 15-1 Query Processing and.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
1 CSE 480: Database Systems Lecture 22: Query Optimization Reference: Read Chapter 15.6 – 15.8 of the textbook.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 19 Algorithms for Query Processing and Optimization.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
ACS-4902 Ron McFadyen Chapter 15 Algorithms for Query Processing and Optimization See Sections 15.1, 2, 3, 7.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Chapter 19 Query Processing and Optimization
©Silberschatz, Korth and Sudarshan14.1Database System Concepts 3 rd Edition Chapter 14: Query Optimization Overview Catalog Information for Cost Estimation.
Query Processing Presented by Aung S. Win.
Access Path Selection in a Relational Database Management System Selinger et al.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
Database Management 9. course. Execution of queries.
A Summary of XISS and Index Fabric Ho Wai Shing. Contents Definition of Terms XISS (Li and Moon, VLDB2001) Numbering Scheme Indices Stored Join Algorithms.
Query Optimization (CB Chapter ) CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems: An Application Oriented.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Department of Computer Science and Engineering, HKUST Slide Query Processing and Optimization Query Processing and Optimization.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Query Processing and Optimization
CS 338Query Evaluation7-1 Query Evaluation Lecture Topics Query interpretation Basic operations Costs of basic operations Examples Textbook Chapter 12.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Chapter 15 Algorithms for Query Processing and Optimization Copyright © 2004 Pearson Education, Inc.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution.
Chapter 18 Query Processing and Optimization. Chapter Outline u Introduction. u Using Heuristics in Query Optimization –Query Trees and Query Graphs –Transformation.
Chapter 13: Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Processing and Query Optimization Database System Implementation CSE 507 Slides adapted from Silberschatz, Korth and Sudarshan Database System Concepts.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Chapter 14: Query Optimization
Database Applications (15-415) DBMS Internals- Part VIII Lecture 17, Oct 30, 2016 Mohammad Hammoud.
slides by: Pavle Morgan & Hui Ma
Query Processing and Optimization, and Database Tuning
Database System Implementation CSE 507
UNIT 11 Query Optimization
Database Management System
CS257 Query Optimization.
Prepared by : Ankit Patel (226)
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
SQL: Advanced Options, Updates and Views Lecturer: Dr Pavle Mogin
Chapter 12: Query Processing
Overview of Query Optimization
Chapter 15 QUERY EXECUTION.
Evaluation of Relational Operations: Other Operations
Examples of Physical Query Plan Alternatives
April 20th – RDBMS Internals
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
QUERY OPTIMIZATION.
Query Processing CSD305 Advanced Databases.
Chapter 12 Query Processing (1)
Monday, 5/13/2002 Hash table indexes, query optimization
Yan Huang - CSCI5330 Database Implementation – Query Processing
Query Processing.
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Relational Query Optimization
Relational Query Optimization
Algorithms for Query Processing and Optimization
Presentation transcript:

Query Optimization Heuristic Optimization SWEN 304 Trimester 2, 2017 slides by: Pavle Morgan & Hui Ma 1

Lect10: Heuristic Query Optimization Outline Query Processing in DBMS Query optimization techniques Heuristic query optimization Translating SQL into relational algebra Reordering query operations Transformation rules for relational algebra operations Readings from the textbook: Chapter 19: Algorithms for Query Processing and Optimization Chapters 17: Disk Storage, Basic File Structures, and Hashing (Sections: 17.2, to 17.8) Chapter 18: Indexing Structures for Files (Sections: 18.1 to 18.5) SWEN 304 Lect10: Heuristic Query Optimization 2 2

Lect10: Heuristic Query Optimization Query Processing in DBMS Users/applications submit queries to the DBMS The DBMS processes queries before evaluating them Recall: DBMS mainly use declarative query languages (such as SQL) Queries can often be evaluated in different ways SQL queries do not determine how to evaluate them SWEN 304 Lect10: Heuristic Query Optimization 3 3

Lect10: Heuristic Query Optimization Query Processing in DBMS SWEN 304 Lect10: Heuristic Query Optimization 4 4

Lect10: Heuristic Query Optimization Query Optimisation Query preparation: Decompose query into query blocks containing Exactly one SELECT and FROM clause At most one WHERE, GROUP BY and HAVING clause Nested queries within a query are identified as separate query blocks Aggregate operators in SQL must be included in the extended algebra SWEN 304 Lect10: Heuristic Query Optimization 5 5

Lect10: Heuristic Query Optimization Heuristic Query Optimization: An Example Student({Lname, Fname, StudId, Major}) Grades({StudId, CrsId, Grade}) Course({Cname, CrsId, Points, Dept}) SQL query: SELECT StudId, Lname FROM Student s, Grades g, Course c WHERE s.StudId=g.StudId AND g.CrsId= c.CrsId AND Grade = ‘A+’ AND Dept = ‘Comp’ AND Major = ‘Math’; Relational Algebra: StudId, LName (σ s.StudId=g.StudId ˄ g.CrsId=c.CrsId ˄ Grade=‘A+’ ˄ Major = ‘Math’ ˄ Dept = ‘Comp’ (Course × (Student × Grades))) SWEN 304 Lect10: Heuristic Query Optimization 6 6

Lect10: Heuristic Query Optimization Query Tree Query tree: A tree data structure that corresponds to a relational algebra expression It represents the input relations of the query as leaf nodes of the tree, and Represents the relational algebra operations as internal nodes The initial query tree for the example SQL query: Lower level nodes, starting from leaves, contain Cartesian product operators These are applied onto relations from the SQL FROM clause After that are (relational) select and join conditions from SQL WHERE clause to upper tree nodes applied Finally is project operator of the SELECT clause attribute list to tree root applied SWEN 304 Lect10: Heuristic Query Optimization 7 7

Lect10: Heuristic Query Optimization Query Tree Example Relational Algebra: StudId, LName (σ s.StudId=g.StudId ˄ g.CrsId=c.CrsId ˄ Grade=‘A+’ ˄ Major= ‘Math’ ˄ Dept = ‘Comp’ (Course × (Student × Grades))) SWEN 304 Lect10: Heuristic Query Optimization 8 8

Lect10: Heuristic Query Optimization Query Tree Exercise Draw query trees for the following relational algebra expressions StudId (σGrade=‘A+’ (Grades)) StudId, LName (σGrade=‘A+’ (Student * Grades)) (StudId,(σ CrsId = ‘M214’ (Grades))) - (StudId,(σ CrsId = ‘C201’ (Grades))) SWEN 304 Lect10: Heuristic Query Optimization 9 9

Lect10: Heuristic Query Optimization Query Execution Plan For each query tree, computation proceeds bottom-up: Child nodes must be executed before their parent nodes But there can exist multiple methods of executing sibling nodes, e.g. Process sequentially Process in parallel A query execution plan consists of a query tree with additional annotation at each node indicating the implementation method for each RA operator The query optimizer determines which query execution plan is optimal, using a variety of algorithms Realistically, we cannot expect to always find the best plan but we expect to consistently find a plan that is good (near-optimal) SWEN 304 Lect10: Heuristic Query Optimization 10 10

Lect10: Heuristic Query Optimization Query Optimisation Query optimizer rewrites the naïve (canonical) query plan into a more efficient evaluation plan Choosing a suitable query execution plan is called Query Optimization and it mainly means making decisions On the order of the execution of basic relational algebra operations and On data access methods SWEN 304 Lect10: Heuristic Query Optimization 11 11

Lect10: Heuristic Query Optimization Query Optimization Techniques Two main query optimization techniques One relies on the heuristic reordering of the relational algebra operations, The other involves systematic estimating the cost of different execution plans, and choosing one with the lowest cost Heuristics vs. cost-based optimization General heuristics allow to improve performance of most queries Costs estimated from statistics allow for a good optimization of each specific query Most DBMS use a hybrid approach between heuristics and cost estimations SWEN 304 Lect10: Heuristic Query Optimization 12 12

Lect10: Heuristic Query Optimization Query Optimization Techniques Enumerating potential query plans: SWEN 304 Lect10: Heuristic Query Optimization 13 13

Lect10: Heuristic Query Optimization Process for heuristics optimization The parser of a high-level query generates an initial internal representation Apply heuristics rules to optimize the internal representation A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query The main heuristic is to apply first the operations that reduce the size of intermediate results E.g., Apply SELECT and PROJECT operations before applying the JOIN or other binary operations SWEN 304 Lect10: Heuristic Query Optimization 14 14

Lect10: Heuristic Query Optimization Using Heuristics in Query Optimization (1) General Transformation Rules for Relational Algebra Operations: 1. Cascade of : A conjunctive selection condition can be broken up into a cascade (sequence) of individual  operations:  c1 ʌ c2 ʌ ... ʌ cn(R) = c1 (c2 (...(cn(R))...) ) 2. Commutativity of : The  operation is commutative: c1 (c2(R)) = c2 (c1(R)) 3. Cascade of : In a cascade (sequence) of  operations, all but the last one can be ignored: List1 (List2 (...(Listn(R))...) ) = List1(R) 4. Commuting  with : If the selection condition c involves only the attributes A1, ..., An in the projection list, the two operations can be commuted: A1, A2, ..., An (c (R)) = c (A1, A2, ..., An (R)) SWEN 304 Lect10: Heuristic Query Optimization 15 15

Lect10: Heuristic Query Optimization Using Heuristics in Query Optimization (2) General Transformation Rules for Relational Algebra Operations (contd.): 5. Commutativity of ( and x ): R1 C R2 = R2 CR1; R1 x R2 = R2 x R1 6. Commuting  with (or x ): If all the attributes in the selection condition c involve only the attributes of one of the relations being joined—say, R1—the two operations can be commuted as follows: c (R1 R2 ) = c (R1) R2 Alternatively, if the selection condition c can be written as (c1 and c2), where condition c1 involves only the attributes of R1 and condition c2 involves only the attributes of R2, the operations commute as follows: c(R1 R2) = c1(R1) c2 (R2) SWEN 304 Lect10: Heuristic Query Optimization 16 16

Lect10: Heuristic Query Optimization Using Heuristics in Query Optimization (3) General Transformation Rules for Relational Algebra Operations (contd.): 7. Commuting  with (or x): Suppose that the projection list is AL = {A1, ..., An, B1, ..., Bm}, where A1, ..., An are attributes of R1 and B1, ..., Bm are attributes of R2. If the join condition c involves only attributes in AL, the two operations can be commuted as follows: AL (R1 C R2) = (A1, ..., An (R1)) C( B1, ..., Bm (R2)) If the join condition C contains additional attributes not in AL, these must be added to the projection list, and a final  operation is needed. AL(R1 CR2) = AL(AL1(R1) CAL2(R2)), where ALi contains the common attributes in Ri and AL and the common attributes of R1 and R2 SWEN 304 Lect10: Heuristic Query Optimization 17 17

Lect10: Heuristic Query Optimization Using Heuristics in Query Optimization (4) General Transformation Rules for Relational Algebra Operations (contd.): 8. Commutativity of set operations: The set operations ∪ and ∩ are commutative but “–” is not. 9. Associativity of , x, ∪, and ∩ : These four operations are individually associative; that is, if  stands for any one of these four operations (throughout the expression), we have (R1  R2)  R3 = R1  (R2  R3) 10. Commuting  with set operations: The  operation commutes with ∪, ∩, and –. If  stands for any one of these three operations, we have c (R1  R2) = (c (R1))  (c (R2)) SWEN 304 Lect10: Heuristic Query Optimization 18 18

Lect10: Heuristic Query Optimization Using Heuristics in Query Optimization (5) General Transformation Rules for Relational Algebra Operations (contd.): 11. The  operation commutes with ∪. AL (R1 ∪ R2) = (AL (R1)) ∪ (AL (R2)) 12. Converting a (, x) sequence into : If the condition c of a  that follows a x corresponds to a join condition, convert the (, x) sequence into a as follows: (C (R1 x R2)) = R1 C R2 SWEN 304 Lect10: Heuristic Query Optimization 19 19

Lect10: Heuristic Query Optimization Using Heuristics in Query Optimization (6) Summary of Heuristics for Algebraic Optimization: The main heuristic is to apply first the operations that reduce the size of intermediate results. Perform select operations as early as possible to reduce the number of tuples and perform project operations as early as possible to reduce the number of attributes. (This is done by moving select and project operations as far down the tree as possible, e.g. rule 6, 7, 10, 11) The select and join operations that are most restrictive should be executed before other similar operations. (This is done by reordering the leaf nodes of the tree among themselves and adjusting the rest of the tree appropriately.) SWEN 304 Lect10: Heuristic Query Optimization 20 20

Lect10: Heuristic Query Optimization An Example Initial Query Tree: Relational algebra query Query tree StudId, LName (σGrade=‘A+’ ˄ Major = ‘Math’ ˄ Dept = ‘Comp’ ˄ s.StudId=g.StudId ˄ g.CrsId=c.CrsId (Course × (Student × Grades))) SWEN 304 Lect10: Heuristic Query Optimization 21 21

Lect10: Heuristic Query Optimization Analysis of the Initial Query Tree According to the structure of the initial query tree, two Cartesian products should be executed first The main heuristic: Apply SELECT and PROJECT operations before applying the JOIN or other binary operations Hence, move select operations down the tree SWEN 304 Lect10: Heuristic Query Optimization 22 22

Lect10: Heuristic Query Optimization Query Tree After Moving Down Selects Move select operations down the tree (Rule 6) SWEN 304 Lect10: Heuristic Query Optimization 23 23

Lect10: Heuristic Query Optimization Query Tree After Introducing Joins Replacing each Cartesian product followed by a select according to a join condition with a join operator (Rule 12) SWEN 304 Lect10: Heuristic Query Optimization 24 24

Lect10: Heuristic Query Optimization Query Tree After Switching Join Nodes Assume there are less Comp courses than Math major students switching Course and Student so that the very restrictive select operation could be applied as early as possible (Rule 9) SWEN 304 Lect10: Heuristic Query Optimization 25 25

Lect10: Heuristic Query Optimization Query Tree after Pushing down Project keeping in intermediate relations only the attributes needed by subsequent operations by applying project () operations as early as possible (Rule 7) SWEN 304 Lect10: Heuristic Query Optimization 26 26

Lect10: Heuristic Query Optimization Effect of Project Operations Performing project operations as early as possible brings an improvement in the efficiency of a query execution tuples get shorter after projection, more of them will fit into a file block of the same size the same number of tuples will be contained in a smaller number of blocks As there will be less blocks to be processed by subsequent operations, the query execution will be faster SWEN 304 Lect10: Heuristic Query Optimization 27 27

Lect10: Heuristic Query Optimization Query Optimisation All transformations can be applied to the canonical evaluation plan However, there is no best operator sequence that is always optimal Efficiency depends on the current data instance, the actual implementation of base operations, access paths and indexes, etc. Idea: assign average costs to operators (nodes) and aggregate costs for each query plan SWEN 304 Lect10: Heuristic Query Optimization 28 28

Lect10: Heuristic Query Optimization Summary Heuristic optimization converts a declarative query to a canonical algebraic query tree that is then gradually transformed using transformation rules The main heuristics is to perform unary relational operations (selection and projection) before binary operations (joins, set theoretic), and aggregate functions with (or without) grouping SWEN 304 Lect10: Heuristic Query Optimization 29 29

Lect10: Heuristic Query Optimization Next Cost-based query optimization Readings: Chapter 19: Algorithms for Query Processing and Optimization Chapters 17: Disk Storage, Basic File Structures, and Hashing (Sections: 17.2, to 17.8) Chapter 18: Indexing Structures for Files (Sections: 17.1 to 17.5) Supposed knowledge: File Organization – COMP261 SWEN 304 Lect10: Heuristic Query Optimization 30 30