ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.

Slides:



Advertisements
Similar presentations
Choosing an Order for Joins
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS 540 Database Management Systems
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
1 Relational Query Optimization Module 5, Lecture 2.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
Cs44321 CS4432: Database Systems II Query Optimizer – Cost Based Optimization.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Relational Query Optimization (this time we really mean it)
Query Processing (overview)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Query Processing & Optimization
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Query Processing Presented by Aung S. Win.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Introduction to Database Systems1 Relational Query Optimization Query Processing: Topic 2.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 14 – Join Processing.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 13.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 13: Query Processing.
1 Relational Query Optimization Chapter Query Blocks: Units of Optimization  An SQL query is parsed into a collection of query blocks :  An SQL.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
1 Database Systems ( 資料庫系統 ) December 13, 2004 Chapter 15 By Hao-hua Chu ( 朱浩華 )
CS 440 Database Management Systems Lecture 5: Query Processing 1.
ICOM 5016 – Introduction to Database Systems Lecture 13- File Structures Dr. Bienvenido Vélez Electrical and Computer Engineering Department Slides by.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
CS 540 Database Management Systems
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Chapter 13: Query Processing
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Introduction to Database Systems
File Processing : Query Processing
Selected Topics: External Sorting, Join Algorithms, …
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Relational Query Optimization
Database Management Systems (CS 564)
Lecture 23: Query Execution
ICOM 5016 – Introduction to Database Systems
Evaluation of Relational Operations: Other Techniques
Relational Query Optimization (this time we really mean it)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
ICOM 5016 – Introduction to Database Systems
Presentation transcript:

ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization

ICOM 6005Dr. Manuel Rodriguez Martinez2 Query Optimization Read : –Chapter 12, sec 12.4 –Chapter 15 –SAC+79 Pages Purpose: –Study different algorithms to optimize queries submitted to the DBMS

ICOM 6005Dr. Manuel Rodriguez Martinez3 Introduction SQL query gets translated into relational algebra expression Relational algebra expression is represented as tree –This is what DBMS “understands” how to process –Expression becomes a plan once we identify access methods for each operator Relational algebra expression might have an equivalent expression –Example: R(A,B,C), S(A, D, F) But, each expression might have different cost How do we find the cheapest expression?

ICOM 6005Dr. Manuel Rodriguez Martinez4 Relational DBMS Architecture Disk Space Management Buffer Management File and Access Methods Relational Operators Query Optimizer Query Parser Client API Client DB Execution Engine Concurrency and Recovery

ICOM 6005Dr. Manuel Rodriguez Martinez5 Query Optimizer Module in DBMS in charge of finding cheapest available plan to execute a query Building one is not easy! Optimizer searches for plans and compares then based on cost Cost can be: –Resource usage –Response time –Power consumption –Number of I/Os –Network transmission cost

ICOM 6005Dr. Manuel Rodriguez Martinez6 Query Plans Query plan specifies the operations to be executed Tree of operators –Each operator corresponds to a relational operator Leaf nodes usually represent base tables  R S R S T S  A,B S  A>2

ICOM 6005Dr. Manuel Rodriguez Martinez7 Executing Query Plans Plans generated by the optimizer are fed to the execution engine Plans support iterator interface –Open – initialize the operator –Next – get next tuple from operator –Close – de-allocate resources from operator Execution engine invokes each method Invocation triggers cascade of calls –Each operator call the corresponding methods on child nodes –Example: open on join, causes call to open on outer table and call to open on inner table. Same for next and close.

ICOM 6005Dr. Manuel Rodriguez Martinez8 Pipelined vs Materialized Execution Pipelined –The output tuple from one operator immediately becomes input tuple to its parent operator in the tree Materialized –The output tuples from one operator must be stored to disk first (into a temporary table) –Once the operator finishes, its parent operator can access the materialed tuples Most execution engines use pipelined –Saving in I/O can be substantial! Some operator cannot be pipelined –Sorting, projections with duplicate elimination Query optimizer must be aware of this issue!

ICOM 6005Dr. Manuel Rodriguez Martinez9 Generation of Query Plans Optimizer generates query plan after a search finds the optimal one –According to some criteria Search is a search by construction –Alterative plans are built and compared –Cheapest one is kept Two major algorithms exist –Dynamic programming (SAC+79) Exhaustive search of plan space Finds the optimal –Randomized Algorithm Random search of plan space Quickly finds sub-optimal but good plan Optimization philosophy find a good plan quick, avoid bad ones!

ICOM 6005Dr. Manuel Rodriguez Martinez10 Left-Deep Join vs Bushy Plans Query Optimizer generate two major types of plans –Left-deep plans –Bushy plans Left-deep plans –Every join has a base table as the inner join table –Use in commercial systems (first in System R) –Good for dynamic programming –Good for optimizing resource usage Bushy plans –Joins might have intermediate tables as input to the join –Good for randomized search –Use in research prototypes for distributed databases –Good for optimizing response time

ICOM 6005Dr. Manuel Rodriguez Martinez11 Left-deep plans  R S R S T R S T U Each join always has base table as inner table

ICOM 6005Dr. Manuel Rodriguez Martinez12 Bushy Plans  R S T R S R S T U U V

ICOM 6005Dr. Manuel Rodriguez Martinez13 Left-Deep vs Bushy Plans Bushy plans –Enable parallelism in operator evaluation –Operator can execute at different rates Good in distributed environments –More complicate to build (harder optimizer) Left-deep plans –Joins are run in sequence Susceptible to bottleneck at some operator –Simpler to build (easier optimizer) –Good for single site systems Everything runs on the same machine Commercial DBMS systems use Left-deep plans

ICOM 6005Dr. Manuel Rodriguez Martinez14 Cost of a plan The cost of plan depends on the metric you wish to optimize Resource usage (CPU + I/O + Network) –Cost is the sum of the resources used by each operator Response time –Cost of the slowest path in the tree Number of I/Os –Cost is the sum of the I/Os generated by each operator Network cost –Cost is sum of cost in moving data between operators

ICOM 6005Dr. Manuel Rodriguez Martinez15 Organization of an optimizer Query Execution Engine Query Parser Parse Tree Query Plan SQL Query Catalog Manager Catalog Query Optimizer Plan Generator Cost Estimator

ICOM 6005Dr. Manuel Rodriguez Martinez16 Generating Alternatives Relational equivalences are used by the optimizer to generate different operators that do the same Selection equivalences: –Cascading selections –Commutative selections

ICOM 6005Dr. Manuel Rodriguez Martinez17 Equivalence Rules Projections –Cascading projections Joins –Commutative rule –Associative rule

ICOM 6005Dr. Manuel Rodriguez Martinez18 Equivalence Rules(2) Commute selections and projections Pushing selections Decomposing selections

ICOM 6005Dr. Manuel Rodriguez Martinez19 System R Optimizer Based on left-deep plans and dynamic programming –Most commercial systems use a System R type of optimizer Cost is based on resource usage –Cost = CPU Cost + I/O Cost –Given a plan P, cost of P is computed as Cost(P) = operatorCost(P.root) + Cost(P.root.leftChild) + Cost(P.root.rightChild)

ICOM 6005Dr. Manuel Rodriguez Martinez20 Estimating Cost of Operators Key feature for this is selectivity factors, selectivity, and join costs Example: –R has no index |R| = 100,000, ||R|| = 5000 –S has un-clustered Index on Join attribute |S| = 70,000, ||S|| = 2500 –What algorithm shall be use for join? Chose between: BNLJ, INLJ, e GHJ with 20 B –What is the cost?  R S

ICOM 6005Dr. Manuel Rodriguez Martinez21 System R Search Algorithm Idea: –Build every possible plan and keep track of Cheapest plan (overall) Cheapest plans that bring data in sorted order (called interesting orders) Dynamic Programming (divide-and-conquer) –To find plan for n-way join you First find singe table plans Then find plans for all (n-1)-way join and find a plan to join missing table with an (n-1)-way join Plan for smaller joins are saved on a table

ICOM 6005Dr. Manuel Rodriguez Martinez22 System R Search Algorithm (2) Process: –For an n-way join between tables R1, R2, …, Rn: Find the access path to access each table –Plans access to get R1, R2, …, Rn »This includes application of selection and projections for each table Find the access path to compute 2-way joins –2-way joins for all possible pairs of tables Find the access path to compute 3-way joins –Add a table to a 2-way join (forms all possible 3-way joins) Find the access path to compute 4-way joins –Add a table to a 3-way join (forms all possible 4-way joins) … Find the access path to compute the n-way join –Add a table to a n-1 way join (forms all possible n-way joins)

ICOM 6005Dr. Manuel Rodriguez Martinez23 System R Search Algorithm (2) Plan SystemROptimizer(R1, R2, …, Rn){ for (int i = 0; i < n; ++i){ // single table access paths optPlan(Ri) = selectPlan(Ri); } for (int i=2; I < n; ++i){ // join access paths, start with 2 tables, then 3, …, for all S  {R1, R2, R3, …Rn} s.t. |S| == i { // S is the next set to join bestPlan = dummy plan with infinite cost; for all Rj, Sj s.t. S = Sj  {Rj} { // Sj & Rj are pieces of S P = joinPlan(optPlan(Sj), optPlan(Rj)) if (cost(P) < cost(bestPlan)){ bestPlan = P; } optPlan(Sj) = bestPlan; } S = {R1, R2, …, Rn); return optPlan(S); }

ICOM 6005Dr. Manuel Rodriguez Martinez24 Illustration How does the algorithm works for this case Tables –R - |R| = 100,000, ||R|| = 8,000 –S - |S| = 90,000, ||S|| = 6,000 –T - |T| = 120,000 ||T|| = 10,000 –U - |U| = 80,000 ||U|| = 4,000 All tables are stored in heap files DBMS has: Blocked nested loops join, Hash Join and 25 free buffers What is the best plan for query: –R  S  T  U

ICOM 6005Dr. Manuel Rodriguez Martinez25 Illustration (2) How does the algorithm works for this case Tables –R - |R| = 100,000, ||R|| = 8,000 R is stored on a clustered B+tree matching join attribute with T –S - |S| = 90,000, ||S|| = 6,000 |R| is stored on –T - |T| = 120,000 ||T|| = 10,000 DBMS has: Blocked nested loops join, Indexed- nested loops join, and Hash Join and 3 free buffers What is the best plan for query: –R  S  T

ICOM 6005Dr. Manuel Rodriguez Martinez26 Illustration (3) How does the algorithm works for this case Tables –R - |R| = 100,000, ||R|| = 8,000 R is stored on a clustered B+tree matching join attribute with T –S - |S| = 90,000, ||S|| = 6,000 |R| is stored on –T - |T| = 120,000 ||T|| = 10,000 DBMS has: Blocked nested loops join, Indexed- nested loops join, and Hash Join and 25 free buffers What is the best plan for query: –  A>3  B = ‘NY’ (R)  S  T –If SF A>3 =.10 and SF B= ‘NY ’ = 0.05 and A>3 matches index on R.

ICOM 6005Dr. Manuel Rodriguez Martinez27 Issues with SystemR optimizer Algorithm performs exhaustive search of left-deep plans Dynamic Programming is ill-suited for optimization of response time –Principle of optimality is not observed Difficult (but not impossible) to modify for bushy plans –Search space is huge –Need pruning techniques to cut on the number of plans stored Do we need exhaustive search? –Optimal plan vs sub-optimal that is good and quick to find Disaster avoidance – More important to avoid bad plans!!!

ICOM 6005Dr. Manuel Rodriguez Martinez28 Alternative approaches Randomized Query Optimization –Use randomized algorithms to build and search plans –Good for bushy plans Rule-based Query Optimization –Use rules to guide the search and better prune space –Good to apply special cases and pruning Parametric Query Optimization –Add run-time parameters to really capture the reality of the system Multiple-query Optimization –Optimizer takes 2 or more queries at a time for optimization