CSE 6331 © Leonidas Fegaras System R1 System R Optimizer Read the paper (available at the course web page): G. Selinger, M. Astrahan, D. Chamberlin, R.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
Query Rewrite: Predicate Pushdown (through grouping) Select bid, Max(age) From Reserves R, Sailors S Where R.sid=S.sid GroupBy bid Having Max(age) > 40.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
Relational Query Optimization (this time we really mean it)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
David Konopnicki Choosing Access Path ä The basic methods. ä The access paths and when they are available. ä How the optimizer chooses among the.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Access Path Selection in a RDBMS Shahram Ghandeharizadeh Computer Science Department University of Southern California.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Query Processing & Optimization
1 Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be? Example:
Access Path Selection in a Relation Database Management System (summarized in section 2)
Overview of Query Optimization v Plan : Tree of R.A. ops, with choice of alg for each op. –Each operator typically implemented using a `pull’ interface:
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
COMP 5138 Relational Database Management Systems Semester 2, 2007 Lecture 12 Query Processing and Optimization.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Copyright © Curt Hill Query Evaluation Translating a query into action.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
1 Relational Query Optimization Chapter Query Blocks: Units of Optimization  An SQL query is parsed into a collection of query blocks :  An SQL.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
Cost Estimation For each plan considered, must estimate cost: –Must estimate cost of each operation in plan tree. Depends on input cardinalities. –Must.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 13: Query Processing
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
Database Management System
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Access Path Selection in a Relational Database Management System
File Processing : Query Processing
Lecture 2- Query Processing (continued)
Advance Database Systems
Overview of Query Evaluation
Database Administration
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

CSE 6331 © Leonidas Fegaras System R1 System R Optimizer Read the paper (available at the course web page): G. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price: Access Path Selection in a Relational Database Management System. Proceedings of the ACM-SIGMOD International Conference on Management of Data, Boston, Massachusetts, pages 23-34, May 1979.

CSE 6331 © Leonidas Fegaras System R2 System R Highlights The System R query optimizer is the most widely used currently. It works well for <10 joins. The original prototype (Phase 0, ) was single-user, no locking, no recovery. Plan costing was based on # of tuples accessed only (rather than # of blocks, plus CPU time). Phase 1 ( ) handled memory management, locking, logging: Storage manager: RSS (Research Storage System) –RSI interface provides tuple-at-a-time operators on base relations –RSS scan (operations: OPEN, NEXT, CLOSE), which can be either a segment scan (SARS), where each page is touched once, or an index scan (SARGS) for B + -tree indexes. Query processor: RDS (Research Data System) –Chooses a low cost plan from those provided by RSS –Does query analysis, catalog lookup, authorization, query optimization.

CSE 6331 © Leonidas Fegaras System R3 Access Methods A sargable predicate takes the form: TABLE.COLUMN compare value and may specified on any of the scans to filter the stream records. A SARG is a disjunctive normal form with sargable predicates. Storage structures: Data are stored in a set of logical spaces, called segments, used for the control of physical clustering. They contain user data, indexes, the user catalog, intermediate query results. A segment is a set of pages and may contain tuples from many relations. A relation though does not cross segment boundaries. Each segment has a mapping table that translates logical addresses to physical addresses.

CSE 6331 © Leonidas Fegaras System R4 Query Optimization Exhaustive search of the solution space to select the best evaluation plan from nearly all possible plans It considers both I/O and CPU costs: cost = (page fetches) + W * (RSI calls) where (RSI calls) is the number of tuples processed and W is the weighting factor between I/O and CPU (typically, 0.1 to 0.3) Instead of using heuristic rules, it uses statistical information It takes into account the order of the result tuples of an operator to select the next operator Selectivity: expected fraction of tuples that satisfy a predicate: F(pred) = |  pred (R 1 x…xR n )| / |(R 1 x…xR n )| given the input relations, R 1,…,R n, of the predicate Uses table/index cardinality and # of unique values in an attribute.

CSE 6331 © Leonidas Fegaras System R5 Statistics NCARD(T)cardinality of relation T TCARD(T)size of T in pages P(T)percentage of non-empty pages for T ICARD(I)cardinality of index I (# of distinct keys) NINDEX(I)size of I in pages high/low key values of index I Selectivity of a predicate F(pred) is calculated in terms of the above statistics. If there is an index I on column T.A, then: F(T.A = value) = 1 / ICARD(I) assuming an even distribution of tuples among the index keys. Otherwise (if there is no index), F(T.A = value) = 1 / 10 (why?)

CSE 6331 © Leonidas Fegaras System R6 More on Selectivities F(R.A=S.B) = 1 / max( ICARD(I R ),ICARD(I S )) if there are indexes I R /I S for R.A/S.B = 1 / ICARD(I) if there is an index I for R.A or S.B = 1 / 10 otherwise F(T.A > value) or >=, <, <= = (high key value - value) / (high key value - low key value) = 1/3otherwise F(T.A between value1 and value2) = (value2 - value1) / (high key value - low key value) = 1/4otherwise F(T.A in [value 1,…,value n ]) = n * F(T.A = value) F(T.A in subquery) = F(subquery)

CSE 6331 © Leonidas Fegaras System R7 More on Selectivities F(pred 1 or pred 2 ) = F(pred 1 ) + F(pred 2 ) – F(pred 1 ) * F(pred 2 ) F(pred 1 and pred 2 ) = F(pred 1 ) * F(pred 2 ) F(not pred) = 1 – F(pred) For a query Q: select * from T 1,…, T n where pred The Query Cardinality: QCARD(Q) =  i NCARD(T i ) * F(pred) RSICARD(Q) =  i NCARD(T i ) * F(sargable predicates) (for RSI calls).

CSE 6331 © Leonidas Fegaras System R8 Approach Concept of interesting order: order specified by GROUP-BY or ORDER-BY or requested by a merge-join. Use of dynamic programming to limit the number of alternative plans. Support for equivalent classes for access plans to avoid repeating search (memoization). Use of pruning to reduce the search space. –Prune uninteresting plans (plans that do not deliver interesting order) –Don’t prune operator implementations based on cost, because they may result to an overall cheaper plan –Avoid cross products –Prune join implementations –Prune within a class –Merge equivalent classes.

CSE 6331 © Leonidas Fegaras System R9 Single Relation Queries If it is a group-by or an order-by query, the interesting order is the group-by/order-by attributes. Access Paths: Examine the cheapest access path which produces tuples in each interesting order. Examine the cheapest unordered access path. Cost Formulas: Unique index matching a equal predicate:1+1+W Matching clustered index I of table T: F(preds)*(NINDX(I)+TCARD(T)) + W*RSICARD Matching non-clustered index I of table T: F(preds)*(NINDX(I)+NCARD(T)) + W*RSICARD Non-matching index: use F(preds)=1 Segment scan:TCARD(T)/P(T) + W*RSICARD

CSE 6331 © Leonidas Fegaras System R10 Joins Joins considered: nested loops and merging scans (for equi- joins only). Only left-deep trees are considered. Join order permutations: –For n tables, there are n! permutations –The problem is now finding an order for n tables, t1,…,tn, with minimal cost. Assumption: once the first k relations are joined, the method to join the result to the k+1’th relation is independent of the order of joining the first k relations. –Is it valid? –Is it realistic? –Optimal substructure => dynamic programming.

CSE 6331 © Leonidas Fegaras System R11 Dynamic Programming A join order heuristic is used to prune out cartesian products: We examine join orders t1,…,tn if for each j: either tj has a join predicate with tk, for some k<j or tj does not have a join predicate, for all k>j e.g. query graph: T1 -- T2 – T3, join orders not considered: T1,T3,T2 and T3,T1,T2 Interesting orders: group-by attributes, order-by attributes, and every join attribute (since merging scan needs its inputs joined by the join attributes). Needs to store 2 n solutions, one for each subset of n tables, times the number of applicable interesting orders plus one (for the unorder case). Each solution has a cost, cardinality, and delivered order. For each subset of k relations, we join the composite result with one of the n-k remaining relations and store the result.

CSE 6331 © Leonidas Fegaras System R12 Plan Costing Cardinality of outer relation:N(estimated using selectivities) Cost of joins: Nested loops: cost(outer)+N*cost(inner) Merge scan join:cost(outer)+cost(inner) + sorting-cost

CSE 6331 © Leonidas Fegaras System R13 Example select name, title, sal, dname from emp, dept, job where title = ‘clerk’ and loc = ‘denver’ and emp.dno = dept.dno and emp.job = job.job Interesting orders: dno and job. Access paths for single relations: EMP: index scan using emp.dno, index scan using emp.job, segment scan (pruned) DEPT: index scan using dept.dno, segment scan (pruned) JOB: index scan using job.job, segment scan (cheaper, so not pruned).

CSE 6331 © Leonidas Fegaras System R14 Example (cont.) Two relations: emp*dept: –Nested loop: index scan on emp.dno, index scan on dept.dno, –Nested loop: index scan on emp.job, index scan on dept.dno, –Merge join: index scan on emp.dno, merge emp.dno with dept.dno –Merge join: sort emp.job by dno into L1, merge L1 with dept.dno job*emp –Nested loop: segment scan job, index scan on emp.job –Merge join: segment scan on job, sort by job.job into L2, merge L2 with emp.job –Merge join: index scan on job.job, merge L2 with emp.job emp*job –… dept*emp –…

CSE 6331 © Leonidas Fegaras System R15 Example (cont.) Three relations: (emp*dept)*job –Get (emp*dept) with job order, sort job segment scan by job into L2, merge the two results –… (emp*job)*dept –Get the (emp*job) with job order, sort it by dno into L5, merge L5 with dept.dno scan –…

CSE 6331 © Leonidas Fegaras System R16 Nested Queries Classification: Uncorrelated subquery: single-shot evaluation subquery select name from employee where salary = (select avg(salary) from employee) If the subquery returns a value, replace it with that value: select name from employee where salary = K where K is the average salary. If the subquery returns a set, evaluate the subquery and store the result in a table: select name from employee where dept in (select dno from dept where location = ‘denver’)

CSE 6331 © Leonidas Fegaras System R17 Nested Queries (cont.) Correlated subqueries: when it refers to variables/values obtained from a higher-level query block: select name from employee x where salary > (select salary from employee where num = x.manager) Naïve nested-loop evaluation: for each employee x, evaluate the inner query (very expensive). Better evaluation: select x.name from employee x, employee m where m.num = x.manager and x.salary > m.salary