Access Path Selection in a Relational Database Management System

Slides:



Advertisements
Similar presentations
1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
Advertisements

Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
6.830/6.814 Lecture 5 Database Internals Continued September 17, 2014.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Access Path Selection in a RDBMS Shahram Ghandeharizadeh Computer Science Department University of Southern California.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
ICS (072)Query Processing and Optimization 1 Chapter 15 Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Chapter 19 Query Processing and Optimization
Access Path Selection in a Relation Database Management System (summarized in section 2)
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
CSE 6331 © Leonidas Fegaras System R1 System R Optimizer Read the paper (available at the course web page): G. Selinger, M. Astrahan, D. Chamberlin, R.
Access Path Selection in a Relational Database Management System Selinger et al.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
1 ICS 184: Introduction to Data Management Lecture Note 10 SQL as a Query Language (Cont.)
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Query Optimization Chap. 19. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying where.
Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
1 ICS 184: Introduction to Data Management Lecture Note 11: Assertions, Triggers, and Index.
1 Information Retrieval and Use (IRU) CE An Introduction To SQL Part 1.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
1 Chapter 10 Joins and Subqueries. 2 Joins & Subqueries Joins – Methods to combine data from multiple tables – Optimizer information can be limited based.
1 Theory, Practice & Methodology of Relational Database Design and Programming Copyright © Ellis Cohen Subqueries These slides are licensed under.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Advance Database Systems Query Optimization Ch 15 Department of Computer Science The University of Lahore.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
CS 540 Database Management Systems
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
Tuning Transact-SQL Queries
COMP3017 Advanced Databases
Module 11: File Structure
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Subqueries Schedule: Timing Topic 25 minutes Lecture
Introduction to Database Systems, CS420
Choosing Access Path The basic methods.
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Chapter 12: Query Processing
Database Performance Tuning and Query Optimization
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Writing Correlated Subqueries
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Database Management Systems (CS 564)
Physical Join Operators
File Processing : Query Processing
File Processing : Query Processing
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Lecture 2- Query Processing (continued)
Advance Database Systems
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 2: Intro to Relational Model
Chapter 11 Database Performance Tuning and Query Optimization
Subqueries Schedule: Timing Topic 25 minutes Lecture
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Database Administration
Query Processing.
Subqueries Schedule: Timing Topic 25 minutes Lecture
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Presentation transcript:

Access Path Selection in a Relational Database Management System Nimy Alex (301325706, MEng) CMPT 843

Access Path Selection in System R SYSTEM R :Experimental database management system based on the relational model of data. Does not require user to specify the type of access path required. Does not require user to specify the join order.

ACCESS PATH ? Path chosen by the system to retrieve data after a SQL request is executed.

How to find a “good” access path?

CONTENT Processing of an SQL Statement The Research Storage System Costs for Single Relation Access Path Access Path Selection Nested Queries Conclusion

PROCESSING OF SQL STATEMENT An SQL query is subjected to 4 phases of processing. PARSING OPTIMIZATION CODE GENERATION EXECUTION

PROCESSING OF SQL STATEMENT PARSING : checks for correct syntax PARSING OPTIMIZATION CODE GENERATION EXECUTION SELECT COLUMN_NAME FROM TABLE WHERE <condition 1>

PROCESSING OF SQL STATEMENT OPTIMIZATION :collects all the table and column references and checks for their existence. PARSING OPTIMIZATION CODE GENERATION EXECUTION

PROCESSING OF SQL STATEMENT CODE GENERATION :translates the output of the optimizer to executable machine language. PARSING OPTIMIZATION CODE GENERATION EXECUTION

PROCESSING OF SQL STATEMENT EXECUTION: the machine code is either executed immediately or later. PARSING OPTIMIZATION CODE GENERATION EXECUTION

PROCESSING OF SQL STATEMENT EXECUTION: the machine code is either executed immediately or later. PARSING OPTIMIZATION CODE GENERATION EXECUTION After execution, System R calls the internal storage system to scan the stored relations.

CONTENT Processing of an SQL Statement The Research Storage System Costs for Single Relation Access Path Access Path Selection Nested Queries Conclusion

The Research Storage System (RSS) Identification of relation Storage sub-system of System R that maintains Physical storage of relations Access paths of these relations Locking and Logging Recovery services The columns of the tuples are contiguous. No tuples span a page. No relations span a segment.

The Research Storage System (RSS) RSS Scans access tuples in a relations. OPEN, NEXT and CLOSE are the principle commands on a scan. Types of RSS Scan Segment Scan Index Scan

The Research Storage System (RSS) Types of RSS Scan Segment Scan Index Scan Finds all tuples of the given relation by scanning all related pages. Indexes are created for relations and stored separately. Tuples are searched using these indexes Both scans may take predicates called search arguments (SARGS). SARGS are applied to the tuples before returning the result to RSI caller.

CONTENT Processing of an SQL Statement The Research Storage System Costs for Single Relation Access Path Access Path Selection Nested Queries Conclusion

Costs for Single Relation Access Path COST = PAGE FETCHES + W*(RSI CALLS) PAGE FETCHES : I/O operations W : Adjustable weighting factor between I/O and CPU RSI CALLS : The predicted number of tuples returned from RSS. The number of RSI calls is a good approximation for CPU utilization.

Costs for Single Relation Access Path Statistics retrieved stored in System R catalog For each relation T NCARD(T) : The cardinality of relation T TCARD(T) : The number of pages in the segment that holds tuples of relation T. P(T) : The fraction of data pages in the segment that holds tuples of relation T. P(T) = TCARD(T) / No. of non-empty pages in the segment. For each index I on relation T ICARD(T) : The number of distinct keys in index I. NINDX(T) : The number of pages in index I

Costs for Single Relation Access Path SELECTIVITY FACTOR roughly corresponds to the expected fractions of tuples satisfying the predicate. e.g. Column = value F = 1 / ICARD(column index) if there is an index on column F = arbitrary otherwise e.g. Column BETWEEN value1 AND value2 F = (value2-value1) / (high key value – low key value) A ratio of the BETWEEN value range to the entire key value range

Costs for Single Relation Access Path Query Cardinality (QCARD) = (cardinality of every relation in the FROM list ) * (product of selectivity factors of all boolean factors) For single relations, the cheapest access path is obtained by evaluating the cost of each available access path

CONTENT Processing of an SQL Statement The Research Storage System Costs for Single Relation Access Path Access Path Selection Nested Queries Conclusions

Access Path Selection Outer relation : Relation from which a tuple will be retrieved first Inner relation : Relation from which tuples will be retrieved, depending on the values obtained in the outer relation tuple. Join predicate : A predicate which relates columns of two tables to be joined. Join column: The column referenced in a join predicate

Access Path Selection (for 2-way Joins) NESTED LOOPS METHOD No particular order of scan scans outer and inner relations Result: composite tuple of outer-relation- tuple/inner-relation-tuple pairs MERGING SCAN Relations are scanned in join column order. Applied on equi-joins.

Access Path Selection (for N-way Joins) IF n RELATIONS, THEN n! JOINS In joining relations t1, t2, t3, ..., tn only those orderings ti1, ti2, ti3, ..., tin are examined in which for all j (j=2, ..., n) either tij has at least one join predicate with some relation tik, where k<j Or for all k > j, tik has no join predicate with ti1, ti2, ..., or ti(j- 1) T1 – T3 – T2 T1 <-> T2 T2 <-> T3 EXCLUDED!!! T3 – T1 – T2

Access Path Selection HOW TO FIND A “GOOD” PATH ? Create a possible solution tree STEPS TO FOLLOW Search is performed by finding the best way to join subsets of the relations. For each relations, cardinality of the composite relations is saved. For unordered joins, cheapest solution is saved SOLUTION Ordered list of relations to be joined Join method used Access method plan for each relation

Computation of Costs The cost for joins are computed from the costs of the scans on each relations. C-outer(path1) -> cost of scanning the outer relation via path 1 N -> cardinality of the outer relation tuples which satisfy the applicable predicates: N = (Product of cardinalities of all relations T of the join so far) * (Product of selectivity factors of all applicable predicates) C-inner(path2) -> cost of scanning the inner relation C-nested loop join (path1,path2) = C-outer(path1) + N * C-inner (path2) C-merge (path1,path2) = C-outer(path1) + N * C-inner (path2)

Example EMP DEPT JOB NAME SMITH JONES DOE DNO 50 51 JOB 12 5 SAL 8500 15000 9500 DEPT DNO 50 51 DNAME MFG BILLING SHIPPING LOC DENVER BOULDER JOB JOB 12 5 TITLE CLERK TYPIST SALES MECHANIC SELECT NAME, TITLE, SAL, DNAME FROM EMP, DEPT, JOB WHERE TITLE=“CLERK” AND LOC =“DENVER” AND EMP.DNO = DEPT.DNO AND EMP.JOB = JOB.JOB

Example

CONTENT Processing of an SQL Statement The Research Storage System Costs for Single Relation Access Path Access Path Selection Nested Queries Conclusions

Nested Queries SELECT NAME FROM EMPLOYEE WHERE SALARY = ( SELECT AVG(SALARY) FROM EMPLOYEE ) SELECT NAME FROM EMPLOYEE WHERE DEPNO IN ( SELECT DEPNO FROM DEPARTMENT WHERE LOCATION=“DENVER” ) Subquery is evaluated before the main query and is evaluated only once. SELECT NAME FROM EMPLOYEE X WHERE SALARY > ( SELECT SALARY FROM EMPLOYEE WHERE EMPLOYEE_NUMBER=X.MANAGER) Correlation Subquery must be re-evaluated for each candidate tuple from the referenced query block.

CONTENT Processing of an SQL Statement The Research Storage System Costs for Single Relation Access Path Access Path Selection Nested Queries Conclusion

Conclusion True optimal path was selected in majority of the cases. Key contributors: expanded use of statistics , inclusion of CPU utilization in cost calculations, method to determine join order More work on validation of the optimizer cost formulas needs to be done, but this preliminary work shows that database management systems can support non-procedural query languages with performance comparable to those supporting the current more procedural languales.

Thank you!