Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 12, Part A.
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapters 14.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
1 Relational Query Optimization Module 5, Lecture 2.
BTrees & Bitmap Indexes
1 Implementation of Relational Operations Module 5, Lecture 1.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
SPRING 2004CENG 3521 Join Algorithms Chapter 14. SPRING 2004CENG 3522 Schema for Examples Similar to old schema; rname added for variations. Reserves:
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Optimization - Selection. 2 The Selection Operation Table: Reserves(sid, bid, day, agent) A page (block) can hold 100 Reserves tuples There are 1,000.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Database systems/COMP4910/Melikyan1 Relational Query Optimization How are SQL queries are translated into relational algebra? How does the optimizer estimates.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
Implementing Natural Joins, R. Ramakrishnan and J. Gehrke with corrections by Christoph F. Eick 1 Implementing Natural Joins.
Implementation of Relational Operators/Estimated Cost 1.Select 2.Join.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Database Systems ( 資料庫系統 ) Chapter 12 Overview of Query Evaluation November 22, 2004 By Hao-hua Chu ( 朱浩華 )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Database Applications (15-415) DBMS Internals- Part IX Lecture 20, March 31, 2016 Mohammad Hammoud.
Database Management System
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Introduction to Query Optimization
Evaluation of Relational Operations
Overview of Query Optimization
Evaluation of Relational Operations: Other Operations
Introduction to Database Systems
Examples of Physical Query Plan Alternatives
Relational Operations
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Database Applications (15-415) DBMS Internals- Part IX Lecture 21, April 1, 2018 Mohammad Hammoud.
Overview of Query Evaluation
Implementation of Relational Operations
Lecture 13: Query Execution
Evaluation of Relational Operations: Other Techniques
Database Systems (資料庫系統)
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Relational Query Optimization
Relational Query Optimization
Presentation transcript:

Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar Ali Query Optimization – I

Advanced Databases: Lecture 6 Query Optimization (I) 2 What is Optimization Best use of resources. –Good time management –Effective allocations of lecturers, labs to course units Efficient solution to a problem. –Quick response to a user query Less costly. –Solar Energy Vs. Nuclear Vs. hydro-electric power –Minimum I/O, CPU cycles, Memory Space

Advanced Databases: Lecture 6 Query Optimization (I) 3 Query Optimization A classical component of a DBMS. Choosing best composition of algebraic operators to answer a query. –A query (e.g. in SQL) may have several alternative representations in algebra. –The optimizer selects a best possible algebraic representation. Choosing an efficient and less costly plan to answer a query. –One that takes less time to compute. –One with least cost (in terms of I/Os). Why Query Optimization? –To make query evaluation faster. –To reduce the response time of the query processor. –To allow the user write queries without being aware of the physical access mechanisms and without asking her/him to explicitly dictate the system how the queries should be evaluated.

Advanced Databases: Lecture 6 Query Optimization (I) 4 Recommended Text Database Management Systems By R. Ramakrishnan, Chapters 12, 13 (copy provided) Fundamental of Database Systems – 3 rd Edition By R. Elmasri and S. B. Navati, Chapter 18 An Introduction to Database Systems – 7 th Edition By C. J. Date, Chapter 17

Advanced Databases: Lecture 6 Query Optimization (I) 5 Query Processing – the clear view user/ application SQL query result of the query DBMS Query Processor

Advanced Databases: Lecture 6 Query Optimization (I) 6 Query Processing – the clear view user/ application scanning, parsing, validating Translator Logical Optimizer uses tranformations Physical Optimizer uses a cost model Runtime Database Engine Database Catalog meta data data parse treeSQL query Relational Algebra query tree optimized Relational Algebra query tree code to execute the query database statistics result of the query

Advanced Databases: Lecture 6 Query Optimization (I) 7 Example database schema We will use the following schema throughout this lecture: Sailors(sid:integer, sname:string, rating:integer, age:real) Reserves(sid:integer, bid:integer, day:date, rname:string) Consider the following statistics about the relations. –Each tuple of Reserves is 40 bytes long, –A data page can hold 100 Reserves tuples, –The size of Reserves relation is 1000 pages, –Each tuple of Sailors is 50 bytes long, –A data page can hold 80 Sailors tuples, and –The size of Sailors relation is 500 pages.

Advanced Databases: Lecture 6 Query Optimization (I) 8 Translating SQL into Relational Algebra After the SQL query is parsed and it is syntactically correct, then it is mapped onto Relational Algebra (RA) expression. Usually shown as a query tree (bottom up). Consider the SQL query: SELECT S.sname FROM Reserves R, Sailors S WHERE R.sid = S.sid AND R.bid = 100 AND S.rating > 5 The same query in RA:  sname ( bid=100 and rating > 5 (Reserves ⋈ sid=sid Sailors))

Advanced Databases: Lecture 6 Query Optimization (I) 9 Implementation of Relational Operators We will discuss how to implement: –Selection () Selects a subset of rows from a relation. –Projection () Picks only required attributes and removes unwanted attributes from a relation. –Join (⋈) Combines two relations.

Advanced Databases: Lecture 6 Query Optimization (I) 10 Access Paths There is usually more than one way to retrieve tuples from a relation, if indexes are available and if the query contains selection conditions. The selection condition comes from a select or a join. The alternative ways to retrieve tuples from a relation are called access paths. An access path is either: –A file scan (when there is no selection condition or no index can be used). –An index plus a matching selection condition. For example, attr op value, where op is an operator (, =), and there is an index available on attr.

Advanced Databases: Lecture 6 Query Optimization (I) 11 Implementing Selection operator Depends on the available file organizations, that is whether we have: –No index available and the physical file for a given relation is unsorted. Too much expensive. –No index but the file is sorted on some attribute. –A B+ tree index is available. –A Hash index is available. For each of the above, the selection operator costs differently and that is the main thing to know.

Advanced Databases: Lecture 6 Query Optimization (I) 12 Selection Operator – an Example Query Consider the following query: SELECT * FROM Reserves WHERE rname = ‘Joe’ Consider that there are 100 tuples that qualify for the result of the above query. That is 100 tuples have rname = ‘Joe’.

Advanced Databases: Lecture 6 Query Optimization (I) 13 Selection using no index & no sorting For a general selection query:  R.attr op value (R), we have to scan the entire file to get the qualifying tuples. Note that op can be, =, <>, etc. For each tuple, it is tested to see if the given condition (R.attr op value) holds. If the conditions holds then the tuple is added to the result. The cost of this approach is M I/Os, where M is the number of pages in R. For the example query, the cost is 1000 I/Os because there are 1000 pages in Reserves relation.

Advanced Databases: Lecture 6 Query Optimization (I) 14 Binary Search – Divide & Conquer An algorithm for searching elements in a sorted array or file. Algorithm BinarySearch(A, k, low, high): Input: a sorted array A storing n items in ascending order; a search key k, and integers low and high. Output: An element of A is exists or special element NoSuchKey if low > high then return NoSuchKey else mid = (low + high)/2 /* round to nearest integer */ if k = A[mid] then return A[mid] else if k < A[mid] then return BinarySearch(A, k, low, mid – 1) else return BinarySearch(A, k, mid + 1, high)

Advanced Databases: Lecture 6 Query Optimization (I) 15 Binary Search – Divide & Conquer … Suppose that in this array we are searching for item 22

Advanced Databases: Lecture 6 Query Optimization (I) 16 Binary Search – Divide & Conquer … Initially, the number of candidate items is n. After the first call to BinarySearch, it is at most n/2. After the second call to BinarySearch, it is at most n/4 or n/2 2. After each ith call to BinarySearch, the number of items. remaining is at most n/2 i. The maximum number of recursive calls performed is m < n. So we can say: n/2 m < 1 In order words: m > log 2 n Thus: m = [log 2 n] + 1 Hence: The binary search algorithm runs in O(log 2 n) time i.e. in the order of log 2 n.

Advanced Databases: Lecture 6 Query Optimization (I) 17 Selection using sorting but no index For a general selection query:  R.attr op value (R), if R is physically sorted on R.attr, we use a binary search to locate the first qualifying tuple. We keep on testing the condition on the tuples in every page that is scanned and add them to the result until the condition fails to hold. The cost of this approach is equal to the cost of binary search plus the number of pages that have been read. –The cost of binary search = log 2 M I/Os –The cost of retrieving tuples = T I/Os where T is the number of pages scanned to retrieve the qualifying tuples. For the example query, the cost is computed as follows: –The binary search cost = log = log 1000/ log 2 = 9.96  10 –Since the number of qualifying tuples are 100, 1 page will hold these tuples and scanning that page will cost 1 I/O. –So the total cost is = 11 I/Os.

Advanced Databases: Lecture 6 Query Optimization (I) 18 B+ tree Index B+ tree index is a balanced tree in which the internal nodes (the top two levels) direct the search and the leaf nodes contain data entries. Searching for a record requires just a traversal from the root to the appropriate leaf node. The length of the path from the root to a leaf is called height of the tree (usually 2 or 3). To search for entry 9*, we follow the left most child pointer from the root (as 9 6). Once at the leaf node, data entries can be found sequentially. Leaf nodes are inter-connected which makes it suitable for range queries.

Advanced Databases: Lecture 6 Query Optimization (I) 19 Selection using B+ tree index For a general selection query:  R.attr op value (R), B+ tree is best if R.attr is not equality (e.g. ). It is also good for = operator. We search the B+ tree to find the first page that contains a qualifying tuple. Assume that the tree index is clustered. We then read all those pages that contain the qualifying tuples. The cost of this approach is equal to the sum of the following: –The cost of identifying the starting page = 2 or 3 I/Os. We assume 2 I/Os throughout. –The cost of retrieving tuples = T I/Os where T is the number of pages scanned to retrieve the qualifying tuples. For the example query, the cost is computed as follows: –Since the number of qualifying tuples are 100, 1 page will hold these tuples and scanning that page will cost 1 I/O. –So the total cost is = 3 I/Os.

Advanced Databases: Lecture 6 Query Optimization (I) 20 Hash Index A function called hash function is applied to the hash field value (key field) to get the address of the disk page in which the record is stored. A bucket is a set of records. The directory is an array of size n (4 in the figure), each element is a pointer to a bucket. To search for a data entry: the hash function is applied to the search field and the last bits of its binary form is used to get a number between 0 and 3. this number gives the array position to get the pointer to the desired bucket. to locate a record with key field 5 (binary 101), we look at directory element 01 and follow the pointer to the data page (Bucket B).

Advanced Databases: Lecture 6 Query Optimization (I) 21 Selection using Hash Index For a general selection query:  R.attr op value (R), hash index is best if R.attr is equality (=). It is not good for not equality (e.g., <>). We retrieve the index page that contain the rids (record identifiers) of the qualifying tuples. Then the pages that contain these tuples are scanned. The cost of this approach is equal to the sum of the following: –The cost to retrieve the index page = 1 I/O –The cost of retrieving tuples = T I/Os where T is the number of pages scanned to retrieve the qualifying tuples. –For none-equality operators, T = the number of qualifying tuples. For the example query, the cost is computed as follows: –Since the number of qualifying tuples are 100, 1 page will hold these tuples and scanning that page will cost 1 I/O. –So the total cost is = 2 I/Os.

Advanced Databases: Lecture 6 Query Optimization (I) 22 Implementation of Selection (summary) Assuming  R.attr op value (R) No Index is available on attr and R is not sorted on attr –Cost = M I/Os, where M is the number of pages in R No Index is available on attr and R is sorted on attr –Cost = log 2 M + T I/Os, where T is the number of pages read for retrieving the qualifying tuples B+ Tree Index (clustered) is available on attr –Cost = B + T I/Os, where B is the height of the index (i.e. 2). Hash Index (clustered) is available on attr –If attr is not a primary key: Cost = H + T I/Os, where H (i.e. 1) is the I/O required to obtain the rids of the qualifying tuples. –If attr is a primary key: Cost = (H + 1) * TP I/Os, where TP is the number of the qualifying tuples.

Advanced Databases: Lecture 6 Query Optimization (I) 23 Summary of the Lecture Query Optimization –What and why Query Processing –The various stages through which a query goes Translation of SQL into Relational Algebra –Internal representation of the query Access Paths –Different paths and ways to get the same data Implementation of the Selection Operator –Different ways of evaluating selection using different access paths