Copyright © Curt Hill 2003-2008 Query Evaluation Translating a query into action.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Implementation of Other Relational Algebra Operators, R. Ramakrishnan and J. Gehrke1 Implementation of other Relational Algebra Operators Chapter 12.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
1 Overview of Query Evaluation Chapter Objectives  Preliminaries:  Core query processing techniques  Catalog  Access paths to data  Index matching.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
SPRING 2004CENG 3521 Query Evaluation Chapters 12, 14.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
Implementation of Relational Operations CS186, Fall 2005 R&G - Chapter 14 First comes thought; then organization of that thought, into ideas and plans;
1 Chapter 10 Query Processing: The Basics. 2 External Sorting Sorting is used in implementing many relational operations Problem: –Relations are typically.
Evaluation of Relational Operators 198:541. Relational Operations  We will consider how to implement: Selection ( ) Selects a subset of rows from relation.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Query Evaluation Chapter 12.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Optimization II R&G, Chapters 12, 13, 14 Lecture 9.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
1 Implementation of Relational Operations: Joins.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Overview of Implementing Relational Operators and Query Evaluation
Copyright © Curt Hill The Relational Algebra What operations can be done?
1 Overview of Query Evaluation Chapter Overview of Query Evaluation  Plan : Tree of R.A. ops, with choice of alg for each op.  Each operator typically.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Lec3/Database Systems/COMP4910/031 Evaluation of Relational Operations Chapter 14.
Database Systems/comp4910/spring20031 Evaluation of Relational Operations Why does a DBMS implements several algorithms for each algebra operation? What.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
Computing & Information Sciences Kansas State University Tuesday, 03 Apr 2007CIS 560: Database System Concepts Lecture 29 of 42 Tuesday, 03 April 2007.
1 Database Systems ( 資料庫系統 ) December 3, 2008 Lecture #10.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 12: Query Processing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Implementing Relational Operators and Query Evaluation Chapter 12.
Chapter 12 Query Processing. Query Processing n Selection Operation n Sorting n Join Operation n Other Operations n Evaluation of Expressions 2.
Chapter 13: Query Processing
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 13 – Query Evaluation.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 12 – Introduction to.
More Optimization Exercises. Block Nested Loops Join Suppose there are B buffer pages Cost: M + ceil (M/(B-2))*N where –M is the number of pages of R.
Database Management Systems 1 Raghu Ramakrishnan Evaluation of Relational Operations Chpt 14.
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Query Processing – Implementing Set Operations and Joins Chap. 19.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Alon Levy 1 Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation. – Projection ( ) Deletes.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 10 The Basics of Query Processing. Copyright © 2005 Pearson Addison-Wesley. All rights reserved External Sorting Sorting is used in implementing.
Database Management System
Chapter 12: Query Processing
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
File Processing : Query Processing
Relational Operations
CS222P: Principles of Data Management Notes #11 Selection, Projection
Database Applications (15-415) DBMS Internals- Part VI Lecture 15, Oct 23, 2016 Mohammad Hammoud.
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Overview of Query Evaluation
Implementation of Relational Operations
CS222: Principles of Data Management Notes #11 Selection, Projection
Evaluation of Relational Operations: Other Techniques
External Sorting Sorting is used in implementing many relational operations Problem: Relations are typically large, do not fit in main memory So cannot.
Evaluation of Relational Operations: Other Techniques
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #10 Selection, Projection Instructor: Chen Li.
Join Implementation How is it done? Copyright © Curt Hill.
Presentation transcript:

Copyright © Curt Hill Query Evaluation Translating a query into action

Copyright © Curt Hill Why? Relational algebra is the means for accessing the tables –Relational calculus must be translated into algebra before it is executed Several relational operators may be implemented in various ways None of these is always best

Copyright © Curt Hill What factors are involved? The organization of the file –Existing sort order The number and types of indexes The size of the tables –Both the cardinality and the number of pages Memory space available in buffer pool The source of much of this information is the system catalogsystem catalog

Copyright © Curt Hill What is really needed? The Join is the most expensive common operator –At its heart is a cartesian product –This needs M by N accesses This is where most but not all optimizations occur Most of what will be discussed applies more to a join evaluation than most other operators

Copyright © Curt Hill The M by N problem Want Cartesian Product on two tables, A and B –A contains M pages and B N Read each page in A Read N pages of B to connect M  N accesses We desperately want to avoid this M N A B

Copyright © Curt Hill File Techniques Indexing –Use the index to only examine the involved tuples –If the key is the only field of interest the index may suffice without accessing the data Iteration –Examine all the tuples sequentially Partitioning –Group the tuples by sort key to avoid the M by N problems –Sorting and hashing do this

Copyright © Curt Hill Access Paths A way of retrieving one or more tuples from a relation Typically two ways –Scan entire file –Use an index to obtain record directly Every relational operator uses one or two tables so this is important

Copyright © Curt Hill Conjunctive Normal Form A series of comparisons with ANDs connecting them All comparisons are of the form: Attr Op Value –Attr is field name –Op is a comparison such as =, >, etc –Value is a constant –Comparison is called a conjunct CNF has nothing to do with 1NF, 2NF, 3NF, BCNF among others

Copyright © Curt Hill CNF Examples Courses table –Number = 160 AND Dept = ‘CS’ Grades table –Dept = ‘CS’ AND Score > 89 Join –S.Naid = G.Naid –First of these becomes a constant when iterating through S table

Copyright © Curt Hill Using a Hash Index Conjunct must use an equality Whole key must be –Indexed by hash index –Used in equality conjuncts Example from Grades table: –Naid = 2013 AND Dept = ‘CS’ AND Number = 160

Copyright © Curt Hill Using a B+Tree index Any comparison operator The B+Tree may only prefix a key Example –Naid = 2013 AND Dept = ‘CS’ AND Number = 160 –Only Dept and Number are actually indexed

Copyright © Curt Hill Indexing CNF The form may include conjuncts that are not indexed Those conjuncts that are indexed are called the primary conjuncts A form may have two separate sets indexed by different indexes –Either can be retrieved and then the others checked against them

Copyright © Curt Hill Selectivity of Access Paths Number of pages fetched to obtain all records –This includes index and data pages The most selective path accesses the fewest pages –This minimizes the retrieval costs –Not always predictable

Copyright © Curt Hill Reduction Each conjunct reduces the number of tuples that could be included in a query –This is the reduction factor –Each is a probability The expected probability of the conjunction of unrelated probabilities is their product It is often the case that the conjuncts are not unrelated, but this is still a good estimate

Copyright © Curt Hill Selection Use an index –If all the fields in selection criteria are indexed –If some of the selection criteria fields are indexed retrieve these and then reduce them with the other criteria Sequential scan –The criteria is tested on each tuple –Always possible If both are available –Use the cost estimators to determine which is most selective

Copyright © Curt Hill Selection Example Suppose: –File of 400 pages –Criteria X > 300 AND Y = 123 –BTree clustered index on X Use the index and then scan sequentially from this location –This works because table is sorted by the BTree

Filter Threshold Suppose that I have a choice of sequential scanning or accessing through a tree Each tree access needs several page accesses If only a few records are to be accessed the index approach will generate the fewest accesses If many then the scan will be best We can calculate a reduction probability and choose the best approach Copyright © Curt Hill

Selection Example Suppose: –Page contains 20 records –Three page accesses for a record via index What is the filter threshold? –If the probability of looking at a record is 1 of 20 (5%) then clearly sequential scan is desirable –Since it takes three accesses to get a record, divide the 5% by 3 to get 1.66%

Example Continued The filter threshold is 1.66% Estimate the reduction factor using the product of the conjunct probabilities If the reduction probability is greater than 1.66% do a sequential scan If the reduction probability is less than 1.66% use the index Copyright © Curt Hill

Projection If the required fields are all in the index, we may not need to access the data at all –Does not matter if index is clustered or not If duplicates are to be eliminated –Sort or hash the results to do the elimination

Copyright © Curt Hill Join Very important Very common Very well studied Many algorithms Most DBMS use more than one algorithm We will consider three

Copyright © Curt Hill Index Nested Loops Join Select … From t1 as A, t2 as B … Where A.x = B.y –B is indexed on Y Scan A sequentially Use B’s index to find if x exists The quality of the index determines how many accesses are needed Desirable if we can restrict one of the files before accessing

Copyright © Curt Hill Sort Merge Join AKA Zipper Join Sort both tables on the join field Does not require an index It is much less expensive if one or both files is already sorted on the field The accesses are M + N once sorted The sort itself is N log N + M log M

Copyright © Curt Hill The Sort Merge Join 1024 a 1092 b 1233 c 1279 d 1092 v 1068 u 1024 t 1024 s 1024 r FacultySchedule 1092 w 1279 x 1279 y 1279 z Only one scan of each is needed.

Copyright © Curt Hill Hash Join Sorting partitions the records so that all the possible candidates for a join can be considered at once Hash join also partitions but using a hash instead of sort Two phases –Partitioning –Probing

Copyright © Curt Hill Process Partition phase –Hash file R into the hash file r –Hash file S into the hash file s Probe phase –For each of k partitions Read partition in r Hash each tuple into an in-memory hash table using a new hash function Read partition in s Find matches with new hash function Make joins and flush to output

Copyright © Curt Hill Which to use? This is where the system catalog information plays into the process Given the right data any of these three will be best So we test the possibilities –Construct the alternative plans –Compute the estimate of accesses –Choose the lowest Considering the rest of the query is also important

Copyright © Curt Hill Query Evaluation Plan A plan is a tree of relational operators Joins, unions, intersections have two descendents (relations) Projection and selection just one These can be augmented with access paths

Copyright © Curt Hill A Query Consider the following query Select s.name, g.dept, g.score From grades g, students s Where s.naid=g.naid AND g.score>79 and g.dept = 'CS'

Copyright © Curt Hill A Plan Join s.naid = g.naid StudentsGrades Select g.score > 79 Select g.dept > ‘CS’ Project s.name, g.dept, g.score

Copyright © Curt Hill Optimization of Plans We can reorganize the tree to produce a better plan In this case push the selection below the join The join has many fewer entries This makes the index loop join or any other much less expensive Selections may be applied upon input –Thus not cause any additional accesses

Copyright © Curt Hill A Revised Plan Join s.naid = g.naid Students Grades Select g.score > 79 Select g.dept > ‘CS’ Project s.name, g.dept, g.score

Copyright © Curt Hill Other optimizations Pushing the selection below (that is before) the join makes the table smaller and the join easier A projection also makes a table smaller by eliminating columns Eliminate any column that does not appear in the rest of the query This will reduce the number of pages and thus the cost of the join Projections may be done as the table is read or written –Thus do not add accesses

Copyright © Curt Hill Another Revised Plan Join s.naid = g.naid Students Grades Select g.score > 79 Select g.dept > ‘CS’ Project s.name,s.naid Project s.name, g.dept, g.score Project g.naid,g.dept, g.score