Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

Copyright © 2011 Ramez Elmasri and Shamkant Navathe Algorithms for SELECT and JOIN Operations (8) Implementing the JOIN Operation: Join (EQUIJOIN, NATURAL.
1 Lecture 23: Query Execution Friday, March 4, 2005.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Dr. Kalpakis CMSC 661, Principles of Database Systems Query Execution [15]
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
COMP 451/651 Optimizing Performance
Greedy Algo. for Selecting a Join Order The "greediness" is based on the idea that we want to keep the intermediate relations as small as possible at each.
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Lecture 24: Query Execution Monday, November 20, 2000.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
1 Query Processing Two-Pass Algorithms Source: our textbook.
Query Execution 15.5 Two-pass Algorithms based on Hashing By Swathi Vegesna.
THE QUERY COMPILER 16.6 CHOOSING AN ORDER FOR JOINS By: Nitin Mathur Id: 110 CS: 257 Sec-1.
15.5 Two-Pass Algorithms Based on Hashing 115 ChenKuang Yang.
Query Execution :Nested-Loop Joins Rohit Deshmukh ID 120 CS-257 Rohit Deshmukh ID 120 CS-257.
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Query Compiler: 16.7 Completing the Physical Query-Plan CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung ID: 212.
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
Choosing an Order for Joins (16.6) Neha Saxena (214) Instructor: T.Y.Lin.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 242 Database Systems II Query Execution.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
DBMS 2001Notes 5: Query Processing1 Principles of Database Management Systems 5: Query Processing Pekka Kilpeläinen (partially based on Stanford CS245.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
CS4432: Database Systems II Query Processing- Part 3 1.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CSCE Database Systems Chapter 15: Query Execution 1.
Query Processing CS 405G Introduction to Database Systems.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Completing the Physical- Query-Plan and Chapter 16 Summary ( ) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.
Chapter 12 Query Processing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
1 Choosing an Order for Joins. 2 What is the best way to join n relations? SELECT … FROM A, B, C, D WHERE A.x = B.y AND C.z = D.z Hash-Join Sort-JoinIndex-Join.
CS 540 Database Management Systems
1 Lecture 23: Query Execution Monday, November 26, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
15.1 – Introduction to physical-Query-plan operators
CS 540 Database Management Systems
CS 440 Database Management Systems
Query Processing Exercise Session 4.
Database Management System
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Chapters 15 and 16b: Query Optimization
One-Pass Algorithms for Database Operations (15.2)
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Chapter 12 Query Processing (1)
Query Execution Index Based Algorithms (15.6)
Completing the Physical-Query-Plan and Chapter 16 Summary ( )
Presentation transcript:

Completing the Physical-Query-Plan

Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query plan with transformations. Started selecting the physical query plan by enumerating and cost-estimating the options. Also, focused on the enumeration, cost estimation, and ordering for joins of several relations.

Now what? Still several steps needed to turn the logical plan into a complete physical query plan. The principal issues are: –Selection of algorithms to implement the operations of the query plan. –Decisions regarding when intermediate results will be materialized, and when they will be pipelined. –Details regarding access methods for stored relations.

Choosing a Selection Method We’ve talked about: –The obvious implementation of a  C (R) operator, where we access the entire relation R and see which tuples satisfy condition C. –The possibility that C was of the form "attribute equals constant," and we had an index on that attribute. If so, then we can find the tuples that satisfy condition C without looking at all of R. Consider now the generalization of this problem, where we have a selection condition that is the AND of several conditions. –Assume at least one condition is of the form A=c or A<c, where A is an attribute with an index, c is a constant. –We limit our discussion to physical plans that: 1.Retrieve all tuples that satisfy a comparison for which an index exists, using an index-scan physical operator. 2.Consider each tuple selected in (1) to decide whether it satisfies the rest of the selection condition. The physical operator that performs this step is called Filter. –We’ll also consider the plan that uses no index but reads the entire relation (using the table-scan physical operator) and passes tuples to Filter operator

Selection Costs Cost of the table-scan algorithm coupled with a filter step is: B(R) Cost of a plan that –picks an equality term such as a =10 for which an index on attribute a exists, –uses index-scan to find the matching tuples, and then –filters the retrieved tuples to see if they satisfy the full condition C is: B(R)/V(R,a) if the index is clustering, and T(R)/V(R, a) if the index is not clustering. Cost of a plan that –picks an inequality term such as b < 20 for which an index on attribute b exists, –uses index-scan to retrieve the matching tuples, and then –filters the retrieved tuples to see if they satisfy the full condition C is: B(R)/3 if the index is clustering, and T(R)/3 if the index is not clustering.

Example:  x=1 AND y=2 AND z<5 (R) R(x, y, z) with: T(R) = 5000, B(R) = 200, V(R, x) = 100, V(R,y) = 500. There are indexes on all of x, y, and z, but only the index on z is clustering. Options for implementing this selection: 1.Table-scan followed by filter. Cost is B(R), or 200 disk I/O's. 2.Index-scan to find those tuples with x = 1, then filter for y = 2 and z < 5. Since there are about T(R)/V(R,x) = 50 tuples with x = 1, and the index is not clustering, we require about 50 disk I/O's. 3.Index-scan to find those tuples with y = 2, then filter for x = 1 and z < 5. Cost for using this nonclustering index is about T(R)/V(R, V), or 10 I/O's. 4.Index-scan to find those tuples with z < 5, then filter for x = 1 and y =2. Cost is about B(R)/3 = 67 I/Os (since the index on z is clustering). Least cost plan is the third!

Choosing a Join Method We’ve seen the costs associated with the various join algorithms. –If we know how many buffers are available to perform the join, we can apply the formulas for sort-joins, for hash-joins, for indexed joins, and choose a join method. What if we don’t know the number of buffers? Then, some principles are: –Call for the one-pass join, hoping that the buffer manager can devote enough buffers to the join, or that the buffer manager can come close, so thrashing is not a major cost. –Or choose a nested-loop join, hoping that the left argument will not have to be divided into too many pieces, and the join will be reasonably efficient. –Sort-join is a good choice when either: One or both arguments are already sorted on their join attribute(s), or There are two or more joins on the same attribute, such as (R(a,b)  S(a,c))  T(a,d) –where sorting R and S on a will cause the result of R  S to be sorted on a and used directly in a second sort-join. –Choose index-join for R(a, b)  S(b, c) if R is small, and there is an index on the join attribute S.b.

Pipelining Versus Materialization Materialization: The naive way to execute a query plan is to order the operations appropriately: –An operation is not performed until the argument(s) below it have been performed, and –Store the result of each operation on disk until it is needed by another operation. Pipelining: A more subtle, and generally more efficient, way to execute a query plan is to interleave the execution of several operations. –The tuples produced by one operation are passed directly to the operation that uses it, without ever storing the intermediate tuples on disk.

Pipelining Example (I) Let us consider physical query plans for the expression: (R(w,x)  S(x,y))  U(y,z) Assumptions: 1.R occupies 5000 blocks; S and U each occupy 10,000 blocks. 2.The intermediate result R  S occupies k blocks for some k. We can estimate k, based on the number of x-values in R and S and the size of (w,x,y) tuples compared to the (w,x) tuples of R and the (x,y) tuples of S. However, we want to see what happens as k varies, so we leave this constant open. 3.Both joins will be implemented as hash-joins, either one-pass or two-pass, depending on k. 4.There are 101 buffers available.

Example (II)

Example (III) First, consider the join R  S. Neither relation fits in main memory, so we need a two-pass hash-join. –If the smaller relation R is partitioned into 100 buckets on the first pass, then each bucket for R occupies 50 blocks. –If R's buckets have 50 blocks, then the second pass of the hash-join R  S uses 51 buffers, leaving 50 buffers to use for the join of the result of R  S with U. Now,suppose that k  49; that is, the result of R  S occupies at most 49 blocks. –Then we can pipeline the result of R  S into 49 buffers, and execute the second join (with U) as a one-pass join. –The total number of disk I/O's is: 45,000 to perform the two-pass hash join of R and S. 10,000 to read U in the one-pass hash-join of (R  S)  U. The total is 55,000 disk I/O's.

Example (IV) Now, suppose k > 49, but k < –We can still pipeline the result R  S, but we need to use another strategy, in which this relation is joined with U in a 50-bucket, two-pass hash-join. Before we start on R  S, we hash U into 50 buckets of 200 blocks. Next, we perform a two-pass hash join of R and S using 51 buckets as before, but as each tuple of R  S is generated, we place it in one of the 50 remaining buffers that is used to help form the 50 buckets for the join of R  S with U. Finally, we join R  S with U bucket by bucket. Since k < 5000, the buckets of R  S will be of size at most 100 blocks, so this join is feasible. The fact that buckets of U are of size 200 blocks is not a problem. –We are using buckets of R  S as the build relation and buckets of U as the probe relation in the one-pass joins of buckets. The number of disk I/O's for this pipelined join is: a) 20,000 to read U and write its tuples into buckets. b) 45,000 to perform the two-pass hash-join R  S. c) k to write out the buckets of R  S. d) k + 10,000 to read the buckets of R  S and U in the final join. The total cost is thus 75, k.

Example (V) Last, let us consider what happens when k > Now, we cannot perform a two-pass join in the 50 buffers available if the result of R  S is pipelined. So, –Compute R  S using a two-pass hash join and materialize the result on disk. –Join R  S with U, also using a two-pass hash-join. Note that since B(U) = 10,000, we can perform a two-pass hash-join using 100 buckets regardless of how large k is. –Technically, U should appear as the left argument of its join if we decide to make U the build relation for the hash join. Number of disk I/O's for this plan is: 1.45,000 for the two-pass join of R and S. 2.k to store R  S on disk. 3.30, k for the two-pass hash-join of U with R  S. Total cost is thus 75, k

Example (VI)

Notation for Physical Plans It can also be SortScan(R,L) if a sort based join is preferred.

Notation for Physical Plans