16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.

Slides:



Advertisements
Similar presentations
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Advertisements

CS4432: Database Systems II
1 Lecture 23: Query Execution Friday, March 4, 2005.
Join Processing in Databases Systems with Large Main Memories
15.8 Algorithms using more than two passes Presented By: Seungbeom Ma (ID 125) Professor: Dr. T. Y. Lin Computer Science Department San Jose State University.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
COMP 451/651 Optimizing Performance
Nested-Loop joins “one-and-a-half” pass method, since one relation will be read just once. Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
Lecture 24: Query Execution Monday, November 20, 2000.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1  Simple Nested Loops Join:  Block Nested Loops Join  Index Nested Loops Join  Sort Merge Join  Hash Join  Hybrid Hash Join Evaluation of Relational.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
ONE PASS ALGORITHM PRESENTED BY: PRADHYUMAN RAOL ID : 114 Instructor: Dr T.Y. LIN.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Evaluation of Relational Operations. Relational Operations v We will consider how to implement: – Selection ( ) Selects a subset of rows from relation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CSCE Database Systems Chapter 15: Query Execution 1.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
CS 257 Chapter – 15.9 Summary of Query Execution Database Systems: The Complete Book Krishna Vellanki 124.
Multi pass algorithms. Nested-Loop joins Tuple-Based Nested-loop Join Algorithm: FOR each tuple s in S DO FOR each tuple r in R DO IF r and s join to.
CS4432: Database Systems II Query Processing- Part 2.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations – Join Chapter 14 Ramakrishnan and Gehrke (Section 14.4)
Lecture 17: Query Execution Tuesday, February 28, 2001.
Implementation of Database Systems, Jarek Gryz1 Evaluation of Relational Operations Chapter 12, Part A.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Processing Spring 2016.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Evaluation of Relational Operations Chapter 14, Part A (Joins)
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
Chapter (6) The Relational Algebra and Relational Calculus Objectives
CS 540 Database Management Systems
CS 440 Database Management Systems
Query Processing Exercise Session 4.
Database Management System
Lecture 27: Size/Cost Estimation
Lecture 16: Relational Operators
Evaluation of Relational Operations
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Database Management Systems (CS 564)
The Relational Algebra and Relational Calculus
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
Query processing and optimization
Instructor: Mohamed Eltabakh
Outline - Query Processing
Lecture 2- Query Processing (continued)
One-Pass Algorithms for Database Operations (15.2)
Lecture 27: Optimizations
Lecture 24: Query Execution
Query Execution Index Based Algorithms (15.6)
Lecture 23: Query Execution
Lecture 28: Size/Cost Estimation, Recovery
CPSC-608 Database Systems
Lecture 22: Query Execution
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Lecture 22: Query Execution
Lecture 11: B+ Trees and Query Execution
CPSC-608 Database Systems
Lecture 27 Wednesday, December 5, 2001.
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University

Introduction Possible Physical Plan Estimating Sizes of Intermediate Relations Estimating the Size of a Projection Estimating the Size of a Selection Estimating the Size of a Join Natural Joins With Multiple Join Attributes Joins of Many Relations Estimating Sizes of Other Operations

Physical Plan For each physical plan select An order and grouping for associative-and- commutative operations like joins, unions. An Algorithm for each operator in the logical plan. eg: whether nested loop join or hash join to be used Additional operators that are needed for the physical plan but that were not present explicitly in the logical plan. eg: scanning, sorting The way in which arguments are passed from one operator to the next.

Estimating Sizes of Intermediate Relations Rules for estimating the number of tuples in an intermediate relation: 1.Give accurate estimates 2.Are easy to compute 3.Are logically consistent Objective of estimation is to select best physical plan with least cost.

Estimating the Size of a Projection The projection is different from the other operators, in that the size of the result is computable. Since a projection produces a result tuple for every argument tuple, the only change in the output size is the change in the lengths of the tuples.

Estimating the Size of a Selection(1) Let,,where A is an attribute of R and C is a constant. Then we recommend as an estimate: T(S) =T(R)/V(R,A) The rule above surely holds if all values of attribute A occur equally often in the database.

Estimating the Size of a Selection(2) If,then our estimate for T(s) is: T(S) = T(R)/3 We may use T(S)=T(R)(V(R,a) -1 )/ V(R,a) as an estimate. When the selection condition C is the And of several equalities and inequalities, we can treat the selection as a cascade of simple selections, each of which checks for one of the conditions.

Estimating the Size of a Selection(3) A less simple, but possibly more accurate estimate of the size of is to assume that C1 and of which satisfy C2, we would estimate the number of tuples in S as In explanation, is the fraction of tuples that do not satisfy C1, and is the fraction that do not satisfy C2. The product of these numbers is the fraction of R’s tuples that are not in S, and 1 minus this product is the fraction that are in S.

Estimating the Size of a Join two simplifying assumptions: 1. Containment of Value Sets If R and S are two relations with attribute Y and V(R,Y)<=V(S,Y) then every Y-value of R will be a Y-value of S. 2. Preservation of Value Sets Join a relation R with another relation S with attribute A in R and not in S then all distinct values of A are preserved and not lost.V(S R,A) = V(R,A) Under these assumptions, we estimate T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))

Natural Joins With Multiple Join Attributes Of the T(R),T(S) pairs of tuples from R and S, the expected number of pairs that match in both y1 and y2 is: T(R)T(S)/max(V(R,y1), V(S,y1)) max(V(R, y2), V(S, y2)) In general, the following rule can be used to estimate the size of a natural join when there are any number of attributes shared between the two relations. The estimate of the size of R S is computed by multiplying T(R) by T(S) and dividing by the largest of V(R,y) and V(S,y) for each attribute y that is common to R and S.

Joins of Many Relations(1) rule for estimating the size of any join Start with the product of the number of tuples in each relation. Then, for each attribute A appearing at least twice, divide by all but the least of V(R,A)’s. We can estimate the number of values that will remain for attribute A after the join. By the preservation-of-value-sets assumption, it is the least of these V(R,A)’s.

Joins of Many Relations(2) Based on the two assumptions-containment and preservation of value sets: No matter how we group and order the terms in a natural join of n relations, the estimation of rules, applied to each join individually, yield the same estimate for the size of the result. Moreover, this estimate is the same that we get if we apply the rule for the join of all n relations as a whole.

Estimating Sizes for Other Operations Union: the average of the sum and the larger. Intersection: approach1: take the average of the extremes, which is the half the smaller. approach2: intersection is an extreme case of the natural join, use the formula T(R S) = T(R)T(S)/max(V(R,Y), V(S, Y))

Estimating Sizes for Other Operations Difference: T(R)-(1/2)*T(S) Duplicate Elimination: take the smaller of (1/2)*T(R) and the product of all the V(R, )’s. Grouping and Aggregation: upper-bound the number of groups by a product of V(R,A)’s, here attribute A ranges over only the grouping attributes of L. An estimate is the smaller of (1/2)*T(R) and this product.

THANK YOU