Lecture 27: Size/Cost Estimation

Slides:



Advertisements
Similar presentations
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Advertisements

CS4432: Database Systems II
1 Lecture 23: Query Execution Friday, March 4, 2005.
Relational Operations on Bags Extended Operators of Relational Algebra.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Lecture 9 Query Optimization November 24, 2010 Dan Suciu -- CSEP544 Fall
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
Lecture 24: Query Execution Monday, November 20, 2000.
CS CS4432: Database Systems II Query Processing – Size Estimation.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Relational Operations on Bags Extended Operators of Relational Algebra.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
Distributed Databases and Query Processing. Distributed DB’s vs. Parallel DB’s Many autonomous processors that may participate in database operations.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
Advanced Relational Algebra & SQL (Part1 )
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Lecture 17: Query Execution Tuesday, February 28, 2001.
1 Lecture 15 Monday, May 20, 2002 Size Estimation, XML Processing.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Module 2: Intro to Relational Model
Query Processing Exercise Session 4.
Database Management System
Lecture 8: Relational Algebra
Lecture 26: Query Optimizations and Cost Estimation
Chapter 2: Intro to Relational Model
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Lecture 26: Query Optimization
More Relational Algebra
Query processing and optimization
Instructor: Mohamed Eltabakh
More Relation Operations
Outline - Query Processing
Lecture 28 Friday, December 7, 2001.
Lecture 33: The Relational Model 2
Lecture 25: Query Execution
Lecture 27: Optimizations
Relational Algebra Friday, 11/14/2003.
Chapter 2: Intro to Relational Model
Query Execution Index Based Algorithms (15.6)
Example of a Relation attributes (or columns) tuples (or rows)
Lecture 23: Query Execution
Lecture 28: Size/Cost Estimation, Recovery
CPSC-608 Database Systems
Chapter 2: Intro to Relational Model
Lecture 22: Query Execution
Lecture 22: Query Execution
Lecture 11: B+ Trees and Query Execution
Outline - Query Processing
CSE 544: Optimizations Wednesday, 5/10/2006.
CPSC-608 Database Systems
Lecture 26: Wednesday, December 4, 2002.
Lecture 27 Wednesday, December 5, 2001.
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 20: Query Execution
Presentation transcript:

Lecture 27: Size/Cost Estimation Monday, December 5, 2005

Types of Joins Eqjoin: R(A,B) |x|B=C S(C,D) = T(A,B,C,D) Natural join: R(A,B) |x| S(B, C) = T(A,B,C) What is the difference from eqjoin ? Theta join: |x|q. E.g. R(A,B) |x|B > CS(C,D) = T(A,B,C,D) Semijoin: R(A,B) |x S(B,C) = T(A,B) Outerjoin R(A,B) S(B,C) = T(A,B,C,D)

Outline Cost estimation: 16.4

Size Estimation The problem: Given an expression E, compute T(E) and V(E, A) This is hard without computing E Will ‘estimate’ them instead

Size Estimation Estimating the size of a projection Easy: T(PL(R)) = T(R) This is because a projection doesn’t eliminate duplicates

Size Estimation Estimating the size of a selection S = sA=c(R) T(S) san be anything from 0 to T(R) – V(R,A) + 1 Estimate: T(S) = T(R)/V(R,A) When V(R,A) is not available, estimate T(S) = T(R)/10 S = sA<c(R) T(S) can be anything from 0 to T(R) Estimate: T(S) = (c - Low(R, A))/(High(R,A) - Low(R,A))T(R) When Low, High unavailable, estimate T(S) = T(R)/3

Size Estimation Estimating the size of a natural join, R ||A S When the set of A values are disjoint, then T(R ||A S) = 0 When A is a key in S and a foreign key in R, then T(R ||A S) = T(R) When A has a unique value, the same in R and S, then T(R ||A S) = T(R) T(S)

Size Estimation Assumptions: Containment of values: if V(R,A) <= V(S,A), then the set of A values of R is included in the set of A values of S Note: this indeed holds when A is a foreign key in R, and a key in S Preservation of values: for any other attribute B, V(R || A S, B) = V(R, B) (or V(S, B))

Size Estimation Assume V(R,A) <= V(S,A) Then each tuple t in R joins some tuple(s) in S How many ? On average T(S)/V(S,A) t will contribute T(S)/V(S,A) tuples in R ||A S Hence T(R ||A S) = T(R) T(S) / V(S,A) In general: T(R ||A S) = T(R) T(S) / max(V(R,A),V(S,A))

Size Estimation Example: T(R) = 10000, T(S) = 20000 V(R,A) = 100, V(S,A) = 200 How large is R || A S ? Answer: T(R ||A S) = 10000 20000/200 = 1M

Size Estimation Joins on more than one attribute: T(R ||A,B S) = T(R) T(S)/(max(V(R,A),V(S,A))*max(V(R,B),V(S,B)))

Histograms Statistics on data maintained by the RDBMS Makes size estimation much more accurate (hence, cost estimations are more accurate)

Histograms Employee(ssn, name, salary, phone) Maintain a histogram on salary: T(Employee) = 25000, but now we know the distribution Salary: 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k Tuples 200 800 5000 12000 6500 500

Histograms Ranks(rankName, salary) Estimate the size of Employee || Salary Ranks Employee 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k 200 800 5000 12000 6500 500 Ranks 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k 8 20 40 80 100 2

Histograms Eqwidth Eqdepth 0..20 20..40 40..60 60..80 80..100 2 104 9739 152 3 0..44 44..48 48..50 50..56 55..100 2000