CS 44321 CS4432: Database Systems II Query Processing – Size Estimation.

Slides:



Advertisements
Similar presentations
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Advertisements

CS4432: Database Systems II
CS CS4432: Database Systems II Logical Plan Rewriting.
1 Lecture 23: Query Execution Friday, March 4, 2005.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Oct 28, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
CS 4432query processing - lecture 131 CS4432: Database Systems II Lecture #13 Query Processing Professor Elke A. Rundensteiner.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Nov 18, 2003Murali Mani Relational Algebra B term 2004: lecture 10, 11.
Murali Mani Relational Algebra. Murali Mani What is Relational Algebra? Defines operations (data retrieval) for relational model SQL’s DML (Data Manipulation.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
QUERY PROCESSING AND OPTIMIZATION. Overview SQL is a declarative language:  Specifies the outcome of the computation without defining any flow of control.
CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
Query Processing Notes. Query Processing The query processor turns user queries and data modification commands into a query plan - a sequence of operations.
CPS216: Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Query Compilation Contains slides by Hector Garcia-Molina.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS4432: Database Systems II Query Processing- Part 3 1.
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
Lecture 17: Query Execution Tuesday, February 28, 2001.
Query Processing COMP3017 Advanced Databases Nicholas Gibbins
CS4432: Database Systems II Query Processing- Part 1 1.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Advanced Database Systems: DBS CB, 2nd Edition
Query Processing Exercise Session 4.
Dr. Ameria Eldosoky Discrete mathematics
Lecture 27: Size/Cost Estimation
CS 245: Database System Principles
Lecture 26: Query Optimization
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
CS 245: Database System Principles
Chapters 15 and 16b: Query Optimization
Outline - Query Processing
Lecture 27: Optimizations
Relational Algebra Friday, 11/14/2003.
Query Execution Index Based Algorithms (15.6)
Lecture 28: Size/Cost Estimation, Recovery
Data-Intensive Computing Systems Query Execution (Sort and Join operators) Shivnath Babu.
Lecture 22: Query Execution
C. Faloutsos Query Optimization – part 2
Lecture 11: B+ Trees and Query Execution
Yan Huang - CSCI5330 Database Implementation – Query Processing
Outline - Query Processing
CPSC-608 Database Systems
Lecture 27 Wednesday, December 5, 2001.
Presentation transcript:

CS CS4432: Database Systems II Query Processing – Size Estimation

CS Estimating cost of query plan (1)Estimating size of results Read Section 16.4 (2) Estimating # of IOs

CS Estimating result size Keep statistics for relation R –T(R) : # tuples in R –S(R) : # of bytes in each R tuple –B(R): # of blocks to hold all R tuples –V(R, A) : # distinct values in R for attribute A

CS Example RA: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string ABCD cat110a cat120b dog130a dog140c bat150d #-of-tuples T(R) = 5 size-tuple S(R) = 37 V(R,A) = 3V(R,C) = 5 V(R,B) = 1V(R,D) = 4

CS Size estimates for W = R1 x R2 T(W) = S(W) = T(R1)  T(R2) S(R1) + S(R2)

CS S(W) = S(R) T(W) = ? ? <= T(R) Size estimate for W =  A=a  (R)

CS Example RV(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W =  z=val (R) T(W) = ABCD cat110a cat120b dog130a dog140c bat150d T(R) V(R,Z)

CS Assumptions: Values in select expression Z = val are uniformly distributed : -over possible V(R,Z) values. - over domain with DOM(R,Z) values.

CS Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 ABCD cat110a cat120b dog130a dog140c bat150d W =  z=val (R) T(W) = ?

CS Example R Alternate assumption V(R,A)=3 DOM(R,A)=10 V(R,B)=1 DOM(R,B)=10 V(R,C)=5 DOM(R,C)=10 V(R,D)=4 DOM(R,D)=10 ABCD cat110a cat120b dog130a dog140c bat150d W =  z=val (R) T(W) = T(R) DOM(R,Z)

CS C=val  T(W) = (1/10)1 + (1/10) = (5/10) = 0.5 B=val  T(W)= (1/10) = 0.5 A=val  T(W)= (1/10)2 + (1/10)2 + (1/10)1 = 0.5

CS Selection cardinality SC(R,A) = average # records that satisfy equality condition on R.A T(R) V(R,A) SC(R,A) = T(R) DOM(R,A)

CS What about W =  z  val (R) ? T(W) = ? Solution # 1: T(W) = T(R)/2 Solution # 2: T(W) = T(R)/3

CS Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W=  z  15 (R) Max=20 f = = 6 ( fraction of range over complete domain ) T(W) = f  T(R)

CS Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 X  Y =  Same as R1 x R2 Case 1

CS W = R1 R2 X  Y = A I.e., R1.A = R2.A R1 A B C R2 A D R1.A = R2.A Case 2 Assumption: V(R1,A)  V(R2,A)  Every A value in R1 is in R2 V(R2,A)  V(R1,A)  Every A value in R2 is in R1 “containment of value sets” Sec

CS R1 A B C R2 A D Computing T(W) when V(R1,A)  V(R2,A) Take 1 tuple Match 1 tuple matches with T(R2) tuples... V(R2,A) so T(W) = T(R2)  T(R1) V(R2, A)

CS V(R1,A)  V(R2,A) T(W) = T(R2) T(R1) V(R2,A) V(R2,A)  V(R1,A) T(W) = T(R2) T(R1) V(R1,A) [A is common attribute]

CS T(W) = T(R2) T(R1) max{ V(R1,A), V(R2,A) } In general W = R1 R2

CS with alternate assumption Values uniformly distributed over domain R1 ABC R2 A D This tuple matches T(R2)/DOM(R2,A) so T(W) = T(R2) T(R1) = T(R2) T(R1) DOM(R2, A) DOM(R1, A) Case 2 Assume the same

CS In all cases: S(W) = S(R1) + S(R2) - S(A) size of attribute A

CS Using similar ideas, we can estimate sizes of:  AB (R) ….. Sec  A=a  B=b (R) …. Sec R S with common attribs. A,B,C Sec Union, intersection, diff, …. Sec

CS What Else ? For complex expressions, need intermediate T,S,V results. E.g. W = [  A=a (R1) ] R2 Treat as relation U T(U) = T(R1)/V(R1,A) S(U) = S(R1) Also need V (U, *) !!

CS To estimate Vs E.g., U =  A=a (R1) Say R1 has attribs A,B,C,D V(U, A) = V(U, B) = V(U, C) = V(U, D) =

CS Example R1V(R1,A)=3 V(R1,B)=1 V(R1,C)=5 V(R1,D)=3 U =  A=a (R1) ABCD cat110 cat120 dog13010 dog14030 bat15010 V(U,A) =1 V(U,B) =1 V(U,C) = T(R1) V(R1,A) V(D,U)... somewhere in between

CS Possible Guess U =  A=a (R) V(U,A) = 1 V(U,B)= V(R,B)

CS For Joins U = R1(A,B) R2(A,C) V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) [called “preservation of value sets” in section ]

CS Example: Z = R1(A,B) R2(B,C) R3(C,D) T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 R1 R2 R3

CS T(U) = 1000  2000 V(U,A) = V(U,B) = 100 V(U,C) = 300 Partial Result: U = R1 R2

CS Z = U R3 T(Z) = 1000  2000  3000 V(Z,A) =  300 V(Z,B) = 100 V(Z,C) = 90 V(Z,D) = 500

CS Summary Estimating size of results is an “art” Don’t forget: Statistics must be kept up to date…