CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner.
1 Lecture 23: Query Execution Friday, March 4, 2005.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
1 Relational Query Optimization Module 5, Lecture 2.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Cs4432optimization1 CS4432: Database Systems II Lecture #18 Query Optimizer – Wrap Up Professor Elke A. Rundensteiner.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS CS4432: Database Systems II Query Processing – Size Estimation.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
CPS216: Advanced Database Systems Notes 03:Query Processing (Overview, contd.) Shivnath Babu.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Advanced Databases: Lecture 8 Query Optimization (III) 1 Query Optimization Advanced Databases By Dr. Akhtar Ali.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
CPS216: Advanced Database Systems Notes 07:Query Execution Shivnath Babu.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS4432: Database Systems II Query Processing- Part 3 1.
Query Optimizer (Chapter ). Optimization Minimizes uses of resources by choosing best set of alternative query access plans considers I/O cost,
Query Optimization.  Parsing Queries  Relational algebra review  Relational algebra equivalencies  Estimating relation size  Cost based plan selection.
CS4432: Database Systems II Query Processing- Part 2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 440 Database Management Systems Query Optimization 1.
CS4432: Database Systems II Query Processing- Part 1 1.
Query Optimization. overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g., SAP admin) DBA,
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Advanced Database Systems: DBS CB, 2nd Edition
Query Processing Exercise Session 4.
Lecture 26: Query Optimizations and Cost Estimation
Running Example – Airline
Lecture 27: Size/Cost Estimation
CS 245: Database System Principles
Lecture 26: Query Optimization
CS 245: Database System Principles
Chapters 15 and 16b: Query Optimization
Outline - Query Processing
Lecture 27: Optimizations
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Query Execution Index Based Algorithms (15.6)
Lecture 28: Size/Cost Estimation, Recovery
Lecture 5- Query Optimization (continued)
Lecture 22: Query Execution
Outline - Query Processing
CPSC-608 Database Systems
Lecture 26: Wednesday, December 4, 2002.
Lecture 27 Wednesday, December 5, 2001.
Lecture 24: Query Execution
Presentation transcript:

CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner

CS 4432estimation - lecture 162 Estimating cost of query plan (1) Estimating size of results (2) Estimating # of IOs

CS 4432estimation - lecture 163 Estimating result statistics Keep statistics for relation R –T(R) : # tuples in R –S(R) : # of bytes in each R tuple –B(R): # of blocks to hold all R tuples –V(R, A) : # distinct values in R for attribute A

CS 4432estimation - lecture 164 Note: for complex expressions, need intermediate T,S,V results. E.g. W = [  A=a (R1) ] R2 Treat as relation U T(U) = ? T(U) = T(R1)/V(R1,A) S(U) = ? S(U) = S(R1) Also need V (U, *) !?

CS 4432estimation - lecture 165 To estimate Vs E.g., U =  A=a (R1) Say R1 has attributes A,B,C,D V(U, A) = V(U, B) = V(U, C) = V(U, D) =

CS 4432estimation - lecture 166 Example R1V(R1,A)=3 V(R1,B)=1 V(R1,C)=5 V(R1,D)=3 U =  A=dog (R1) ABCD cat110 cat120 dog13010 dog14030 bat15010 V(U,A) =1 V(U,B) =1 V(U,C) = T(R1) V(R1,A) V(U,D)... somewhere in between

CS 4432estimation - lecture 167 Possible Guess U =  A=a (R) V(U,A) = 1 V(U,B)= V(R,B)

CS 4432estimation - lecture 168 For Joins U = R1(A,B) R2(A,C) What are different V(U,*) values? V(U,A) = min { V(R1, A), V(R2, A) } V(U,B) = V(R1, B) V(U,C) = V(R2, C) Property: “preservation of value sets”

CS 4432estimation - lecture 169 Example: Z = R1(A,B) R2(B,C) R3(C,D) T(R1) = 1000 V(R1,A)=50 V(R1,B)=100 T(R2) = 2000 V(R2,B)=200 V(R2,C)=300 T(R3) = 3000 V(R3,C)=90 V(R3,D)=500 R1 R2 R3

CS 4432estimation - lecture 1610 T(U) = 1000  V(U,A) = 50 V(U,B) = 100 V(U,C) = 300 Partial Result: U = R1 R2

CS 4432estimation - lecture 1611 Z = U R3 T(Z) = 1000  2000   300 V(Z,A) = 50 V(Z,B) = 100 V(Z,C) = 90 V(Z,D) = 500 T(Z)?

CS 4432estimation - lecture 1612 More on Estimation Uniform distribution is not accurate since real data is not uniformly distributed. Histogram: –a data structure maintained by a DBMS to approximate a data distribution. –divide range of column values into subranges (buckets). Assume distribution within histogram bucket is uniform number of tuples in R with A value in given range

CS 4432estimation - lecture 1613 Summary on Estimation of “Sizes” Estimating size of results is an “art” Don’t forget: Statistics must be kept up to date… (cost?)

CS 4432estimation - lecture 1614CS Summary Estimating cost of query plan –Estimating size of results done! –Estimating # of IOs to do … Generate and compare plans