Outline - Query Processing

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CSE544 Database Statistics Tuesday, February 15 th, 2011 Dan Suciu , Winter
1 Lecture 23: Query Execution Friday, March 4, 2005.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Query Optimization Goal: Declarative SQL query
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Query processing and optimization. Advanced DatabasesQuery processing and optimization2 Definitions Query processing –translation of query into low-level.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
15.6 Index-based Algorithms Jindou Jiao 101. Index-based algorithms are especially useful for the selection operator Algorithms for join and other binary.
CS CS4432: Database Systems II Query Processing – Size Estimation.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Introduction to Query Optimization Chapter 13.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
Lecture 17: Query Execution Tuesday, February 28, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Advanced Database Systems: DBS CB, 2nd Edition
CS 540 Database Management Systems
Module 2: Intro to Relational Model
Query Processing Exercise Session 4.
Database Management System
Lecture 26: Query Optimizations and Cost Estimation
Lecture 27: Size/Cost Estimation
Chapter 12: Query Processing
CS 245: Database System Principles
COST ESTIMATION FOR THE RELATIONAL ALGEBRA OPERATIONS MIT 813 GROUP 15 PRESENTATION.
Chapter 15 QUERY EXECUTION.
Query Execution Presented by Khadke, Suvarna CS 257
Cse 344 April 25th – Disk i/o.
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Query processing and optimization
CS 245: Database System Principles
Chapters 15 and 16b: Query Optimization
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Lecture 27: Optimizations
Query Functions.
Lecture 28: Size/Cost Estimation, Recovery
CPSC-608 Database Systems
CPSC-608 Database Systems
Query processing and optimization
Lecture 5- Query Optimization (continued)
Lecture 22: Query Execution
C. Faloutsos Query Optimization – part 2
Outline - Query Processing
CPSC-608 Database Systems
Relational Query Optimization
Lecture 26: Wednesday, December 4, 2002.
Relational Query Optimization
Lecture 27 Wednesday, December 5, 2001.
Lecture 24: Query Execution
Presentation transcript:

Outline - Query Processing Relational algebra level transformations good transformations Detailed query plan level estimate costs generate and compare plans

Estimating cost of query plan (1) Estimating size of results (2) Estimating # of IOs

Estimating result size Keep statistics for relation R - T(R) : # tuples in R - L(R) : # of bytes in each R tuple - B(R): # of blocks to hold all R tuples - V(R, A) : # distinct values in R for attribute A - b: block size - bf(R) (blocking factor): # of tuples in a block bf(R) = b/L(R)

Example R A: 20 byte string B: 4 byte integer C: 8 byte date D: 5 byte string A B C D cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d T(R) = 5 L(R) = 37 V(R,A) = 3 V(R,C) = 5 V(R,B) = 1 V(R,D) = 4

Size estimates for W = R x S T(W) = T(R)  T(S) L(W) = L(R) + L(S) bf(W) = b/(L(R)+L(S)) B(W) = T(R)*T(S)/bf(W) = = T(R)*T(S)*L(S)/b + T(S)*T(R)*L(R)/b = = T(R)*T(S)/bf(S) + T(S)*T(R)/bf(R) = = T(R)*B(S) + T(S)*B(R)

Size estimate for W = sA=a (R) L(W) = L(R) T(W) = ?

Example R V(R,A)=3 V(R,B)=1 V(R,C)=5 V(R,D)=4 W = sz=val(R) T(W) = cat 1 10 a cat 1 20 b dog 1 30 a dog 1 40 c bat 1 50 d T(R) V(R,Z)

Selection cardinality SC(R,A) = average # records that satisfy equality condition on R.A SC(R,A) = T(R) / V(R,A)

What about W = sz  val (R) ? Solution # 1: T(W) = T(R)/2 Solution # 2: T(W) = T(R)/3

Solution # 3: Estimate values in range Example R Z Min=1 V(R,Z)=10 W= sz  15 (R) Max=20 f = 20-15+1 = 6 (fraction of range) 20-1+1 20 T(W) = f  T(R)

Equivalently: fV(R,Z) = fraction of distinct values T(W) = [f  V(Z,R)] T(R) = f  T(R) V(Z,R)

X  Y =  Size estimate for W = R1 R2 Let x = attributes of R1 y = attributes of R2 X  Y =  Same as R1 x R2 Case 1

W = R1 R2 X  Y = A Case 2 R1 A B C R2 A D Assumption: V(R1,A)  V(R2,A)  Every A value in R1 is in R2 V(R2,A)  V(R1,A)  Every A value in R2 is in R1 “containment of value sets”

Computing T(W) when V(R1,A)  V(R2,A) R1 A B C R2 A D Take 1 tuple Match 1 tuple matches with T(R2) tuples... V(R2,A) so T(W) = T(R2)  T(R1) V(R2, A)

V(R1,A)  V(R2,A) T(W) = T(R2) T(R1) [A is common attribute]

In general W = R1 R2 T(W) = T(R2) T(R1) max{ V(R1,A), V(R2,A) }

Size Estimation Summary (1/2) SC(R,A) (--> SC(R,A) = T(R) / V(R,A)) multiplying probabilities probability that a record satisfy none of θ:

Size Estimation Summary(2/2) R x S T(RxS) = T(R)*T(S) R S R  S = : T(R) * T(S) R  S key for R: maximum output size is T(S) R  S foreign key for R: T(S) R  S = {A}, neither key of R nor S T(R) * T(S) / V(S,A) T(R) * T(S) / V(R,A)

A Note on Histograms sA=val(R) = ? number of tuples in R with A value 40 30 number of tuples in R with A value in given range 20 10 10 20 30 40 sA=val(R) = ?

Summary Estimating size of results is an “art” Don’t forget: Statistics must be kept up to date… (cost?)