Lecture 28: Size/Cost Estimation, Recovery

Slides:



Advertisements
Similar presentations
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinVinayan Verenkar Computer Science Dept San Jose State University.
Advertisements

CS4432: Database Systems II
1 Lecture 23: Query Execution Friday, March 4, 2005.
Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.
Lecture 9 Query Optimization November 24, 2010 Dan Suciu -- CSEP544 Fall
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
Lecture 24: Query Execution Monday, November 20, 2000.
CS CS4432: Database Systems II Query Processing – Size Estimation.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
1 Lecture 22: Query Execution Wednesday, March 2, 2005.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CS411 Database Systems Kazuhiro Minami 12: Query Optimization.
Query Optimization March 10 th, Very Big Picture A query execution plan is a program. There are many of them. The optimizer is trying to chose a.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
1 Final Review Tuesday, March 6, The Final Date: Tuesday, March 13, 2007 Time: 6:30 - 8:30 Room: EE 037 You must come to campus Open book exam.
CS 4432estimation - lecture 161 CS4432: Database Systems II Lecture #16 Query Processing : Estimating Sizes of Results Professor Elke A. Rundensteiner.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
Lecture 17: Query Execution Tuesday, February 28, 2001.
1 Lecture 15 Monday, May 20, 2002 Size Estimation, XML Processing.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
Tallahassee, Florida, 2016 COP5725 Advanced Database Systems Query Optimization Spring 2016.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
1/14/2005Yan Huang - CSCI5330 Database Implementation – Query Optimization Query Optimization.
Module 2: Intro to Relational Model
Database Management System
Lecture 26: Query Optimizations and Cost Estimation
Lecture 27: Size/Cost Estimation
Database System Implementation CSE 507
Anthony Okorodudu CSE ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan.
Chapter 12: Query Processing
Chapter 2: Intro to Relational Model
Database Management Systems (CS 564)
Introduction to Database Systems CSE 444 Lecture 22: Query Optimization November 26-30, 2007.
Lecture 26: Query Optimization
Query processing and optimization
Introduction to Database Systems CSE 444 Lecture 23: Final Review
Outline - Query Processing
Lecture 28 Friday, December 7, 2001.
Database management concepts
Lecture 27: Optimizations
Lecture 30: Final Review Wednesday, December 6, 2000.
Relational Algebra Friday, 11/14/2003.
Lecture 24: Query Execution
Query Execution Index Based Algorithms (15.6)
Example of a Relation attributes (or columns) tuples (or rows)
Lecture 23: Query Execution
CPSC-608 Database Systems
CPSC-608 Database Systems
Chapter 2: Intro to Relational Model
Lecture 5- Query Optimization (continued)
Lecture 22: Query Execution
Lecture 24: Final Review Friday, March 10, 2006.
Lecture 22: Query Execution
= x 2 = = 20 4 x 5 = = 16 4 x 4 = = 18 6 x 3 = = 12 2 x 6 = 12.
Lecture 11: B+ Trees and Query Execution
Outline - Query Processing
CSE 544: Optimizations Wednesday, 5/10/2006.
CPSC-608 Database Systems
Lecture 26: Wednesday, December 4, 2002.
Lecture 27 Wednesday, December 5, 2001.
Lecture 22: Friday, November 22, 2002.
Lecture 24: Query Execution
Lecture 29: Final Review Wednesday, December 11, 2002.
Lecture 20: Query Execution
Lecture 2 Relational Database
Presentation transcript:

Lecture 28: Size/Cost Estimation, Recovery Monday, December 6, 2004

Outline Cost estimation: 16.4 Recovery using undo logging 17.2

Size Estimation The problem: Given an expression E, compute T(E) and V(E, A) This is hard without computing E Will ‘estimate’ them instead

Size Estimation Estimating the size of a projection Easy: T(PL(R)) = T(R) This is because a projection doesn’t eliminate duplicates

Size Estimation Estimating the size of a selection S = sA=c(R) T(S) san be anything from 0 to T(R) – V(R,A) + 1 Estimate: T(S) = T(R)/V(R,A) When V(R,A) is not available, estimate T(S) = T(R)/10 S = sA<c(R) T(S) can be anything from 0 to T(R) Estimate: T(S) = (c - Low(R, A))/(High(R,A) - Low(R,A)) When Low, High unavailable, estimate T(S) = T(R)/3

Size Estimation Estimating the size of a natural join, R ||A S When the set of A values are disjoint, then T(R ||A S) = 0 When A is a key in S and a foreign key in R, then T(R ||A S) = T(R) When A has a unique value, the same in R and S, then T(R ||A S) = T(R) T(S)

Size Estimation Assumptions: Containment of values: if V(R,A) <= V(S,A), then the set of A values of R is included in the set of A values of S Note: this indeed holds when A is a foreign key in R, and a key in S Preservation of values: for any other attribute B, V(R || A S, B) = V(R, B) (or V(S, B))

Size Estimation Assume V(R,A) <= V(S,A) Then each tuple t in R joins some tuple(s) in S How many ? On average T(S)/V(S,A) t will contribute T(S)/V(S,A) tuples in R ||A S Hence T(R ||A S) = T(R) T(S) / V(S,A) In general: T(R ||A S) = T(R) T(S) / max(V(R,A),V(S,A))

Size Estimation Example: T(R) = 10000, T(S) = 20000 V(R,A) = 100, V(S,A) = 200 How large is R || A S ? Answer: T(R ||A S) = 10000 20000/200 = 1M

Size Estimation Joins on more than one attribute: T(R ||A,B S) = T(R) T(S)/(max(V(R,A),V(S,A))*max(V(R,B),V(S,B)))

Histograms Statistics on data maintained by the RDBMS Makes size estimation much more accurate (hence, cost estimations are more accurate)

Histograms Employee(ssn, name, salary, phone) Maintain a histogram on salary: T(Employee) = 25000, but now we know the distribution Salary: 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k Tuples 200 800 5000 12000 6500 500

Histograms Ranks(rankName, salary) Estimate the size of Employee || Salary Ranks Employee 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k 200 800 5000 12000 6500 500 Ranks 0..20k 20k..40k 40k..60k 60k..80k 80k..100k > 100k 8 20 40 80 100 2

Histograms Eqwidth Eqdepth 0..20 20..40 40..60 60..80 80..100 2 104 9739 152 3 0..44 44..48 48..50 50..56 55..100 2000

Histograms Assume: Main property: V(Employee, Salary) = 200 V(Ranks, Salary) = 250 Main property: Employee ⋈Salary Ranks=Employee1 ⋈Salary Ranks1’  ...  Emplyee6 ⋈Salary Ranks6’ A tuple t in Employee1 joins with how many tuples in Ranks1’ ? Answer: with T(Rank1)/T(Rank) * T(Rank)/250 = T1 ‘/ 250 Then T(Employee ⋈Salary Ranks) = = Si=1,6 Ti Ti’ / 250 = (200x8 + 800x20 + 5000x40 + 12000x80 + 6500x100 + 500x2)/250 = ….