Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

CS CS4432: Database Systems II Logical Plan Rewriting.
Database Modifications CIS 4301 Lecture Notes Lecture /30/2006.
Completing the Physical-Query-Plan. Query compiler so far Parsed the query. Converted it to an initial logical query plan. Improved that logical query.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Notions of clustering Clustered relation: tuples are stored in blocks mostly devoted to that relation. Clustering index: tuples (of the relation) with.
Query Compiler. The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical.
Cost-Based Transformations. Why estimate costs? Well, sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
COMP 451/651 Optimizing Performance
The Query Compiler Parses SQL query into parse tree Transforms parse tree into expression tree (logical query plan) Transforms logical query plan into.
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 183 Database Systems II Query Compiler.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Query Compiler By:Payal Gupta Roll No:106(225) Professor :Tsau Young Lin.
16.2.Algebraic Laws for Improving Query Plans Algebraic Laws for Improving Query Plans Commutative and Associative Laws Laws Involving.
CS 4432query processing - lecture 141 CS4432: Database Systems II Lecture #14 Query Processing – Size Estimation Professor Elke A. Rundensteiner.
Notions of clustering Clustered file: e.g. store movie tuples together with the corresponding studio tuple. Clustered relation: tuples are stored in blocks.
Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. How can we estimate.
CS CS4432: Database Systems II Query Processing – Size Estimation.
Relational Algebra on Bags A bag is like a set, but an element may appear more than once. –Multiset is another name for “bag.” Example: {1,2,1,3} is a.
The Query Compiler Section 16.3 DATABASE SYSTEMS – The Complete Book Presented By:Under the supervision of: Deepti KunduDr. T.Y.Lin.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Estimating the Cost of Operations. From l.q.p. to p.q.p Having parsed a query and transformed it into a logical query plan, we must turn the logical plan.
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
CS Spring 2002Notes 61 CS 277: Database System Implementation Notes 6: Query Processing Arthur Keller.
16.2.Algebraic Laws for Improving Query Plans Algebraic Laws for Improving Query Plans Commutative and Associative Laws Laws Involving.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Transformations. Why estimate costs? Sometimes we don’t need cost estimations to decide applying some heuristic transformation. –E.g. Pushing.
CS 4432query processing1 CS4432: Database Systems II.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Algebraic Laws. {P1,P2,…..} {P1,C1>...} parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute Pi answer.
16.2.Algebraic Laws for Improving Query Plans Algebraic Laws for Improving Query Plans Commutative and Associative Laws Laws Involving.
CS 4432query processing - lecture 121 CS4432: Database Systems II Lecture #12 Query Processing Professor Elke A. Rundensteiner.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
CS 255: Database System Principles slides: From Parse Trees to Logical Query Plans By:- Arunesh Joshi Id:
Cost based transformations Initial logical query plan Two candidates for the best logical query plan.
Chapter 7 The Query Compiler Query Processor : Query Parser Tree Logical Query Plan Physical Query Plan Query Structure Relational Algebraic Expression.
EN : Adv. Storage and TP Systems Cost-Based Query Optimization.
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
DBMS 2001Notes 6: Query Compilation1 Principles of Database Management Systems 6: Query Compilation and Optimization Pekka Kilpeläinen (partially based.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
SCUHolliday - COEN 17814–1 Schedule Today: u Query Processing overview.
Chapters 15-16a1 (Slides by Hector Garcia-Molina, Chapters 15 and 16: Query Processing.
Estimating the Cost of Operations. Suppose we have parsed a query and transformed it into a logical query plan (lqp) Also suppose all possible transformations.
More Relation Operations 2014, Fall Pusan National University Ki-Joune Li.
CS4432: Database Systems II Query Processing- Part 2.
CS 245Notes 61 CS 245: Database System Principles Notes 6: Query Processing Hector Garcia-Molina.
CSCE Database Systems Chapter 15: Query Execution 1.
Lecture 17: Query Execution Tuesday, February 28, 2001.
1 Lecture 23: Query Execution Monday, November 26, 2001.
CS4432: Database Systems II Query Processing- Part 1 1.
1 Ullman et al. : Database System Principles Notes 6: Query Processing.
16.4 Estimating the Cost of Operations Project GuidePrepared By Dr. T. Y. LinAkshay Shenoy Computer Science Dept San Jose State University.
Database Management System
16.2.Algebraic Laws for Improving Query Plans
Lecture 26: Query Optimizations and Cost Estimation
CS 245: Database System Principles
Lecture 26: Query Optimization
Focus: Relational System
16.2.Algebraic Laws for Improving Query Plans
CS 245: Database System Principles
Algebraic Laws.
Query Execution Index Based Algorithms (15.6)
CPSC-608 Database Systems
Lecture 11: B+ Trees and Query Execution
Query Compiler By:Payal Gupta Shirali Choksi Professor :Tsau Young Lin.
CPSC-608 Database Systems
Presentation transcript:

Algebraic Laws For the binary operators, we push the selection only if all attributes in the condition C are in R.

Example: Consider relation schemas R(A,B) and S(B,C) and the expression below:  (A=1 OR A=3) AND B<C (R  S) 1.Splitting AND  A=1 OR A=3 (  B < C (R  S)) 2.Push  to S  A=1 OR A=3 (R   B < C (S)) 3.Push  to R  A=1 OR A=3 (R)   B < C (S)

Pushing selections Usually selections are pushed down the expression tree. The following example shows that it is sometimes useful to pull selection up in the tree. StarsIn(title,year,starName) Movie(title,year,length,studioName) CREATE VIEW MoviesOf1996 AS SELECT * FROM MOVIE WHERE year=1996; Query: Which stars worked for which studios in 1996? SELECT starName,studioName FROM MoviesOf1996 NATURAL JOIN StarsIN;

pull selection up then push down

Laws for (bag) Projection A simple law: Project out attributes that are not needed later. –I.e. keep only the input attr. and any join attribute.

Examples for pushing projection Schema R(a,b,c), S(c,d,e)

Example: Pushing Projection Schema : StarsIn(title,year,starName) Query: SELECT starName FROM StarsIn WHERE year = 1996; Should we transform to  ? Depends! Is StarsIn stored or computed ?  starName  year=1996 StarsIn  starName  year=1996 StarsIn  starName,year

Reasons for not pushing the projection If StarsIn is stored, the for the projection we have to scan the relation. If the relation is pipelined from some previous computation, then yes, we better do the projection (on the fly). Also, if for example there is an index on year for StarsIn, such index is useless in the projected relation  starName,year (StarsIn) –While such an index is very useful for the selection on “year=1996”

Laws for duplicate elimination and grouping Try to move  in a position where it can be eliminated altogether E.g. when  is applied on A stored relation with a declared primary key A relation that is the result of a  operation, since grouping creates a relation with no duplicates.  absorbs  Also: What’s M?

Improving logical query plans Push  as far down as possible (sometimes pull them up first). Do splitting of complex conditions in  in order to push  even further. Push  as far down as possible, introduce new early  (but take care for exceptions) Combine  with  to produce  - joins or equi-joins Choose an order for joins

Example of improvement SELECT title FROM StarsIn, MovieStar WHERE starName = name AND birthdate LIKE ‘%1960’;  title  starname=name AND birthdate LIKE ‘%1960’ StarsIn  MovieStar  title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’

 title StarsIn MovieStar starName=name  birthdate LIKE ‘%1960’  name And a better plan introducing a projection to filter out useless attributes:

Estimating the Cost of Operations We don’t want to execute the query in order to learn the costs. So, we need to estimate the costs. What’s the cost? The number of I/O’s to needed to manage the intermediate relations. This number will be a function of the size of intermediate relations, –i.e. number of their tuples times the number of bytes per tuple How can we estimate the number of tuples in an intermediate relation? Rules about estimation formulas: 1.Give (somehow) accurate estimates 2.Easy to compute

Projection Projection  retains duplicates, so the number of tuples in the result is the same as in the input. Result tuples are usually shorter than the input tuples. The size of a projection is the only one we can compute exactly.

Selection Let S =  A=c (R) We can estimate T(S) = T(R) / V(R,A) Let S =  A<c (R) On average, T(S) would be T(R)/2, but more properly: T(R)/3 Let S =  A  c (R), Then, an estimate is: T(S) = T(R)*[(V(R,A)-1)/V(R,A)], or simply T(S) = T(R)

Selection... Let S =  C AND D (R) =  C (  D (R)) and U =  D (R). First estimate T(U) and then use this to estimate T(S). Example S =  a=10 AND b<20 (R) T(R) = 10,000, V(R,a) = 50 T(S) = (1/50)* (1/3) * T(R) = 67 Note: Watch for selections like:  a=10 AND a>20 (R)

Selection... Let S =  C OR D (R). Simple estimate: T(S) = T(  C (R)) + T(  D (R)). Problem: It is possible that T(S)  T(R)! A more accurate estimate Let: –T(R)=n, –m 1 = size of selection on C, and –m 2 = size of selection on D. Then T(S) = n(1-(1-m 1 /n)(1-m 2 /n)) Why? Example: S =  a=10 OR b<20 (R). T(R) = 10,000, V(R,a) =50 Simple estimation: T(S) = 3533 More accurate: T(S) = 3466

Natural Join R(X,Y)  S(Y,Z) Anything could happen! No tuples join: T(R  S) = 0 Y is the key in S and a foreign key in R (i.e., R.Y refers to S.Y): Then, T(R  S) = T(R) All tuples join: i.e. R.Y=S.Y = a. Then, T(R  S) = T(R)*T(S)

Two Assumptions Containment of value sets If V(R,Y) ≤ V(S,Y), then every Y-value in R is assumed to occur as a Y-value in S When such thing can happen? For example when: Y is foreign key in R, and key in S Preservation of set values If A is an attribute of R but not S, then is assumed that V(R  S, A)=V(R, A) This may be violated when there are dangling tuples in R There is no violation when: Y is foreign key in R, and key in S

Natural Join size estimation Let, R(X,Y) and S(Y,Z), where Y is a single attribute. What’s the size of T(R  S)? Let r be a tuple in R and s be a tuple in S. What’s the probability that r and s join? Suppose V(R,Y)  V(S,Y) By the containment of set values we infer that: Every Y’s value in R appears in S. So, the tuple r of R surely is going match with some tuples of S, but what’s the probability it matches with s? It’s 1/V(S,Y). Hence, T(R  S) = T(R)*T(S)/V(S,Y) –When V(R,Y)  V(S,Y) By a similar reasoning, for the case when V(S,Y)  V(R,Y), we get T(R  S) = T(R)*T(S)/V(S,Y). So, sumarizing we have as an estimate: T(R  S) = T(R)*T(S)/max{V(R,Y),V(S,Y)}

Remember: T(R  S) = T(R)*T(S)/max{V(R,Y),V(S,Y)} Example: R(a,b), T(R)=1000, V(R,b)=20 S(b,c), T(S)=2000, V(S,b)=50, V(S,c)=100 U(c,d), T(U)=5000, V(U,c)=500 Estimate the size of R  S  U T(R  S) = 40,000, T((R  S)  U) = 400,000 T(S  U) = 20,000, T(R  (S  U)) = 400,000 The equality of results is not a coincidence. Note 1: estimate of final result should not depend on the evaluation order Note 2: intermediate results could be of different sizes

Natural join with multiple join attrib. R(x,y 1,y 2 )  S(y 1,y 2,z) T(R  S) = T(R)*T(S)/m 1 *m 2, where m 1 = max{V(R,y 1 ),V(S,y 1 )} m 2 = max{V(R,y 2 ),V(S,y 2 )} Why? Let r be a tuple in R and s be a tuple in S. What’s the probability that r and s agree on y 1 ? From the previous reasoning, it’s 1/max{V(R,y 1 ),V(S,y 1 )} Similarly, what’s the probability that r and s agree on y 2 ? It’s 1/max{V(R,y 2 ),V(S,y 2 )} Assuming that aggrements on y1 and y2 are independent we estimate: T(R  S) = T(R)*T(S)/[max{V(R,y 1 ),V(S,y 1 )} * max{V(R,y 2 ),V(S,y 2 )}] Example: T(R)=1000, V(R,b)=20, V(R,c)=100 T(S)=2000, V(S,d)=50, V(S,e)=50 R(a,b,c)  R.b=S.d AND R.c=S.e S(d,e,f) T(R  S) = (1000*2000)/(50*100)=400

Another example: (one of the previous) R(a,b), T(R)=1000, V(R,b)=20 S(b,c), T(S)=2000, V(S,b)=50, V(S,c)=100 U(c,d), T(U)=5000, V(U,c)=500 Estimate the size of R  S  U Observe that R  S  U = (R  U)  S T(R  U) = 1000*5000 = 5,000,000 Note that the number of b’s in the product is 20 (V(R,b)), and the number of c’s is 500 (V(U,c)). T((R  U)  S) = 5,000,000 * 2000 / (50 * 500) = 400,000