CS CS4432: Database Systems II Logical Plan Rewriting
CS 4432query processing2 parse convert apply laws estimate result sizes consider physical plans estimate costs pick best execute {P1,P2,…..} {(P1,C1),(P2,C2)...} Pi answer SQL query parse tree logical query plan “improved” l.q.p l.q.p. +sizes statistics
CS Query in SQL Query Plan in Algebra (logical) Other Query Plan in Algebra (logical)
CS Query plan 1 (in relational algebra) B,D R.A =“c” S.E=2 R.C=S.C X RS
CS Query plan 2 (in relational algebra) B,D R.A = “c” S.E = 2 R S natural join on R.C=S.C
CS Relational algebra optimization What are transformation rules ? –preserve equivalence What are good transformations? –reduce query execution costs
CS Rules: Natural join rewriting. R S=SR (R S) T= R (S T) R SS T T R Can also write as trees, e.g.:
CS Rules: Other binary operators ? R S=SR (R S) T= R (S T) What about : Cross product? Condition join? Union? Intersection ? Difference ?
CS Note: T R R SS T
CS R x S = S x R (R x S) x T = R x (S x T) R U S = S U R R U (S U T) = (R U S) U T Rules: Natural joins & cross products & union R S=SR (R S) T= R (S T)
CS Rules: Selects p1 p2 (R)= p1 [ p2 (R)] [ p1 (R)] U [ p2 (R)] p1vp2 (R) =
CS Bags vs. Sets R = {a,a,b,b,b,c} S = {b,b,c,c,d} What about union R U S = ? Option 1 SUM R U S = {a,a,b,b,b,b,b,c,c,c,d} Option 2 MAX R U S = {a,a,b,b,b,c,c,d}
CS Which option makes this rule work ? p1vp2 (R) = p1 (R) U p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c p1vp2 (R) = {a,a,b,b,b,c} p1 (R) = {a,a,b,b,b} p2 (R) = {b,b,b,c} p1 (R) U p2 (R) = {a,a,b,b,b,c} Let us try MAX():
CS Which option makes this rule work ? p1vp2 (R) = p1 (R) U p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c p1vp2 (R) = {a,a,b,b,b,c} p1 (R) = {a,a,b,b,b} p2 (R) = {b,b,b,c} p1 (R) U p2 (R) = {a,a,b,b,b,b,b,b,c} What about Sum()?
CS Which option makes this rule work ? p1 p2 (R)= p1 [ p2 (R)] Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c What about MAX versus SUM ?
CS Option 2 (MAX) makes this rule work: p1vp2 (R) = p1 (R) U p2 (R) Example: R={a,a,b,b,b,c} P1 satisfied by a,b; P2 satisfied by b,c p1vp2 (R) = {a,a,b,b,b,c} p1 (R) = {a,a,b,b,b} p2 (R) = {b,b,b,c} p1 (R) U p2 (R) = {a,a,b,b,b,c}
CS Yet another example ! Senators (……)Reps (……) T1 = yr,state Senators; T2 = yr,state Reps T1 Yr State T2 Yr State 97 CA 99 CA 99 CA 99 CA 98 AZ 98 CA Union? “Sum” option makes more sense!
CS Executive Decision -> Use “SUM” option for bag unions -> CAREFUL ! Some rules cannot be used for bags
CS Rules: Project Let: X = set of attributes Y = set of attributes XY = X U Y xy (R) = x [ y (R)]
CS Let p = predicate with only R attributes q = predicate with only S attributes m = predicate with both R and S attribs p (R S) = q (R S) = Rules: combined [ p (R)] S R [ q (S)]
CS p q (R S) = ? Rules: combined Rule can be derived !
CS Derivation for rule : p q (R S) = p [ q (R S) ] = p [ R q (S) ] = [ p (R)] [ q (S)]
CS More Rules can be Derived: p q (R S) = p q m (R S) = pvq (R S) = Rules: combined (continued)
CS We did one, do others on your own : p q (R S) = [ p (R)] [ q (S)] p q m (R S) = m [ ( p R) ( q S) ] pvq (R S) = [ ( p R) S ] U [ R ( q S) ]
CS Rules: combined Let x = subset of R attributes z = attributes in predicate P (subset of R attributes) x [ p ( R ) ] = { p [ x ( R ) ] } x x xz
CS Rules: combined Let x = subset of R attributes y = subset of S attributes z = intersection of R,S attributes xy (R S) = xy { [ xz ( R ) ] [ yz ( S ) ] }
CS xy { p (R S) } = xy { p [ xz’ (R) yz’ (S)] } z’ = z U { attributes used in P }
CS p (R U S) = p (R) U p (S) p (R - S) = p (R) - S = p (R) - p (S) Rules U combined:
CS Which are “good” transformations?
CS Conventional wisdom: do projects early Example: relation R(A,B,C,D,E) predicate P: (A=3) (B=“cat”) E { p (R)} vs. E { p { ABE (R)} }
CS What if we have A, B indexes? B = “cat” A=3 Intersect pointers to get pointers to matching tuples! But Then better to do projection later !
CS p1 p2 (R) p1 [ p2 (R)] p (R S) [ p (R)] S R S S R x [ p (R)] x { p [ xz (R)] } Which are “good” transformations?
CS Bottom line: Some heuristics : –Early selection is usually good No transformation is always good Rule application defines a search space –Need cost criteria to make decision
CS In textbook: more transformations Chapter 16.2, More rewrite rules Other operations, such as, duplicate elimination, etc.