Presentation is loading. Please wait.

Presentation is loading. Please wait.

C. Faloutsos Query Optimization – part 1

Similar presentations


Presentation on theme: "C. Faloutsos Query Optimization – part 1"— Presentation transcript:

1 C. Faloutsos Query Optimization – part 1
Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Query Optimization – part 1

2 General Overview - rel. model
Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing C. Faloutsos

3 Overview of a DBMS casual Naïve user user DBA DML parser DML
precomp. DDL parser trans. mgr buffer mgr catalog Data-files C. Faloutsos

4 Overview - detailed Why q-opt? Equivalence of expressions
C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies C. Faloutsos CMU

5 Why Q-opt? SQL: ~declarative good q-opt -> big difference
eg., seq. Scan vs B-tree index, on P=1,000 pages C. Faloutsos

6 Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best C. Faloutsos

7 Q-opt - example p s TAKES STUDENT select name from STUDENT, TAKES
where c-id=‘415’ and STUDENT.ssn=TAKES.ssn C. Faloutsos

8 Q-opt - example p STUDENT TAKES s p Canonical form s STUDENT TAKES
C. Faloutsos

9 Hash join; merge join; nested loops;
Q-opt - example STUDENT TAKES s p Hash join; merge join; nested loops; Index; seq scan C. Faloutsos

10 Overview - detailed Why q-opt? Equivalence of expressions
C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies C. Faloutsos CMU

11 Equivalence of expressions
A.k.a.: syntactic q-opt in short: perform selections and projections early More details: see transf. rules in text C. Faloutsos

12 Equivalence of expressions
Q: How to prove a transf. rule? A: use RTC, to show that LHS = RHS, eg: C. Faloutsos

13 Equivalence of expressions
C. Faloutsos

14 Equivalence of expressions
C. Faloutsos

15 Equivalence of expressions
Q: how to disprove a rule?? C. Faloutsos

16 Equivalence of expressions
Selections perform them early break a complex predicate, and push simplify a complex predicate (‘X=Y and Y=3’) -> ‘X=3 and Y=3’ C. Faloutsos

17 Equivalence of expressions
Projections perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are eliminated) project out all attributes except the ones requested or required (e.g., joining attr.) C. Faloutsos

18 Equivalence of expressions
Joins Commutative , associative Q: n-way join - how many diff. orderings? C. Faloutsos

19 Equivalence of expressions
Joins - Q: n-way join - how many diff. orderings? A: Catalan number ~ 4^n Exhaustive enumeration: too slow. C. Faloutsos

20 Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best C. Faloutsos

21 Cost estimation Eg., find ssn’s of students with an ‘A’ in 415 (using seq. scanning) How long will a query take? CPU (but: small cost; decreasing; tough to estimate) Disk (mainly, # block transfers) How many tuples will qualify? (what statistics do we need to keep?) C. Faloutsos

22 Cost estimation Statistics: for each relation ‘r’ we keep
Sr #1 #2 #3 #nr Statistics: for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes C. Faloutsos

23 Cost estimation Statistics: for each relation ‘r’ we keep …
Sr Statistics: for each relation ‘r’ we keep V(A,r): number of distinct values of attr. ‘A’ (recently, histograms, too) #1 #2 #3 #nr C. Faloutsos

24 Derivable statistics fr: blocking factor = max# records/block (=?? )
fr Sr #1 #2 #br fr: blocking factor = max# records/block (=?? ) br: # blocks (=?? ) SC(A,r) = selection cardinality = avg# of records with A=given (=?? ) C. Faloutsos

25 Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) C. Faloutsos

26 Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? C. Faloutsos

27 Additional quantities we need:
For index ‘i’: fi: average fanout (~50-100) HTi: # levels of index ‘i’ (~2-3) ~ log(#entries)/log(fi) LBi: # blocks at leaf level HTi C. Faloutsos

28 Statistics Where do we store them? How often do we update them?
C. Faloutsos

29 Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best C. Faloutsos

30 Cost estimation + plan generation
fr Sr #1 #2 #br Selections – eg., select * from TAKES where grade = ‘A’ Plans? C. Faloutsos

31 Cost estimation + plan generation
fr Sr #1 #2 #br Plans? seq. scan binary search (if sorted & consecutive) index search if an index exists C. Faloutsos

32 Cost estimation + plan generation
fr Sr #1 #2 #br seq. scan – cost? br (worst case) br/2 (average, if we search for primary key) C. Faloutsos

33 Cost estimation + plan generation
fr Sr #1 #2 #br binary search – cost? if sorted and consecutive: ~log(br) + SC(A,r)/fr (=blocks spanned by qual. tuples) C. Faloutsos

34 Cost estimation + plan generation
fr Sr #1 #2 #br estimation of selection cardinalities SC(A,r): non-trivial – details later C. Faloutsos

35 Cost estimation + plan generation
fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#1: primary key case#2: sec. key – clustering index case#3: sec. key – non-clust. index C. Faloutsos

36 Cost estimation + plan generation
fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples .. case#1: primary key – cost: HTi + 1 HTi C. Faloutsos

37 Cost estimation + plan generation
Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr HTi #br C. Faloutsos

38 Cost estimation + plan generation
fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#3: sec. key – non-clust. index HTi + SC(A,r) (actually, pessimistic...) C. Faloutsos

39 Cost estimation – arithm. examples
find accounts with branch-name = ‘Perryridge’ account(branch-name, balance, ...) C. Faloutsos

40 Arithm. examples – cont’d
n-account = 10,000 tuples f-account = 20 tuples/block V(balance, account) = 500 distinct values V(branch-name, account) = 50 distinct values for branch-index: fanout fi = 20 C. Faloutsos

41 Arithm. examples Q1: cost of seq. scan? A1: 500 disk accesses
Q2: assume a clustering index on branch-name – cost? C. Faloutsos

42 Cost estimation + plan generation
Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr HTi #br C. Faloutsos

43 Arithm. examples A2: HTi + SC(branch-name, account)/f-account HTi: 50 values, with index fanout 20 -> HT=2 levels (log(50)/log(20) = 1+) SC(..)= # qual records = nr/V(A,r) = 10,000/50 = 200 tuples spanning 200/20 blocks = 10 blocks C. Faloutsos

44 Arithm. examples A2 final answer: 2+10= 12 block accesses
(vs. 500 block accesses of seq. scan) footnote: in all fairness seq. disk accesses: ~2msec or less random disk accesses: ~10msec C. Faloutsos

45 Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best C. Faloutsos


Download ppt "C. Faloutsos Query Optimization – part 1"

Similar presentations


Ads by Google