Download presentation
Presentation is loading. Please wait.
1
C. Faloutsos Query Optimization – part 1
Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Query Optimization – part 1
2
General Overview - rel. model
Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing C. Faloutsos
3
Overview of a DBMS casual Naïve user user DBA DML parser DML
precomp. DDL parser trans. mgr buffer mgr catalog Data-files C. Faloutsos
4
Overview - detailed Why q-opt? Equivalence of expressions
C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies C. Faloutsos CMU
5
Why Q-opt? SQL: ~declarative good q-opt -> big difference
eg., seq. Scan vs B-tree index, on P=1,000 pages C. Faloutsos
6
Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best C. Faloutsos
7
Q-opt - example p s TAKES STUDENT select name from STUDENT, TAKES
where c-id=‘415’ and STUDENT.ssn=TAKES.ssn C. Faloutsos
8
Q-opt - example p STUDENT TAKES s p Canonical form s STUDENT TAKES
C. Faloutsos
9
Hash join; merge join; nested loops;
Q-opt - example STUDENT TAKES s p Hash join; merge join; nested loops; Index; seq scan C. Faloutsos
10
Overview - detailed Why q-opt? Equivalence of expressions
C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies C. Faloutsos CMU
11
Equivalence of expressions
A.k.a.: syntactic q-opt in short: perform selections and projections early More details: see transf. rules in text C. Faloutsos
12
Equivalence of expressions
Q: How to prove a transf. rule? A: use RTC, to show that LHS = RHS, eg: C. Faloutsos
13
Equivalence of expressions
C. Faloutsos
14
Equivalence of expressions
C. Faloutsos
15
Equivalence of expressions
Q: how to disprove a rule?? C. Faloutsos
16
Equivalence of expressions
Selections perform them early break a complex predicate, and push simplify a complex predicate (‘X=Y and Y=3’) -> ‘X=3 and Y=3’ C. Faloutsos
17
Equivalence of expressions
Projections perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are eliminated) project out all attributes except the ones requested or required (e.g., joining attr.) C. Faloutsos
18
Equivalence of expressions
Joins Commutative , associative Q: n-way join - how many diff. orderings? C. Faloutsos
19
Equivalence of expressions
Joins - Q: n-way join - how many diff. orderings? A: Catalan number ~ 4^n Exhaustive enumeration: too slow. C. Faloutsos
20
Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best C. Faloutsos
21
Cost estimation Eg., find ssn’s of students with an ‘A’ in 415 (using seq. scanning) How long will a query take? CPU (but: small cost; decreasing; tough to estimate) Disk (mainly, # block transfers) How many tuples will qualify? (what statistics do we need to keep?) C. Faloutsos
22
Cost estimation Statistics: for each relation ‘r’ we keep
… Sr #1 #2 #3 #nr Statistics: for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes C. Faloutsos
23
Cost estimation Statistics: for each relation ‘r’ we keep …
Sr Statistics: for each relation ‘r’ we keep … V(A,r): number of distinct values of attr. ‘A’ (recently, histograms, too) #1 #2 #3 … #nr C. Faloutsos
24
Derivable statistics fr: blocking factor = max# records/block (=?? )
… fr Sr #1 #2 #br fr: blocking factor = max# records/block (=?? ) br: # blocks (=?? ) SC(A,r) = selection cardinality = avg# of records with A=given (=?? ) C. Faloutsos
25
Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) C. Faloutsos
26
Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? C. Faloutsos
27
Additional quantities we need:
For index ‘i’: fi: average fanout (~50-100) HTi: # levels of index ‘i’ (~2-3) ~ log(#entries)/log(fi) LBi: # blocks at leaf level HTi C. Faloutsos
28
Statistics Where do we store them? How often do we update them?
C. Faloutsos
29
Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best C. Faloutsos
30
Cost estimation + plan generation
… fr Sr #1 #2 #br Selections – eg., select * from TAKES where grade = ‘A’ Plans? C. Faloutsos
31
Cost estimation + plan generation
… fr Sr #1 #2 #br Plans? seq. scan binary search (if sorted & consecutive) index search if an index exists C. Faloutsos
32
Cost estimation + plan generation
… fr Sr #1 #2 #br seq. scan – cost? br (worst case) br/2 (average, if we search for primary key) C. Faloutsos
33
Cost estimation + plan generation
… fr Sr #1 #2 #br binary search – cost? if sorted and consecutive: ~log(br) + SC(A,r)/fr (=blocks spanned by qual. tuples) C. Faloutsos
34
Cost estimation + plan generation
… fr Sr #1 #2 #br estimation of selection cardinalities SC(A,r): non-trivial – details later C. Faloutsos
35
Cost estimation + plan generation
… fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#1: primary key case#2: sec. key – clustering index case#3: sec. key – non-clust. index C. Faloutsos
36
Cost estimation + plan generation
… fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples .. case#1: primary key – cost: HTi + 1 HTi C. Faloutsos
37
Cost estimation + plan generation
Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr … HTi #br C. Faloutsos
38
Cost estimation + plan generation
… fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#3: sec. key – non-clust. index HTi + SC(A,r) (actually, pessimistic...) C. Faloutsos
39
Cost estimation – arithm. examples
find accounts with branch-name = ‘Perryridge’ account(branch-name, balance, ...) C. Faloutsos
40
Arithm. examples – cont’d
n-account = 10,000 tuples f-account = 20 tuples/block V(balance, account) = 500 distinct values V(branch-name, account) = 50 distinct values for branch-index: fanout fi = 20 C. Faloutsos
41
Arithm. examples Q1: cost of seq. scan? A1: 500 disk accesses
Q2: assume a clustering index on branch-name – cost? C. Faloutsos
42
Cost estimation + plan generation
Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr … HTi #br C. Faloutsos
43
Arithm. examples A2: HTi + SC(branch-name, account)/f-account HTi: 50 values, with index fanout 20 -> HT=2 levels (log(50)/log(20) = 1+) SC(..)= # qual records = nr/V(A,r) = 10,000/50 = 200 tuples spanning 200/20 blocks = 10 blocks C. Faloutsos
44
Arithm. examples A2 final answer: 2+10= 12 block accesses
(vs. 500 block accesses of seq. scan) footnote: in all fairness seq. disk accesses: ~2msec or less random disk accesses: ~10msec C. Faloutsos
45
Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best C. Faloutsos
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.