C. Faloutsos Query Optimization – part 1

C. Faloutsos Query Optimization – part 1
Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Query Optimization – part 1

General Overview - rel. model
Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing C. Faloutsos

Overview of a DBMS casual Naïve user user DBA DML parser DML
precomp. DDL parser trans. mgr buffer mgr catalog Data-files C. Faloutsos

Overview - detailed Why q-opt? Equivalence of expressions
C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies C. Faloutsos CMU

Why Q-opt? SQL: ~declarative good q-opt -> big difference
eg., seq. Scan vs B-tree index, on P=1,000 pages C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best C. Faloutsos

Q-opt - example p s TAKES STUDENT select name from STUDENT, TAKES
where c-id=‘415’ and STUDENT.ssn=TAKES.ssn C. Faloutsos

Q-opt - example p STUDENT TAKES s p Canonical form s STUDENT TAKES
C. Faloutsos

Hash join; merge join; nested loops;
Q-opt - example STUDENT TAKES s p Hash join; merge join; nested loops; Index; seq scan C. Faloutsos

Overview - detailed Why q-opt? Equivalence of expressions
C. Faloutsos Overview - detailed Why q-opt? Equivalence of expressions Cost estimation Cost of indices Join strategies C. Faloutsos CMU

Equivalence of expressions
A.k.a.: syntactic q-opt in short: perform selections and projections early More details: see transf. rules in text C. Faloutsos

Q: How to prove a transf. rule? A: use RTC, to show that LHS = RHS, eg: C. Faloutsos

C. Faloutsos

Q: how to disprove a rule?? C. Faloutsos

Selections perform them early break a complex predicate, and push simplify a complex predicate (‘X=Y and Y=3’) -> ‘X=3 and Y=3’ C. Faloutsos

Projections perform them early (but carefully…) Smaller tuples Fewer tuples (if duplicates are eliminated) project out all attributes except the ones requested or required (e.g., joining attr.) C. Faloutsos

Joins Commutative , associative Q: n-way join - how many diff. orderings? C. Faloutsos

Joins - Q: n-way join - how many diff. orderings? A: Catalan number ~ 4^n Exhaustive enumeration: too slow. C. Faloutsos

… into ‘canonical form’ (syntactic q-opt) generate alt. plans estimate cost; pick best C. Faloutsos

Cost estimation Eg., find ssn’s of students with an ‘A’ in 415 (using seq. scanning) How long will a query take? CPU (but: small cost; decreasing; tough to estimate) Disk (mainly, # block transfers) How many tuples will qualify? (what statistics do we need to keep?) C. Faloutsos

Cost estimation Statistics: for each relation ‘r’ we keep
… Sr #1 #2 #3 #nr Statistics: for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes C. Faloutsos

Cost estimation Statistics: for each relation ‘r’ we keep …
Sr Statistics: for each relation ‘r’ we keep … V(A,r): number of distinct values of attr. ‘A’ (recently, histograms, too) #1 #2 #3 … #nr C. Faloutsos

Derivable statistics fr: blocking factor = max# records/block (=?? )
… fr Sr #1 #2 #br fr: blocking factor = max# records/block (=?? ) br: # blocks (=?? ) SC(A,r) = selection cardinality = avg# of records with A=given (=?? ) C. Faloutsos

Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) C. Faloutsos

Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? C. Faloutsos

Additional quantities we need:
For index ‘i’: fi: average fanout (~50-100) HTi: # levels of index ‘i’ (~2-3) ~ log(#entries)/log(fi) LBi: # blocks at leaf level HTi C. Faloutsos

Statistics Where do we store them? How often do we update them?
C. Faloutsos

… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best C. Faloutsos

Cost estimation + plan generation
… fr Sr #1 #2 #br Selections – eg., select * from TAKES where grade = ‘A’ Plans? C. Faloutsos

… fr Sr #1 #2 #br Plans? seq. scan binary search (if sorted & consecutive) index search if an index exists C. Faloutsos

… fr Sr #1 #2 #br seq. scan – cost? br (worst case) br/2 (average, if we search for primary key) C. Faloutsos

… fr Sr #1 #2 #br binary search – cost? if sorted and consecutive: ~log(br) + SC(A,r)/fr (=blocks spanned by qual. tuples) C. Faloutsos

… fr Sr #1 #2 #br estimation of selection cardinalities SC(A,r): non-trivial – details later C. Faloutsos

… fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#1: primary key case#2: sec. key – clustering index case#3: sec. key – non-clust. index C. Faloutsos

… fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples .. case#1: primary key – cost: HTi + 1 HTi C. Faloutsos

Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr … HTi #br C. Faloutsos

… fr Sr #1 #2 #br method#3: index – cost? levels of index + blocks w/ qual. tuples ... case#3: sec. key – non-clust. index HTi + SC(A,r) (actually, pessimistic...) C. Faloutsos

Cost estimation – arithm. examples
find accounts with branch-name = ‘Perryridge’ account(branch-name, balance, ...) C. Faloutsos

Arithm. examples – cont’d
n-account = 10,000 tuples f-account = 20 tuples/block V(balance, account) = 500 distinct values V(branch-name, account) = 50 distinct values for branch-index: fanout fi = 20 C. Faloutsos

Arithm. examples Q1: cost of seq. scan? A1: 500 disk accesses
Q2: assume a clustering index on branch-name – cost? C. Faloutsos

Sr method#3: index – cost? levels of index + blocks w/ qual. tuples #1 fr #2 case#2: sec. key – clustering index HTi + SC(A,r)/fr … HTi #br C. Faloutsos

Arithm. examples A2: HTi + SC(branch-name, account)/f-account HTi: 50 values, with index fanout 20 -> HT=2 levels (log(50)/log(20) = 1+) SC(..)= # qual records = nr/V(A,r) = 10,000/50 = 200 tuples spanning 200/20 blocks = 10 blocks C. Faloutsos

Arithm. examples A2 final answer: 2+10= 12 block accesses
(vs. 500 block accesses of seq. scan) footnote: in all fairness seq. disk accesses: ~2msec or less random disk accesses: ~10msec C. Faloutsos

… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections joins estimate cost; pick best C. Faloutsos

C. Faloutsos Query Optimization – part 1

Similar presentations

Presentation on theme: "C. Faloutsos Query Optimization – part 1"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C. Faloutsos Query Optimization – part 1

Similar presentations

Presentation on theme: "C. Faloutsos Query Optimization – part 1"— Presentation transcript:

Similar presentations

About project

Feedback