C. Faloutsos Query Optimization – part 2

C. Faloutsos Query Optimization – part 2
Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Query Optimization – part 2

General Overview - rel. model
Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing C. Faloutsos

Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos

Reminder – statistics:
… Sr #1 #2 #3 #nr for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes V(A,r): number of distinct values of attr. ‘A’ C. Faloutsos

Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) C. Faloutsos

Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? C. Faloutsos

Selections we saw simple predicates (A=constant; eg., ‘name=Smith’)
how about more complex predicates, like ‘salary > 10K’ ‘age = 30 and job-code=“analyst” ’ what is their selectivity? C. Faloutsos

Selections – complex predicates
selectivity sel(P) of predicate P : == fraction of tuples that qualify sel(P) = SC(P) / nr C. Faloutsos

eg., assume that V(grade, TAKES)=5 distinct values simple predicate P: A=constant sel(A=constant) = 1/V(A,r) eg., sel(grade=‘B’) = 1/5 (what if V(A,r) is unknown??) grade count A F C. Faloutsos

range query: sel( grade >= ‘C’) sel(A>a) = (Amax – a) / (Amax – Amin) grade count A F C. Faloutsos

Selections - complex predicates
negation: sel( grade != ‘C’) sel( not P) = 1 – sel(P) (Observation: selectivity =~ probability) grade count A F ‘P’ C. Faloutsos

conjunction: sel( grade = ‘C’ and course = ‘415’) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION P1 P2 C. Faloutsos

disjunction: sel( grade = ‘C’ or course = ‘415’) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again P1 P2 C. Faloutsos

disjunction: in general sel(P1 or P2 or … Pn) = 1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn)) P1 P2 C. Faloutsos

Selections – summary sel(A=constant) = 1/V(A,r)
sel( A>a) = (Amax – a) / (Amax – Amin) sel(not P) = 1 – sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) UNIFORMITY and INDEPENDENCE ASSUMPTIONS C. Faloutsos

C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos CMU

Sorting Assume br blocks of rel. ‘r’, and
C. Faloutsos Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost? ... 1 2 M br ‘r’ C. Faloutsos CMU

Sorting Q1: how to sort (‘external sorting’)? A1:
C. Faloutsos Sorting Q1: how to sort (‘external sorting’)? A1: create sorted runs of size M merge ... 1 2 M br ‘r’ C. Faloutsos CMU

Sorting create sorted runs of size M (how many?) merge them (how?) ...
C. Faloutsos Sorting create sorted runs of size M (how many?) merge them (how?) M ... ... C. Faloutsos CMU

Sorting create sorted runs of size M
C. Faloutsos Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ... M ….. ... ... C. Faloutsos CMU

Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br
C. Faloutsos Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br M ….. ... ... C. Faloutsos CMU

Sorting In short, excluding the final ‘write’, we need
C. Faloutsos Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br M ….. ... ... C. Faloutsos CMU

C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections, aggregations joins estimate cost; pick best C. Faloutsos CMU

Projection - dup. elimination
C. Faloutsos Projection - dup. elimination eg., select distinct c-id from TAKES How? Pros and cons? C. Faloutsos CMU

Set operations eg., How? Pros and cons? select * from REGULAR-STUDENT
C. Faloutsos Set operations eg., select * from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons? C. Faloutsos CMU

Aggregations eg., How? select ssn, avg(grade) from TAKES group by ssn
C. Faloutsos Aggregations eg., select ssn, avg(grade) from TAKES group by ssn How? C. Faloutsos CMU

C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins n-way joins estimate cost; pick best C. Faloutsos CMU

2-way joins output size estimation: r JOIN s nr, ns tuples each
case#1: cartesian product (R, S have no common attribute) #of output tuples=?? C. Faloutsos

2-way joins output size estimation: r JOIN s
case#2: r(A,B), s(A,C,D), A is cand. key for ‘r’ #of output tuples=?? <=ns r(A, ...) s(A, ) nr ns C. Faloutsos

2-way joins output size estimation: r JOIN s
case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=?? r(A, ...) s(A, ) nr ns C. Faloutsos

2-way joins #of output tuples~ nr * ns/V(A,s) or ns * nr/V(A,r)
(whichever is less) r(A, ...) s(A, ) nr ns C. Faloutsos

C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best C. Faloutsos CMU

2-way joins algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...)
C. Faloutsos

2-way joins Algorithm #0: (naive) nested loop (SLOW!)
for each tuple tr of r for each tuple ts of s print, if they match r(A, ...) s(A, ) nr ns C. Faloutsos

2-way joins Algorithm #0: why is it bad?
how many disk accesses (‘br’ and ‘bs’ are the number of blocks for ‘r’ and ‘s’)? br + nr*bs r(A, ...) s(A, ) nr ns C. Faloutsos

2-way joins Algorithm #1: Blocked nested-loop join
read in a block of r read in a block of s print matching tuples cost: br + br * bs r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

2-way joins Arithmetic example: nr = 10,000 tuples, br = 1,000 blocks
ns = 1,000 tuples, bs = 200 blocks alg#0: 2,001,000 d.a. alg#1: 201,000 d.a. r(A, ...) s(A, ) 10,000 1,000 1,000 records, 200 blocks C. Faloutsos

smallest relation in outer loop
2-way joins Observation1: Algo#1: asymmetric: cost: br + br * bs - reverse roles: cost= bs + bs*br Best choice? smallest relation in outer loop r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

2-way joins Observation2 [NOT IN BOOK]:
what if we have ‘k’ buffers available? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

2-way joins Cost? what if br=k-1?
read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples Cost? br + br/(k-1) * bs what if br=k-1? what if we assign k-1 blocks to inner?) r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

2-way joins Observation3: can we get rid of the ‘br’ term?
cost: br + br * bs A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each
s(A, ) nr ns C. Faloutsos

2-way joins - other algo’s
sort-merge sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted) r(A, ...) s(A, ) nr ns C. Faloutsos

sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order r(A, ...) s(A, ) nr ns C. Faloutsos

use an existing index, or even build one on the fly cost: br + nr * c (c: look-up cost) r(A, ...) s(A, ) ns nr C. Faloutsos

hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets r(A, ...) s(A, ) 1 max C. Faloutsos

2-way joins - hash join details
how to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i r(A, ...) Hr-0 Hs-0 s(A, ) 1 max C. Faloutsos

2-way joins - hash join details
what if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions) 3(br + bs) + 2* max C. Faloutsos

C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best C. Faloutsos CMU

n-way joins r1 JOIN r2 JOIN ... JOIN rn
typically, break problem into 2-way joins C. Faloutsos

Structure of query optimizers:
System R: break query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only one intermediate result at a time pros: smaller search space; pipelining cons: may miss optimal 2-way joins: NL and sort-merge r1 r2 r3 r4 C. Faloutsos

Structure of query optimizers:
More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book In general: q-opt is very important for large databases. (‘explain select <sql-statement>’ gives plan) C. Faloutsos

… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos

C. Faloutsos Query Optimization – part 2

Similar presentations

Presentation on theme: "C. Faloutsos Query Optimization – part 2"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

C. Faloutsos Query Optimization – part 2

Similar presentations

Presentation on theme: "C. Faloutsos Query Optimization – part 2"— Presentation transcript:

Similar presentations

About project

Feedback