Download presentation
Presentation is loading. Please wait.
1
C. Faloutsos Query Optimization – part 2
Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Query Optimization – part 2
2
General Overview - rel. model
Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing C. Faloutsos
3
Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos
4
Reminder – statistics:
… Sr #1 #2 #3 #nr for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes V(A,r): number of distinct values of attr. ‘A’ C. Faloutsos
5
Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) C. Faloutsos
6
Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? C. Faloutsos
7
Selections we saw simple predicates (A=constant; eg., ‘name=Smith’)
how about more complex predicates, like ‘salary > 10K’ ‘age = 30 and job-code=“analyst” ’ what is their selectivity? C. Faloutsos
8
Selections – complex predicates
selectivity sel(P) of predicate P : == fraction of tuples that qualify sel(P) = SC(P) / nr C. Faloutsos
9
Selections – complex predicates
eg., assume that V(grade, TAKES)=5 distinct values simple predicate P: A=constant sel(A=constant) = 1/V(A,r) eg., sel(grade=‘B’) = 1/5 (what if V(A,r) is unknown??) grade count A F C. Faloutsos
10
Selections – complex predicates
range query: sel( grade >= ‘C’) sel(A>a) = (Amax – a) / (Amax – Amin) grade count A F C. Faloutsos
11
Selections - complex predicates
negation: sel( grade != ‘C’) sel( not P) = 1 – sel(P) (Observation: selectivity =~ probability) grade count A F ‘P’ C. Faloutsos
12
Selections – complex predicates
conjunction: sel( grade = ‘C’ and course = ‘415’) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION P1 P2 C. Faloutsos
13
Selections – complex predicates
disjunction: sel( grade = ‘C’ or course = ‘415’) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again P1 P2 C. Faloutsos
14
Selections – complex predicates
disjunction: in general sel(P1 or P2 or … Pn) = 1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn)) P1 P2 C. Faloutsos
15
Selections – summary sel(A=constant) = 1/V(A,r)
sel( A>a) = (Amax – a) / (Amax – Amin) sel(not P) = 1 – sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) UNIFORMITY and INDEPENDENCE ASSUMPTIONS C. Faloutsos
16
Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos CMU
17
Sorting Assume br blocks of rel. ‘r’, and
C. Faloutsos Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost? ... 1 2 M br ‘r’ C. Faloutsos CMU
18
Sorting Q1: how to sort (‘external sorting’)? A1:
C. Faloutsos Sorting Q1: how to sort (‘external sorting’)? A1: create sorted runs of size M merge ... 1 2 M br ‘r’ C. Faloutsos CMU
19
Sorting create sorted runs of size M (how many?) merge them (how?) ...
C. Faloutsos Sorting create sorted runs of size M (how many?) merge them (how?) M ... ... C. Faloutsos CMU
20
Sorting create sorted runs of size M
C. Faloutsos Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ... M ….. ... ... C. Faloutsos CMU
21
Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br
C. Faloutsos Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br M ….. ... ... C. Faloutsos CMU
22
Sorting In short, excluding the final ‘write’, we need
C. Faloutsos Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br M ….. ... ... C. Faloutsos CMU
23
Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections, aggregations joins estimate cost; pick best C. Faloutsos CMU
24
Projection - dup. elimination
C. Faloutsos Projection - dup. elimination eg., select distinct c-id from TAKES How? Pros and cons? C. Faloutsos CMU
25
Set operations eg., How? Pros and cons? select * from REGULAR-STUDENT
C. Faloutsos Set operations eg., select * from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons? C. Faloutsos CMU
26
Aggregations eg., How? select ssn, avg(grade) from TAKES group by ssn
C. Faloutsos Aggregations eg., select ssn, avg(grade) from TAKES group by ssn How? C. Faloutsos CMU
27
Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins n-way joins estimate cost; pick best C. Faloutsos CMU
28
2-way joins output size estimation: r JOIN s nr, ns tuples each
case#1: cartesian product (R, S have no common attribute) #of output tuples=?? C. Faloutsos
29
2-way joins output size estimation: r JOIN s
case#2: r(A,B), s(A,C,D), A is cand. key for ‘r’ #of output tuples=?? <=ns r(A, ...) s(A, ) nr ns C. Faloutsos
30
2-way joins output size estimation: r JOIN s
case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=?? r(A, ...) s(A, ) nr ns C. Faloutsos
31
2-way joins #of output tuples~ nr * ns/V(A,s) or ns * nr/V(A,r)
(whichever is less) r(A, ...) s(A, ) nr ns C. Faloutsos
32
Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best C. Faloutsos CMU
33
2-way joins algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...)
C. Faloutsos
34
2-way joins Algorithm #0: (naive) nested loop (SLOW!)
for each tuple tr of r for each tuple ts of s print, if they match r(A, ...) s(A, ) nr ns C. Faloutsos
35
2-way joins Algorithm #0: why is it bad?
how many disk accesses (‘br’ and ‘bs’ are the number of blocks for ‘r’ and ‘s’)? br + nr*bs r(A, ...) s(A, ) nr ns C. Faloutsos
36
2-way joins Algorithm #1: Blocked nested-loop join
read in a block of r read in a block of s print matching tuples cost: br + br * bs r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos
37
2-way joins Arithmetic example: nr = 10,000 tuples, br = 1,000 blocks
ns = 1,000 tuples, bs = 200 blocks alg#0: 2,001,000 d.a. alg#1: 201,000 d.a. r(A, ...) s(A, ) 10,000 1,000 1,000 records, 200 blocks C. Faloutsos
38
smallest relation in outer loop
2-way joins Observation1: Algo#1: asymmetric: cost: br + br * bs - reverse roles: cost= bs + bs*br Best choice? smallest relation in outer loop r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos
39
2-way joins Observation2 [NOT IN BOOK]:
what if we have ‘k’ buffers available? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos
40
2-way joins Cost? what if br=k-1?
read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples Cost? br + br/(k-1) * bs what if br=k-1? what if we assign k-1 blocks to inner?) r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos
41
2-way joins Observation3: can we get rid of the ‘br’ term?
cost: br + br * bs A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos
42
2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each
s(A, ) nr ns C. Faloutsos
43
2-way joins - other algo’s
sort-merge sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted) r(A, ...) s(A, ) nr ns C. Faloutsos
44
2-way joins - other algo’s
sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order r(A, ...) s(A, ) nr ns C. Faloutsos
45
2-way joins - other algo’s
use an existing index, or even build one on the fly cost: br + nr * c (c: look-up cost) r(A, ...) s(A, ) ns nr C. Faloutsos
46
2-way joins - other algo’s
hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets r(A, ...) s(A, ) 1 max C. Faloutsos
47
2-way joins - hash join details
how to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i r(A, ...) Hr-0 Hs-0 s(A, ) 1 max C. Faloutsos
48
2-way joins - hash join details
what if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions) 3(br + bs) + 2* max C. Faloutsos
49
Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best C. Faloutsos CMU
50
n-way joins r1 JOIN r2 JOIN ... JOIN rn
typically, break problem into 2-way joins C. Faloutsos
51
Structure of query optimizers:
System R: break query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only one intermediate result at a time pros: smaller search space; pipelining cons: may miss optimal 2-way joins: NL and sort-merge r1 r2 r3 r4 C. Faloutsos
52
Structure of query optimizers:
More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book In general: q-opt is very important for large databases. (‘explain select <sql-statement>’ gives plan) C. Faloutsos
53
Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.