Presentation is loading. Please wait.

Presentation is loading. Please wait.

C. Faloutsos Query Optimization – part 2

Similar presentations


Presentation on theme: "C. Faloutsos Query Optimization – part 2"— Presentation transcript:

1 C. Faloutsos Query Optimization – part 2
Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Query Optimization – part 2

2 General Overview - rel. model
Relational model - SQL Functional Dependencies & Normalization Physical Design Indexing Query optimization Transaction processing C. Faloutsos

3 Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos

4 Reminder – statistics:
Sr #1 #2 #3 #nr for each relation ‘r’ we keep nr : # tuples; Sr : size of tuple in bytes V(A,r): number of distinct values of attr. ‘A’ C. Faloutsos

5 Derivable statistics fr: blocking factor = max# records/block (= B/Sr ; B: block size in bytes) br: # blocks (= nr / fr ) C. Faloutsos

6 Derivable statistics SC(A,r) = selection cardinality = avg# of records with A=given (= nr / V(A,r) ) (assumes uniformity...) – eg: 10,000 students, 10 colleges – how many students in SCS? C. Faloutsos

7 Selections we saw simple predicates (A=constant; eg., ‘name=Smith’)
how about more complex predicates, like ‘salary > 10K’ ‘age = 30 and job-code=“analyst” ’ what is their selectivity? C. Faloutsos

8 Selections – complex predicates
selectivity sel(P) of predicate P : == fraction of tuples that qualify sel(P) = SC(P) / nr C. Faloutsos

9 Selections – complex predicates
eg., assume that V(grade, TAKES)=5 distinct values simple predicate P: A=constant sel(A=constant) = 1/V(A,r) eg., sel(grade=‘B’) = 1/5 (what if V(A,r) is unknown??) grade count A F C. Faloutsos

10 Selections – complex predicates
range query: sel( grade >= ‘C’) sel(A>a) = (Amax – a) / (Amax – Amin) grade count A F C. Faloutsos

11 Selections - complex predicates
negation: sel( grade != ‘C’) sel( not P) = 1 – sel(P) (Observation: selectivity =~ probability) grade count A F ‘P’ C. Faloutsos

12 Selections – complex predicates
conjunction: sel( grade = ‘C’ and course = ‘415’) sel(P1 and P2) = sel(P1) * sel(P2) INDEPENDENCE ASSUMPTION P1 P2 C. Faloutsos

13 Selections – complex predicates
disjunction: sel( grade = ‘C’ or course = ‘415’) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1 and P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) INDEPENDENCE ASSUMPTION, again P1 P2 C. Faloutsos

14 Selections – complex predicates
disjunction: in general sel(P1 or P2 or … Pn) = 1 - (1- sel(P1) ) * (1 - sel(P2) ) * … (1 - sel(Pn)) P1 P2 C. Faloutsos

15 Selections – summary sel(A=constant) = 1/V(A,r)
sel( A>a) = (Amax – a) / (Amax – Amin) sel(not P) = 1 – sel(P) sel(P1 and P2) = sel(P1) * sel(P2) sel(P1 or P2) = sel(P1) + sel(P2) – sel(P1)*sel(P2) UNIFORMITY and INDEPENDENCE ASSUMPTIONS C. Faloutsos

16 Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos CMU

17 Sorting Assume br blocks of rel. ‘r’, and
C. Faloutsos Sorting Assume br blocks of rel. ‘r’, and only M (<br) buffers in main memory Q1: how to sort (‘external sorting’)? Q2: cost? ... 1 2 M br ‘r’ C. Faloutsos CMU

18 Sorting Q1: how to sort (‘external sorting’)? A1:
C. Faloutsos Sorting Q1: how to sort (‘external sorting’)? A1: create sorted runs of size M merge ... 1 2 M br ‘r’ C. Faloutsos CMU

19 Sorting create sorted runs of size M (how many?) merge them (how?) ...
C. Faloutsos Sorting create sorted runs of size M (how many?) merge them (how?) M ... ... C. Faloutsos CMU

20 Sorting create sorted runs of size M
C. Faloutsos Sorting create sorted runs of size M merge first M-1 runs into a sorted run of (M-1) *M, ... M ….. ... ... C. Faloutsos CMU

21 Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br
C. Faloutsos Sorting How many steps we need to do? ‘i’, where M*(M-1)^i > br How many reads/writes per step? br+br M ….. ... ... C. Faloutsos CMU

22 Sorting In short, excluding the final ‘write’, we need
C. Faloutsos Sorting In short, excluding the final ‘write’, we need ceil(log(br/M) / log(M-1)) * 2 * br + br M ….. ... ... C. Faloutsos CMU

23 Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections, aggregations joins estimate cost; pick best C. Faloutsos CMU

24 Projection - dup. elimination
C. Faloutsos Projection - dup. elimination eg., select distinct c-id from TAKES How? Pros and cons? C. Faloutsos CMU

25 Set operations eg., How? Pros and cons? select * from REGULAR-STUDENT
C. Faloutsos Set operations eg., select * from REGULAR-STUDENT union select * from SPECIAL-STUDENT How? Pros and cons? C. Faloutsos CMU

26 Aggregations eg., How? select ssn, avg(grade) from TAKES group by ssn
C. Faloutsos Aggregations eg., select ssn, avg(grade) from TAKES group by ssn How? C. Faloutsos CMU

27 Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins n-way joins estimate cost; pick best C. Faloutsos CMU

28 2-way joins output size estimation: r JOIN s nr, ns tuples each
case#1: cartesian product (R, S have no common attribute) #of output tuples=?? C. Faloutsos

29 2-way joins output size estimation: r JOIN s
case#2: r(A,B), s(A,C,D), A is cand. key for ‘r’ #of output tuples=?? <=ns r(A, ...) s(A, ) nr ns C. Faloutsos

30 2-way joins output size estimation: r JOIN s
case#3: r(A,B), s(A,C,D), A is cand. key for neither (is it possible??) #of output tuples=?? r(A, ...) s(A, ) nr ns C. Faloutsos

31 2-way joins #of output tuples~ nr * ns/V(A,s) or ns * nr/V(A,r)
(whichever is less) r(A, ...) s(A, ) nr ns C. Faloutsos

32 Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best C. Faloutsos CMU

33 2-way joins algorithm(s) for r JOIN s? nr, ns tuples each r(A, ...)
C. Faloutsos

34 2-way joins Algorithm #0: (naive) nested loop (SLOW!)
for each tuple tr of r for each tuple ts of s print, if they match r(A, ...) s(A, ) nr ns C. Faloutsos

35 2-way joins Algorithm #0: why is it bad?
how many disk accesses (‘br’ and ‘bs’ are the number of blocks for ‘r’ and ‘s’)? br + nr*bs r(A, ...) s(A, ) nr ns C. Faloutsos

36 2-way joins Algorithm #1: Blocked nested-loop join
read in a block of r read in a block of s print matching tuples cost: br + br * bs r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

37 2-way joins Arithmetic example: nr = 10,000 tuples, br = 1,000 blocks
ns = 1,000 tuples, bs = 200 blocks alg#0: 2,001,000 d.a. alg#1: 201,000 d.a. r(A, ...) s(A, ) 10,000 1,000 1,000 records, 200 blocks C. Faloutsos

38 smallest relation in outer loop
2-way joins Observation1: Algo#1: asymmetric: cost: br + br * bs - reverse roles: cost= bs + bs*br Best choice? smallest relation in outer loop r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

39 2-way joins Observation2 [NOT IN BOOK]:
what if we have ‘k’ buffers available? read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

40 2-way joins Cost? what if br=k-1?
read in ‘k-1’ blocks of ‘r’ read in a block of ‘s’ print matching tuples Cost? br + br/(k-1) * bs what if br=k-1? what if we assign k-1 blocks to inner?) r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

41 2-way joins Observation3: can we get rid of the ‘br’ term?
cost: br + br * bs A: read the inner relation backwards half of the times! Q: cons? r(A, ...) s(A, ) nr, br ns records, bs blocks C. Faloutsos

42 2-way joins Other algorithm(s) for r JOIN s? nr, ns tuples each
s(A, ) nr ns C. Faloutsos

43 2-way joins - other algo’s
sort-merge sort ‘r’; sort ‘s’; merge sorted versions (good, if one or both are already sorted) r(A, ...) s(A, ) nr ns C. Faloutsos

44 2-way joins - other algo’s
sort-merge - cost: ~ 2* br * log(br) + 2* bs * log(bs) + br + bs needs temporary space (for sorted versions) gives output in sorted order r(A, ...) s(A, ) nr ns C. Faloutsos

45 2-way joins - other algo’s
use an existing index, or even build one on the fly cost: br + nr * c (c: look-up cost) r(A, ...) s(A, ) ns nr C. Faloutsos

46 2-way joins - other algo’s
hash join: hash ‘r’ into (0, 1, ..., ‘max’) buckets hash ‘s’ into buckets (same hash function) join each pair of matching buckets r(A, ...) s(A, ) 1 max C. Faloutsos

47 2-way joins - hash join details
how to join each pair of partitions Hr-i, Hs-i ? A: build another hash table for Hs-i, and probe it with each tuple of Hr-i r(A, ...) Hr-0 Hs-0 s(A, ) 1 max C. Faloutsos

48 2-way joins - hash join details
what if Hs-i is too large to fit in main-memory? A: recursive partitioning more details (overflows, hybrid hash joins): in book cost of hash join? (under certain assumptions) 3(br + bs) + 2* max C. Faloutsos

49 Q-opt steps bring query in internal form (eg., parse tree)
C. Faloutsos Q-opt steps bring query in internal form (eg., parse tree) … into ‘canonical form’ (syntactic q-opt) generate alt. plans selections; sorting; projections, aggregations joins 2-way joins - output size estimation; algorithms n-way joins estimate cost; pick best C. Faloutsos CMU

50 n-way joins r1 JOIN r2 JOIN ... JOIN rn
typically, break problem into 2-way joins C. Faloutsos

51 Structure of query optimizers:
System R: break query in query blocks simple queries (ie., no joins): look at stats n-way joins: left-deep join trees; ie., only one intermediate result at a time pros: smaller search space; pipelining cons: may miss optimal 2-way joins: NL and sort-merge r1 r2 r3 r4 C. Faloutsos

52 Structure of query optimizers:
More heuristics by Oracle, Sybase and Starburst (-> DB2) : in book In general: q-opt is very important for large databases. (‘explain select <sql-statement>’ gives plan) C. Faloutsos

53 Q-opt steps bring query in internal form (eg., parse tree)
… into ‘canonical form’ (syntactic q-opt) generate alt. plans selections (simple; complex predicates) sorting; projections joins estimate cost; pick best C. Faloutsos


Download ppt "C. Faloutsos Query Optimization – part 2"

Similar presentations


Ads by Google