Download presentation
Presentation is loading. Please wait.
Published byLeonard Holmes Modified over 8 years ago
1
1 B + -Trees: Search If there are n search-key values in the file, the path is no longer than log f/2 (n) (worst case).
2
2 External Sort-Merge Sorting phase: Sorts n B pages at a time n B = # of main memory pages buffer creates n R = b/n B initial sorted runs on disk b = # of file blocks (pages) to be sorted Sorting Cost = read b blocks + write b blocks = 2 b
3
3 External Sort-Merge Merging phase: The sorted runs are merged during one or more passes. The degree of merging (d M ) is the number of runs that can be merged in each pass. d M = Min (n B -1, n R ) n P = (log dM (n R )) n P : number of passes. In each pass, One buffer block is needed to hold one block from each of the runs being merged, and One block is needed for containing one block of the merged result.
4
4 External Sort-Merge Degree of merging (d M ) # of runs that can be merged together in each pass = min (n B - 1, n R ) Number of passes n P = (log dM (n R )) In our example d M = 4 (four-way merging) min (n B -1, n R ) = min(5-1, 205) = 4 Number of passes n P = (log dM (n R )) = (log 4 (205)) = 4 First pass: –205 initial sorted runs would be merged into 52 sorted runs Second pass: –52 sorted runs would be merged into 13 Third pass: –13 sorted runs would be merged into 4 Fourth pass: –4 sorted runs would be merged into 1
5
5 External Sort-Merge External Sort-Merge: Cost Analysis Disk accesses for initial run creation (sort phase) as well as in each merge pass is 2b reads every block once and writes it out once Initial # of runs is n R = b/n B and # of runs decreases by a factor of n B - 1 in each merge pass, then the total # of merge passes is n p = log d M (n R ) In general, the cost performance of Merge-Sort is Cost = sort cost + merge cost Cost = 2b + 2b * n p Cost = 2b + 2b * log dM n R = 2b ( log d M (n R ) + 1)
6
6 Catalog Information Attribute d:# of distinct values of an attribute sl (selectivity): the ratio of the # of records satisfying the condition to the total # of records in the file. s (selection cardinality) = sl * r average # of records that will satisfy an equality condition on the attribute For a key attribute: d = r,sl = 1/r,s = 1 For a nonkey attribute: assuming that d distinct values are uniformly distributed among the records the estimated sl = 1/d,s = r/d
7
7 Using Selectivity and Cost Estimates in Query Optimization Examples of Cost Functions for SELECT S1. Linear search (brute force) approach C S1a = b; For an equality condition on a key, C S1a = (b/2) if the record is found; otherwise C S1a = b. S2. Binary search: C S2 = log 2 b + (s/bfr) –1 For an equality condition on a unique (key) attribute, C S2 =log 2 b S3. Using a primary index (S3a) or hash key (S3b) to retrieve a single record C S3a = x + 1; C S3b = 1 for static or linear hashing; C S3b = 1 for extendible hashing;
8
8 Using Selectivity and Cost Estimates in Query Optimization Examples of Cost Functions for SELECT (contd.) S4. Using an ordering index to retrieve multiple records: For the comparison condition on a key field with an ordering index, C S4 = x + (b/2) S5. Using a clustering index to retrieve multiple records: C S5 = x + ┌ (s/bfr) ┐ S6. Using a secondary (B+-tree) index: For an equality comparison, C S6a = x + s; For an comparison condition such as >, =, or <=, C S6a = x + (b I1 /2) + (r/2)
9
9 Using Selectivity and Cost Estimates in Query Optimization Examples of Cost Functions for SELECT (contd.) S7. Conjunctive selection: Use either S1 or one of the methods S2 to S6 to solve. For the latter case, use one condition to retrieve the records and then check in the memory buffer whether each retrieved record satisfies the remaining conditions in the conjunction. S8. Conjunctive selection using a composite index: Same as S3a, S5 or S6a, depending on the type of index.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.