Download presentation
Presentation is loading. Please wait.
Published byCorey Shepherd Modified over 9 years ago
1
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park
2
Outline Overview Measures of Query Cost Selection Operation Sorting Join Operation(will be covered in the next file) Other Operations(will be covered in the next file) Evaluation of Expressions(will be covered in the next file)
3
Basic Steps in Query Processing 1. Parsing and translation 2. Optimization 3. Evaluation
4
Optimization A relational algebra expression can be evaluated in many ways Example salary 75000 ( salary (instructor)) is equivalent to salary ( salary 75000 (instructor)) Annotated expression specifying detailed evaluation strategy is called an evaluation plan Example (1) can use an index on instructor to find tuples with salary < 75000 (2) can perform complete relation scan and discard instructors with salary ≥ 75000 Query optimization: Amongst all equivalent evaluation plans choose the one with lowest cost (details in Chapter 13)
5
Measures of Query Cost (1/2) Cost is generally measured as total elapsed time for answering query; many factors contribute the cost (disk accesses, CPU, …) Typically disk access is the predominant cost, and is also relatively easy to estimate; measured by taking into account Number of seeks average-seek-cost Number of blocks read average-block-read-cost Number of blocks written average-block-write-cost For simplicity we just use number of block transfers from disk as the cost measure
6
Measures of Query Cost (2/2) Costs depend on the size of the buffer in main memory Having more memory reduces need for disk access Amount of real memory available to buffer depends on other concurrent OS processes, and hard to determine ahead of actual execution We often use worst case estimates, assuming only the minimum amount of memory needed for the operation is available Real systems take CPU cost, difference between sequential and random I/O, and buffer size into account We do not include cost to writing output to disk in our cost formula
7
Selection Operation In query processing, the file scan is the lowest-level operator to access data File scans are search algorithms that locate and retrieve records that satisfy a selection condition In relational systems, a file scan allows an entire relation to be read in those cases where the relation is stored in a single, dedicated file
8
Basic Algorithms: Linear Search (A1) Scan each file block and test all records to see whether they satisfy the selection condition Cost estimate (number of disk blocks scanned) = b r (b r denotes number of blocks containing records from relation r) Selections on key attributes have an average cost b r /2, but still have a worst-case cost of b r Linear search algorithm can be applied to any file, regardless of Ordering of records in the file Availability of indices Nature of the selection operation
9
Basic Algorithms: Binary Search (A2) Applicable if selection is an equality comparison on the attribute on which the file is ordered Cost estimate (number of disk blocks to be scanned) log 2 (b r ) - cost of locating the first tuple by a binary search on the blocks Plus number of blocks containing records that satisfy selection condition
10
Selection Using Indices (1/2) Search algorithms that use an index are referred to as index scans (selection condition must be on search-key of index) A3 (primary index on candidate key, equality) Retrieve a single record that satisfies the equality condition If a B + -tree is used, the cost is equal to the height of the tree plus one I/O to fetch the record; Cost = HT i + 1
11
Selection Using Indices (2/2) A4 (primary index on nonkey, equality) Records will be on consecutive blocks Cost = HT i + number of blocks containing retrieved records A5(a) (secondary index on candidate key, equality) Cost = HT i + 1 (ignoring the cost for bucket access) A5(b) (secondary index on nonkey, equality) Cost = HT i + number of records retrieved (ignoring the cost for bucket access) (each record may be on a different block, very expensive)
12
Selections Involving Comparisons (1/2) Can implement selections of the form A≤V (r) or A≥V (r) by using A file scan Or by using indices in the following ways A6 (primary index, comparison) (Relation is sorted on A) For A≥V (r), use index to find first tuple ≥ v and scan relation sequentially from there For A≤V (r), just scan relation sequentially till first tuple > v; do not use index
13
Selections Involving Comparisons (2/2) A7 (secondary index, comparison) For A≥V (r), use index to find first index entry ≥ v and scan index sequentially from there, to find pointers to records For A≤V (r), just scan leaf pages of index finding pointers to records, till first entry > v In either case, retrieve records that are pointed to Requires an I/O for each record Linear file scan may be cheaper if many records are to be fetched
14
Sorting For relations that fit in memory, techniques like quicksort can be used For relations that don’t fit in memory, external sort-merge is a good choice
15
External Sort-Merge (1/3) Let M denote memory size (in blocks) Create sorted runs Let i be 0 initially Repeatedly do the following till the end of the relation: Read M blocks of relation into memory Sort the in-memory blocks Write sorted data to run R i ; increment i Let the final value of i be N
16
External Sort-Merge (2/3) Merge the runs (N-way merge) We assume (for now) that N < M Use N blocks of memory as buffers for input runs, and 1 block as buffer for output. Read the first block of each run into its input buffer Repeatedly do the following until all input buffers are empty: Select the first record (in sort order) among all input buffers Write the record to the output buffer; if the output buffer is full, write it to disk Delete the record from its input buffer; if the input buffer becomes empty, then read the next block (if any) of the run into the input buffer
17
External Sort-Merge (3/3) If N M, several merge passes are required In each pass, contiguous groups of M - 1 runs are merged A pass reduces the number of runs by a factor of M - 1, and creates runs longer by the same factor Repeated passes are performed till all runs have been merged into one
18
Example: External Sort-Merge 1 9
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.