Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz.

Similar presentations


Presentation on theme: "Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz."— Presentation transcript:

1 Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz

2 Orientation 2 ExecutorParser Operator EvaluatorOptimizer Files and Index Structures Buffer Manager Disk Space Manager Recovery Manager Transaction Manager Lock Manager SQL InterfaceApplicationsWeb Forms SQL Commands Index and Data Files Catalog Database DBMS Figure Credit: Raghu Ramakrishnan and Johannes Gehrke: “Database Management Systems”, McGraw-Hill, 2003. We are here!

3 Recall Heap Files Heap files provide just enough structure to maintain a collection of records (of a table) The heap file supports sequential ( openScan( ∙ ) ) over the collection No other operations get specific support from heap files 3  SQL query leading to a sequential scan SELECT A, B FROM R SELECT A, B FROM R Slides Credit: Michael Grossniklaus – Uni-Konstanz

4 Systematic File Organization For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R 4  SQL queries calling for systematic file organization SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC  Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? Slides Credit: Michael Grossniklaus – Uni-Konstanz

5 Systematic File Organization For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R 5  SQL queries calling for systematic file organization  Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? Allocate records of table R in ascending order of attribute C values Place records in neighboring pages (Only include columns A, B, and C in the records) Which organization of records in the file for table R could speed up the evaluation of both queries above? Allocate records of table R in ascending order of attribute C values Place records in neighboring pages (Only include columns A, B, and C in the records) SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC Slides Credit: Michael Grossniklaus – Uni-Konstanz

6 Module Overview Three different file organizations 1.files containing randomly ordered records (heap files) 2.files sorted on one or more record fields 3.files hashed on one or more record fields Comparison of file organizations –simple cost model –application of cost model to file operations Introduction to index concept –clustered vs. unclustered indexes –dense vs. sparse indexes 6 Slides Credit: Michael Grossniklaus – Uni-Konstanz

7 Comparison of File Organizations Competition of three file organizations in five disciplines 1.scan: read all records in a give file 2.search with equality test 3.search with range selection (upper or lower bound may be unspecified) 4.insert a given record in the file, respecting the file’s organization 5.delete a record (identified by its rid), maintain the file’s organization 7  SQL queries calling for equality test and range selection support SELECT *FROM R WHERE C = 45WHERE A > 0 AND A < 100SELECT *FROM R Slides Credit: Michael Grossniklaus – Uni-Konstanz

8 Simple Cost Model A cost model is used to analyze the execution time of a given database operations –block I/O operations are typically a major cost factor –CPU time to account for searching inside a page, comparing a record field to selection constant, etc. To estimate the execution time of the five database operation, we introduce a coarse cost model –omits cost of network access –does not consider cache effects –neglects burst I/O –… Cost models play an important role in query optimization 8 Slides Credit: Michael Grossniklaus – Uni-Konstanz

9 Simple Cost Model Some typical values –D ≈ 15 ms –C ≈ H ≈ 0.1 μs 9  Simple cost model parameters ParameterDescription bnumber of pages in the file rnumber of records on a page Dtime to read/write a disk page CCPU time needed to process a record (e.g., compare a field value) HCPU time take to apply a function to a record (e.g., a comparison or hash function) ParameterDescription bnumber of pages in the file rnumber of records on a page Dtime to read/write a disk page CCPU time needed to process a record (e.g., compare a field value) HCPU time take to apply a function to a record (e.g., a comparison or hash function) Slides Credit: Michael Grossniklaus – Uni-Konstanz

10 Back to the Future The hash function determines the page number only, record placement inside a page is not prescribed If a page p is filled to capacity, a chain of overflow pages is maintained to store additional records with h( 〈 … 〉 ) = p To avoid immediate overflowing when a new record is inserted, pages are typically filled to 80% only when a heap file is initially (re)organized into a hash file 10  A simple hash function A hashed file uses a hash function h to map a given record onto a specific page of the file. Example: h uses the lower 3 bits of the first field (of type INTEGER ) of the record to compute the corresponding page number. h( 〈 42, true, ‘dog’ 〉 ) → 2(42 = 101010 2 ) h( 〈 14, true, ‘cat’ 〉 ) → 6(14 = 1110 2 ) h( 〈 26, false, ‘mouse’ 〉 ) → 2(26 = 11010 2 ) A hashed file uses a hash function h to map a given record onto a specific page of the file. Example: h uses the lower 3 bits of the first field (of type INTEGER ) of the record to compute the corresponding page number. h( 〈 42, true, ‘dog’ 〉 ) → 2(42 = 101010 2 ) h( 〈 14, true, ‘cat’ 〉 ) → 6(14 = 1110 2 ) h( 〈 26, false, ‘mouse’ 〉 ) → 2(26 = 11010 2 ) Slides Credit: Michael Grossniklaus – Uni-Konstanz

11 Cost of Scan 11 Slides Credit: Michael Grossniklaus – Uni-Konstanz

12 Hashed File 12  Scanning a hashed file In which order does a scan of a hashed file retrieve its records? Slides Credit: Michael Grossniklaus – Uni-Konstanz

13 Cost of Search with Equality Test 13  Nevertheless, no DBMS will implement binary search for value lookup Why? Slides Credit: Michael Grossniklaus – Uni-Konstanz

14 Cost of Search with Equality Test 14 Slides Credit: Michael Grossniklaus – Uni-Konstanz

15 Cost of Search with Range Selection 15 Slides Credit: Michael Grossniklaus – Uni-Konstanz

16 Cost of Insert 16 Slides Credit: Michael Grossniklaus – Uni-Konstanz

17 Cost of Delete 17 Slides Credit: Michael Grossniklaus – Uni-Konstanz

18 Performance Comparison Performance of range selections for files of increasing size (D = 15 ms, C = 0.1 μs, r = 100, n = 10) 18  Performance graph Figure Credit: Marc H. Scholl, University of Konstanz, Germany

19 Performance Comparison Performance of deletions for files of increasing size (D = 15 ms, C = 0.1 μs, r = 100, n = 1) 19  Performance graph Figure Credit: Marc H. Scholl, University of Konstanz, Germany

20 And the Winner Is… There is no single file organization that responds equally fast to all five operations This is a dilemma because more advanced file organizations can make a real difference in speed (see previous slides) There exist index structures which offer all advantages of a sorted file and support insertions/deletions efficiently (at the cost of a modest space overhead): B+ trees Before discussing B+ trees in detail, the following introduces the index concept in general 20 Slides Credit: Michael Grossniklaus – Uni-Konstanz


Download ppt "Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz."

Similar presentations


Ads by Google