Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz.

Slides:



Advertisements
Similar presentations
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Advertisements

Evaluation of Relational Operators CS634 Lecture 11, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Chapter 8 File organization and Indices.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part A Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Layers of a DBMS Query optimization Execution engine Files and access methods Buffer management Disk space management Query Processor Query execution plan.
Storage and Indexing February 26 th, 2003 Lecture 19.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
1 Physical Data Organization and Indexing Lecture 14.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
1 Overview of Storage and Indexing Chapter 8 (part 1)
File Organizations and Indexing
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Storage and File Organization
CS222: Principles of Data Management Lecture #4 Catalogs, Buffer Manager, File Organizations Instructor: Chen Li.
Database System Architecture and Implementation
Module 11: File Structure
Indexing Structures for Files and Physical Database Design
CS522 Advanced database Systems
Record Storage, File Organization, and Indexes
Indexing and hashing.
CS522 Advanced database Systems
Tree-Structured Indexes
Storage and Indexes Chapter 8 & 9
File Organizations and Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Database Management Systems (CS 564)
Database Management Systems (CS 564)
Database Performance Tuning and Query Optimization
Evaluation of Relational Operations
File Organizations Chapter 8 “How index-learning turns no student pale
Chapter 15 QUERY EXECUTION.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
File organization and Indexing
Chapter 11: Indexing and Hashing
Lecture 12 Lecture 12: Indexing.
B+-Trees and Static Hashing
File Organizations and Indexing
File Organizations and Indexing
Tree-Structured Indexes
Introduction to Database Systems
Indexing and Hashing Basic Concepts Ordered Indices
Selected Topics: External Sorting, Join Algorithms, …
Operations to Consider
Lecture 19: Data Storage and Indexes
CS222/CS122C: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
Tree-Structured Indexes
Storage and Indexing May 17th, 2002.
Indexing and Hashing B.Ramamurthy Chapter 11 2/5/2019 B.Ramamurthy.
CSE 544: Lecture 11 Storing Data, Indexes
Indexing 1.
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Storage and Indexing.
Chapter 11 Database Performance Tuning and Query Optimization
CS222p: Principles of Data Management Lecture #4 Catalogs, File Organizations Instructor: Chen Li.
General External Merge Sort
Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes May 16, 2008.
Lecture 20: Indexes Monday, February 27, 2006.
Chapter 11: Indexing and Hashing
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
File Organizations and Indexing
Unit 12 Index in Database 大量資料存取方法之研究 Approaches to Access/Store Large Data 楊維邦 博士 國立東華大學 資訊管理系教授.
Presentation transcript:

Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Orientation 2 ExecutorParser Operator EvaluatorOptimizer Files and Index Structures Buffer Manager Disk Space Manager Recovery Manager Transaction Manager Lock Manager SQL InterfaceApplicationsWeb Forms SQL Commands Index and Data Files Catalog Database DBMS Figure Credit: Raghu Ramakrishnan and Johannes Gehrke: “Database Management Systems”, McGraw-Hill, We are here!

Recall Heap Files Heap files provide just enough structure to maintain a collection of records (of a table) The heap file supports sequential ( openScan( ∙ ) ) over the collection No other operations get specific support from heap files 3  SQL query leading to a sequential scan SELECT A, B FROM R SELECT A, B FROM R Slides Credit: Michael Grossniklaus – Uni-Konstanz

Systematic File Organization For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R 4  SQL queries calling for systematic file organization SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC  Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? Slides Credit: Michael Grossniklaus – Uni-Konstanz

Systematic File Organization For the above queries, it would definitely be helpful if the SQL query processor could rely on a particular file organization of the records in the file for table R 5  SQL queries calling for systematic file organization  Exercise Which organization of records in the file for table R could speed up the evaluation of both queries above? Allocate records of table R in ascending order of attribute C values Place records in neighboring pages (Only include columns A, B, and C in the records) Which organization of records in the file for table R could speed up the evaluation of both queries above? Allocate records of table R in ascending order of attribute C values Place records in neighboring pages (Only include columns A, B, and C in the records) SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC SELECT A, BSELECT A, B FROM RFROM R WHERE C > 45ORDER BY C ASC Slides Credit: Michael Grossniklaus – Uni-Konstanz

Module Overview Three different file organizations 1.files containing randomly ordered records (heap files) 2.files sorted on one or more record fields 3.files hashed on one or more record fields Comparison of file organizations –simple cost model –application of cost model to file operations Introduction to index concept –clustered vs. unclustered indexes –dense vs. sparse indexes 6 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Comparison of File Organizations Competition of three file organizations in five disciplines 1.scan: read all records in a give file 2.search with equality test 3.search with range selection (upper or lower bound may be unspecified) 4.insert a given record in the file, respecting the file’s organization 5.delete a record (identified by its rid), maintain the file’s organization 7  SQL queries calling for equality test and range selection support SELECT *FROM R WHERE C = 45WHERE A > 0 AND A < 100SELECT *FROM R Slides Credit: Michael Grossniklaus – Uni-Konstanz

Simple Cost Model A cost model is used to analyze the execution time of a given database operations –block I/O operations are typically a major cost factor –CPU time to account for searching inside a page, comparing a record field to selection constant, etc. To estimate the execution time of the five database operation, we introduce a coarse cost model –omits cost of network access –does not consider cache effects –neglects burst I/O –… Cost models play an important role in query optimization 8 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Simple Cost Model Some typical values –D ≈ 15 ms –C ≈ H ≈ 0.1 μs 9  Simple cost model parameters ParameterDescription bnumber of pages in the file rnumber of records on a page Dtime to read/write a disk page CCPU time needed to process a record (e.g., compare a field value) HCPU time take to apply a function to a record (e.g., a comparison or hash function) ParameterDescription bnumber of pages in the file rnumber of records on a page Dtime to read/write a disk page CCPU time needed to process a record (e.g., compare a field value) HCPU time take to apply a function to a record (e.g., a comparison or hash function) Slides Credit: Michael Grossniklaus – Uni-Konstanz

Back to the Future The hash function determines the page number only, record placement inside a page is not prescribed If a page p is filled to capacity, a chain of overflow pages is maintained to store additional records with h( 〈 … 〉 ) = p To avoid immediate overflowing when a new record is inserted, pages are typically filled to 80% only when a heap file is initially (re)organized into a hash file 10  A simple hash function A hashed file uses a hash function h to map a given record onto a specific page of the file. Example: h uses the lower 3 bits of the first field (of type INTEGER ) of the record to compute the corresponding page number. h( 〈 42, true, ‘dog’ 〉 ) → 2(42 = ) h( 〈 14, true, ‘cat’ 〉 ) → 6(14 = ) h( 〈 26, false, ‘mouse’ 〉 ) → 2(26 = ) A hashed file uses a hash function h to map a given record onto a specific page of the file. Example: h uses the lower 3 bits of the first field (of type INTEGER ) of the record to compute the corresponding page number. h( 〈 42, true, ‘dog’ 〉 ) → 2(42 = ) h( 〈 14, true, ‘cat’ 〉 ) → 6(14 = ) h( 〈 26, false, ‘mouse’ 〉 ) → 2(26 = ) Slides Credit: Michael Grossniklaus – Uni-Konstanz

Cost of Scan 11 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Hashed File 12  Scanning a hashed file In which order does a scan of a hashed file retrieve its records? Slides Credit: Michael Grossniklaus – Uni-Konstanz

Cost of Search with Equality Test 13  Nevertheless, no DBMS will implement binary search for value lookup Why? Slides Credit: Michael Grossniklaus – Uni-Konstanz

Cost of Search with Equality Test 14 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Cost of Search with Range Selection 15 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Cost of Insert 16 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Cost of Delete 17 Slides Credit: Michael Grossniklaus – Uni-Konstanz

Performance Comparison Performance of range selections for files of increasing size (D = 15 ms, C = 0.1 μs, r = 100, n = 10) 18  Performance graph Figure Credit: Marc H. Scholl, University of Konstanz, Germany

Performance Comparison Performance of deletions for files of increasing size (D = 15 ms, C = 0.1 μs, r = 100, n = 1) 19  Performance graph Figure Credit: Marc H. Scholl, University of Konstanz, Germany

And the Winner Is… There is no single file organization that responds equally fast to all five operations This is a dilemma because more advanced file organizations can make a real difference in speed (see previous slides) There exist index structures which offer all advantages of a sorted file and support insertions/deletions efficiently (at the cost of a modest space overhead): B+ trees Before discussing B+ trees in detail, the following introduces the index concept in general 20 Slides Credit: Michael Grossniklaus – Uni-Konstanz