Implementation of Relational Operators/Estimated Cost 1.Select 2.Join.

Slides:



Advertisements
Similar presentations
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 How index-learning turns no student pale Yet holds.
Advertisements

Indexes An index on a file speeds up selections on the search key fields for the index. Any subset of the fields of a relation can be the search key for.
B+-Trees and Hashing Techniques for Storage and Index Structures
Hashing and Indexing John Ortiz.
1 Introduction to Database Systems CSE 444 Lectures 19: Data Storage and Indexes November 14, 2007.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
1 Overview of Storage and Indexing Chapter 8 (part 1)
1 File Organizations and Indexing Module 4, Lecture 2 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander.
1 Overview of Storage and Indexing Yanlei Diao UMass Amherst Feb 13, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Introduction to Database Systems1 Indexing Techniques Storage Technology: Topic 4.
1 Lecture 20: Indexes Friday, February 25, Outline Representing data elements (12) Index structures (13.1, 13.2) B-trees (13.3)
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
1 Overview of Storage and Indexing Chapter 8 1. Basics about file management 2. Introduction to indexing 3. First glimpse at indices and workloads.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
DBMS Internals: Storage February 27th, Representing Data Elements Relational database elements: A tuple is represented as a record CREATE TABLE.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Storage and Indexing February 26 th, 2003 Lecture 19.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8.
Indexing - revisited CS 186, Fall 2012 R & G Chapter 8.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
Database Management 8. course. Query types Equality query – Each field has to be equal to a constant Range query – Not all the fields have to be equal.
Advanced Databases: Lecture 6 Query Optimization (I) 1 Introduction to query processing + Implementing Relational Algebra Advanced Databases By Dr. Akhtar.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 “How index-learning turns no student pale Yet holds.
1 Overview of Storage and Indexing Chapter 8 (part 1)
Storage and Indexing1 Overview of Storage and Indexing.
1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet holds the eel of science by the tail.” -- Alexander Pope ( )
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Data on External Storage  Disks: Can retrieve random page at fixed cost  But reading several consecutive.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8 “How index-learning turns no student pale Yet.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Index tuning-- B+tree. overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree.
Spring 2003 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2003 Yanyong Zhang
CPSC 404, Laks V.S. Lakshmanan1 Overview of Query Evaluation Chapter 12 Ramakrishnan & Gehrke (Sections )
File Organizations and Indexing
Spring 2004 ECE569 Lecture 05.1 ECE 569 Database System Engineering Spring 2004 Yanyong Zhang
Storage and Indexing. How do we store efficiently large amounts of data? The appropriate storage depends on what kind of accesses we expect to have to.
Indexing. 421: Database Systems - Index Structures 2 Cost Model for Data Access q Data should be stored such that it can be accessed fast q Evaluation.
Data on External Storage – File Organization and Indexing – Cluster Indexes - Primary and Secondary Indexes – Index data Structures – Hash Based Indexing.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS4432: Database Systems II
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2007.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 8 Jianping Fan Dept of Computer Science UNC-Charlotte.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
1 Overview of Storage and Indexing Chapter 8. 2 Review: Architecture of a DBMS  A typical DBMS has a layered architecture.  The figure does not show.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Overview of Storage and Indexing Chapter 8.
CS522 Advanced database Systems
CS 728 Advanced Database Systems Chapter 18
CS522 Advanced database Systems
Storage and Indexes Chapter 8 & 9
File Organizations and Indexes
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
CS222P: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Introduction to Database Systems File Organization and Indexing
File Organizations and Indexing
File Organizations and Indexing
Indexes A Heap file allows record retrieval:
Operations to Consider
Lecture 21: Indexes Monday, November 13, 2000.
Lecture 19: Data Storage and Indexes
Overview of Storage and Indexing
Storage and Indexing May 17th, 2002.
CSE 544: Lecture 11 Storing Data, Indexes
Indexing 1.
CS222/CS122C: Principles of Data Management Notes #6 Index Overview and ISAM Tree Index Instructor: Chen Li.
Storage and Indexing.
General External Merge Sort
Indexing February 28th, 2003 Lecture 20.
Lecture 20: Indexes Monday, February 27, 2006.
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #05 Index Overview and ISAM Tree Index Instructor: Chen Li.
Presentation transcript:

Implementation of Relational Operators/Estimated Cost 1.Select 2.Join

Sailors(sid, sname, rating, age) –Each Sailors tuple is 50 bytes long (fixed length record format) –A page size is 4K bytes –#tuples: 40,000 –All pages are full, unpacked bitmap; 96 bytes are reserved for slot directory –How many pages for Sailors? One page can contain at most tuples Sailors occupies pages Cost of Selection Operators

Factors to Consider No index unsorted data sorted data Index tree index hash-based index

No index, unsorted data Suppose R is Sailors Best access path: File Scan I/O Cost: 500 pages I/O time cost: 500 * time to access each page Complexity: O(|R|) Notation: |R| is the number of pages in R

No Index, sorted file on R.A Suppose R is Sailors Sorted on R.A Best Access Path: Binary search to locate the first tuple with R.A=Value Scan the remaining records I/O Cost: log 2 (|R|)+Cost of scan for remaining tuples (0 ~ |R|)

Tree Index on R.A Selection Cost = cost of traversing from the root to the leaf + cost of retrieving the pages in the sequence set + cost of retrieving pages containing the data records. Need to know –Format of the data entries in the leaf node –Clustered or unclustered –Dense or sparse Index entries Data entries Data Records Page (node)

Alternatives for Data Entry k* in Index  Alternative 1: Data entry is the actual tuple o Tuples organized according to the search key value o Allow only one Alternative 1 index per relation A B C D E F G H I J K L 97 F, 10 G, 15 H, 27 I, 33 J, 40 K, 55 L, 97 B, 20 C, 33 D, 51 E, 63 A, 40 Root

Alternatives for Data Entry k* in Index  Alternative 2: Data entry is o Ex. : search key value = 50, rid tells that the data is in page 1, slot 10 10*15*20*27*33* 40* 51* 55* 63* 97* Root A B C D E F G H I J K L 97

Alternatives for Data Entry k* in Index  Alternative 3: Data entry is o Ex: search key value = 50; three record IDs 10*15*20*27*33* 40*51* 55* 63* 97* Root A B C D E F G H I J K L 97

Index entries Formats of Data entries Data Records Pros and Cons  Data entries are typically much smaller than data records. So, Alternatives 2 and 3 are better than Alternative 1 with large data records, especially if search keys are small.  If more than one index is required on a given file, at most one index can use Alternative 1; the rest must use Alternatives 2 or 3.  Alternative 3 is more compact than Alternative 2, but leads to variable sized data entries even if search keys are of fixed length.

Index entries Data entries direct search for (Index File) (Data file) Data Records data entries Data entries Data Records DENSE INDEXSPARSE INDEX Dense or Sparse Dense: At least one data entry per data record Sparse: At least one data entry per block/page Pros and Cons: Dense: less space-efficient, but great for both equality and range search Sparse: more space-efficient, but need sequential search within a page

Dense or Sparse 10*15*20*27*33* 40* 51* 55* 63* 97* Root A B C D E F G H I J K L 97

Dense or Sparse Root A B C D E F G H I J K L 97

Clustered or Unclustered Clustered Index : The ordering of data records is organized the same as or close to the ordering of data entries in the index Sparse index is always clustered (why?) A clustered index does not have to be sparse (why?) Pros and Cons: Clustered: maintenance cost high, but great for range search Unclustered: low maintenance cost, but high retrieval cost Retrieving one record may need to load one page

Index Classification Primary vs. secondary: If search key contains primary key, then called primary index. – Unique index: Search key contains a candidate key. – Non-unique index: Search key is a non-candidate attribute o Multiple records may be associated with a same attribute value

Support equality and range-searches efficiently Balanced tree Search/Insert/delete at log F N cost (F = fanout, N = # leaf pages) F => 100 The cost of traversing from root to data entries is usually regarded as 4 Minimum 50% occupancy (except for root). Each node except root contains d <= m <= 2d entries. The root node contains 1<= m <= 2d entries. The parameter d is called the order of the tree. Index Entries Data Entries ("Sequence set") (Direct search) B + Tree: the Most Widely Used Index for RDBMS doubly linked (why?)

B+ Tree Example Search begins at root, and key comparisons direct it to a leaf. Search for tuples whose search key = 15 Follow the left pointer if the desired value is less than the value in the node Otherwise, follow the right pointer Root * 3*5* 7*14*16* 19*20*22*24*27* 29*33*34* 38* 39* 13

B+ Tree Example Root * 3*5* 7*14*16* 19*20*22*24*27* 29*33*34* 38* 39* 13 dense sparse

Dense/Sparse Clustered/Unclustered Root * 3*5* 7*14*16* 19*20*22*24*27* 29*33*34* 38* 39* 13

B + Tree Index on R.A –Format of data entry: alternative 2 –Size of data entry = 20 bytes; –Page size=4K bytes; 96 bytes are reserved –Total number of records = 100,000; record size = 40 bytes –Reduction Factor = 0.1 #matching entries/#total entries Total Cost = Cost of traversing from the root to the leaf (assume 4 I/Os) + Cost of retrieving the pages in the sequence set + the cost of retrieving pages containing the data records

B + tree, Alternative 2, dense, unclustered I/O cost of retrieving pages of qualifying data entries –Matching data entries: 0.1*100000=10,000 entries –#Date entries per page: –Pages of matching data entries = 10000/200 = 50 pages I/O cost of retrieving qualifying tuples –10,000 pages since the index is unclustered, the qualifying tuples are not always in the same order as the data entries. –In the worst case, for each qualifying data entry, one I/O is needed Total I/O Cost = ,000 pages Data Records UNCLUSTERED INDEX Data entries Format of data entry: alternative 2 Size of data entry = 20 bytes; Page size=4K bytes; 96 bytes are reserved Total number of records = 100,000; record size = 40 bytes Reduction Factor = 0.1

B + tree, Alternative 2, dense, clustered I/O cost of retrieving pages of qualifying data entries –Matching data entries: 0.1*100000=10000 entries –#Date entries per page: –#Pages of matching data entries = I/O cost of retrieving qualifying tuples –#Matching tuples: –Since the index is dense and clustered, the qualifying tuples are also clustered –# pages: 10000/100=100 due to ( )/40=100 tuples per page Total I/O Cost = pages Data entries Data Records CLUSTERED INDEX

B + tree, Alternative 2, sparse (must be clustered) I/O cost of retrieving qualifying tuples –#Matching tuples: 0.1*100,000=10,000 –Since the index is clustered, the qualifying tuples are also clustered –# pages: due to 100 tuples per page I/O cost of retrieving pages of qualifying data entries –Matching data pages: 100 –#Data entries per page: –#Pages of matching data entries = =1 page Total I/O Cost = pages Data entries Data Records CLUSTERED INDEX SPARSE INDEX Each entry links to one page

Hash-based Index Hash-based index on A is good for equality search on attribute A –Usually cannot support range search

Data File Ashby, 25, 3000 Smith, 44, 3000 Bristow, 30, 2007 Basu, 33, 4003 Cass, 50, 5004 Tracy, 44, 5004 Daniels, 22, 6003 Jones, 40, 6003 h2 sal h2(sal)=00 h2(sal)=11 Hashed index Data entries Data records I/O costs 1)Cost for retrieving the matching data entries 2)Cost for retrieving the qualifying tuples

I/O cost = cost for retrieving the matching data entries + cost for retrieving the qualifying tuples I/O cost of retrieving pages of matching data entries –Matching data entries: 0.1*100000=10000 entries –I/O cost = 10000*1.2 = I/Os I/O cost for retrieving the qualifying tuples = I/Os Total cost = = I/Os Total number of records: 100,000 Reduction Factor: 0.1 Cost for searching the matching data entry: 1.2 I/O Assume dense and unclustered

Factors to Consider No index unsorted data sorted data Index tree index hash-based index