Lecture 20: Indexing Structures

Slides:



Advertisements
Similar presentations
Disk Storage, Basic File Structures, and Hashing
Advertisements

Chapter 7 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Hashing and Indexing John Ortiz.
Calculate the record size R in bytes.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 14 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 Lecture 8: Data structures for databases II Jose M. Peña
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Index on EmpID PRIMARY INDEX Key Field (No Repeat Values) Ordering field (records are ordered by the field value) SECONDARY – KEY INDEX Key Field (No Repeat.
Database Systems Chapters ITM 354. The Database Design and Implementation Process Phase 1: Requirements Collection and Analysis Phase 2: Conceptual.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
Primary Indexes Dense Indexes
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
DISK STORAGE INDEX STRUCTURES FOR FILES Lecture 12.
Indexing dww-database System.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 File Organizations and Indexing Chapter 5, 6 of Elmasri “ How index-learning turns no student.
Chapter 14-1 Chapter Outline Types of Single-level Ordered Indexes –Primary Indexes –Clustering Indexes –Secondary Indexes Multilevel Indexes Dynamic Multilevel.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
1 Index Structures. 2 Chapter : Objectives Types of Single-level Ordered Indexes Primary Indexes Clustering Indexes Secondary Indexes Multilevel Indexes.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Indexing Structures for Files by Pinar Senkul resources: mostly froom Elmasri, Navathe and.
Chapter 9 Disk Storage and Indexing Structures for Files Copyright © 2004 Pearson Education, Inc.
Indexing Structures for Files
1 Chapter 2 Indexing Structures for Files Adapted from the slides of “Fundamentals of Database Systems” (Elmasri et al., 2003)
Nimesh Shah (nimesh.s) , Amit Bhawnani (amit.b)
1 Overview of Database Design Process. Data Storage, Indexing Structures for Files 2.
Chapter- 14- Index structures for files
Indexing Methods. Storage Requirements of Databases Need data to be stored “permanently” or persistently for long periods of time Usually too big to fit.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 13 Disk Storage, Basic File Structures, and Hashing.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Chapter 6 Index Structures for Files 1 Indexes as Access Paths 2 Types of Single-level Indexes 2.1Primary Indexes 2.2Clustering Indexes 2.3Secondary Indexes.
Chapter 14 Indexing Structures for Files Copyright © 2004 Ramez Elmasri and Shamkant Navathe.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
Indexing Structures Database System Implementation CSE 507 Some slides adapted from R. Elmasri and S. Navathe, Fundamentals of Database Systems, Sixth.
CS4432: Database Systems II
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
10/3/2017 Chapter 6 Index Structures.
Indexing Structures for Files
Indexing Structures for Files
Chapter Outline Indexes as additional auxiliary access structure
Indexing Structures for Files
Chapter # 14 Indexing Structures for Files
Indexing Structures for Files and Physical Database Design
Indexing and hashing.
CS 728 Advanced Database Systems Chapter 18
Database System Implementation CSE 507
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
11/14/2018.
Chapters 17 & 18 6e, 13 & 14 5e: Design/Storage/Index
File organization and Indexing
Chapter 11: Indexing and Hashing
Disk storage Index structures for files
Advance Database System
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Example 1: Given the following data file:
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Chapter 11 Indexing And Hashing (1)
INDEXING.
Indexing 4/11/2019.
Lecture 20: Indexes Monday, February 27, 2006.
CS4433 Database Systems Indexing.
Indexing Structures for Files
Chapter 11: Indexing and Hashing
Advance Database System
8/31/2019.
Lec 6 Indexing Structures for Files
Presentation transcript:

Lecture 20: Indexing Structures CSE 480: Database Systems Lecture 20: Indexing Structures

Index Mechanism to efficiently locate row(s) of a table without having to scan the entire table Analogous to a book index Index entry

Indexing Field An index is built based on an indexing field Records having a particular value for their indexing field can be quickly located The index can be built on one or more fields Example: index on (CrsCode, Semester) An index contains entries <field value, pointer to record>, ordered by field value The index file occupies considerably less disk blocks than the data file because its entries are much smaller than the records themselves A binary search on the index yields a pointer to the file record

Example of an Index

Types of Indexes Dense vs sparse Primary vs clustering vs secondary Single-level vs multi-level Static vs dynamic

Types of Indexes Dense index Sparse (or nondense) index has an index entry for each record in the data file. Sparse (or nondense) index has index entries for only a subset of the records in the data file

Types of Indexes Primary vs Clustering vs Secondary Index Depends on the indexing and ordering fields Indexing field is the field upon which an index is built Ordering field is the field upon which the data file is sorted Ordering key field is an ordering field that also corresponds to the key of the table Indexing field is ProfName Ordering field is ProfID ProfID is also the ordering key field (because it is a key)

Types of Indexes Primary Index Specified on the ordering key field of an ordered file of records Nondense (sparse) One index entry for each block in the data file The index entry has the key field value for the first record in the block, which is called the block anchor

Primary Index Example

Types of Indexes Clustering Index Specified on the ordering field that is not a key field Nondense (sparse) One index entry for each distinct value of the field; The index entry points to the first data block that contains records with that field value

Clustering Index Example

Types of Indexes Secondary Index Index defined on some non-ordering field of the data file Dense Includes one entry for each record in the data file The index entry points to either a block or a record

Secondary Index Example

Example EMPLOYEE(NAME, SSN, ADDRESS, JOB, SALARY, ... ) Record size R=150 bytes Block size B=512 bytes No of records r = 30000 records Blocking factor Bfr = B  R = 512  150 = 3 records/block Number of blocks needed to store the records b = r/Bfr = 30000/3 = 10000 blocks

Example Suppose we need to perform the following query: SELECT * FROM EMPLOYEE WHERE SSN = ‘1234567890’; Heap file (unsorted): Average cost of linear search (assuming SSN exists): (b/2) = 10000/2 = 5000 block accesses Sequential file (sorted on SSN) Average cost of binary search: log2 b = log2 10000 = 14 block accesses

Example Suppose there is a (sparse) primary index on the SSN field Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B  Ri  = 512  14 = 36 entries/block Number of index blocks bi = b/Bfri = 10000/36 = 278 blocks Binary search on index takes log2 bi = log2 278 = 9 block accesses Total cost = 10 block accesses (1 more to access the record itself)

Example Suppose there is a (dense) secondary index on the SSN field Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B  Ri  = 512  14 = 36 entries/block Number of index blocks bi = r/Bfri = 30000/36 = 834 blocks Binary search needs log2 bi = log2 834 = 10 block accesses Total cost = 11 block accesses (1 more to access the record itself)

Multi-Level Indexes Because a single-level index is an ordered file, we can create an index to the index itself In this case, the original index file is called the first-level index and the index to the index is called the second-level index We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk block A multi-level index can be created for any type of first- level index (primary, secondary, clustering) as long as the first-level index has more than one disk block

Example: Two-level Primary Index

Example Suppose there is a 2-level index on the SSN field (assume index at the first level is sparse, i.e., a primary index) Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B / Ri  = 512  14 = 36 entries/block Number of index blocks at 1st level b1 = b/Bfri = 10000/36 = 278 blocks Number of index blocks at 2nd level b2 = b1/Bfri = 278/36 = 8 blocks Binary search needs log2 bi = log2 8 = 3 block accesses Total cost = 3 + 1 + 1 block accesses = 5 block accesses (2nd level) + (1st level) + (data block)

Example Suppose there is a 3-level index on the SSN field (assume index at first level is sparse) Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B  Ri  = 512  14 = 36 entries/block Number of index blocks at 1st level b1 = b/Bfri = 10000/36 = 278 blocks Number of index blocks at 2nd level b2 =  b1/Bfri = 278/36 = 8 blocks Number of index blocks at 3rd level b3 =  b2/Bfri = 8/32 = 1 block Total cost = 1 + 1 + 1 + 1 block accesses = 4 block accesses (3rd level) + (2nd level) + (1st level) + (data block)

Dynamic Multilevel Indexes Insertion and deletion from multilevel indexes can be quite a severe problem One possibility is to use overflow blocks and then do file reorganization periodically A better strategy is to use dynamic multi-level indexes Examples: B-tree and B+-tree (they are called search trees) In B-Tree and B+-Tree, each node corresponds to a disk block Each node is always kept at least half-full

A node in the search tree (B-tree or B+-tree) A Node in a Search Tree A node in the search tree (B-tree or B+-tree) Each node is stored in a disk block Within each node: K1 < K2 < … < Kq-1 For all values X in subtree pointed by Pi: Ki-1 < X < Ki (B-tree) or Ki-1  X < Ki (B+-tree)

Search tree A search tree of order p each node contains at most p – 1 search values

Nodes of a B+-tree Difference between an internal node and a leaf node

Example of a B+-tree Tree node pointer Data/record pointer Sibling pointer

Example: Search for 9

Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6 Largest value among those stored in the left subtree ?

Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6 ?

Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6 ?

Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6

Example of Insertion in a B+-tree Insertion sequence: 8 5 1 7 3 12 9 6

Example of Deletion in a B+-tree

Example of Deletion in a B+-tree Deletion sequence: 5 12 9

Underflow in a B+-tree Underflow when the number of entries in a node is below the minimum required Redistribute entries with the left sibling or This changes the search field values at higher levels of the tree Redistribute entries with the right sibling or Merge the 3 nodes into 2 nodes and redistribute the entries

Example of Deletion in a B+-tree Deletion sequence: 5 12 9

Example of Deletion in a B+-tree Final tree (after deletion)