Lecture 20: Indexing Structures

Lecture 20: Indexing Structures
CSE 480: Database Systems Lecture 20: Indexing Structures

Index Mechanism to efficiently locate row(s) of a table without having to scan the entire table Analogous to a book index Index entry

Indexing Field An index is built based on an indexing field
Records having a particular value for their indexing field can be quickly located The index can be built on one or more fields Example: index on (CrsCode, Semester) An index contains entries <field value, pointer to record>, ordered by field value The index file occupies considerably less disk blocks than the data file because its entries are much smaller than the records themselves A binary search on the index yields a pointer to the file record

Example of an Index

Types of Indexes Dense vs sparse Primary vs clustering vs secondary
Single-level vs multi-level Static vs dynamic

Types of Indexes Dense index Sparse (or nondense) index
has an index entry for each record in the data file. Sparse (or nondense) index has index entries for only a subset of the records in the data file

Types of Indexes Primary vs Clustering vs Secondary Index
Depends on the indexing and ordering fields Indexing field is the field upon which an index is built Ordering field is the field upon which the data file is sorted Ordering key field is an ordering field that also corresponds to the key of the table Indexing field is ProfName Ordering field is ProfID ProfID is also the ordering key field (because it is a key)

Types of Indexes Primary Index
Specified on the ordering key field of an ordered file of records Nondense (sparse) One index entry for each block in the data file The index entry has the key field value for the first record in the block, which is called the block anchor

Primary Index Example

Types of Indexes Clustering Index
Specified on the ordering field that is not a key field Nondense (sparse) One index entry for each distinct value of the field; The index entry points to the first data block that contains records with that field value

Clustering Index Example

Types of Indexes Secondary Index
Index defined on some non-ordering field of the data file Dense Includes one entry for each record in the data file The index entry points to either a block or a record

Secondary Index Example

Example EMPLOYEE(NAME, SSN, ADDRESS, JOB, SALARY, ... )
Record size R=150 bytes Block size B=512 bytes No of records r = records Blocking factor Bfr = B  R = 512  150 = 3 records/block Number of blocks needed to store the records b = r/Bfr = 30000/3 = blocks

Example Suppose we need to perform the following query:
SELECT * FROM EMPLOYEE WHERE SSN = ‘ ’; Heap file (unsorted): Average cost of linear search (assuming SSN exists): (b/2) = 10000/2 = 5000 block accesses Sequential file (sorted on SSN) Average cost of binary search: log2 b = log  = 14 block accesses

Example Suppose there is a (sparse) primary index on the SSN field
Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B  Ri  = 512  14 = 36 entries/block Number of index blocks bi = b/Bfri = 10000/36 = 278 blocks Binary search on index takes log2 bi = log2 278 = 9 block accesses Total cost = 10 block accesses (1 more to access the record itself)

Example Suppose there is a (dense) secondary index on the SSN field
Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B  Ri  = 512  14 = 36 entries/block Number of index blocks bi = r/Bfri = 30000/36 = 834 blocks Binary search needs log2 bi = log2 834 = 10 block accesses Total cost = 11 block accesses (1 more to access the record itself)

Multi-Level Indexes Because a single-level index is an ordered file, we can create an index to the index itself In this case, the original index file is called the first-level index and the index to the index is called the second-level index We can repeat the process, creating a third, fourth, ..., top level until all entries of the top level fit in one disk block A multi-level index can be created for any type of first- level index (primary, secondary, clustering) as long as the first-level index has more than one disk block

Example: Two-level Primary Index

Example Suppose there is a 2-level index on the SSN field (assume index at the first level is sparse, i.e., a primary index) Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B / Ri  = 512  14 = 36 entries/block Number of index blocks at 1st level b1 = b/Bfri = 10000/36 = 278 blocks Number of index blocks at 2nd level b2 = b1/Bfri = 278/36 = 8 blocks Binary search needs log2 bi = log2 8 = 3 block accesses Total cost = block accesses = 5 block accesses (2nd level) + (1st level) + (data block)

Example Suppose there is a 3-level index on the SSN field (assume index at first level is sparse) Field size for SSN, V = 10 bytes, Record pointer size PR = 4 bytes Index entry size Ri = (V + PR) = (10 + 4) = 14 bytes Index blocking factor Bfri = B  Ri  = 512  14 = 36 entries/block Number of index blocks at 1st level b1 = b/Bfri = 10000/36 = 278 blocks Number of index blocks at 2nd level b2 =  b1/Bfri = 278/36 = 8 blocks Number of index blocks at 3rd level b3 =  b2/Bfri = 8/32 = 1 block Total cost = block accesses = 4 block accesses (3rd level) + (2nd level) + (1st level) + (data block)

Dynamic Multilevel Indexes
Insertion and deletion from multilevel indexes can be quite a severe problem One possibility is to use overflow blocks and then do file reorganization periodically A better strategy is to use dynamic multi-level indexes Examples: B-tree and B+-tree (they are called search trees) In B-Tree and B+-Tree, each node corresponds to a disk block Each node is always kept at least half-full

A node in the search tree (B-tree or B+-tree)
A Node in a Search Tree A node in the search tree (B-tree or B+-tree) Each node is stored in a disk block Within each node: K1 < K2 < … < Kq-1 For all values X in subtree pointed by Pi: Ki-1 < X < Ki (B-tree) or Ki-1  X < Ki (B+-tree)

Search tree A search tree of order p
each node contains at most p – 1 search values

Nodes of a B+-tree Difference between an internal node and a leaf node

Example of a B+-tree Tree node pointer Data/record pointer
Sibling pointer

Example: Search for 9

Example of Insertion in a B+-tree
Insertion sequence: Largest value among those stored in the left subtree ?

Insertion sequence: ?

Insertion sequence:

Example of Deletion in a B+-tree

Deletion sequence:

Underflow in a B+-tree Underflow
when the number of entries in a node is below the minimum required Redistribute entries with the left sibling or This changes the search field values at higher levels of the tree Redistribute entries with the right sibling or Merge the 3 nodes into 2 nodes and redistribute the entries

Deletion sequence:

Final tree (after deletion)

Lecture 20: Indexing Structures

Similar presentations

Presentation on theme: "Lecture 20: Indexing Structures"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 20: Indexing Structures

Similar presentations

Presentation on theme: "Lecture 20: Indexing Structures"— Presentation transcript:

Similar presentations

About project

Feedback