Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.

Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee

Introduction to Indexing Goal: to make it easier to look up data Goal: to make it easier to look up data Do by saving the data in a sorted, compressed version Do by saving the data in a sorted, compressed version Searching and insertion will be easier Searching and insertion will be easier

Factors of Indices 1. Access type 1. Access type 2. Access Time 2. Access Time 3. Insertion time 3. Insertion time 4. Deletion time 4. Deletion time 5. Space overhead 5. Space overhead

Clustering Index an index whose search key also defines the sequential order of the file an index whose search key also defines the sequential order of the file

Index-sequential files files ordered sequentially on a search key files ordered sequentially on a search key

Index Record (aka index entry)- holds the search-key value and pointers to the records with the value (aka index entry)- holds the search-key value and pointers to the records with the value

Pointer identifies disk block or offset to disk block identifies disk block or offset to disk block

Dense Index a record appears for every search key value. Records are stored in the same search-key a record appears for every search key value. Records are stored in the same search-key faster access time, but higher space overhead faster access time, but higher space overhead

Sparse Index an index record appears on some search-key values. To find a record, the system finds the largest search key value that is less than or equal to the given search-key value then it moves up to finds it if it is not an index record appears on some search-key values. To find a record, the system finds the largest search key value that is less than or equal to the given search-key value then it moves up to finds it if it is not lower space overhead, but higher access time lower space overhead, but higher access time

Larger Databases Make a sparse index on a clustering index, using 2 levels of indices Make a sparse index on a clustering index, using 2 levels of indices Multilevel indices search faster than a binary search Multilevel indices search faster than a binary search

Index Update (Insertion) A. Look up search key A. Look up search key B. If the index record stores all pointers with the same index value, then add a new pointer to the index record B. If the index record stores all pointers with the same index value, then add a new pointer to the index record C. Otherwise, the index stores the first pointer to the index value C. Otherwise, the index stores the first pointer to the index value

Index update- (Insertion to Sparse Indices) For sparse indices, if the system makes a new block, then it must add the first search-key value to the new index For sparse indices, if the system makes a new block, then it must add the first search-key value to the new index if the value has the least search key value in the block, the index record is updated pointing to the block if the value has the least search key value in the block, the index record is updated pointing to the block

Deletion A. Look up record A. Look up record B. If it was a dense index and the record deleted was the only one with the search key, then delete the key form the index B. If it was a dense index and the record deleted was the only one with the search key, then delete the key form the index C. If the record stores pointers to all records, then the pointer to the deleted record is removed C. If the record stores pointers to all records, then the pointer to the deleted record is removed

Deletion (cont’d) D. If the record stores the pointer to the first record and the first record is deleted, then the pointer moves to the following record D. If the record stores the pointer to the first record and the first record is deleted, then the pointer moves to the following record E. If the index is sparse and the index does not contain the search-key value, then the index remains the same. E. If the index is sparse and the index does not contain the search-key value, then the index remains the same.

Deletion (cont’d) F. If deleted record had the only search key, then the system replaces the corresponding index search record for the next search key value. If the next search key value is an index entry, then the entry is deleted instead of being replaced F. If deleted record had the only search key, then the system replaces the corresponding index search record for the next search key value. If the next search key value is an index entry, then the entry is deleted instead of being replaced

Deletion (cont’d) G. If the index record for the search-key point to the record being deleted, the pointer goes to the next record with the same search key value. G. If the index record for the search-key point to the record being deleted, the pointer goes to the next record with the same search key value.

Secondary Indices A. Secondary Indices are dense and points to all records A. Secondary Indices are dense and points to all records B. Stored sequentially and may not have non-candidate keys B. Stored sequentially and may not have non-candidate keys C. If a multi-indexed database is updated, then every index must be updated also C. If a multi-indexed database is updated, then every index must be updated also

B+-Trees An alternative to Binary Search Trees

Conditions of a B+-Tree A. Search-key values are K1, K2...Kn-1 A. Search-key values are K1, K2...Kn-1 B. Pointers P1, P2...Pn B. Pointers P1, P2...Pn C. Search key values are kept in sorted order C. Search key values are kept in sorted order

Conditions (cont’d) D. Pointer P points to a file record with a search-key value of K or a bucket of more pointers D. Pointer P points to a file record with a search-key value of K or a bucket of more pointers E. Each node has more than 2 pointers (binary tree has 2) E. Each node has more than 2 pointers (binary tree has 2) F. Stores redundant search-key values F. Stores redundant search-key values

Buckets Buckets are used only if the search key value does not form a candidate key and if the file is not stored in search key order Buckets are used only if the search key value does not form a candidate key and if the file is not stored in search key order

Leaves A. Each leaf holds up to n-1 values A. Each leaf holds up to n-1 values B. Pointers P chain together leaf nodes in search key order B. Pointers P chain together leaf nodes in search key order C. Non-leaf nodes are sparse multilevel indices C. Non-leaf nodes are sparse multilevel indices

Leaves (cont’d) D. Non-leaf nodes may hold up to n/2 ceil to n pointers D. Non-leaf nodes may hold up to n/2 ceil to n pointers E. Number of pointers in a node is a fan out of a node E. Number of pointers in a node is a fan out of a node F. The root must hold at 2 to n/2 pointers F. The root must hold at 2 to n/2 pointers

Queries for finding V A. To find search-key value V, start at root A. To find search-key value V, start at root B. It looks for the smallest search-key greater than V B. It looks for the smallest search-key greater than V C. If it finds a K, then the pointer P goes to another node C. If it finds a K, then the pointer P goes to another node

Queries (cont’d) D. The process repeats going down the tree by finding a search key value K that equals V. D. The process repeats going down the tree by finding a search key value K that equals V. E. If there is no K that equals V at the leaf, then no such record exists E. If there is no K that equals V at the leaf, then no such record exists

B+-tree Insertion A. First look up A. First look up B. If the search key value exists in the leaf node, then add a file to the record and a bucket pointer if necessary B. If the search key value exists in the leaf node, then add a file to the record and a bucket pointer if necessary C. If a search-key value does not exist, then insert a new record into the file and make a new bucket and pointer if necessary C. If a search-key value does not exist, then insert a new record into the file and make a new bucket and pointer if necessary

Insertion (cont’d) D. If there is no search key value and there is no room in the node, then split the node. D. If there is no search key value and there is no room in the node, then split the node. E. Adjust the two leaves to a new greatest and least search-key value E. Adjust the two leaves to a new greatest and least search-key value F. After a split, insert a new node to the parent and repeat the process of splitting when it gets too full F. After a split, insert a new node to the parent and repeat the process of splitting when it gets too full

B+-Tree Deletion A. Look up the record and remove it from file A. Look up the record and remove it from file B. If no bucket was associated with its search-key value, remove the search-key value B. If no bucket was associated with its search-key value, remove the search-key value C. If the bucket is empty, remove the search-key value C. If the bucket is empty, remove the search-key value

Deletion (cont’d) D. If there are too few pointers in a node, transfer teh pointers to a sibling node, then delete it D. If there are too few pointers in a node, transfer teh pointers to a sibling node, then delete it E. If transferring pointers gives a node to many pointers, redistribute the pointers. the parent of the two nodes, need to change pointers E. If transferring pointers gives a node to many pointers, redistribute the pointers. the parent of the two nodes, need to change pointers

B+-Tree File Organization A. Leaf nodes store records instead of pointers to records A. Leaf nodes store records instead of pointers to records B. Insertion and deletion happens the same way B. Insertion and deletion happens the same way C. When inserting, the system adds the record to the block if there is enough space, otherwise it splits the block C. When inserting, the system adds the record to the block if there is enough space, otherwise it splits the block D. Any Split will propagate upward if necessary D. Any Split will propagate upward if necessary

Bibliography Sliberchatz, Abraham, Henry F. Korth, and S. Sudarshan Database System Concepts 5th Ed. Boston: McGraw Hill, 2002. Ch 12 Sliberchatz, Abraham, Henry F. Korth, and S. Sudarshan Database System Concepts 5th Ed. Boston: McGraw Hill, 2002. Ch 12

Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.

Similar presentations

Presentation on theme: "Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee.

Similar presentations

Presentation on theme: "Indexing and B+-Trees By Kenneth Cheung CS 157B TR 07:30-08:45 Professor Lee."— Presentation transcript:

Similar presentations

About project

Feedback