1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries, we create indices. –An index is any data structure that takes as input a property (e.g. a value for a specific field), called the search key, and quickly finds all records with that property. –A database may have several indices (based on different keys) Examples: –An index to search for students by id and another index to search by name.
2 Database indices The actual records (and the index) typically do not fit in memory. Secondary (or tertiary) storage must be used. Disk operations are very time consuming, so we would like to limit them. –Since CPU operations are much faster (one disk access equals several million CPU instructions), we would be willing to do any sort of preprocessing that may reduce disk I/O.
3 Database indices Idea: –Expand the BST idea to create a multi-way search tree: Instead of a long, thin tree with at most 2 children per node, create a short, wide tree with many children per node. Each node will then need to have children - 1 keys. Try to maintain the tree balanced and as full as possible. –Finding the correct branch to follow requires several comparisons (CPU operations) and leads to few disk accesses.
4 B-trees The most common data structure used for database indices is the B-tree. A B-tree of order m is an m-way tree where –All leaves are on the same level –All internal nodes except the root have k-1 keys and k children where m/2 k m –The root is either a leaf or has between 2 and m children.
5 B-trees 1951 keys smaller than keys greater than keys greater than 19 and less than 51 B-tree of order 3
6 B-trees: Insert Find the appropriate leaf If has room insert key else // overflow! split Generalization of the search method for a BST See next slide for details on the split operation.
7 B-trees: Insert How to handle an overflow at a leaf. Pick the middle key of the leaf. Split the leaf in two, each part containing half of the elements. If the leaf has a parent p insert to the parent make p the parent of the two pieces check whether the parent overflows if yes, repeat the splitting process for the parent. else // the leaf was the root create a new root and insert to it make it the parent of the two pieces
8 B-trees: Delete Find the element to be deleted. If it is not a leaf, replace it with its immediate successor and delete the successor instead. else delete the element. Check for underflow (too few children) guaranteed to be in a leaf See next slide for more details.
9 B-trees: Delete How to handle an underflow at a node n. Check whether any of the siblings can afford to lose children. If yes, transfer a child move a child from the sibling to n move a key from n's parent to n move a key from the sibling to the parent. else merge n with a sibling move a key from the parent to the merged node check the parent for underflow. The sibling loses a child so it must lose a key and n gains a child so it must gain a key. The parent loses a child so it must lose a key.
10 B-trees B-trees grow when a new root is created as a result of an insert operation. B-trees shrink when the root has only two children and they merge as a result of a delete operation. This will cause the root's only key to move down and the root to become empty.
11 B*-trees Variant of the B-tree. Each node must be at least 2/3 full Overflow is handled mainly by redistributing keys among siblings. If all siblings are full, then 2 nodes are split into 3.