Index tuning-- B+tree
overview Overview of tree-structured index Indexed sequential access method (ISAM) B+tree
Data Structures Most index data structures can be viewed as trees. In general, the root of this tree will always be in main memory, while the leaves will be located on disk. –The performance of a data structure depends on the number of nodes in the average path from the root to the leaf. –Data structure with high fan-out (maximum number of children of an internal node) are thus preferred.
Tree-Structured Indexing Tree-structured indexing techniques support both range searches and equality searches –index file may still be quite large. But we can apply the idea repeatedly! –ISAM: static structure; B+ tree: dynamic, adjusts gracefully under inserts and deletes. Data pages
Indexed sequential access method (ISAM) Data entries vs index entries –Both belong to index file –Data entries: –Index entries: Both in ISAM and B-tree, leaf pages are data entries
Example ISAM Tree Each node can hold 2 entries; no need for `next- leaf-page’ pointers. (Why?)
Comments on ISAM File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages. Index entries: ; they `direct’ search for data entries, which are in leaf pages. Search: Start at root; use key comparisons to go to leaf. Cost=log F N ; F = # entries/index pg, N = # leaf pgs Insert: Find leaf data entry belongs to, and put it there. Delete: Find and remove from leaf; if empty overflow page, de-allocate. * Static tree structure: inserts/deletes affect only primary leaf pages.
After Inserting 23*, 48*, 41*, 42*
... Then Deleting 42*, 51*, 97* * Note If primary leaf page is empty, just leave it empty so that the number of allocation primary pages does not change!
Features of ISAM The primary leaf pages are assumed to be allocated sequentially –Because the number of such pages is known when the tree is created and does not change subsequently under inserts and deletes –So next-page-pointers are not needed
Prons and cons of ISAM Cons:losing sequentiality and balance –Due to long overflow chains –Leading to poor retrieving performance –One solution Keep 20% of each page free when tree was created Prons –No need to lock non-leaf index pages since we never modify them, which is one of important advantages of ISAM over B+tree –Another is: scans over a large range is more efficient thant B+tree
B+-Tree A B+-Tree is a balanced tree whose leaves contain a sequence of key-pointer pairs. Dynamic structure that adjust gracefully for delete and insert
B + Tree: The Most Widely Used Index –Operation (insertion, deletion) keep it Height- balanced. –Searching for a record requires a traversal from root to the leaf Equality query, Insert/delete at log F N cost (F = fanout, N = # leaf pages); –Grow and shrink dynamically. Need `next-leaf-pointer’ to chain up the leaf nodes –To handle cases such as leaf node merging –Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. –Data entries at leaf are sorted.
Example B+ Tree Each node can hold 4 entries (order = 2) 23 Root
Node structure P 0 K 1 P 1 K 2 P 2 K m P m index entry Non-leaf nodes zLeaf nodes P 0 K 1 P 1 K 2 P 2 K m P m Next leaf node
Searching in B+ Tree Search begins at root, and key comparisons direct it to a leaf (as in ISAM). Search for 5, 15, all data entries >= * Based on the search for 15*, we know it is not in the tree! Root
summarize