B-Tree
M-ary Search Tree Generalization of binary tree Maximum branching factor is M and range of no. of keys in any node is [0 to (m-1)] Leaves need not be at same level Complete tree has height = # disk accesses for find
B-Tree To reduce runtime of find in M-ary trees, B-Trees evolved Each node has upto (M-1) keys and limits minimum no. of keys in each node as well ( (𝑀/2 )−1 Pick branching factor M such that each node takes one full {page / block} of memory Disk friendly Many keys stored in a node Data resides on disk and tree structure loaded in memory Internal node may contain only keys. Leaves contain keys and actual data More suitable as cost of accessing nodes amortized over multiple operation within node
B-Tree Definition If d to 2*d no. of keys in each node (d+1) is minimum branching factor Order of tree is (2d+1) No. of children, 1 more than no. of keys in that node All leaf nodes at same depth Root at least two children if it is a non-leaf node
B-Tree Definition Height of tree can be minimized by increasing order of tree i.e. maximum branching factor
Tree Heights If database contains 1,000,000 records, binary tree can search an element in 20 accesses and comparisons B-Tree of order m and to store n keys requires minimum height (best case) is ℎ= 𝑙𝑜𝑔 𝑚 (𝑛+1) -1 If d is the no. of children node can have, 𝑑= 𝑚/ 2 and root node is at height 0, the worst case height of B-tree is ℎ= 𝑙𝑜𝑔 𝑑 [(𝑛+1) /2]
Steps of Insertion in a B-tree Using the SEARCH procedure for M-way trees (described above) find the leaf node to which X should be added. Add X to this node in the appropriate place among the values already there. Being a leaf node there are no subtrees to worry about. If there are M-1 or fewer values in the node after adding X, then we are finished. If there are M nodes after adding X, we say the node has overflowed. To repair this, we split the node into three parts: Left:the first (M-1)/2 values Middle:the middle value (position 1+((M-1)/2) Right:the last (M-1)/2 values Left and right children of Middle, which we add in the appropriate place in this node's parent.
After Step 1 & 2, No Overflow On Inserting 17
After Step 3, Overflow On Inserting 6
Steps of Deletion in a B-tree Search and replace X to be deleted with the largest value say Y in its left subtree and then proceed to delete that value i.e Y from the node that originally contained it. Delete Y from the node directly if no UNDERFLOW of that node later If there are less than [(M-1)/2 OR 1 ]values in non-leaf node node OR root node respectively then it is UNDERFLOW. Non-leaf node underflows: Combine it with most populous node Root may underflows
During deletion- After Step 2 NO Underflow On deleting 50- No Underflow of node having 45
During deletion- After Step 2 Underflow of leaf node On deleting 6- Underflow of node having 7
During deletion- After Step 3 On deleting 3- Underflow of node having 2
During deletion- If Root underflows On deleting 7- Underflow of root node, Re-assign root node
Sequence of Operations Insert 49 Delete 66 Insert 72 Insert 60