B-trees Eduardo Laber David Sotelo
What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at minimizing disk I/O operations Main data structure used by DBMS to store and retrieve information
What are B-trees? Nodes may have many children (from a few to thousands) Branching factor can be quite large Every B-tree of n keys has height O(log n) In practice, its height is smaller than the height of an AVL-Tree
B-trees and Branching Factor
Definition of B-trees B-tree is a rooted tree containing the following five properties: 1.Every node x has the following attributes: a) x.n, the number of keys stored in node x b)The x.n keys: x.key 1 ≤ x.key 2 ≤... ≤ x.key x.n c)The boolen x.leaf indicating if x is a leaf or an internal node
Definition of B-trees 2.If x is an internal node it contains x.n + 1 pointers x.p 1, x.p 2,..., x.p (x.n + 1) to its children 3.The keys x.key i separate ranges of trees stored in each subtree (x.p i, x.p i+1 ) 4.All leaves have the same depth == tree’s height.
Definition of B-trees 5.Bounds on the number of keys of a node: Let B be a positive integer representing the order of the B-tree. Every node (except the root) must have at least B keys. Every node (except the root) must have at most 2B keys. Root is free to contain between 1 and 2B nodes (why?)
Example of a B-tree
Exercise 1 Enumerate all valid B-trees of order 2 that represent the set {1, 2,..., 8}
Exercise 1 Solution:
The height of a B-tree Theorem : Let h be the height of a B-tree of n keys and order B > 1. Then: h ≤ log B (n+1)/2 Proof: Root contains at least one key. All other nodes contain at least B keys At least one key at depth 0 At least 2B keys at depth 1 At least 2B 2 + B keys at depth 2 At least 2B i + B i-1 + B i B keys at depth i
Proof (continued) ■
Searching a B-tree Similar to searching a binary search tree. Multiway branching decision according to the number of the node’s chidren. Recursive procedure with a time complexity of O(B log B n) for a tree of n keys and order B.
Searching a B-tree B-TREE-SEARCH (x, k) 1 i = 1 2 while i ≤ x.n and k > x.key i do i = i if i ≤ x.n and k == x.key i then return (x, i) 4 if x.leaf then return NIL 5 else DISK-READ(x.p i ) return B-TREE-SEARCH (x.p i, k)
Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key F
Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key F
Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key F
Searching a B-tree J J A B N O H I Q R D E F L M T U K P S C G Search for the key N
Searching a B-tree Lemma: The time complexity of procedure B-TREE-SEARCH is O(B log B n) Proof: Number of recursive calls is equal to tree’s height. The height of a B-tree is O(log B n) Cost between B and 2B iterations per call. Total of O(B log B n) steps. ■
Exercise 2 Suppose that B-TREE-SEARCH is implemented to use binary search rather than linear search within each node. Show that this changes makes the time complexity O(lg n), independently of how B might be chosen as a function of n.
Exercise 2 Solution: By using binary search the number of steps of the algorithm becomes O(lg B log B n). Observe that log B n = lg n / lg B. Therefore O(lg B log B n) = O(lg n).
Linear or Binary B-tree search ? Lemma: If 1 < B < n then lg n ≤ B log B n Proof:
Inserting a key into a B-tree The new key is always inserted into an existing leaf node (why?) Firstly we search for the leaf position at which to insert the new key. If such a node is full we split it. A split operation splits a full node around its median key into two nodes having B keys each. Median key moves up into splitted node’s parent (insertion recursive call).
Split operation Inserting key F into a full node (B = 2) A C E G K M O Q J J
Split operation Node found but already full A C E F G K M O Q J J
Split operation Median key identified A C E F G K M O Q J J
Split operation Splitting the node A C K M O Q E J F G
Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK M O Q E J T X F G U W Y Z
Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK M N O Q E J T X F G U W Y Z
Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK M E J N T X F GU WY ZO Q SPLIT
Inserting a key into a B-tree Insertion can be propagated upward (B = 2) A CK MF G E J N O QY ZU W T X SPLIT
Inserting a key into a B-tree B-TREE-INSERT (x, k, y) 1 i = 1 2 while i ≤ x.n and k < x.key i do i = i x.n = x.n x.key i = k 5 x.p i+1 = y 6 for j = x.n downto i+1 do 7 x.key j = x.key j-1 8 x.p j = x.p j-1 9 end-for 10 DISK-WRITE(x)
Inserting a key into a B-tree B-TREE-INSERT (x, k) 11 if x.n > 2*B then 12 [m, z] = SPLIT (x) 13 if x.parent != NIL then 14 DISK-READ (x.parent) 15 end-if 16 else 17 x.parent = ALLOCATE-NODE() 18 DISK-WRITE (x) 19 root = x.parent 20 end-else 21 B-TREE-INSERT (x.parent, m, z) 22 end-if
Inserting a key into a B-tree SPLIT (x) 1 z = ALLOCATE-NODE() 2 m = FIND-MEDIAN (x) 3 COPY-GREATER-ELEMENTS(x, m, z) 4 DISK-WRITE (z) 5 COPY-SMALLER-ELEMENTS(x, m, x) 6 DISK-WRITE (x) 7return [m, z]
Inserting a key into a B-tree Function B-TREE-INSERT has three arguments: – The node x at which an element of key k should be inserted – The key k to be inserted – A pointer y to the left child of k to be used as one of the pointers of x during insertion process. There is a global variable named root which is a pointer to the root of the B-Tree. Observe that the field x.parent was not defined as an original B-tree attribute, but is considered just to simplify the process. The fields x.leaf should also be updated accordingly.
Inserting a key into a B-tree Lemma: The time complexity of B-TREE-INSERT is O(B log B n) Proof: Recall that B-TREE-SEARCH function is called first and costs O(log n) by using binary search. Then, B-TREE-INSERT starts by visiting a node and proceeds upward. At most one node is visited per level/depth and only visited nodes can be splitted. A most one node is created during the insertion process. Cost for splitting is proportional to 2B. Number of visited nodes is equal to tree’s height and the height of a B-tree is O(log B n). Cost between B and 2B iterations per visited node. Total of O(B log B n) steps. ■
Some questions on insertion Which split operation increases the tree’s height? The split of the root of the tree. How many DISK-READ operations are executed by the insertion algorithm? Every node was read at least twice. Does binary search make sense here? Not exactly. We already pay O(B) to split a node (for finding the median).
Drawbacks of our insertion method Once that the key’s insertion node is found it may be necessary to read its parent node again (due to splitting). DISK-READ/WRITE operations are expensive and would be executed al least twice for each node in the key’s path. It would be necessary to store a nodes’s parent or to use the recursion stack to keep its reference. (Mond and Raz, 1985) provide a solution that spends one DISK-READ/WRITE per visited node (See at CLRS)
Exercise 3 Show the results of inserting the keys E, H, B, A, F, G, C, J, D, I in order into an empty B-tree of order 1.
Exercise 3 Solution: (final configuration) G I FC DHJA B E
Exercise 4 Does a B-tree of order 1 is a good choice for a balanced search tree? What about the expression h ≤ log B (n+1)/2 when B = 1?
Deleting a key from a B-tree Analogous to insertion but a little more complicated. A key can be deleted from any node (not just a leaf) and can affect its parent and its children (insertion operation just affect parents). One must ensure that a node does not get to small during deletion (less than B keys). As a result deleting a node is the most complex operation on B-trees. It will be considered in 4 particular cases.
Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements. Procedure: Just remove the key from the node.
Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements (B = 2) A C DK MF G E J N O QY ZU W T X
Deleting a key from a B-tree Case 1: The key is in a leaf node with more than B elements (B = 2) A DK MF G E J N O QY ZU W T X
Deleting a key from a B-tree Case 2: The join procedure The key k 1 to be deleted is in a leaf x with exactly B elements. Let y be a node that is an “adjacent brother” of x. Suppose that y has exactly B elements. Procedure: Remove the key k 1. Let k 2 be the key that separates nodes x and y in their parent. Join the the nodes x and y and move the key k 2 from the parent to the new joined node. If the parent of x becomes with B-1 elements and also has an “adjacent brother” with B elements, apply the join procedure recursively for the parent of x (seen as x) and its adjacent brother (seen as y).
Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F O QY ZU W K T X...
Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F O QY ZU W K T X Node x Node y Parent...
Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F OY ZU W K T X Node x Node y Parent...
Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F OY ZU W K T X Node x Node y Parent...
Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F Y ZO T U W K X... Parent Join
Deleting a key from a B-tree Case 2: Delete key Q (B = 2) H I F Y ZO T U W K X...
Deleting a key from a B-tree Case 3: join and split The key k 1 to be deleted is in a leaf x with exactly B elements. Let y be a node that is an “adjacent brother” of x. Suppose that y has more than B elements. Procedure: Remove the key k 1. Let k 2 be the key that separates nodes x and y in their parent. Join the the nodes x and y and move the key k 2 from the parent to the new joined node z. Find the median key m of z Determine the new nodes x and y by splitting z around m. Insert m into the parent of x and y.
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MF G E J N O QY ZU W T X
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MF G E J N O QY ZU W T X
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MG E J N O QY ZU W T X
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MG E J N O QY ZU W T X Parent Node x Node y
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C DK MG E J N O QY ZU W T X Parent Node x Node y
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C D E GK M J N O QY ZU W T X Join
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A C D E GK M J N O QY ZU W T X Median key
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A CK M D J N O QY ZU W T X Split E G
Deleting a key from a B-tree Case 3: Delete key F (B = 2) A CK M D J N O QY ZU W T X E G
Deleting a key from a B-tree Case 4: internal node The key k 1 to be deleted is in a node x that is not a leaf or a root. Procedure: Let k 2 be the smallest key that is greater than k 1. Let y be the node of k 2, which will be a leaf. Insert key k 2 into x. Remove the key k 1 from x. Solve now the problem of removing k 2 from a leaf y, previously considered.
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZU W T X E G
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZU W T X E G Node x
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZU W T X E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QY ZW U X E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N O QW X Y Z U E G Node x Node y
Deleting a key from a B-tree Case 4: Delete key T (B = 2) A CK M D J N U O Q W X Y Z E G