1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1 keys The root has at least 2 children All leaves are at the same distance from the root
2 2-4 tree and General k k=2 Each node has 2,3,or 4 children WHAT IS BETTER: k =2 or k >> 2?? Depth? Large k better But what about degree? Small k better Overall:
3 A 4-node key < ≤ key < ≤ key < ≤ key
4 B vs. B+ In a B tree items are in every node In B+ tree items are at the leaves; internal nodes have keys to direct the search The leaves are (possibly) also maintained in a linked list to allow fast sequential access
5 A 2-4+ tree
6 The height The root has at least 2 children At level 2 we have at least 2k nodes At level 3 we have at least 2k 2 nodes At level h we have at least 2k h-1 nodes
7 Red-Black Trees n = 2 30 = 10 9 (approx). 30 <= height <= 60. When the red-black tree resides on a disk, up to 60 disk access are made for a search. Disk access takes about 5 millisecond (10 -4 sec) Memory access takes about 100 nano (10 -7 sec)
8 B-trees B-trees are used when the tree resides in secondary storage. k is picked according to the size of a disk block Since the height is smaller we do less I/O, we get more in each single access
9 B-Trees Large degree B-trees are used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal- memory dictionaries to overcome cache-miss penalties.
10 Node’s structure a i is a pointer to a subtree. p i is a key j a 0 p 1 a 1 p 2 a 2 … p j a j Can search linearly each node. total time ≈ kh ≈ klog k n time Can maintain a little red-black tree or an array in each node so search takes ≈ log 2 k h ≈ log 2 n k ≤ j ≤ 2k
11 Insert Insert(2,T).
12 Insert Insert(2,T)
13 Insert Insert(4,T)
14 Insert Insert(4,T)
15 Split Insert(4,T)
16 Split Insert(4,T)
17 Split Insert(4,T)
Insert(6,T)
Insert(6,T)
Insert(7,T)
Insert(7,T)
Insert(8,T)
Insert(8,T)
24 Split Insert(8,T)
25 Split Insert(8,T)
26 Split Insert(8,T)
27 Split Insert(8,T)
28 Insert -- definition Add the new key in its position. Say in a node v. (*) If v has 4 keys split v into a 2-node u, a 1-node w, and a key k, (or two 2-nodes and a key if v is a leaf) If v was the root then create a new root r parent of u and w and stop. Replace v by u and w as children of p(v). Repeat (*) for v := p(v).
29 Split (2k) a 0 p 1 a 1 p 2 a 2 … p 2k a 2k (k-1) a 0 p 1 a 1 p 2 a 2 … p k-1 a k-1 (k) a k p k+1 a k+1 … p 2k a 2k p k is inserted in parent.
30 Split (2k) a 0 p 1 a 1 p 2 a 2 … p 2k a 2k (k-1) a 0 p 1 a 1 p 2 a 2 … p k-1 a k-1 (k) a k p k a k+1 … p 2k a 2k p k is inserted in parent. Takes O(k) time
31 Split You want to copy half of the node to a new block rather than split the little red-black tree to efficiently use external memory You can prove that not too many splits occur
32 Insert (summary) O(logn) time and at most O(log k n) each split takes O(k) time Can show that the amortized # of splits is O(1) per insert
33 Delete delete(14,T)
34 Delete delete(14,T)
35 Delete delete(17,T)
36 Delete delete(17,T)
37 Delete delete(16,T)
38 Delete delete(16,T)
39 Borrow delete(16,T)
40 Borrow delete(16,T)
delete(9,T)
delete(9,T)
delete(9,T) Fusion 7 30
delete(9,T) Fusion
45 Delete -- definition Remove the key. If it is the only key in the node remove the node, and let v be the parent that loses a child, otherwise return (*) If v has one child, and v is the root discard v. Otherwise (v is not a root), if v has a sibling w of degree 3 or 4, borrow a child from w to v and terminate. Otherwise, fuse v with its sibling to a degree 3 node and repeat (*) with the parent of v.