Download presentation
Presentation is loading. Please wait.
Published byScott Manning Modified over 9 years ago
1
Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees
2
102542 key < 1010 < key < 2525 < key < 4242 < key A 4-node 3 keys 4-way branch
3
k0k0 k r−3 k r−2 An r-node r−1 keys r-way branch k1k1 k2k2 … c0c0 c1c1 c2c2 c r−2 c r−1
4
B-Trees (with minimum degree d) Each node holds between d−1 and 2d −1 keys Each non-leaf node has between d and 2d children The root is special: has between 1 and 2d −1 keys and between 2 and 2d children (if not a leaf) All leaves are at the same depth
5
A 2-4 tree 15 28 14 13 30 40 50 16 17 4 6 10 5711 B-Tree with minimal degree d=2
6
Node structure r – the degree key[0],…key[r−2] – the keys k0k0 k r-3 k r-2 k1k1 k2k2 … c0c0 c1c1 c2c2 c r−2 c r−1 child[0],…child[r−1] – the children leaf – is the node a leaf? Possibly a different representation for leafs item[0],…item[r−2] – the associated items
7
The height of B-Trees At depth 1 we have at least 2 nodes At depth 2 we have at least 2d nodes At depth 3 we have at least 2d 2 nodes … At depth h we have at least 2d h−1 nodes
8
Number of nodes accessed - log d n Number of operations – O(d log d n) Number of ops with binary search – O(log 2 d log d n) = O(log 2 n) Look for k in node x Look for k in the subtree of node x
9
B-Trees vs binary search trees Wider and shallower Access less nodes during search But may take more operations
10
B-Trees – What are they good for?
11
The hardware structure CPU RAM Disk Cache Each memory-level much larger but much slower Information moved in blocks
12
A simplified I/O model CPU RAM Disk Each block is of size m. Count both operations and I/O operations
13
Data structures in the I/O model Linked list and search trees behave poorly in the I/O model. Each pointer followed may cause a disk access Pick d such that a node fits in a block B-trees reduce the worst case # of I/Os Each node (struct) is allocated continuously. Harder to control the disk blocks containing different nodes
14
Number of nodes accessed - log d n Number of operations – O(d log d n) Number of ops with binary search – O(log 2 d log d n) = O(log 2 n) Look for k in node x Look for k in the subtree of node x I/Os
15
Red-Black Trees vs. B-Trees n = 2 30 10 9 30 ≤ height of Red-Black Tree ≤ 60 Up to 60 pages read from disk Height of B-Tree with d=1000 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access 1 millisecond (10 -3 sec) Memory access 100 nanosecond (10 -7 sec)
16
B-Trees – What are they good for? Large degree B-trees are used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block. Smaller degree B-trees used for internal- memory dictionaries to overcome cache-miss penalties. B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.
17
Updates to a B-tree
18
A B B A Rotate/Steal right Rotate/Steal left Number of I/Os – O(1) Number of operations – O(d)
19
Split A C B d−1 A CB Join Number of I/Os – O(1) Number of operations – O(d)
20
Insert 14 13 30 40 50 16 17 611 5 10 Insert(T,2) 15 28
21
Insert 13 611 5 10 Insert(T,2) 1 2 3 14 30 40 50 16 17 15 28
22
Insert 13 611 5 10 1 2 3 Insert(T,4) 14 30 40 50 16 17 15 28
23
Insert 13 611 5 10 Insert(T,4) 14 30 40 50 16 17 15 28 1 2 3 4
24
Split 13 611 5 10 Insert(T,4) 1 2 3 4 14 30 40 50 16 17 15 28
25
Split 13 611 5 10 Insert(T,4) 3 4 1 2 14 30 40 50 16 17 15 28
26
Split 13 611 Insert(T,4) 3 4 2 5 10 1 14 30 40 50 16 17 15 28
27
Splitting an overflowing node A C B dd−1 A CB d
28
13 611 Insert(T,7) 3 4 Another insert 1 2 5 10 14 30 40 50 16 17 15 28
29
13 11 Insert(T,7) 3 4 2 5 10 Another insert 1 14 30 40 50 16 17 15 28 6 7
30
13 11 Insert(T,8) 3 4 2 5 10 and another insert 1 6 7 14 30 40 50 16 17 15 28
31
13 11 Insert(T,8) 3 4 2 5 10 and another insert 1 6 7 8 14 30 40 50 16 17 15 28
32
13 11 Insert(T,9) 3 4 2 5 10 6 7 8 9 and the last for today 1 14 30 40 50 16 17 15 28
33
Split 13 11 Insert(T,9) 3 4 2 5 10 8 9 14 30 40 50 16 17 15 28 6 7 1
34
Split 13 11 Insert(T,9) 3 48 9 2 5 7 10 14 30 40 50 16 17 15 28 6 1
35
Split 13 11 Insert(T,9) 3 48 9 14 30 40 50 16 17 15 28 6 1 7 10 2 5
36
Split 11 Insert(T,9) 3 48 9 14 30 40 50 16 17 15 28 6 1 7 10 2 5 13
37
Insert – Bottom up Find the insertion point by a downward search Insert the key in the appropriate place If the current node is overflowing, split it If its parent is now overflowing, split it, etc. Disadvantages: Need both a downward scan and an upward scan Need to keep parents on a stack Nodes are temporarily overflowing
38
Insert – Top down While conducting the search, split full children on the search path before descending to them! When the appropriate leaf it reached, it is not full, so the new key may be added!
39
Split-Root(T) C d−1 C T.root
40
Split-Child(x,i) A C B d−1 A CB key[i]x x.child[i] key[i]x x.child[i]
41
Insert – Top down While conducting the search, split full children on the search path before descending to them! Number of I/Os – O(log d n) Number of operations – O(d log d n)
42
Deletions from B-Trees 22 28 20 30 40 50 24 26 14 delete(T,26) 1 24 6 8 9 11 12 10 13 7 15 3
43
Delete 22 28 20 30 40 50 14 delete(T,26) 1 24 6 8 9 11 12 10 13 7 15 3 24
44
Delete 22 28 20 30 40 50 14 delete(T,13) 1 24 6 8 9 11 12 10 13 7 15 3 24
45
Delete (Replace with predecessor) 22 28 20 30 40 50 14 delete(T,13) 1 24 6 8 9 11 12 10 12 7 15 3 24
46
Delete 22 28 20 30 40 50 14 delete(T,13) 1 24 6 8 9 10 12 7 15 3 2411
47
Delete 22 28 20 30 40 50 14 delete(T,24) 1 24 6 8 9 10 12 7 15 3 2411
48
Delete 22 28 20 30 40 50 14 delete(T,24) 1 24 6 8 9 10 12 7 15 3 11
49
Delete (steal from sibling) 22 30 20 14 delete(T,24) 1 24 6 8 9 10 12 7 15 3 2811 40 50
50
A B B A Rotate/Steal right Rotate/Steal left
51
Delete 22 30 20 14 delete(T,20) 1 24 6 8 9 10 12 7 15 3 2811 40 50
52
Delete 22 30 14 delete(T,20) 1 24 6 8 9 10 12 7 15 3 2811 40 50
53
Delete (Join) 14 delete(T,20) 1 24 6 8 9 10 12 7 15 3 11 40 50 22 28 30
54
Few more.. 14 delete(T,22) 1 24 6 8 9 10 12 7 15 3 11 40 50 22 28 30
55
Few more.. 14 delete(T,22) 1 24 6 8 9 10 12 7 15 3 11 40 50 30 28
56
Few more.. 14 delete(T,28) 1 24 6 8 9 10 12 7 15 3 11 40 50 30 28
57
Few more.. 14 delete(T,28) 1 24 6 8 9 10 12 7 15 3 11 40 50 30
58
Stealing again 14 delete(T,28) 1 24 6 8 9 10 12 7 15 3 11 40 30 50
59
Another one 14 delete(T,30) 1 24 6 8 9 10 12 7 15 3 11 40 30 50
60
Another one 14 delete(30,T) 1 24 6 8 9 10 12 7 15 3 11 40 50
61
After Join 14 delete(30,T) 1 24 6 8 9 10 12 7 15 3 1140 50
62
Now we can steal 14 delete(30,T) 1 24 6 8 9 10 12 7 15 3 1140 50
63
Now we can steal 14 delete(30,T) 1 24 6 8 9 7 12 3 11 15 40 50 10
64
More ? 14 delete(40,T) 1 24 6 8 9 7 12 3 11 15 40 50 10
65
Delete – Top down While conducting the search, make sure that each child descended into contains at least d keys How? Steal or join Assume, at first, that the item to be deleted is in a leaf When the item is located, it resides in a leaf containing at least d keys, so it can be removed
66
Delete – Top down While conducting the search, make sure that each child you descend to contains at least d keys d−1 d d Rotate! (Steal) d−1 Join!
67
Delete – Top down What if the item to be deleted is in an internal node? Descend as before from the root until the item to be deleted is located Keep a pointer to the node containing the item Carry on descending towards the successor, making sure that nodes contain at least d keys When the successor is found, delete it from its leaf and use it to replace the item to be deleted
68
Deletions from B-Trees As always, similar, but slightly more complicated than insertions (may need to replace with successor) Deletion is slightly simpler for B + -Trees
69
B-Trees vs. B + -Trees In a B-tree each node contains items and keys In a B + -tree leaves contain items and keys. Internal nodes contain keys to direct the search. Keys in internal nodes are either keys of existing items, or keys of items that were deleted. Internal nodes may contain more keys so overall the # of items we can store increases
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.