Presentation is loading. Please wait.

Presentation is loading. Please wait.

Haim Kaplan and Uri Zwick November 2014

Similar presentations


Presentation on theme: "Haim Kaplan and Uri Zwick November 2014"— Presentation transcript:

1 Haim Kaplan and Uri Zwick November 2014
Data Structures Lecture 5 B-Trees Haim Kaplan and Uri Zwick November 2014

2 Idealized computation model
CPU RAM Each instruction takes one unit of time Each memory access takes one unit of time

3 A more realistic model CPU Disk Each level much larger but much slower
RAM Disk Cache Each level much larger but much slower Information moved in blocks

4 A simplified I/O mode CPU Disk Each block is of size B
RAM Disk Each block is of size B Count both operations and I/O operations I/O operations are much more expensive

5 Data structures in the I/O model
Linked list and binary search trees behave poorly in the I/O model. Each pointer followed may cause a cache miss We need an alternative for binary search trees that is more suited to the I/O model B-Trees !

6 A 4-node 3 keys 4-way branch 10 25 42 key < 10 10 < key < 25

7 An r-node … r−1 keys r-way branch k0 k1 k2 kr−3 kr−2 c0 c1 c2 cr−2

8 B-Trees / (d,2d)-Trees [Bayer-McCreight (1972)]
d – minimum degree Each node holds between d−1 and 2d −1 keys (Each non-leaf node has between d and 2d children) The root is special: has between 1 and 2d −1 keys (Has between 2 and 2d children, if not a leaf) All leaves are at the same depth

9 B-Tree with minimal degree d=2
A (2,4)-tree B-Tree with minimal degree d=2 13 15 28 Insert 10, no problem. Insert 18? 1 3 5 7 11 14 16 17

10 Node structure Room for 2d1 keys and 2d child pointers
kr-3 kr-2 k1 k2 c0 c1 c2 cr−2 cr−1 Room for 2d1 keys and 2d child pointers r – the actual degree key[0],…key[r−2] – the keys item[0],…item[r−2] – the associated items child[0],…child[r−1] – the children leaf – is the node a leaf? Possibly a different representation for leaves

11 The height of B-Trees … At depth 1 there are at least 2 nodes
At depth 2 there are at least 2d nodes At depth 3 there are at least 2d2 nodes At depth h there are at least 2dh−1 nodes

12 B-Trees – What are they good for?
Large degree B-trees are used to represent very large dictionaries stored on disks. The minimum degree d is chosen according to the size of a disk block. Smaller degree B-trees used for internal-memory dictionaries to reduce the number of cache-misses. B-trees with d=2, i.e., (2,4)-trees, are very similar to Red-Black trees.

13 WAVL Trees vs. B-Trees n = 230  109 30 ≤ height of WAVL Tree ≤ 60
Up to 60 pages read from disk Height of B-Tree with d= 210 =1024 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access  1 millisecond (10-3 sec) Memory access  100 nanosecond (10-7 sec)

14 Look for k in the subtree of node x
Look for k in node x Look for k in the subtree of node x Number of I/Os - logdn Number of operations – O(d logdn) Number of ops with binary search – O(log2d logdn) = O(log2n)

15 Splitting an overflowing node
(I.e., a node with 2d keys / 2d+1 children) A C B d d−1 A C B d−1 d

16 Insert 13 15 28 1 3 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,2)

17 Insert 13 15 28 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,2)

18 Insert 13 15 28 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,4)

19 Insert 13 15 28 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,4)

20 Split 13 15 28 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,4)

21 Split 13 15 28 2 1 3 4 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,4)

22 Split 13 15 28 1 3 4 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,4)

23 Splitting an overflowing node
(I.e., a node with 2d keys / 2d+1 children) A C B d d−1 A C B d−1 d

24 Splitting an overflowing root
C d−1 T.root d T.root C d d−1 Number of I/Os – O(1) Number of operations – O(d)

25 Another insert 13 15 28 1 3 4 6 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,7)

26 Another insert 13 15 28 1 3 4 6 7 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,7)

27 and another insert 13 15 28 1 3 4 6 7 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,8)

28 and another insert 13 15 28 1 3 4 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,8)

29 and the last for today 13 15 28 1 3 4 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,9)

30 Split 13 15 28 7 1 3 4 6 8 9 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,9)

31 Split 13 15 28 1 3 4 6 8 9 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,9)

32 Split 13 5 2 15 28 1 3 4 6 8 9 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,9)

33 Split 5 13 2 15 28 1 3 4 6 8 9 11 14 16 17 Insert 10, no problem. Insert 18? Insert(T,9)

34 Insert – Bottom-up Find the insertion point by a downward search
Insert the key in the appropriate place If the current node is overflowing, split it If its parent is now overflowing, split it, etc. Disadvantages: Need both a downward scan and an upward scan Nodes are temporarily overflowing Need to keep parents on a stack Note: We do not maintain parent pointers. (Why?)

35 Exercise: (d,2d1)-Trees
Show that essentially the same bottom-up insertion technique also works for (d,2d1)-Trees (d,2d)-Trees are better than (d,2d-1)-Trees for at least two reasons: They allow top-down insertions and deletions The amortized number of split/fuse operations per insertion/deletion is O(1)

36 If the root is full, split it before starting this process
Insert – Top-down While conducting the search, split full children on the search path before descending to them! If the root is full, split it before starting this process When the appropriate leaf it reached, it is not full, so the new key may be added!

37 Number of operations – O(d)
Split-Root(T) C d−1 T.root C d−1 T.root Number of I/Os – O(1) Number of operations – O(d)

38 Number of operations – O(d)
Split-Child(x,i) x key[i] x key[i] A C B d−1 A C B d−1 x.child[i] x.child[i] Number of I/Os – O(1) Number of operations – O(d)

39 Insert – Top-down While conducting the search, split full children on the search path before descending to them! Number of I/Os – O(logdn) Number of operations – O(d logdn) Amortized no. of splits  1/(d1) (See bonus material)

40 Number of splits (Insertions only)
Bonus material Number of splits (Insertions only) If n items are inserted into an initially empty (d,2d)-tree, then the total number of splits is at most n/(d1) Amortized number of splits per insert  1/(d1)

41 Deletions from B-Trees
As always, similar, but slightly more complicated than insertions To delete an item in an internal node, replace it by its successor and delete successor Deletion is slightly simpler for B+-Trees

42 We continue with B-trees
B-Trees vs. B+-Trees In a B-tree each node contains items and keys In a B+-tree leaves contain items and keys. Internal nodes contain keys to direct the search. Keys in internal nodes are either keys of existing items, or keys of items that were deleted. Internal nodes may contain more keys. When d is large, the extra space needed is negligible. We continue with B-trees

43 Delete 7 15 3 10 13 22 28 20 24 26 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,26)

44 Delete 7 15 3 10 13 22 28 20 24 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,26)

45 Delete 7 15 3 10 13 22 28 20 24 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,13)

46 Delete (Replace with predecessor)
7 15 3 10 12 22 28 20 24 1 2 4 6 14 8 9 11 12 Insert 10, no problem. Insert 18? delete(T,13)

47 Delete 7 15 3 10 12 22 28 11 20 24 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,13)

48 Delete 7 15 3 10 12 22 28 11 20 24 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,24)

49 Delete 7 15 3 10 12 22 28 11 20 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,24)

50 Delete (borrow from sibling)
7 15 3 10 12 22 30 1 2 4 6 8 9 11 14 20 28 40 50 Insert 10, no problem. Insert 18? delete(T,24)

51 Borrow from left A B B A Borrow from right

52 Delete 7 15 3 10 12 22 30 1 2 4 6 8 9 11 14 20 28 40 50 Insert 10, no problem. Insert 18? delete(T,20)

53 Delete 7 15 3 10 12 22 30 1 2 4 6 8 9 11 14 28 40 50 Insert 10, no problem. Insert 18? delete(T,20)

54 Delete (Fuse) 7 15 3 10 12 30 1 2 4 6 8 9 11 14 22 28 40 50 Insert 10, no problem. Insert 18? delete(T,20)

55 Few more… 7 15 3 10 12 30 1 2 4 6 8 9 11 14 22 28 40 50 Insert 10, no problem. Insert 18? delete(T,22)

56 Few more… 7 15 3 10 12 30 1 2 4 6 8 9 11 14 28 40 50 Insert 10, no problem. Insert 18? delete(T,22)

57 Few more… 7 15 3 10 12 30 40 50 11 28 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,28)

58 Few more… 7 15 3 10 12 30 40 50 11 1 2 4 6 14 8 9 Insert 10, no problem. Insert 18? delete(T,28)

59 Borrowing again 7 15 3 10 12 40 1 2 4 6 8 9 11 14 30 50 delete(T,28)
7 15 3 10 12 40 1 2 4 6 8 9 11 14 30 50 Insert 10, no problem. Insert 18? delete(T,28)

60 Another one 7 15 3 10 12 40 1 2 4 6 8 9 11 14 30 50 Insert 10, no problem. Insert 18? delete(T,30)

61 Another one 7 15 3 10 12 40 1 2 4 6 8 9 11 14 50 Insert 10, no problem. Insert 18? delete(T,30)

62 Fuse A C B d−2 d−1 A C B d−2 d−1

63 After Fuse 7 15 3 10 12 1 2 4 6 8 9 11 14 40 50 Insert 10, no problem. Insert 18? delete(T,30)

64 Now we can borrow 7 15 3 10 12 1 2 4 6 8 9 11 14 40 50 delete(T,30)
7 15 3 10 12 1 2 4 6 8 9 11 14 40 50 Insert 10, no problem. Insert 18? delete(T,30)

65 Now we can borrow 7 12 3 10 15 40 50 1 2 4 6 11 14 8 9 delete(T,30)
7 12 3 10 15 40 50 1 2 4 6 11 14 8 9 Insert 10, no problem. Insert 18? delete(T,30)

66 Delete – Bottom-up Delete an item from a leaf
If the item to be deleted is not in a leaf, replace it by its successor and delete the successor Delete an item from a leaf If the current node is underflowing, i.e., has less than d1 keys, either borrow an item from a sibling, or fuse with a sibling Borrowing fixes the problem Fusing may make the parent underflowing

67 Borrow from left A B B A Borrow from right

68 Fuse A C B d−2 d−1 A C B d−2 d−1

69 Assume, at first, that the item to be deleted is in a leaf
Delete – Top-down Assume, at first, that the item to be deleted is in a leaf While conducting the search, make sure that each child descended into contains at least d keys How? Use Borrow or Fuse When the item is located, it resides in a leaf containing at least d keys, so it can be removed

70 Delete – Top down d−1  d d−1 Borrow Fuse
While conducting the search, make sure that each child you descend to contains at least d keys d−1  d d−1 Borrow Fuse

71 Delete – Top down What if the item to be deleted is in an internal node? Descend as before from the root until the item to be deleted is located Keep a pointer to the node containing the item Carry on descending towards the successor, making sure that nodes contain at least d keys When the successor is found, delete it from its leaf and use it to replace the item to be deleted

72 Number of fuse/splits (With bottom-up Insert/Delete)
Bonus material Number of fuse/splits (With bottom-up Insert/Delete) The number of split and fuse operations in a sequence of m insert and delete operations on an initially empty (d,2d)-tree is at most O(m) Amortized no. of splits/fuses per update is O(1) (2d-node) = 2 (d-node) = 1 With top-down insertions and deletions, the amortized number of splits/fuses may be (logdn)


Download ppt "Haim Kaplan and Uri Zwick November 2014"

Similar presentations


Ads by Google