Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees.

Similar presentations


Presentation on theme: "Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees."— Presentation transcript:

1 Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees

2 102542 key < 1010 < key < 2525 < key < 4242 < key A 4-node 3 keys 4-way branch

3 k0k0 k r−3 k r−2 An r-node r−1 keys r-way branch k1k1 k2k2 … c0c0 c1c1 c2c2 c r−2 c r−1

4 B-Trees (with minimum degree d) Each node holds between d−1 and 2d −1 keys Each non-leaf node has between d and 2d children The root is special: has between 1 and 2d −1 keys and between 2 and 2d children (if not a leaf) All leaves are at the same depth

5 A 2-4 tree 15 28 14 13 30 40 50 16 17 4 6 10 5711 B-Tree with minimal degree d=2

6 Node structure r – the degree key[0],…key[r−2] – the keys k0k0 k r-3 k r-2 k1k1 k2k2 … c0c0 c1c1 c2c2 c r−2 c r−1 child[0],…child[r−1] – the children leaf – is the node a leaf? Possibly a different representation for leafs item[0],…item[r−2] – the associated items

7 The height of B-Trees At depth 1 we have at least 2 nodes At depth 2 we have at least 2d nodes At depth 3 we have at least 2d 2 nodes … At depth h we have at least 2d h−1 nodes

8 Number of nodes accessed - log d n Number of operations – O(d log d n) Number of ops with binary search – O(log 2 d log d n) = O(log 2 n) Look for k in node x Look for k in the subtree of node x

9 B-Trees vs binary search trees Wider and shallower Access less nodes during search But may take more operations

10 B-Trees – What are they good for?

11 The hardware structure CPU RAM Disk Cache Each memory-level much larger but much slower  Information moved in blocks

12 A simplified I/O model CPU RAM Disk Each block is of size m. Count both operations and I/O operations

13 Data structures in the I/O model  Linked list and search trees behave poorly in the I/O model. Each pointer followed may cause a disk access Pick d such that a node fits in a block  B-trees reduce the worst case # of I/Os Each node (struct) is allocated continuously. Harder to control the disk blocks containing different nodes

14 Number of nodes accessed - log d n Number of operations – O(d log d n) Number of ops with binary search – O(log 2 d log d n) = O(log 2 n) Look for k in node x Look for k in the subtree of node x I/Os

15 Red-Black Trees vs. B-Trees n = 2 30  10 9 30 ≤ height of Red-Black Tree ≤ 60 Up to 60 pages read from disk Height of B-Tree with d=1000 is only 3 Each B-Tree node resides in a block/page Only 3 (or 4) pages read from disk Disk access  1 millisecond (10 -3 sec) Memory access  100 nanosecond (10 -7 sec)

16 B-Trees – What are they good for? Large degree B-trees are used to represent very large disk dictionaries. The minimum degree d is chosen according to the size of a disk block. Smaller degree B-trees used for internal- memory dictionaries to overcome cache-miss penalties. B-trees with d=2, i.e., 2-4 trees, are very similar to Red-Black trees.

17 Updates to a B-tree

18 A B  B A  Rotate/Steal right Rotate/Steal left Number of I/Os – O(1) Number of operations – O(d)

19 Split A C  B d−1 A CB   Join Number of I/Os – O(1) Number of operations – O(d)

20 Insert 14 13 30 40 50 16 17 611 5 10 Insert(T,2) 15 28

21 Insert 13 611 5 10 Insert(T,2) 1 2 3 14 30 40 50 16 17 15 28

22 Insert 13 611 5 10 1 2 3 Insert(T,4) 14 30 40 50 16 17 15 28

23 Insert 13 611 5 10 Insert(T,4) 14 30 40 50 16 17 15 28 1 2 3 4

24 Split 13 611 5 10 Insert(T,4) 1 2 3 4 14 30 40 50 16 17 15 28

25 Split 13 611 5 10 Insert(T,4) 3 4 1 2 14 30 40 50 16 17 15 28

26 Split 13 611 Insert(T,4) 3 4 2 5 10 1 14 30 40 50 16 17 15 28

27 Splitting an overflowing node A C  B dd−1 A CB  d

28 13 611 Insert(T,7) 3 4 Another insert 1 2 5 10 14 30 40 50 16 17 15 28

29 13 11 Insert(T,7) 3 4 2 5 10 Another insert 1 14 30 40 50 16 17 15 28 6 7

30 13 11 Insert(T,8) 3 4 2 5 10 and another insert 1 6 7 14 30 40 50 16 17 15 28

31 13 11 Insert(T,8) 3 4 2 5 10 and another insert 1 6 7 8 14 30 40 50 16 17 15 28

32 13 11 Insert(T,9) 3 4 2 5 10 6 7 8 9 and the last for today 1 14 30 40 50 16 17 15 28

33 Split 13 11 Insert(T,9) 3 4 2 5 10 8 9 14 30 40 50 16 17 15 28 6 7 1

34 Split 13 11 Insert(T,9) 3 48 9 2 5 7 10 14 30 40 50 16 17 15 28 6 1

35 Split 13 11 Insert(T,9) 3 48 9 14 30 40 50 16 17 15 28 6 1 7 10 2 5

36 Split 11 Insert(T,9) 3 48 9 14 30 40 50 16 17 15 28 6 1 7 10 2 5 13

37 Insert – Bottom up Find the insertion point by a downward search Insert the key in the appropriate place If the current node is overflowing, split it If its parent is now overflowing, split it, etc. Disadvantages: Need both a downward scan and an upward scan Need to keep parents on a stack Nodes are temporarily overflowing

38 Insert – Top down While conducting the search, split full children on the search path before descending to them! When the appropriate leaf it reached, it is not full, so the new key may be added!

39 Split-Root(T) C  d−1 C   T.root

40 Split-Child(x,i) A C  B d−1 A CB   key[i]x x.child[i] key[i]x x.child[i]

41 Insert – Top down While conducting the search, split full children on the search path before descending to them! Number of I/Os – O(log d n) Number of operations – O(d log d n)

42 Deletions from B-Trees 22 28 20 30 40 50 24 26 14 delete(T,26) 1 24 6 8 9 11 12 10 13 7 15 3

43 Delete 22 28 20 30 40 50 14 delete(T,26) 1 24 6 8 9 11 12 10 13 7 15 3 24

44 Delete 22 28 20 30 40 50 14 delete(T,13) 1 24 6 8 9 11 12 10 13 7 15 3 24

45 Delete (Replace with predecessor) 22 28 20 30 40 50 14 delete(T,13) 1 24 6 8 9 11 12 10 12 7 15 3 24

46 Delete 22 28 20 30 40 50 14 delete(T,13) 1 24 6 8 9 10 12 7 15 3 2411

47 Delete 22 28 20 30 40 50 14 delete(T,24) 1 24 6 8 9 10 12 7 15 3 2411

48 Delete 22 28 20 30 40 50 14 delete(T,24) 1 24 6 8 9 10 12 7 15 3 11

49 Delete (steal from sibling) 22 30 20 14 delete(T,24) 1 24 6 8 9 10 12 7 15 3 2811 40 50

50 A B  B A  Rotate/Steal right Rotate/Steal left

51 Delete 22 30 20 14 delete(T,20) 1 24 6 8 9 10 12 7 15 3 2811 40 50

52 Delete 22 30 14 delete(T,20) 1 24 6 8 9 10 12 7 15 3 2811 40 50

53 Delete (Join) 14 delete(T,20) 1 24 6 8 9 10 12 7 15 3 11 40 50 22 28 30

54 Few more.. 14 delete(T,22) 1 24 6 8 9 10 12 7 15 3 11 40 50 22 28 30

55 Few more.. 14 delete(T,22) 1 24 6 8 9 10 12 7 15 3 11 40 50 30 28

56 Few more.. 14 delete(T,28) 1 24 6 8 9 10 12 7 15 3 11 40 50 30 28

57 Few more.. 14 delete(T,28) 1 24 6 8 9 10 12 7 15 3 11 40 50 30

58 Stealing again 14 delete(T,28) 1 24 6 8 9 10 12 7 15 3 11 40 30 50

59 Another one 14 delete(T,30) 1 24 6 8 9 10 12 7 15 3 11 40 30 50

60 Another one 14 delete(30,T) 1 24 6 8 9 10 12 7 15 3 11 40 50

61 After Join 14 delete(30,T) 1 24 6 8 9 10 12 7 15 3 1140 50

62 Now we can steal 14 delete(30,T) 1 24 6 8 9 10 12 7 15 3 1140 50

63 Now we can steal 14 delete(30,T) 1 24 6 8 9 7 12 3 11 15 40 50 10

64 More ? 14 delete(40,T) 1 24 6 8 9 7 12 3 11 15 40 50 10

65 Delete – Top down While conducting the search, make sure that each child descended into contains at least d keys How? Steal or join Assume, at first, that the item to be deleted is in a leaf When the item is located, it resides in a leaf containing at least d keys, so it can be removed

66 Delete – Top down While conducting the search, make sure that each child you descend to contains at least d keys d−1  d d Rotate! (Steal) d−1 Join!

67 Delete – Top down What if the item to be deleted is in an internal node? Descend as before from the root until the item to be deleted is located Keep a pointer to the node containing the item Carry on descending towards the successor, making sure that nodes contain at least d keys When the successor is found, delete it from its leaf and use it to replace the item to be deleted

68 Deletions from B-Trees As always, similar, but slightly more complicated than insertions (may need to replace with successor) Deletion is slightly simpler for B + -Trees

69 B-Trees vs. B + -Trees In a B-tree each node contains items and keys In a B + -tree leaves contain items and keys. Internal nodes contain keys to direct the search. Keys in internal nodes are either keys of existing items, or keys of items that were deleted. Internal nodes may contain more keys so overall the # of items we can store increases


Download ppt "Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees."

Similar presentations


Ads by Google