Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSE 373: Data Structures and Algorithms

Similar presentations


Presentation on theme: "CSE 373: Data Structures and Algorithms"— Presentation transcript:

1 CSE 373: Data Structures and Algorithms
Lecture 25: B-Trees

2 Cycles to access: CPU Registers 1 Cache tens Main memory hundreds Disk millions

3 Hard Disks Large amount of storage but slow access
Identifying a page takes a long time Pays to read or write data in pages (i.e. blocks) of 0.5 – 8 KB in size

4 Algorithm Analysis Running time of disk-based data structures measured in terms of computing time (CPU) number of disk accesses sequential reads random reads Regular main-memory algorithms that work one data element at a time can not be "ported" to secondary storage in a straight forward way

5 Principles Almost all of our data structure is on disk.
Every time we access a node in the tree it amounts to a random disk access. How can we address this problem?

6 M-ary Search Tree M – 1 keys needed to decide branch to take
Suppose we devised a search tree with branching factor M: M – 1 keys needed to decide branch to take Complete tree has height: # Nodes accessed for search: So, we’ll try to solve this problem as we did with heaps. Here’s the general idea. We create a search tree with a branching factor of M. Each node has M-1 keys and we search between them. What’s the runtime? O(logMn)? That’s a nice thought, and it’s the best case. What about the worst case? (logMn) (logMn)

7 B-Trees Internal nodes store (up to) M  1 keys Order property:
subtree between two keys x and y contain leaves with values v such that x  v < y Note the “” Leaf nodes contain up to L sorted values/ records. 3 7 12 21 x<3 3x<7 7x<12 12x<21 21x M = 7 To address these problems, we’ll use a slightly more structured M-ary tree: B-Trees. As before, each internal node has M-1 keys. To manage memory problems, we’ll tune the size of a node (or leaf) to the size of a memory unit. Usually, a page or disk block.

8 Disk Friendliness What makes B-trees disk-friendly?
Many keys stored in a node Each node is one disk page/block. All brought to memory/cache in one disk access. Internal nodes contain only keys; Only leaf nodes contain keys and actual data Much of tree structure can be loaded into memory irrespective of data object size Data actually resides in disk What is limiting you from increasing the number of keys stored in each node? Exercise: If disk block is 4000 bytes, key size is 20 bytes, pointer size is 4 bytes, and data/value size is 200 bytes, what should M and L be for our B-Tree?

9 B-Tree Structure Properties
Root (special case) has between 2 and M children (or could be a leaf) Internal nodes store up to M1 keys have between floor(M/2) and M children Leaf nodes where data is stored contain between floor(L/2) and L data items Nodes are at least ½ full The properties of B-Trees (and the trees themselves) are a bit more complex than previous structures we’ve looked at. Here’s a big, gnarly list; we’ll go one step at a time. The maximum branching factor, as we said, is M (tunable for a given tree). The root has between 2 and M children or at most L keys. (L is another parameter) These restrictions will be different for the root than for other nodes. Leaves are at least ½ full The tree is perfectly balanced !

10 B-Tree: Example B-Tree with M = 4 (# pointers in internal node)
and L = (# data items in leaf) Data objects… which we’ll ignore in slides 12 44 6 20 27 34 50 1, AB.. 6 8 9 10 12 20 27 34 44 47 49 50 This is just an example B-tree. Notice that it has 24 entries with a depth of only 2. A BST would be 4 deep. Notice also that the leaves are at the same level in the tree. I’ll use integers as both key and data, but we all know that that could as well be different data at the bottom, right? 2, GH.. 14 22 28 38 60 All leaves at the same depth 4, XY.. 16 24 32 39 70 17 41 19 Definition for later: “neighbor” is the next sibling to the left or right.

11 B-trees vs. AVL trees Suppose we have n = 109 data items:
Depth of AVL Tree: log2 109 = 30 Depth of B-Tree with M = 256, L = 256: log = 4.3 x = 2^30 1.44 log_phi x = 43 Log_128 x = 4.3 Wow!!! 11 11

12 Building a B-Tree with Insertions
3 3 3 Insert(3) Insert(18) Insert(14) 18 14 18 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? The empty B-Tree M = L = 3

13 3 3 3 18 Insert(30) 14 14 14 30 18 18 30 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? M = L = 3

14 18 18 18 32 3 18 3 18 3 18 32 Insert(32) Insert(36) 14 30 14 30 14 30 36 32 18 32 Insert(15) Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 3 18 32 14 30 36 M = L = 3 15

15 18 32 18 32 3 18 32 3 18 32 Insert(16) 14 30 36 14 30 36 15 15 16 18 32 15 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 18 3 15 18 32 15 32 14 16 30 36

16 M = L = 3 Insert(12,40,45,38) 18 18 15 32 15 32 40 3 15 18 32 3 15 18 32 40 14 16 30 36 12 16 30 36 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 38

17 Insertion Algorithm: The Overflow Step
Too big ? K3 K1 K2 K3 K4 K5 K1 K2 K4 K5 Too big M = 5

18 Insertion Algorithm Insert the key in its leaf in sorted order
If the leaf ends up with L+1 items, overflow! Split the leaf into two nodes: (L+1)/2 smaller keys (L+1)/2 larger keys Add the new child to the parent If the parent ends up with M+1 children, overflow! 3. If an internal node ends up with M+1 children, overflow! Split the node into two nodes: (M+1)/2 children with smaller keys (M+1)/2 children with larger keys Add the new child to the parent If the parent ends up with M+1 items, overflow! If the root ends up with M+1 children, split it in two, and create new root with two children OK, here’s that process as an algorithm. The new funky symbol is floor; that’s just like regular C++ integer division. Notice that this can propagate all the way up the tree. How often will it do that? Notice that the two new leaves or internal nodes are guaranteed to have enough items (or subtrees). Because even the floor of (L+1)/2 is as big as the ceiling of L/2. This makes the tree deeper!

19 And Now for Deletion… 18 Delete(32) 18 15 32 40 15 40 3 15 18 32 40 3
36 40 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 12 16 30 36 45 12 16 30 38 45 14 38 14 M = L = 3

20 Are you using that 14? Can I borrow it? M = 3 L = 3
18 Delete(15) 18 15 36 40 16 36 40 3 15 18 36 40 3 16 18 36 40 12 16 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 14 Are we okay? Are you using that 14? Can I borrow it? M = L = 3 Dang, not half full

21 18 18 16 36 40 14 36 40 3 16 18 36 40 3 14 18 36 40 12 30 38 45 12 16 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 M = L = 3

22 18 Delete(16) 18 14 36 40 14 36 40 3 14 18 36 40 3 14 18 36 40 12 16 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? M = L = 3 Are you using that 12?

23 18 18 14 36 40 36 40 3 14 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 M = L = 3 Are you using the node18/30?

24 18 36 36 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 14 M = L = 3

25 Delete(14) 36 36 18 40 18 40 3 18 36 40 3 18 36 40 12 30 38 45 12 30 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 14 M = L = 3

26 Delete(18) 36 36 18 40 18 40 3 18 36 40 3 30 36 40 12 30 38 45 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? M = L = 3

27 36 36 18 40 40 3 30 36 40 3 36 40 12 38 45 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 30 M = L = 3

28 36 40 36 40 3 36 40 3 36 40 12 38 45 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 30 30 M = L = 3

29 36 40 36 40 3 36 40 12 38 45 3 36 40 30 12 38 45 Alright, how do we insert and delete? Let’s start with the empty B-Tree. That’s one leaf as the root. Now, we’ll insert 3 and 14. Fine… What about inserting 1. Is there a problem? 30 M = L = 3

30 Deletion Algorithm: Rotation Step
K2 K3 K1 K3 K4 K5 K1 K2 K4 K5 Too small This is left rotation. Similarly, right rotation M = 5

31 Deletion Algorithm: Merging Step
Too small ? K2 K1 K3 K4 K1 K2 K3 K4 Can’t get smaller Too small M = 5

32 Deletion Algorithm Remove the key from its leaf
2. If the leaf ends up with fewer than L/2 items, underflow! Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow Alright, that’s deletion. Let’s talk about a few of the details. Why will dumping keys always work? If the neighbors were too low on keys to loan any, they must have L/2 keys, but we have one fewer. Therefore, putting them together, we get at most L, and that’s legal.

33 Deletion Slide Two 3. If an internal node ends up with fewer than M/2 children, underflow! Try a left rotation If not, try a right rotation If not, merge, then check the parent node for underflow If the root ends up with only one child, make the child the new root of the tree The same applies here for dumping subtrees as on the previous slide for dumping keys. This reduces the height of the tree!


Download ppt "CSE 373: Data Structures and Algorithms"

Similar presentations


Ads by Google