Presentation is loading. Please wait.

Presentation is loading. Please wait.

B+-Trees.

Similar presentations


Presentation on theme: "B+-Trees."— Presentation transcript:

1 B+-Trees

2 Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. However, trees require that all data fit into the main memory When the size of the tree is too large to fit in main memory and has to reside on disk, accessing each node is slow.

3 We will group nodes together so we can read in many at the same time, and use an index to help us determine which group we want to bring in. When we a system to find other nodes easier, we term this an index.

4 From Binary to K-ary Idea: allow a node in a tree to have many children Less disk access as fewer nodes are retrieved = smaller tree height = more branching As branching increases, the depth decreases An K-ary tree allows K-way branching Each internal node has at most K children A complete K-ary tree has height that is roughly logK N instead of log2 N If K = 20, then log < 5 (log2 220 = ?) Thus, we can speedup access time as the number of nodes accessed decreases significantly In a B+ tree we want all leaves to be at same level. We can do that by varying the branching factor.

5 K-ary Search Tree A binary search tree has one key to decide which of the two branches to take An K-ary search tree needs K–1 keys to decide which branch to take - “One more kid than key” How do we store k child pointers? For a B+ tree, we require that each node is at least ½ full! We don’t want an K-ary search tree to degenerate to a linked list, or even a binary search tree If we reserve space for all the kids, we don’t want it to be wasted.

6 B+ Tree (K # of kids, L size of leaves)
A B+-tree of order K (K>3) is an K-ary tree with the following properties: The data items are stored ONLY in leaves The root is either a leaf or has between two and K children The non-leaf nodes store up to K-1 keys to guide the searching; key i represents the smallest key in subtree i+1 All non-leaf nodes (except the root) have between K/2 and K children All leaves are at the same depth and have between L/2 and L data items, for some L (usually L << K, but we will assume K=L in our examples) Note, the text calls these trees B-trees, but B+ is a more generally used term

7 Keys in Internal Nodes We will adopt the following convention:
key i in an internal node is the smallest key in its i+1 subtree (i.e., right subtree of key i) I would even be less strict. Since internal nodes are “roadsigns”, I would just not bother to update the internal values. Even following this convention, there is no unique B+-tree for the same set of records

8 B+ Tree Example 1 (Order 5, K=L=5)
Whole records are stored at the leaves (we only show the keys here) Since L=5, each leaf has between 3 and 5 data items (root can be exception) Since K=5, each nonleaf node has between 3 to 5 children (root can be exception) Requiring nodes to be half full guarantees that the B+ tree does not degenerate into a linked list or a simple binary tree

9 B+ Tree Example 2 (Order K=L=4)

10 B+ Tree in Practical Usage
Each internal node/leaf is designed to fit into one I/O block of data. An I/O block usually can hold quite a lot of data. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion B+-tree is a popular structure used in commercial databases. To further speed up the search, the first one or two levels of the B+- tree are usually kept in main memory wasted space: The disadvantage of B+-tree is that we must allow for K-1 keys, but only half are used (on average).

11 Searching Example Suppose that we want to search for the key K. The path traversed is shown in red

12 Insertion (int val) find the leaf location Insert into node n Split if too big (instead of rotations in AVL trees) node count per node is used to maintain properties of B+-trees If n is now too big (i.e. contains > L keys). Split node Cut n off from its parent Split n into two pieces. Identify key to be the parent of nL and nR, and insert the new key together with its child pointers into the old parent of n.

13 Inserting “O” into a Non-full Leaf (K=4 L=3)

14 What if we insert T into the tree? Node is too large. How can you fix?
Q R S T

15 Splitting a Leaf: Inserting T (K=4,L=3)
Unhappy node. Break apart and propagate the smallest key of the rightmost node up to the next higher level

16 Splitting Example 2 (L=3, K=4)
Unhappy node. Break apart and propagate the smallest key of the rightmost node up to the next higher level This node is NOW unhappy so we will do the same thing again – Break apart and propagate up.

17 Splitting an Internal Node
To insert a key val into a full internal node x: Cut x off from its parent Insert val and its left and right child pointers into x, pretending there is space. Now x has K keys. Split x into 2 new internal nodes xL and xR, with xL containing the ( K/2 - 1 ) smallest keys, and xR containing the K/2 largest keys. Note that the middle key M is not placed in xL or xR Make M the parent of xL and xR, and insert M together with its child pointers into the old parent of x.

18 Notice the multiple splits
Each splits apart and propagated up the key. How many splits can there be? How much work is each split?

19 Termination Splitting will continue as long as we encounter full internal nodes If the split internal node x does not have a parent (i.e. x is a root), then create a new root containing the key J and its two children

20 Deletion Find and delete in leaf May have too few nodes.
Do reverse of add (pull down and slap together) BUT, it could be that when you combine neighbor nodes, you get a node that is too large. Then, you would have to split it apart. ARGGGG! Better to shift some of the records from a neighbor into the leaf that is too small.

21 What if the item d we remove was part of an index node?
the key we delete can appear in at most one ancestor of x as a key (why?) This key is seen when we searched down the tree. Remember it. After deleting d from node the tree, we can access the ancestor directly and replace it by the new smallest key in the node

22 Deletion Example (K=5, L=4)– deletion causes no issues
Want to delete 15

23 Again, no problems Want to delete 9

24 Want to delete 10

25 When a node becomes too small, you combine adjacent nodes together.
You pull down the key from the parent and slap the two nodes together. Deletion of 10 leaves node too small Pull down the key between the nodes and slap together

26 Now this node is unhappy.
Pull down 7 and slap together.

27 Tree is in proper form

28 Could combining ever be a problem? K=5,L=4
In this case, the circled node is unhappy as there must be between 3 to 5 kids (except for the root), but if the unhappy node tries to combine with the left neighbor, there will be six kids (and keys 3,5,10,24,35). It will be unhappy again, and have to split. The same thing is true if it tries to combine with its right neighbor.

29 We want a local solution
Sometimes students try to move nodes in creative ways. We want something that only looks locally We want something that always works (is not dependent on population of tree)

30 The solution -slide a child from sibling.


Download ppt "B+-Trees."

Similar presentations


Ads by Google