Presentation is loading. Please wait.

Presentation is loading. Please wait.

Balanced Search Trees 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005.

Similar presentations


Presentation on theme: "Balanced Search Trees 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005."— Presentation transcript:

1 Balanced Search Trees 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005

2 Plan  Today  2-3-4 trees  Red-Black trees  Reading:  For today: Chapters 13.3-4  Reminder: HW1 due tonight!!! HW2 will be available soon

3 AVL-tree Review

4 AVL-Trees What is the key restriction on a binary search tree that keeps an AVL tree balanced? OKnot OK 5 3 6 7 249 5 2 6 7 14 3 6 28 14 5 5 37 8 9 4

5 AVL-Trees  Height balanced:  For each node the heights of left and right subtrees differ by at most 1, a representational invariance.  What is the mechanism to rebalance an out-of-balanced AVL tree caused by an insert?

6 The single rotation  Rotate the deepest out-of-balanced node. “Pulls” the child up one level. Z Y X ZY X

7 The double rotation  First rotate around child node, then around the parent node. Z X Y1Y1 Y2Y2 Z Y2Y2 Y1Y1 X

8 Double rotation cont’d  Result is to “pull” the grandchild node up two levels. Z X Y1Y1 Y2Y2 ZXY1Y1 Y2Y2

9 AVL Tree Summary  In each node maintains a lazy deletion flag and the height of its subtree.  The height of an AVL tree is at most 45% greater than the minimum.  Requires at most one single or double rotation to regain balance after an insert.  Thus, guarantees O(log N) time for search and insert.

10 2-3-4 Trees

11 Balanced 2-3-4 Trees  Maintain height balance in all subtrees. Depth property.  But allow nodes in the tree to expand to accommodate inserts.  In particular, nodes can have 2, 3 or 4 children. Node-size property.  E.g., a 4-node would have 3 keys that splits the keys into 4 intervals.

12 2-3-4 tree search  Search is similar to a binary search.  E.g., search for B G M Q A CHS WR

13 2-3-4 tree search  Search is similar to a binary search.  E.g., search for B G M Q A C HS WR

14 2-3-4 Tree Insert  To insert, first search for a leaf node in which to put the key.  E.g., insert U S U W G M Q A CHO G M Q A CHS WR

15 2-3-4 Tree Insert  May need to split a node  E.g., insert T A C H S U W G Q U A CH G Q T W S T

16 2-3-4 Tree Insert /* Either returns an empty node or a new root */ public Node BUinsert(int key) { if isEmptyNode() return new Node(key); /* Search for leaf to put key into */ Node subtree = findChild(key); // down which link? Node upNode = child.BUinsert(key); /* upNode is empty, the key at a leaf node, or * the result of a 4-node split that needs to be * propagated up. */ if upNode.isEmptyNode() return upNode; else return addToNode(upNode); // split? }

17 Cascading splits  When inserting a key into a 4-node, the 4-node splits and a key moves up to the parent node.  This new key may in turn cause the parent to split, moving a key up to the grandparent, and so on up to the root.  When would this happen?  Is there a way to avoid these cascading splits?

18 Bottom-up 2-3-4 trees  This BUinsert is called a bottom-up version of insert, since splits occur as we go back up the tree after the recursive calls.  Work occurs before and after the recursive calls.

19 Preemptive Split  Every time we find a 4-node while traveling down a search path, we split the 4-node.  Note: Two 2-nodes have the same number of children as one 4-node.  Changes are local to the split node (no cascading).  Guaranteed to find a 2-node or 3-node at the leaf.  Splitting a root node creates a new root.

20 2-3-4 Tree Height  What is the height of the tree? At most log 2 N + 1  Why? The maximum depth is when every node is a 2-node. Since every leaf has the same depth, the tree is complete and has depth log 2 N + 1.

21 Number of splits  How many splits does an insertion require? At most log 2 N + 1 splits.  Seems to require less than one split on average when tree is built from a random permutation. Trees tend to have few 4-nodes.

22 Top-down 2-4-5 trees  The second method is called top- down as splits occur on the way down the tree.  All the work occurs before the recursive calls and no work occurs after the recursive calls.  Called tail-recursion, which is much more efficient.  Can AVL trees be made tail recursive?

23 2-3-4 trees  Advantages:  Guaranteed O(log N) time for search and insert.  Issues:  Awkward to maintain three types of nodes.  Need to modify the standard search on binary trees.  Splits need to move links between nodes.  Code has many cases to handle.

24 Red Black Trees

25 Red-Black trees  A red-black tree is binary tree representation of a 2-3-4 tree using red and black nodes. B F HD IG G F BH I D D I OR

26 Red-black tree properties A Red-Black tree is a binary search tree where  Every node is colored either red or black.  Note: Every 2-3-4 node corresponds to one black node.  The root node is black.  Red nodes always have black parents (children)  Every path from the root to a leaf has same number of black nodes.

27 Red-black tree height  What is the height of a red-black tree?  It is at most 2 log N + 2 since it can be at most twice as high as its corresponding 2-3-4 tree, which has height at most log N + 1. 5 3 6 7 9

28 Red-black Tree Search  Search is the same as for binary search trees.  Color is irrelevant.  Search guaranteed to take O(log N) time.  Search typically occurs more frequently than insert.

29 Red-black Tree Insert  Simple 4-node test (2 red children?)  Few splits as most 4-nodes tend to be near the leaves.  Some 4-node splits require only changing the color of three nodes.  Rotations needed only when a 4-node has a 3-node parent.

30 Red-black Tree Summary  Advantages:  Guaranteed O(log N) time for search and insert.  Little overhead for balancing.  Trees are nearly optimal.  Top-down implementation can be made tail-recursive, so very efficient.

31 B-Trees

32 B-trees  A generalization of 2-3-4 trees.  Used for very large dictionaries where the data are maintained on disks.  Since disk lookups are very SLOW, want to read as few disk pages as possible. Want really shallow depth trees!

33 B-trees Key Idea  Make the nodes in the trees have a huge number of links, k-way.  Typically choose k so that a node fills a disk page.  As with 2-3-4 trees, not all the nodes have k links. Some may have as few as k/2 links.  When a node overflows, split the node.

34 B-trees  Takes O(log k/2 N) probes for search and insert.  Typically about 2-3 probes (disk accesses)  E.g., for N < 125 million and k = 1000, the height of the tree is less than 3.  As all searches go through the root node, usually keep the root node in memory.  Many variants  Common in many large data base systems.

35 Conclusion  AVL trees have the disadvantage that insert is not tail recursive.  2-3-4 trees are not practical, but are a good way to think about other approaches.  Red-black trees are very efficient and have guaranteed O(log N) insert and search.  B-trees have very shallow depth to minimize the number of disk reads needed for huge data bases.


Download ppt "Balanced Search Trees 15-211 Fundamental Data Structures and Algorithms Margaret Reid-Miller 3 February 2005."

Similar presentations


Ads by Google