Copyright Curt Hill Balance in Binary Trees Impact on Performance
Copyright Curt Hill Tree Shape and Performance A tree that is balanced has excellent performance O(log 2 N) for: –Searches –Insertions –Deletions Only a hash table can beat this performance –But it has its own issues
Copyright Curt Hill What is balance? The notion is that the two sub-trees are of about the same size Thus a search eliminates half the tree in each examination Perfect balance: –For each node in the tree, the size of the two sub-trees are off by at most one
Copyright Curt Hill Probabilities What is the likelihood that a randomly built tree will have good performance characteristics? This is a difficult question The shape of a tree is dependent on the entry order of the nodes to be inserted Example: –Consider the integers 1-7 as the items to put in a tree –There are 7! = 5040 ways to order their input 7 ways to choose first 6 ways to choose second etc.
Copyright Curt Hill What do we want? A search must look at no more than 3 nodes
Copyright Curt Hill Example Continued There are two really bad ways to choose the tree: –In ascending order or descending order –There are only two of these but there are several others that are just as bad –Consider or Bad in this case means that every node has zero or one descendents
Copyright Curt Hill What do we not want? 1 A search must look at no more than 7 nodes Arrival in ascending order Equally bad
Copyright Curt Hill Negative Combinatorics There are two ways to choose the first item –Each subsequent item provides two ways: –The next item in ascending order –The last item –Therefore 2 * 2 * 2 * 2 * 2 * 2 * 1 –Looks like 64 ways to choose a list –This is 1.27% chance of a list A search would look at no more than 7 nodes
Copyright Curt Hill Positive Combinatorics There is only one way to choose the root, it must be the 4 There are two ways to choose the second: 2 or 6 There are three ways to choose the third –If 2 was picked the 6 or any descendent of 2 –If 6 was picked the 2 or any descendent of 6 It gets exciting after that
Copyright Curt Hill Positive Combinatorics Sub-cases need to be examined of the three last choices These do not work well in this kind of presentation I believe that there are 80 out of 5040 (1.5%) permutations that yield a perfectly balance tree However, most possibilities fall somewhere in between maximum pathes of 7 and 3
Copyright Curt Hill Summary The worst case is a linked list which is bad –The worst case is not very likely The best case is perfectly balanced –The best case is more likely, but still unlikely Empirical studies indicate that the average path length of a unbalanced tree to be only 39% longer than a perfectly balanced tree Balancing is hard and slows insertions and deletions
Copyright Curt Hill When to Balance In most cases an unbalanced tree will perform quite adequately If the application fulfills the following two criteria then balancing could be considered –The data is large and the search performance impacts the program –The number of searches is large compared to insertion and deletion
Copyright Curt Hill Perfectly balanced trees Definition: –For each node the number of nodes of the left and right sub-trees differ by only 1 Balancing a tree is a recursive process that involves nodes from the leaves to root It is usually the case that control information is placed in node that measures the balance
Copyright Curt Hill Balance Again Balancing occurs in insertion and deletion, but not searches It is somewhat intricate so perfect balance is seldom used The ratio of searches to inserts and deletes must be very high Is there another definition of balance that gives good performance with less rebalancing
Copyright Curt Hill Height Balanced Also known as AVL balance –Adelson, Velski and Landis –Developed it and proved its desirability Definition: –The tree is balanced if for each node the heights of the two sub-trees differ at most by one It is the height of the tree that determines the worst case search
Copyright Curt Hill Digression on Search Consider searching an array On average the search requires ½N comparisons The worst case is N searchs to find last one or to show not found The average and worst case are quite different This is not the case for trees
Copyright Curt Hill Searching Trees More than half the nodes are leaves at maximum depth. Worst case is three probes, but average case is only slightly less than three probes.
Copyright Curt Hill AVL Trees Again Adelson, Velski and Landis proved: –Worst case of an AVL tree is only 45% worse than perfectly balanced –Average case: Insignificantly different than perfectly balanced Every perfectly balanced is also AVL balanced Far fewer rebalance, thus cheaper to construct –For the most part rebalancing occurs when really needed
Copyright Curt Hill Construction Consider the construction of the following tree Four types of rebalancing operation –RR single –LL single –LR double –RL double Add:
Copyright Curt Hill After 2 inserts 4 5 Still perfectly balanced
Copyright Curt Hill Insert Neither perfect nor AVL, rebalance is needed
Copyright Curt Hill Rotate Right Rebalance is needed – RR Single
Copyright Curt Hill After Rotate 5 7 After rebalance 4
Copyright Curt Hill Insert No problem 4 2
Copyright Curt Hill Insert Unbalanced in other way – Do a LL single 4 2 1
Copyright Curt Hill Rebalance 5 7 Rebalance complete – not perfect but AVL 2 14
Copyright Curt Hill Insert A rebalance is again needed, but different
Copyright Curt Hill After Rotatation 4 5 This requires LR double
Copyright Curt Hill Insert This requires RL double
Copyright Curt Hill Rotate This requires RL double
Copyright Curt Hill Rotate Now complete
Copyright Curt Hill The problem of balancing To implement requires extra stuff in the nodes Measures the height of the descendents Even with an AVL tree there is substantial work to be done at insertion and deletion time Thus the search to insert and delete ratio needs to be high –Just not as high as perfect balance
Copyright Curt Hill Synonyms Another name for an AVL trees is Fibonacci tree The fact that heights may disagree by one leads to as strangely asymmetric tree
Copyright Curt Hill Is this balanced?