Download presentation
Presentation is loading. Please wait.
Published byHoratio McDowell Modified over 9 years ago
© Neeraj Suri EU-NSF ICT March 2006 Dependable Embedded Systems & SW Group Introduction to Computer Science 2 Balanced Binary Search Trees (2) & Extended Binary Trees Prof. Neeraj Suri Brahim Ayari
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees2 Height of AVL Trees AVL trees are defined by the height difference of subtrees Original goal: the tree should be as “balanced” as possible How balanced is an AVL tree? The answer is given by the theorem of height of an AVL tree: Theorem: For the height h(T) of an AVL tree with n nodes holds: log 2 n + 1 h(T) 1.44 log 2 ( n+1 )
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees3 Fibonacci Trees The lower bound log 2 n + 1 h(T) comes from the minimal height of a balanced binary tree (already shown) For the proof of the upper bound one needs a special class of AVL trees: Fibonacci trees Fibonacci numbers: F 0 = 0, F 1 = 1, F n = F n-1 + F n-2 Definition: Fibonacci Trees are constructed as follows: The empty tree T 0 is a Fibonacci tree (height 0) The tree T 1, that contains only one node is a Fibonacci tree of height 1 If T h-1 and T h-2 are Fibonacci trees of heights h-1 and h-2, and x a node, then T h = (T h-1, x, T h-2 ) is a Fibonacci tree of height h No other trees are Fibonacci trees -> Observe: the number of nodes on the path from root to the deepest leaf gives the height of the Fibonacci tree !
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees4 Number of nodes n 0 = 0, F 0 = 0 n 1 = 1, F 1 = 1 n 2 = 2, F 2 = 1 n 3 = 4, F 3 = 2 Fibonacci Trees T 0 : empty tree T 1 : one node T 2 : (T 1, x, T 0 ) x T 3 : (T 2, x, T 1 ) x
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees5 Number of nodes n 4 = 7, F 4 = 3 n 5 = 12, F 5 =5 Fibonacci Trees T 4 : (T 3, x, T 2 ) T 5 : (T 4, x, T 3 ) x T3T3 T2T2 T4T4 T3T3 T 6, T 7, etc. analogue
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees6 Fibonacci and AVL Trees To prove: Every Fibonacci tree is an AVL tree Proof (by induction over h): Note: T h is always a tree of height h T 0 and T 1 are AVL trees If T h-1 and T h-2 are AVL trees, build according to the rules T h = (T h-1, x, T h-2 ). As T h-1 and T h-2 are AVL trees, we must now only check the balancing factor of the root BF(T h ) = | h(T h-1 ) - h(T h-2 ) | = | (h - 1) - (h - 2) | = 1
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees7 Fibonacci and AVL Trees Special note: for a given Fibonacci tree there are no AVL trees with the same height and fewer nodes The construction gives AVL trees with maximal height One can add more nodes with kept height, but remove none without violating the AVL criterion (height is kept unchanged) Fibonacci trees gives the maximal height of an AVL tree for a given number of nodes Note: the number of nodes n h in T h is the number of nodes in the (h+2)-th Fibonacci number minus 1, i.e., n h = F h+2 - 1 (for n 0)
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees8 Fibonacci and AVL Trees The following inequality holds for Fibonacci numbers: F h h-2 for h 2 and = ½ ( 1 + 5 ) n is the number of nodes in an AVL tree of height h. As T h contains a minimal number of nodes: n n h Insert n h = F h+2 - 1: n n h = F h+2 - 1 h - 1 thus n + 1 h Number of nodes grows exponentially with the height Reversely: h log (n + 1) = (1 / log 2 ) log 2 (n+1) = 1.44... log 2 (n+1) Thus: search path in an AVL tree is in worst case 44% longer than in a complete tree
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees9 Cost Analysis of AVL Trees h clog 2 (n+1) means: the height of an AVL tree is limited by O(log 2 n) Cost for insertion is in O( log 2 n ) One should only consider the path from the root to the insertion point Rotations have constant costs Cost for deletion is in O( log 2 n ) For every node on the path from the root to the deleted node results in maximally one rotation AVL trees are worst case efficient implementations of binary search trees Natural trees need (n) steps in worst case Calculating the average height is still an open problem Empirical results give h = c + log 2 n for c 0,2
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees10 Weight Balanced Binary Search Trees Treat the “weight difference” of two subtrees as a measure of balancing Weight = number of nodes in subtree The properties are very similar to height balanced binary trees Let T be a binary search tree, T L the left subtree and n(X) the number of nodes in a tree X Definition: the value (T) = (n(T L ) + 1) / (n(T) + 1) is the root balance of T Definition: a tree T is -balanced, if for every subtree T’ holds that: (T’) 1 -
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees11 Condition (T’) 1 - The set of all -balanced binary trees are called BB() („bounded balance“). The definition of balance only considers the left subtree, but for a BB() tree holds also for every subtree 1 - ’(T’) 1 - where ’ analogue to is defined on the right subtree Parameter defines the “distance” from a complete tree: = ½only complete trees allowed < ½relaxed condition = 0no structural conditions > ½ makes no sense to consider
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees12 Example (T) = (n(T L ) + 1) / (n(T) + 1) Choose = 0.3, then holds for every subtree = 0.3 1 - = 0.7 Tree is in BB() for = 0.3 Subtree with root Mars3/10 = 0.3 Jupiter2/3 = 0.67 Pluto3/7 = 0.43 Mercury1/3 = 0.33 Uranus2/4 = 0.5 Pluto Mars Jupiter EarthMercuryUranus VenusSaturnNeptune
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees13 Notes Already noted: = ½ holds for complete trees Root balance < ½ means: there are fewer nodes in the left subtree limits the root balance symmetrically from both sides Left tree is complete: root balance goes towards 1 with increasing number of nodes Only = 0 allows all “degenerations” Not every tree (with n nodes) can be transformed into a BB() tree for any There is at least one tree in BB() when 0,25 1 - ½ 2 0,292
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees14 Height of Weight Balanced Trees Note: when traversing the path from the root to the leaves one “looses”, dependent on , a number of nodes at every step Consider the path p = v 1, v 2,..., v h For the right and left subtree T L and T R of a tree T holds (due to the BB() condition) n(T L ) + 1 ( 1 - ) (n(T) + 1) n(T R ) + 1 ( 1 - ) (n(T) + 1) Traversal of path p: n(v 2 ) + 1 ( 1 - ) (n(v 1 ) + 1) n(v 3 ) + 1 ( 1 - ) (n(v 2 ) + 1) n(v h ) + 1 ( 1 - ) (n(v h-1 ) + 1)
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees15 Height of Weight Balanced Trees As v 1 is the root and v h a leaf, holds: n(T) + 1 = n(v 1 ) + 1 and n(v h ) + 1 = 2 Insertion in the total inequality : 2 = n(v h ) + 1 (1 - ) h-1 (n(v 1 ) + 1) = (1 - ) h-1 (n(T) + 1) Apply logarithms on both sides: 1 (h - 1)log 2 (1 - ) + log 2 (n(T) + 1) Thus (note: log 2 (1 - ) 0): h - 1 log 2 (n(T) + 1) / c O(log 2 n) Height of the tree is logarithmic in the number of nodes
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees16 Operations on Weight Balanced Binary Trees Search is the same as for AVL trees Cost is logarithmic For insertion/deletion the root balance must be updated along the path from the root to the corresponding position By violation of the criterion: rotations as for AVL trees Open issues: Are rotations appropriate measures for restructuring BB() trees? How does one effectively calculate the root balance? The number of rotations on the path to the root is limited: search/insertion/deletion are all in O(log 2 n)
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees17 Position Search in Balanced Binary Search Tree Comparison: Tree implementations vs. linked lists Balanced trees allows (almost) all operations in O(log 2 n) Linked lists need for search/insertion/deletion in O(n)! For sequential traversal both perform in O(n) Should sorted data always be stored in trees?! One should not underestimate the implementation costs “Last” operation where lists “win” is for positional search (the p th element) Positional search: Find the k th element in a list For trees the “list” is an inorder traversal
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees18 The Problem For lists: Travers k elements in O(k) For trees: One does not “know” whether to go left or right, and one does not know anything about the number of nodes in the subtrees Worst case all nodes must be visited: O(n)! That can be improved! ?...
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees19 Rank of a Node Definition: The rank of a node is the number of nodes in the left subtree plus 1 Rank = position of node x in the tree where x is root class BinarySearchTree { int K;/* Key */ Info info; /* info */ int balance;/* BF, for AVL trees: -1, 0, +1 */ int rank; BinarySearchTree L, R; /* constructor und methods... */ public BinarySearchTree posFind(int pos) {... } }
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees20 Algorithm Pseudo code: Start in the root If pos < rank: search in the left subtree If pos > rank: subtract the rank from the position and search in the right subtree Search stops when pos = rank Correctness: The rank of a node is always its position in the subtree where it is the root Note: when inserting/deleting in the left subtree, the nodes upwards until the root must update their ranks
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees21 Example 3 Prague Bonn Bern 2 Lima 5 Sofia 3 2 Paris 2 Cairo 1 Athens 1 Oslo 1 Rome 1 Tokyo 1 pos = 4 -> Cairo pos = 9 -> Rome pos=1 pos=2 pos=3 pos=4 pos=5 pos=6 pos=7 pos=8 pos=9 pos=10 pos=11
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees22 Java Method public BinarySearchTree findPos( int pos ) { BinarySearchTree root = this; while ( ( root null ) && ( pos root.rank )) { if ( pos < root.rank ) { root = root.L; } else { pos = pos - root.rank; root = root.R; } return root; } Complexity in balanced tree O(log 2 n)
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees23 Summary: Balanced Search Trees OperationSequential listLinked listBal. tree with degree SearchO(log 2 n) (binary search) O(n)O(log 2 n) Positional search (k th element) O(1)O(k)O(log 2 n) InsertionO(log 2 n) + O(n)O(n) O(1) known pos. O(log 2 n) DeletionO(log 2 n) + O(n)O(n) O(1) known pos. doubly linked O(log 2 n) Deletion k th element O(n-k)O(k)O(log 2 n) Sequential traversal O(n)
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees24 Extended Binary Trees
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees25 Extended binary trees Replace NULL-pointers with special (external) nodes. A binary tree, to which external nodes are added, is called extended binary tree. The data can be stored either in the internal or the external nodes. The length of the path to the node illustrates the cost of the search.
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees26 External and internal path length The cost of the search in extended binary trees depend on the following parameters: External path length = The sum over all path lengths from the root to the external nodes S i (1 i n+1): Ext n = i = 1... n+1 depth( S i ) Internal path length = The sum over all path lengths to the internal nodes K i ( 1 i n ): Int n = i = 1... n depth( K i ) Ext n = Int n + 2n(Proof by induction) Extended binary trees with a minimal external path length have a minimal internal path length too.
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees27 Example External path length Ext n = 3 + 4 + 4 + 2 + 3 + 3 + 3 + 3 = 25 Internal path length Int n = 0 + 1 + 1 + 2 + 2 + 2 + 3 = 11 25 = Ext n = Int n + 2n = 11 + 14 = 25 n = 7 0 4 11 222 3 4 3 3 3 3 3 2
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees28 Minimal and maximal length For a given n, a balanced tree has the minimal internal path length. Example: Within a complete tree with height h, the internal path length is (for n = 2 h -1): Int n = i = 1... h i 2i Internal path length becomes maximum if the tree degenerates to a linear list: Int n = i = 1... n-1 i = n(n-1)/2 Example: h = 4, n = 15, Int = 34, Ext = 164 = 64 For comparison: List with n = 15 nodes has Int = 105, Ext = 105 + 30 = 135
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees29 Weighted binary trees Often weights q i are assigned to the external nodes ( 1 i n+1 ). The weighted external path length is defined as Ext w = i = 1... n+1 depth( S i ) q i Within weighted binary trees the properties of minimal and maximal path lengths do not apply any more. The determination of the minimal external path length is an important practical problem... Ext w = 102 Ext w = 88 (less than 102 although linear list) 3 8 15 25 8 3 15 25
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees30 Application example: optimal codes To convert a text file efficiently to bit strings, there are two alternatives: Fixed length coding: each character has the same number of bits (e.g., ASCII) Variable length coding: some characters are represented using less bits than the others Example for coding with fixed length: 3-bit code for alphabet A, B, C, D: A = 001, B = 010, C = 011, D = 100 Message: ABBAABCDADA is converted to 001010010001001010011100001100001 (length 33 bits) Using a 2-bit code the same message can be coded only with 22 bits. For decoding the message, group each 3-bits (respectively 2bits) and use a table with the code and its matching character.
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees31 Application example: optimal codes (2) Idea: More frequently used characters are coded using less bits. Message: ABBAABCDADA Coding: 01010001011111001100 Length: 20 Bit! Variable length coding can reduce the memory space needed for storing the file. How can this special coding be found and why is the decoding unique? CharacterABCD Frequency5312 Coding010111110
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees32 Application example: optimal codes (3) Representation of the frequencies and coding as a weighted binary tree. First of all decoding: Given a bit string: Use the successive bits, in order to traverse the tree starting from the root. If you arrive to an external node, use the character stored there. Example: 010100010111... 1. Bit = 0: external node, A 2. Bit = 1, from the root to the right 3. Bit 0, links, external node, B 4. Bit = 1, from the root to the right 5. Bit 1, right... 3 5 2 1 0 0 0 1 1 1 A B DC
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees33 Correctness condition Observation: Within variable length coding, the code of one character should not be a prefix of the code of any other character. If a character is represented in form of an extended binary tree, then the uniqueness is guaranteed (only one character per external node). If the frequency of the characters in the original text is taken as the weight of the external nodes, then a tree with minimal external path length will offer an optimal code. How is a tree with minimal external path length generated?
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees34 Huffman Code Idea: Characters are weighted and sorted according to the frequency This works as well independently from the text, e.g., in English (characters with relative weights): A binary tree with minimal external path length is constructed as follows: Each character is represented with an appropriate tree with its corresponding weight (only one external node). The two trees having respectively the smallest weight are merged to a new tree. The root of the new tree is marked with the sum of the weights of the original roots. Continue until only one tree remains. E1231T959A805O794 N719I718S659R603 H514L403D365C320 U310P229F228M225 W203Y188B162G161 V93K52Q20X J10Z9
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees35 Example 1: Huffman Alphabet and frequency: ETNIS 2910954 Step 1: (4, 5, 9, 10, 29) new weight: 9 Step 2: (9, 9, 10, 29) new weight: 18 4+ 5 4 5 01 9 4 5 01 9+ 9 0 9 1
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees36 Example 1: Huffman (2) Step 3: (18, 10, 29) (10, 18, 29) new weight: 28 Step 4: (28, 29) finished! 9 4 5 01 1818 0 9 1 10+1 8 10 0 1 9 4 5 01 1818 0 9 1 2828 0 1 29 5757 01
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees37 Resulting tree Coding: Ext w = 112 Using this coding, the code e.g., for: TENNIS = 00101101101010100 SET = 0100100 NET = 011100 Decoding as described before. 9 S I 01 1818 0 N 1 2828 T 0 1 E 5757 01 CharacterCodeWeight E129 T0010 N0119 I01015 S01004
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees38 Some remarks The resulting tree is not regular. Regular trees are not always optimal. Example: the best nearly complete tree has Ext w = 123 For the message ABBAABCDADA 20 bits is optimal (see previous slides) 4 5 10 29 9
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees39 Example 2: Huffman Average number of bits without Huffman: 3 (because 2 3 = 8) Average number of bits using Huffman code: There are other “valid” solutions! But the average number of bits remains the same for all these solutions (equal to Huffman) Zp (%)Code A2500 B41110 C13100 D7110 E3501 F11101 G211110 H311111
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees40 Analysis /* Algorithm Huffmann */ for (int i = 1; i n-1; i++) { p 1 = smallest element in list L remove p1 from L p 2 = smallest element in L remove p 2 from L create node p add p 1 und p 2 as left and right subtrees to p weight p = weight p 1 + weight p 2 insert p into L } Run time behavior depends in particular on the implementation of the list Time required to find the node with the smallest weight Time required to insert a new node “Naive” implementations give O(n 2 ), “smarter” result in O(n log 2 n)
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees41 Optimality Observation: The weight of a node K in the Huffman tree is equal to the external path length of the subtree having K as root. Theorem: A Huffman tree is an extended binary tree with minimal external path length Ext w. Proof outline (per induction over n, the number of the characters in the alphabet): The statement to prove is A(n) = “A Huffman tree with n nodes has minimal external path length Ext w ”. Consider first n=2: Prove A(2) = “A Huffman tree with 2 nodes has minimal external path length”.
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees42 Optimality (2) Proof: n = 2: Only two characters with weights q1 and q2 result in a tree with Ext w = q1 + q2. This is minimal, because there are no other trees. Induction hypothesis: For all i n, A(i) is true. To prove: A(n+1) is true. V T1T1 T2T2
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees43 Optimality (3) Proof: Consider a Huffman tree T with n+1 nodes. This tree has a root V and two subtrees T 1 und T 2, which have respectively the weights q 1 and q 2. Considering the construction method we can deduce, that For the weights q i of all internal nodes n i of T 1 and T 2 : q i min(q 1, q 2 ). That’s why: for these weights q i : q 1 + q 2 > q i. So if V is replaced by any node in T1 or T2, the resulting tree will have a greater weight. Replacing nodes within T 1 and T 2 will not make sense, because T1 and T2 are already optimal (both are trees with n nodes or less and the induction hypothesis hold for them). So T is an optimal tree with n+1 nodes. V T1T1 T2T2 q1q1 q2q2 q 1 + q2
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees44 Huffman Code: Applications Fax machine
ICS-II - 2008Balanced Binary Search Trees (2) & Extended Binary Trees45 Huffman: Other applications ZIP-Coding (at least similar technique) In principle: most of coding techniques with data reduction (lossless compression) NOT Huffman: lossy compression techniques like JPEG, MP3, MPEG, …
Similar presentations
© 2025 Inc.
All rights reserved.