CS 2604 Intro Data Structures and File Management Binary Trees The jargon defined here is used throughout later notes, so students should become familiar with it. The fact that the definition of a binary tree is recursive is worth noting. A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each other and from the root. For example: Jargon: root node level: 0 1 2 internal node edge leaf node Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Binary Tree Node Relationships CS 2604 Intro Data Structures and File Management Binary Tree Node Relationships More jargon… not a big deal. A binary tree node may have 0, 1 or 2 child nodes. A path is a sequence of adjacent (via the edges) nodes in the tree. A subtree of a binary tree is either empty, or consists of a node in that tree and all of its descendent nodes. parent node of b and g a b g d e f h child nodes of a a descendant of a and g subtree rooted at g Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Quick Application: Expression Trees CS 2604 Intro Data Structures and File Management Quick Application: Expression Trees Don't overdo this… the point is to provide some motivation for wanting the tree structure. A binary tree may be used to represent an algebraic expression: * x – + 5 y Each subtree represents a part of the entire expression… If we visit the nodes of the binary tree in the correct order, we will construct the algebraic expression: Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
CS 2604 Intro Data Structures and File Management Traversals The traversal schemes are obviously fundamental, but it shouldn't take much time to illustrate them. At least one should be discussed carefully in order to emphasize the recursive nature and reinforce the students' exposure to recursive algorithms. A traversal is an algorithm for visiting some or all of the nodes of a binary tree in some defined order. A traversal that visits every node in a binary tree is called an enumeration. a b g d e f h preorder: visit the node, then the left subtree, then the right subtree a b g d f h e postorder: visit the left subtree, then the right subtree, and then the node inorder: visit the left subtree, then the node, then the right subtree b f h d e g a b a f d h g e Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Postorder Traversal Details CS 2604 Intro Data Structures and File Management Postorder Traversal Details The representation of the recursive descent here is mine… no claim that it's the best way to present it. Consider the postorder traversal from a recursive perspective: postorder: postorder visit the left subtree, postorder visit the right subtree, then visit the node (no recursion) a b g d e f h If we start at the root: POV sub(b) | POV sub(g) | visit a visit b POV sub(d) | POV sub(e) | visit g visit f | visit h | visit d visit e Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Binary Tree Representation CS 2604 Intro Data Structures and File Management Binary Tree Representation I chose to present a general binary tree before delving into BST and other specializations, primarily so I could address some fundamental design issues and add some more emphasis on recursion before getting down to the useful variants. I also use this as the base from which the other variants are derived, although I do not use inheritance to do so in this version of my notes. WARNING: most of the students entering 2604 don't have any sort of firm grip on recursion. Trace as many recursions as you have time for. The natural way to think of a binary tree is that it consists of nodes (objects) connected by edges (pointers). This leads to a design employing two classes: - binary tree class to encapsulate the tree and its operations - binary node class to encapsulate the data elements, pointers and associated operations. Each should be a template, for generality. The node class may handle all direct accesses of the pointers and data element, or allow its client (the tree) free access. The tree class may maintain a sense of a current location (node) and must provide all the high-level functions, such as searching, insertion and deletion. Many implementations use a struct type for the nodes. The motivation is generally to make the data elements and pointers public and hence to simplify the code, at the expense of automatic initialization via a constructor. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
A Binary Node Class Interface CS 2604 Intro Data Structures and File Management A Binary Node Class Interface As usual, I choose to make the data and pointers public in the node type. That should be safe since nodes are only used by an encapsulating container. It also simplifies some of the recursive implementations. I also wrote this to store data by pointers, rather than directly, for the reasons given on the slide. (Pointers are necessary if I need to store members of a polymorphic inheritance hierarchy.) Here's a possible interface for a binary tree node: template <typename T> class BinNodeT { public: T Element; BinNodeT<T>* Left; BinNodeT<T>* Right; BinNodeT(); BinNodeT(const T& D, BinNodeT<T>* L = NULL, BinNodeT<T>* R = NULL); bool isLeaf() const; ~BinNodeT(); }; Binary tree object can access node data members directly. Useful for tree navigation. The design here leaves the data members public to simplify the implementation of the encapsulating binary tree class; due to that encapsulation there is no concern that client code will be able to take advantage of this decision. The data element is stored by pointer to provide for storing dynamically allocated elements, and elements from an inheritance hierarchy. Converting to direct storage is relatively trivial. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
A Binary Tree Class Interface CS 2604 Intro Data Structures and File Management A Binary Tree Class Interface I tried to keep this minimal and still illustrate the basic design issues. Shaffer's approach to dealing with data comparisons could certainly be adopted here, but I chose to ignore that issue. Here it's irrelevant aside from an equality comparison. When we get to the BST I'll explicitly require the data type to provide the usual relational operators. One common question is "why do we have those recursive helper functions?" The answer is that the public functions can't do the job because they can't take the node pointers the recursive helpers need as parameters. I included a Current pointer to support some internal bookkeeping in my first version of the template… at this point it's probably not used for anything but to remember the node located by the Find() function. Here's a possible interface for a binary tree class. It's not likely to be put to any practical use, just a proof of concept. template <typename T> class BinaryTreeT { protected: BinNodeT<T>* Root; BinNodeT<T>* Current; int SizeHelper(BinNodeT<T>* sRoot) const; int HeightHelper(BinNodeT<T>* sRoot) const; virtual bool InsertHelper(const T& D, BinNodeT<T>* sRoot); virtual bool DeleteHelper(const T& D, BinNodeT<T>* sRoot); virtual void TreeCopyHelper(BinNodeT<T>* TargetRoot, BinNodeT<T>* SourceRoot); virtual T* const FindHelper(const T& toFind, BinNodeT<T>* sRoot); virtual void InOrderPrintHelper(BinNodeT<T>* sRoot, ostream& Out, int Level); void ClearHelper(BinNodeT<T>* sRoot); public: BinaryTreeT(); BinaryTreeT(const T& D); BinaryTreeT(const BinaryTreeT<T>& Source); BinaryTreeT<T>& operator=(const BinaryTreeT<T>& Source); // . . . continued . . . Recursive "helper" functions — each has a corresponding public function. Virtual functions are used to encourage subclasses. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
A Binary Tree Class Interface CS 2604 Intro Data Structures and File Management A Binary Tree Class Interface I tried to keep this minimal and still illustrate the basic design issues. Shaffer's approach to dealing with data comparisons could certainly be adopted here, but I chose to ignore that issue. Here it's irrelevant aside from an equality comparison. When we get to the BST I'll explicitly require the data type to provide the usual relational operators. One common question is "why do we have those recursive helper functions?" The answer is that the public functions can't do the job because they can't take the node pointers the recursive helpers need as parameters. I included a Current pointer to support some internal bookkeeping in my first version of the template… at this point it's probably not used for anything but to remember the node located by the Find() function. Another common question is "why is the destructor virtual"… the answer is that is necessary in order to guarantee the correct derived-type destructor is invoked in cases where polymorphic assess is used. // . . . continued . . . virtual bool Insert(const T& D); virtual bool Delete(const T& D); virtual T* const Find(const T& D); int Size() const; int Height() const; virtual void InOrderPrint(ostream& Out); void Clear(); virtual ~BinaryTreeT(); }; Data insertion/search functions. Reporters, a display function, and a clear function. The interface is somewhat incomplete since it's not really a serious class… as we will see, specialized binary trees are what we really want. Still, there are some useful things we can learn from even an incomplete version… Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
CS 2604 Intro Data Structures and File Management Finding a Data Element By design, FindHelper() is always called with sRoot pointing to a node that MAY contain the target value. The root could be checked by the public function, but I find that leads to a less clear helper implementation. The recursive helper function is needed because the recursion requires the root pointer to a subtree… the client can't start the ball rolling since the tree root pointer is protected. An alternative could take a special flag to mean "start at the root" but again that would be less clear. Obviously, a preorder traversal is used so that we don't descend past the target value and waste steps looking at its subtrees. The function returns a pointer rather than a reference so that failure can be signaled to the client by returning NULL. The return is protected by const so that the client cannot call delete on it. That would seriously compromise the integrity of the container, if it did not simply lead to a runtime error. The long-term idea is that we could override these functions in specialized derived types, such as the BST. template <typename T> T* const BinaryTreeT<T>::Find(const T& toFind) { if (Root == NULL) return NULL; return (FindHelper(toFind, Root)); } T* const BinaryTreeT<T>::FindHelper(const T& toFind, BinNodeT<T>* sRoot) { T* Result; if (sRoot == NULL) return NULL; if (sRoot->Element == toFind) { Current = sRoot; Result = &(Current->Element); else { Result = FindHelper(toFind, sRoot->Left); if (Result == NULL) Result = FindHelper(toFind, sRoot->Right); return Result; Nonrecursive interface function for client… … uses a recursive protected function to do almost all the work. Why? Which traversal is used here? Why not use a different traversal instead? Why is const used on the return value?? Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
CS 2604 Intro Data Structures and File Management Clearing the Tree The logic of destroying the tree nodes should be covered carefully…. this is almost always confusing to weaker (and perhaps not so weak) students. The common first reaction is that the helper function isn't doing anything… it's too simple to be doing anything… a quick trace on a tree holding three nodes is probably sufficient to clear that up. Similar to the class destructor, Clear() causes the deallocation of all the tree nodes and the resetting of Root and Current to indicate an empty tree. template <typename T> void BinaryTreeT<T>::Clear() { ClearHelper(Root); Root = Current = NULL; } void BinaryTreeT<T>::ClearHelper(BinNodeT<T>* sRoot) { if (sRoot == NULL) return; ClearHelper(sRoot->Left); ClearHelper(sRoot->Right); delete sRoot; Which traversal is used here? Why not use a different traversal instead? Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
CS 2604 Intro Data Structures and File Management Inorder Printing A good display function is invaluable in debugging container implementations. This is also another good opportunity to illustrate an application of recursion, and the virtue of an inorder traversal. QTP: obviously, swap the recursive calls in the helper function. template <typename T> void BinaryTreeT<T>::InOrderPrint(ostream& Out) { if (Root == NULL) { Out << "tree is empty" << endl; return; } InOrderPrintHelper(Root, Out, 0); void BinaryTreeT<T>::InOrderPrintHelper(BinNodeT<T>* sRoot, ostream& Out, int Level) { if (sRoot == NULL) return; InOrderPrintHelper(sRoot->Left, Out, Level + 1); for (int L = 0; L < Level; L++) Out << " "; Out << sRoot->Element << endl; InOrderPrintHelper(sRoot->Right, Out, Level + 1); Inorder traversal: 3 1 4 5 2 7 6 8 left right QTP: Could we reverse the sides of the printed tree? Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Summary of Implementation CS 2604 Intro Data Structures and File Management Summary of Implementation The implementation described here is primarily for illustration. The full implementation has been tested, but not thoroughly. As we will see in the next chapter, general binary trees are not often used in applications. Rather, specialized variants are derived from the notion of a general binary tree, and THOSE are used. Before proceeding with that idea, we need to establish a few facts regarding binary trees. Warning: the binary tree classes given in this chapter are intended for instructional purposes. The given implementation contains a number of known flaws, and perhaps some unknown flaws as well. Caveat emptor. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Full and Complete Binary Trees CS 2604 Intro Data Structures and File Management Full and Complete Binary Trees The two terms defined here are common in graph theory, but unfortunately authors differ in their meaning. It's common to find the definitions reversed in other books. Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Full but not complete. Definition: a binary tree T is full if each node is either a leaf or possesses exactly two child nodes. Definition: a binary tree T with n levels is complete if all levels except possibly the last are completely full, and the last level has all its nodes to the left side. Neither complete nor full. Complete but not full. Full and complete. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Full Binary Tree Theorem CS 2604 Intro Data Structures and File Management Full Binary Tree Theorem The results here are used to derive some bounds on balanced BST search. Theorem: Let T be a nonempty, full binary tree Then: (a) If T has I internal nodes, the number of leaves is L = I + 1. (b) If T has I internal nodes, the total number of nodes is N = 2I + 1. (c) If T has a total of N nodes, the number of internal nodes is I = (N – 1)/2. (d) If T has a total of N nodes, the number of leaves is L = (N + 1)/2. (e) If T has L leaves, the total number of nodes is N = 2L – 1. (f) If T has L leaves, the number of internal nodes is I = L – 1. Basically, this theorem says that the number of nodes N, the number of leaves L, and the number of internal nodes I are related in such a way that if you know any one of them, you can determine the other two. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Full Binary Tree Theorem Proof CS 2604 Intro Data Structures and File Management Full Binary Tree Theorem Proof The proof is essentially a straightforward induction, but there is a standard trick that students may not have seen in their discrete math course. It's common in graph theoretic proofs to induct on some measure of the size of the graph, such as the number of internal nodes in a tree, and perform the induction step by performing surgery to reduce a graph of "size" N + 1 to one of "size" N in order to use the inductive assumption. The proofs here should be given some serious attention. proof of (a): We will use induction on the number of internal nodes, I. Let S be the set of all integers I 0 such that if T is a full binary tree with I internal nodes then T has I + 1 leaf nodes. For the base case, if I = 0 then the tree must consist only of a root node, having no children because the tree is full. Hence there is 1 leaf node, and so 0 S. Now suppose that some integer K 0 is in S. That is, whenever a nonempty full binary tree has K internal nodes it has K + 1 leaf nodes. Let T be a full binary tree with K + 1 internal nodes. Pick an internal node of T whose child nodes are both leaves (how do we know this is possible?), and delete both its children; call the resulting tree T'. Then T' is a nonempty full binary tree, and T' has K internal nodes; so by the inductive assumption, T' must have K + 1 leaf nodes. But the number of leaf nodes in T' is one less than the number of leaf nodes in T (deleting the two child nodes turned the former internal node into a leaf). Therefore, T must have K + 2 leaf nodes and so K + 1 S. Hence by Mathematical Induction, S = [0, ). QED The remaining parts are easily derived algebraically from (a). Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Limit on the Number of Leaves CS 2604 Intro Data Structures and File Management Limit on the Number of Leaves This one's important primarily because it provides the basis for the next theorem. Theorem: Let T be a binary tree of with l levels. Then the number of leaves is at most 2l-1. proof: We will use strong induction on the number of levels, l. Let S be the set of all integers l 1 such that if T is a binary tree with l levels then T has at most 2l-1 leaf nodes. For the base case, if l = 1 then the tree must have one node (the root) and it must have no child nodes. Hence there is 1 leaf node (which is 2l-1 if l = 1), and so 1 S. Now suppose that for some integer K 1, all the integers 1 through K are in S. That is, whenever a binary tree has M levels with M K, it has at most 2M-1 leaf nodes. Let T be a binary tree with K + 1 levels. If T has the maximum number of leaves, T consists of a root node and two nonempty subtrees, say S1 and S2. Let S1 and S2 have M1and M2 levels, respectively. Since M1 and M2 are between 1 and K, each is in S by the inductive assumption. Hence, the number of leaf nodes in S1 and S2 are at most 2K-1 and 2K-1, respectively. Since all the leaves of T must be leaves of S1 or of S2, the number of leaves in T is at most 2K-1 + 2K-1 which is 2K. Therefore, K + 1 is in S. Hence by Mathematical Induction, S = [1, ). QED Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees
Limit on the Number of Levels CS 2604 Intro Data Structures and File Management Limit on the Number of Levels This one's important because it provides the theoretical foundation for complexity results relating to search in a binary search tree, and later to the lower bound on the cost of comparison-based sorting. Theorem: Let T be a binary tree of with l levels and L leaves. Then the number of levels is at least log L + 1. proof: From the previous theorem, if T has l levels then the number of leaves is at most 2l-1. That is, L 2l-1 Taking logarithms of both sides yields: log L l - 1 Since l is an integer, we may apply the ceiling function to the left side, to obtain: log L l - 1 and the final result follows immediately. QED Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees