CS 2604 Intro Data Structures and File Management

Slides:



Advertisements
Similar presentations
Binary Trees Chapter 6. Linked Lists Suck By now you realize that the title to this slide is true… By now you realize that the title to this slide is.
Advertisements

Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
Binary Trees, Binary Search Trees COMP171 Fall 2006.
Data Structures Data Structures Topic #8. Today’s Agenda Continue Discussing Table Abstractions But, this time, let’s talk about them in terms of new.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
Binary Trees Chapter 6.
Data Structures Using C++1 Chapter 11 Binary Trees.
UNCA CSCI September, 2001 These notes were prepared by the text’s author Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright.
CS Data Structures Chapter 15 Trees Mehmet H Gunes
CS 1031 Tree Traversal Techniques; Heaps Tree Traversal Concept Tree Traversal Techniques: Preorder, Inorder, Postorder Full Trees Almost Complete Trees.
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
A Review of Binary Search Trees Dr. Gang Qian Department of Computer Science University of Central Oklahoma.
CMSC 341 Introduction to Trees. 8/3/2007 UMBC CMSC 341 TreeIntro 2 Tree ADT Tree definition  A tree is a set of nodes which may be empty  If not empty,
Chapter 6 Binary Trees. 6.1 Trees, Binary Trees, and Binary Search Trees Linked lists usually are more flexible than arrays, but it is difficult to use.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
1 Chapter 10 Trees. 2 Definition of Tree A tree is a set of linked nodes, such that there is one and only one path from a unique node (called the root.
Review 1 Queue Operations on Queues A Dequeue Operation An Enqueue Operation Array Implementation Link list Implementation Examples.
Tree Traversals, TreeSort 20 February Expression Tree Leaves are operands Interior nodes are operators A binary tree to represent (A - B) + C.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
CMSC 341 Introduction to Trees. 2/21/20062 Tree ADT Tree definition –A tree is a set of nodes which may be empty –If not empty, then there is a distinguished.
BSTs Data Structures & OO Development I 1 Computer Science Dept Va Tech June 2006 ©2006 McQuain & Ribbens Binary Search Trees A binary search tree or BST.
Binary Trees Data Structures & OO Development I 1 Computer Science Dept Va Tech June 2006 ©2006 McQuain & Ribbens Binary Trees A binary tree is either.
CMSC 202, Version 5/02 1 Trees. CMSC 202, Version 5/02 2 Tree Basics 1.A tree is a set of nodes. 2.A tree may be empty (i.e., contain no nodes). 3.If.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Trees CSIT 402 Data Structures II 1. 2 Why Do We Need Trees? Lists, Stacks, and Queues are linear relationships Information often contains hierarchical.
1 CMSC 341 Introduction to Trees Textbook sections:
CSE 373 Data Structures Lecture 7
Trees Saurav Karmakar
Non Linear Data Structure
CS 201 Data Structures and Algorithms
Trees Chapter 15.
Data Structure and Algorithms
Binary Trees.
CMSC 341 Introduction to Trees 8/3/2007 CMSC 341 Tree Intro.
Binary Search Trees A binary search tree is a binary tree
Data Structures Binary Trees 1.
UNIT III TREES.
Week 6 - Wednesday CS221.
Trees.
CMSC 341 Introduction to Trees.
ITEC 2620M Introduction to Data Structures
Binary Trees Lecture 36 Wed, Apr 21, /21/2018 Binary Trees.
Data Structures Using C++ 2E
Binary Tree and General Tree
Chapter 20: Binary Trees.
Binary Tree and General Tree
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Chapter 21: Binary Trees.
Find in a linked list? first last 7  4  3  8 NULL
Trees.
B- Trees D. Frey with apologies to Tom Anastasio
CS Data Structure: Heaps.
Trees CMSC 202, Version 5/02.
Lecture 12 CS203 1.
CMSC 202 Trees.
B- Trees D. Frey with apologies to Tom Anastasio
CE 221 Data Structures and Algorithms
Binary Trees, Binary Search Trees
Advanced Implementation of Tables
CE 221 Data Structures and Algorithms
Trees.
Binary Trees.
CMSC 341 Introduction to Trees CMSC 341 Tree Intro.
Mark Redekopp David Kempe
B-Trees.
Binary Trees, Binary Search Trees
CS 2604 Data Structures and File Management
General Trees A general tree T is a finite set of one or more nodes such that there is one designated node r, called the root of T, and the remaining nodes.
NATURE VIEW OF A TREE leaves branches root. NATURE VIEW OF A TREE leaves branches root.
Presentation transcript:

CS 2604 Intro Data Structures and File Management Binary Trees The jargon defined here is used throughout later notes, so students should become familiar with it. The fact that the definition of a binary tree is recursive is worth noting. A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from each other and from the root. For example:  Jargon: root node level: 0 1 2 internal node edge leaf node Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Binary Tree Node Relationships CS 2604 Intro Data Structures and File Management Binary Tree Node Relationships More jargon… not a big deal. A binary tree node may have 0, 1 or 2 child nodes. A path is a sequence of adjacent (via the edges) nodes in the tree. A subtree of a binary tree is either empty, or consists of a node in that tree and all of its descendent nodes. parent node of b and g a b g d e f h child nodes of a a descendant of a and g subtree rooted at g Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Quick Application: Expression Trees CS 2604 Intro Data Structures and File Management Quick Application: Expression Trees Don't overdo this… the point is to provide some motivation for wanting the tree structure. A binary tree may be used to represent an algebraic expression: * x – + 5 y Each subtree represents a part of the entire expression… If we visit the nodes of the binary tree in the correct order, we will construct the algebraic expression: Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

CS 2604 Intro Data Structures and File Management Traversals The traversal schemes are obviously fundamental, but it shouldn't take much time to illustrate them. At least one should be discussed carefully in order to emphasize the recursive nature and reinforce the students' exposure to recursive algorithms. A traversal is an algorithm for visiting some or all of the nodes of a binary tree in some defined order. A traversal that visits every node in a binary tree is called an enumeration. a b g d e f h preorder: visit the node, then the left subtree, then the right subtree a b g d f h e postorder: visit the left subtree, then the right subtree, and then the node inorder: visit the left subtree, then the node, then the right subtree b f h d e g a b a f d h g e Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Postorder Traversal Details CS 2604 Intro Data Structures and File Management Postorder Traversal Details The representation of the recursive descent here is mine… no claim that it's the best way to present it. Consider the postorder traversal from a recursive perspective: postorder: postorder visit the left subtree, postorder visit the right subtree, then visit the node (no recursion) a b g d e f h If we start at the root: POV sub(b) | POV sub(g) | visit a visit b POV sub(d) | POV sub(e) | visit g visit f | visit h | visit d visit e Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Binary Tree Representation CS 2604 Intro Data Structures and File Management Binary Tree Representation I chose to present a general binary tree before delving into BST and other specializations, primarily so I could address some fundamental design issues and add some more emphasis on recursion before getting down to the useful variants. I also use this as the base from which the other variants are derived, although I do not use inheritance to do so in this version of my notes. WARNING: most of the students entering 2604 don't have any sort of firm grip on recursion. Trace as many recursions as you have time for. The natural way to think of a binary tree is that it consists of nodes (objects) connected by edges (pointers). This leads to a design employing two classes: - binary tree class to encapsulate the tree and its operations - binary node class to encapsulate the data elements, pointers and associated operations. Each should be a template, for generality. The node class may handle all direct accesses of the pointers and data element, or allow its client (the tree) free access. The tree class may maintain a sense of a current location (node) and must provide all the high-level functions, such as searching, insertion and deletion. Many implementations use a struct type for the nodes. The motivation is generally to make the data elements and pointers public and hence to simplify the code, at the expense of automatic initialization via a constructor. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

A Binary Node Class Interface CS 2604 Intro Data Structures and File Management A Binary Node Class Interface As usual, I choose to make the data and pointers public in the node type. That should be safe since nodes are only used by an encapsulating container. It also simplifies some of the recursive implementations. I also wrote this to store data by pointers, rather than directly, for the reasons given on the slide. (Pointers are necessary if I need to store members of a polymorphic inheritance hierarchy.) Here's a possible interface for a binary tree node: template <typename T> class BinNodeT { public: T Element; BinNodeT<T>* Left; BinNodeT<T>* Right; BinNodeT(); BinNodeT(const T& D, BinNodeT<T>* L = NULL, BinNodeT<T>* R = NULL); bool isLeaf() const; ~BinNodeT(); }; Binary tree object can access node data members directly. Useful for tree navigation. The design here leaves the data members public to simplify the implementation of the encapsulating binary tree class; due to that encapsulation there is no concern that client code will be able to take advantage of this decision. The data element is stored by pointer to provide for storing dynamically allocated elements, and elements from an inheritance hierarchy. Converting to direct storage is relatively trivial. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

A Binary Tree Class Interface CS 2604 Intro Data Structures and File Management A Binary Tree Class Interface I tried to keep this minimal and still illustrate the basic design issues. Shaffer's approach to dealing with data comparisons could certainly be adopted here, but I chose to ignore that issue. Here it's irrelevant aside from an equality comparison. When we get to the BST I'll explicitly require the data type to provide the usual relational operators. One common question is "why do we have those recursive helper functions?" The answer is that the public functions can't do the job because they can't take the node pointers the recursive helpers need as parameters. I included a Current pointer to support some internal bookkeeping in my first version of the template… at this point it's probably not used for anything but to remember the node located by the Find() function. Here's a possible interface for a binary tree class. It's not likely to be put to any practical use, just a proof of concept. template <typename T> class BinaryTreeT { protected: BinNodeT<T>* Root; BinNodeT<T>* Current; int SizeHelper(BinNodeT<T>* sRoot) const; int HeightHelper(BinNodeT<T>* sRoot) const; virtual bool InsertHelper(const T& D, BinNodeT<T>* sRoot); virtual bool DeleteHelper(const T& D, BinNodeT<T>* sRoot); virtual void TreeCopyHelper(BinNodeT<T>* TargetRoot, BinNodeT<T>* SourceRoot); virtual T* const FindHelper(const T& toFind, BinNodeT<T>* sRoot); virtual void InOrderPrintHelper(BinNodeT<T>* sRoot, ostream& Out, int Level); void ClearHelper(BinNodeT<T>* sRoot); public: BinaryTreeT(); BinaryTreeT(const T& D); BinaryTreeT(const BinaryTreeT<T>& Source); BinaryTreeT<T>& operator=(const BinaryTreeT<T>& Source); // . . . continued . . . Recursive "helper" functions — each has a corresponding public function. Virtual functions are used to encourage subclasses. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

A Binary Tree Class Interface CS 2604 Intro Data Structures and File Management A Binary Tree Class Interface I tried to keep this minimal and still illustrate the basic design issues. Shaffer's approach to dealing with data comparisons could certainly be adopted here, but I chose to ignore that issue. Here it's irrelevant aside from an equality comparison. When we get to the BST I'll explicitly require the data type to provide the usual relational operators. One common question is "why do we have those recursive helper functions?" The answer is that the public functions can't do the job because they can't take the node pointers the recursive helpers need as parameters. I included a Current pointer to support some internal bookkeeping in my first version of the template… at this point it's probably not used for anything but to remember the node located by the Find() function. Another common question is "why is the destructor virtual"… the answer is that is necessary in order to guarantee the correct derived-type destructor is invoked in cases where polymorphic assess is used. // . . . continued . . . virtual bool Insert(const T& D); virtual bool Delete(const T& D); virtual T* const Find(const T& D); int Size() const; int Height() const; virtual void InOrderPrint(ostream& Out); void Clear(); virtual ~BinaryTreeT(); }; Data insertion/search functions. Reporters, a display function, and a clear function. The interface is somewhat incomplete since it's not really a serious class… as we will see, specialized binary trees are what we really want. Still, there are some useful things we can learn from even an incomplete version… Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

CS 2604 Intro Data Structures and File Management Finding a Data Element By design, FindHelper() is always called with sRoot pointing to a node that MAY contain the target value. The root could be checked by the public function, but I find that leads to a less clear helper implementation. The recursive helper function is needed because the recursion requires the root pointer to a subtree… the client can't start the ball rolling since the tree root pointer is protected. An alternative could take a special flag to mean "start at the root" but again that would be less clear. Obviously, a preorder traversal is used so that we don't descend past the target value and waste steps looking at its subtrees. The function returns a pointer rather than a reference so that failure can be signaled to the client by returning NULL. The return is protected by const so that the client cannot call delete on it. That would seriously compromise the integrity of the container, if it did not simply lead to a runtime error. The long-term idea is that we could override these functions in specialized derived types, such as the BST. template <typename T> T* const BinaryTreeT<T>::Find(const T& toFind) { if (Root == NULL) return NULL; return (FindHelper(toFind, Root)); } T* const BinaryTreeT<T>::FindHelper(const T& toFind, BinNodeT<T>* sRoot) { T* Result; if (sRoot == NULL) return NULL; if (sRoot->Element == toFind) { Current = sRoot; Result = &(Current->Element); else { Result = FindHelper(toFind, sRoot->Left); if (Result == NULL) Result = FindHelper(toFind, sRoot->Right); return Result; Nonrecursive interface function for client… … uses a recursive protected function to do almost all the work. Why? Which traversal is used here? Why not use a different traversal instead? Why is const used on the return value?? Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

CS 2604 Intro Data Structures and File Management Clearing the Tree The logic of destroying the tree nodes should be covered carefully…. this is almost always confusing to weaker (and perhaps not so weak) students. The common first reaction is that the helper function isn't doing anything… it's too simple to be doing anything… a quick trace on a tree holding three nodes is probably sufficient to clear that up. Similar to the class destructor, Clear() causes the deallocation of all the tree nodes and the resetting of Root and Current to indicate an empty tree. template <typename T> void BinaryTreeT<T>::Clear() { ClearHelper(Root); Root = Current = NULL; } void BinaryTreeT<T>::ClearHelper(BinNodeT<T>* sRoot) { if (sRoot == NULL) return; ClearHelper(sRoot->Left); ClearHelper(sRoot->Right); delete sRoot; Which traversal is used here? Why not use a different traversal instead? Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

CS 2604 Intro Data Structures and File Management Inorder Printing A good display function is invaluable in debugging container implementations. This is also another good opportunity to illustrate an application of recursion, and the virtue of an inorder traversal. QTP: obviously, swap the recursive calls in the helper function. template <typename T> void BinaryTreeT<T>::InOrderPrint(ostream& Out) { if (Root == NULL) { Out << "tree is empty" << endl; return; } InOrderPrintHelper(Root, Out, 0); void BinaryTreeT<T>::InOrderPrintHelper(BinNodeT<T>* sRoot, ostream& Out, int Level) { if (sRoot == NULL) return; InOrderPrintHelper(sRoot->Left, Out, Level + 1); for (int L = 0; L < Level; L++) Out << " "; Out << sRoot->Element << endl; InOrderPrintHelper(sRoot->Right, Out, Level + 1); Inorder traversal: 3 1 4 5 2 7 6 8 left right QTP: Could we reverse the sides of the printed tree? Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Summary of Implementation CS 2604 Intro Data Structures and File Management Summary of Implementation The implementation described here is primarily for illustration. The full implementation has been tested, but not thoroughly. As we will see in the next chapter, general binary trees are not often used in applications. Rather, specialized variants are derived from the notion of a general binary tree, and THOSE are used. Before proceeding with that idea, we need to establish a few facts regarding binary trees. Warning: the binary tree classes given in this chapter are intended for instructional purposes. The given implementation contains a number of known flaws, and perhaps some unknown flaws as well. Caveat emptor. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Full and Complete Binary Trees CS 2604 Intro Data Structures and File Management Full and Complete Binary Trees The two terms defined here are common in graph theory, but unfortunately authors differ in their meaning. It's common to find the definitions reversed in other books. Here are two important types of binary trees. Note that the definitions, while similar, are logically independent. Full but not complete. Definition: a binary tree T is full if each node is either a leaf or possesses exactly two child nodes. Definition: a binary tree T with n levels is complete if all levels except possibly the last are completely full, and the last level has all its nodes to the left side. Neither complete nor full. Complete but not full. Full and complete. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Full Binary Tree Theorem CS 2604 Intro Data Structures and File Management Full Binary Tree Theorem The results here are used to derive some bounds on balanced BST search. Theorem: Let T be a nonempty, full binary tree Then: (a) If T has I internal nodes, the number of leaves is L = I + 1. (b) If T has I internal nodes, the total number of nodes is N = 2I + 1. (c) If T has a total of N nodes, the number of internal nodes is I = (N – 1)/2. (d) If T has a total of N nodes, the number of leaves is L = (N + 1)/2. (e) If T has L leaves, the total number of nodes is N = 2L – 1. (f) If T has L leaves, the number of internal nodes is I = L – 1. Basically, this theorem says that the number of nodes N, the number of leaves L, and the number of internal nodes I are related in such a way that if you know any one of them, you can determine the other two. Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Full Binary Tree Theorem Proof CS 2604 Intro Data Structures and File Management Full Binary Tree Theorem Proof The proof is essentially a straightforward induction, but there is a standard trick that students may not have seen in their discrete math course. It's common in graph theoretic proofs to induct on some measure of the size of the graph, such as the number of internal nodes in a tree, and perform the induction step by performing surgery to reduce a graph of "size" N + 1 to one of "size" N in order to use the inductive assumption. The proofs here should be given some serious attention. proof of (a): We will use induction on the number of internal nodes, I. Let S be the set of all integers I  0 such that if T is a full binary tree with I internal nodes then T has I + 1 leaf nodes. For the base case, if I = 0 then the tree must consist only of a root node, having no children because the tree is full. Hence there is 1 leaf node, and so 0  S. Now suppose that some integer K  0 is in S. That is, whenever a nonempty full binary tree has K internal nodes it has K + 1 leaf nodes. Let T be a full binary tree with K + 1 internal nodes. Pick an internal node of T whose child nodes are both leaves (how do we know this is possible?), and delete both its children; call the resulting tree T'. Then T' is a nonempty full binary tree, and T' has K internal nodes; so by the inductive assumption, T' must have K + 1 leaf nodes. But the number of leaf nodes in T' is one less than the number of leaf nodes in T (deleting the two child nodes turned the former internal node into a leaf). Therefore, T must have K + 2 leaf nodes and so K + 1  S. Hence by Mathematical Induction, S = [0, ). QED The remaining parts are easily derived algebraically from (a). Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Limit on the Number of Leaves CS 2604 Intro Data Structures and File Management Limit on the Number of Leaves This one's important primarily because it provides the basis for the next theorem. Theorem: Let T be a binary tree of with l levels. Then the number of leaves is at most 2l-1. proof: We will use strong induction on the number of levels, l. Let S be the set of all integers l  1 such that if T is a binary tree with l levels then T has at most 2l-1 leaf nodes. For the base case, if l = 1 then the tree must have one node (the root) and it must have no child nodes. Hence there is 1 leaf node (which is 2l-1 if l = 1), and so 1  S. Now suppose that for some integer K  1, all the integers 1 through K are in S. That is, whenever a binary tree has M levels with M  K, it has at most 2M-1 leaf nodes. Let T be a binary tree with K + 1 levels. If T has the maximum number of leaves, T consists of a root node and two nonempty subtrees, say S1 and S2. Let S1 and S2 have M1and M2 levels, respectively. Since M1 and M2 are between 1 and K, each is in S by the inductive assumption. Hence, the number of leaf nodes in S1 and S2 are at most 2K-1 and 2K-1, respectively. Since all the leaves of T must be leaves of S1 or of S2, the number of leaves in T is at most 2K-1 + 2K-1 which is 2K. Therefore, K + 1 is in S. Hence by Mathematical Induction, S = [1, ). QED Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees

Limit on the Number of Levels CS 2604 Intro Data Structures and File Management Limit on the Number of Levels This one's important because it provides the theoretical foundation for complexity results relating to search in a binary search tree, and later to the lower bound on the cost of comparison-based sorting. Theorem: Let T be a binary tree of with l levels and L leaves. Then the number of levels is at least log L + 1. proof: From the previous theorem, if T has l levels then the number of leaves is at most 2l-1. That is, L  2l-1 Taking logarithms of both sides yields: log L  l - 1 Since l is an integer, we may apply the ceiling function to the left side, to obtain:  log L   l - 1 and the final result follows immediately. QED Computer Science Dept Va Tech January 2004 ©2000-2004 McQuain WD General Binary Trees