Binary Search Trees A binary search tree is a binary tree It may be empty. If not empty, then it has the following properties: Every element(including the root) has a key, and all keys are distinct The keys(if any) in the left subtree are smaller than the key in the root The keys(if any) in the right subtree are larger than the key in the root Examples of binary tree with distinct keys:
Binary Search Tree Class /*1*/ template <class Etype> /*2*/ class TreeNode { /*3*/ protected: /*4*/ Etype element; /*5*/ TreeNode* left; /*6*/ TreeNode* right; /*7*/ TreeNode ( Etype E=0, TreeNode* L=NULL, TreeNode* R=NULL) : /*8*/ element( E ), left( L ), right( R ) { } /*9*/ friend int Height( TreeNode* T); /*10*/ friend class BinarySearchTree<Etype>; /*11*/ };
Searching a BST To search for an element with key x: Begin at root If root is nil then search tree has no elements and search is unsuccessful Else compare x with key in root If X equals search key then search terminates successfully If x less than key in root, then search left subtree Else if x greater than key in root then search right subtree.
Searching BST //Find routine for binary search tree (assumes linked representation of bst /*1*/ template <class Etype> /*2*/ TreeNode<Etype>* BinarySearchTree<Etype> :: /*3*/ find(const Etype& X, TreeNode<Etype>* T ) const{ /*4*/ if (T == NULL) return NULL; /*5*/ else /*6*/ if ( X < T->element ) return find( X, T->left ); /*7*/ else /*8*/ if ( X > T->element ) return find( X, T->right ); /*9*/ else /*10*/ return T; /*11*/ }
Insertion into a BST To insert new element, key must be unique (unless duplicates are allowed) Search first for the key (i.e., find key) If key not found, then insert new element at point where search terminated Example: insert 80,16,25,4,37 into following tree
Recursive Insert Routine /*1*/ Insert(const Etype & x,TreeNode<Etype> * & t) /*2*/ { /*3*/ if (t == NULL) /*4*/ { /*5*/ t = new TreeNode <Etype> (x); /*6*/ } /*7*/ else /*8*/ if (x < t->element) /*9*/ Insert(x, t->left); /*10*/ else /*11*/ if (x > t->element) /*12*/ Insert(x, t->right); /*13*/ //else x is in tree already. Do nothing. /*14*/ } Handling duplicate keys: Keep extra field in node indicating frequency of occurrence. If key is part of a larger record, all records with same key can be stored in a second data structure (list or another tree)
Deletion in BST Delete a leaf: set parent pointer to nil and dispose the node. Ex. delete 80 Delete nonleaf element that has only one child: node containing the element to be deleted is disposed, and the single-child takes the place of the disposed node. Ex. delete 16 Delete a nonleaf node that has two children: replace the element by either the largest element in its left subtree or thesmallest one in its right subtree. delete this replacing element from the subtree from which it was taken. Ex. delete 5 -- replace 5 by 4 or by 25
Deletion in BST Lazy deletion is practiced when few deletions are expected mark node that is to be deleted, but leave it there if deleted key is reinserted, overhead of allocating new cell is avoided
BST Analysis Computational complexity (number of comparisons made during search) is proportional to number of levels in the search path needed to locate the node being inserted/deleted (this path maximum depth = depth of search tree). Best performance: when tree is perfectly/nearly balanced Worst performance: when elements are ordered and path equals number of nodes in tree Average path length of a random search tree is expected to be O(log n) because in constant time when we go down a level, we operate on a tree approximately half the size
Average Case Analysis To prove log n bound we first need to prove that average search/insertion time is O(log n) for any node in a BST We first assume that all binary search trees are equally likely Then we calculate the average internal path length of all possible binary search trees From this, we obtain the average depth of a single node Value of key in root determines shape of the tree Consider the following examples (elements are given in the order they have been inserted: List 1: {38,25,30,45,47,8,2,40,120} List 2: {25,38,30,45,47,8,2,40,120} List 3: {47,38,25,30,45,8,2,40,120}
Average Case Analysis [continued] Proof: 1. Let D(n) be internal path length for some tree T of n nodes. 2. Assume D(1) = 0 (path length of a one-node tree is zero) An n-node tree consists of an i-node left subtree and an (n-i-1)-node right subtree, plus a root at depth zero for 0 < i < n. By definition, D(i) is the internal path length of left subtree wrt its root. However, in the main tree, all of these nodes are one level deeper. Likewise for the right subtree. The drawing below describes such a tree. From these facts we obtain the recurrence relation: D(n) = D(i) + D(n-i-1) + (i) + (n-i-1) = D(i) + D(n-i-1) + n - 1 for the path length of one specific binary search tree
Average Case Analysis [continued] Suppose all subtrees of binary search trees are equally likely, since it is equally likely that any element will be the first element inserted. Then there is a probability of 1/n that any element is the root. If the smallest n picked is the root, then D(0) = 0 and D(n-1) gives average internal path length of the tree. Likewise, 1/n[D(0) + D(n-1) + (n-1)] 1/n[D(1) + D(n-2) + (n-1)] 1/n[D(2) + D(n-3) + (n-1)] • 1/n[D(n-1) + D(0) + (n-1)] 2/n [ D(j) ] + n-1
Average Case Analysis [continued] The drawings below illustrate three trees represented by the first three of these equations. Thus, the average internal path length of all binary search trees of size n is given by: D(n) = 2/n [ D(j) ] + n - 1 {Sum twice--from 0 to n-1 and from n-1 to 0} The drawings below depict the first three trees represented by the summation from n-1 to 0 Solving this recurrence, we get D(n) = O(n log n) as the internal path length of a tree of n nodes. Thus, expected depth of any one node is O(log n) Q.E.D.
Average Case Analysis [continued] You cannot assume that all operations on BSTs are O(log n) because, due to deletions, it is not certain that all BSTs are equally likely. If insertions and deletions are alternated (n2) times, the expected depth of the trees will be (sqrt(n)). In Algorithms + Data Structures = Programs, 1976, Nicholas Wirth derived analytically the average path length for random search tree to be 1.386*ave. path length for a perfectly balanced tree. i.e., the random BST requires 39% more comparisons than a balanced BST. Example: balanced tree of 15 nodes: ipl=34; random tree of 15 nodes: ipl=1.386*34 = 47.12 Suggested solution to the balancing problem: rather than using one or the other algorithms for deletion, randomly choose between smallest element in right subtree and largest in left subtree when replacing deleted element idea is intuitively correct, but never proven Balance problem does not show up for small trees; further, if o(n2) insert/delete pairs are used, the tree seems to gain balance.