1 prepared from lecture material © 2004 Goodrich & Tamassia COMMONWEALTH OF AUSTRALIA Copyright Regulations 1969 WARNING This material has been reproduced and communicated to you by or on behalf of Monash University pursuant to Part VB of the Copyright Act 1968 (the Act). The material in this communication may be subject to copyright under the Act. Any further reproduction or communication of this material by you may be the subject of copyright protection under the Act. Do not remove this notice.
FIT2004 Algorithms & Data Structures L6-7: Trees Prepared by: Bernd Meyer from lecture materials © 2004 Goodrich & Tamassia February 2007
3 prepared from lecture material © 2004 Goodrich & Tamassia What is a Tree In computer science, a tree is an abstract model of a hierarchical structure A tree consists of nodes with a parent-child relation Applications: –Organization charts –File systems –Programming environments Computers”R”Us SalesR&DManufacturing LaptopsDesktops US International EuropeAsiaCanada
4 prepared from lecture material © 2004 Goodrich & Tamassia subtree Tree Terminology Root: node without parent (A) Internal node: node with at least one child (A, B, C, F) External node (a.k.a. leaf ): node without children (E, I, J, K, G, H, D) Ancestors of a node: parent, grandparent, grand-grandparent, etc. Depth of a node: number of ancestors Height of a tree: maximum depth of any node (3) Descendant of a node: child, grandchild, grand-grandchild, etc. A BDC GH E F IJ K Subtree: tree consisting of a node and its descendants
5 prepared from lecture material © 2004 Goodrich & Tamassia Binary Trees A binary tree is a tree with the following properties: –Each internal node has at most two children (exactly two for proper binary trees) –The children of a node are an ordered pair We call the children of an internal node left child and right child Alternative recursive definition: a binary tree is either –a tree consisting of a single node, or –a tree whose root has an ordered pair of children, each of which is a binary tree Applications: –arithmetic expressions –decision processes –searching A BC FG D E H I
6 prepared from lecture material © 2004 Goodrich & Tamassia Arithmetic Expression Tree Binary tree associated with an arithmetic expression –internal nodes: operators –external nodes: operands Example: arithmetic expression tree for the expression (2 ( a 1) (3 b)) 2 a1 3b
7 prepared from lecture material © 2004 Goodrich & Tamassia Decision Tree Binary tree associated with a decision process –internal nodes: questions with yes/no answer –external nodes: decisions Example: dining decision Want a fast meal? How about coffee?On expense account? StarbucksSpike’sAl FornoCafé Paragon Yes No YesNoYesNo
8 prepared from lecture material © 2004 Goodrich & Tamassia Properties of Proper Binary Trees Notation n number of nodes e number of external nodes i number of internal nodes h height Properties: –e i 1 –n 2e 1 –h i –h (n 1) 2 –e 2 h –h log 2 e –h log 2 (n 1) 1 proper = each node has zero or two children
9 prepared from lecture material © 2004 Goodrich & Tamassia Preorder Traversal A traversal visits the nodes of a tree in a systematic manner In a preorder traversal, a node is visited before its descendants Application: print a structured document Make Money Fast! 1. MotivationsReferences2. Methods 2.1 Stock Fraud 2.2 Ponzi Scheme 1.1 Greed1.2 Avidity 2.3 Bank Robbery Algorithm preOrder(v) visit(v) for each child w of v preorder (w)
10 prepared from lecture material © 2004 Goodrich & Tamassia Postorder Traversal In a postorder traversal, a node is visited after its descendants Application: compute space used by files in a directory and its subdirectories Algorithm postOrder(v) for each child w of v postOrder (w) visit(v) cs16/ homeworks/ todo.txt 1K programs/ DDR.java 10K Stocks.java 25K h1c.doc 3K h1nc.doc 2K Robot.java 20K
11 prepared from lecture material © 2004 Goodrich & Tamassia Inorder Traversal In an inorder traversal a node is visited after its left subtree and before its right subtree Application: draw a binary tree –x(v) = inorder rank of v –y(v) = depth of v Algorithm inOrder(v) if hasLeft (v) inOrder (left (v)) visit(v) if hasRight (v) inOrder (right (v))
12 prepared from lecture material © 2004 Goodrich & Tamassia Print Arithmetic Expressions Specialization of an inorder traversal –print operand or operator when visiting node –print “(“ before traversing left subtree –print “)“ after traversing right subtree Algorithm printExpression(v) if hasLeft (v) print( “(’’ ) inOrder (left(v)) print(v.element ()) if hasRight (v) inOrder (right(v)) print ( “)’’ ) 2 a1 3b ((2 ( a 1)) (3 b))
13 prepared from lecture material © 2004 Goodrich & Tamassia Evaluate Arithmetic Expressions Specialization of a postorder traversal –recursive method returning the value of a subtree –when visiting an internal node, combine the values of the subtrees Algorithm evalExpr(v) if isExternal (v) return v.element () else x evalExpr(leftChild (v)) y evalExpr(rightChild (v)) operator stored at v return x y
14 prepared from lecture material © 2004 Goodrich & Tamassia Euler Tour Traversal Generic traversal of a binary tree Includes a special cases the preorder, postorder and inorder traversals Walk around the tree and visit each node three times: –on the left (preorder) –from below (inorder) –on the right (postorder) L B R
15 prepared from lecture material © 2004 Goodrich & Tamassia Linked Structure for Binary Trees A node is represented by an object storing –Element –Parent node –Left child node –Right child node B D A CE B A D CE
16 prepared from lecture material © 2004 Goodrich & Tamassia Linked Structure for Trees A node is represented by an object storing –Element –Parent node –Sequence of children nodes B D A CE F B ADF C E
17 prepared from lecture material © 2004 Goodrich & Tamassia Tries: Trees for fast string matching (§ 11.3 Goodrich & Tamassia)
18 prepared from lecture material © 2004 Goodrich & Tamassia Preprocessing Strings Preprocessing the pattern speeds up pattern matching queries –After preprocessing the pattern, KMP’s algorithm performs pattern matching in time proportional to the text size If the text is large, immutable and searched for often (e.g., works by Shakespeare), we may want to preprocess the text instead of the pattern A trie is a compact data structure for representing a set of strings, such as all the words in a text –A tries supports pattern matching queries in time proportional to the pattern size
19 prepared from lecture material © 2004 Goodrich & Tamassia Standard Tries The standard trie for a set of strings S is an ordered tree such that: –Each node but the root is labeled with a character –The children of a node are alphabetically ordered –The paths from the external nodes to the root yield the strings of S –no word in the trie may be a prefix of another word in the trie! (how can you ensure that?) Example: standard trie for the set of strings S = { bear, bell, bid, bull, buy, sell, stock, stop }
20 prepared from lecture material © 2004 Goodrich & Tamassia Analysis of Standard Tries A standard trie uses O(n) space and supports searches, insertions and deletions in time O(dm), where: n total size of the strings in S m size of the string parameter of the operation d size of the alphabet
21 prepared from lecture material © 2004 Goodrich & Tamassia Word Matching with a Trie We insert the words of the text into a trie Each leaf stores the occurrences of the associated word in the text
22 prepared from lecture material © 2004 Goodrich & Tamassia Compressed Tries A compressed trie has internal nodes of degree at least two It is obtained from standard trie by compressing chains of “redundant” nodes
23 prepared from lecture material © 2004 Goodrich & Tamassia Compact Representation Compact representation of a compressed trie for an array of strings: –advantageous as auxiliary index if strings already stored –Stores at the nodes ranges of indices instead of substrings –Uses O(s) space, where s is the number of strings in the array
24 prepared from lecture material © 2004 Goodrich & Tamassia Suffix Trie The suffix trie of a string X is the compressed trie of all the suffixes of X
25 prepared from lecture material © 2004 Goodrich & Tamassia Analysis of Suffix Tries Compact representation of the suffix trie for a string X of size n from an alphabet of size d –Supports arbitrary pattern matching queries in X in O(dm) time, where m is the size of the pattern –Can be constructed in O(n) time (difficult!)
26 prepared from lecture material © 2004 Goodrich & Tamassia Pattern Matching with Suffix Tries How would you use a suffix trie for fast pattern matching?
27 prepared from lecture material © 2004 Goodrich & Tamassia Binary Search Trees
28 prepared from lecture material © 2004 Goodrich & Tamassia Ordered Dictionaries Keys are assumed to come from a total order. New operations: –first(): first entry in the dictionary ordering –last(): last entry in the dictionary ordering –successors(k): iterator of entries with keys greater than or equal to k; increasing order –predecessors(k): iterator of entries with keys less than or equal to k; decreasing order
29 prepared from lecture material © 2004 Goodrich & Tamassia Binary Search (Revision) Binary search can perform operation find(k) on a dictionary implemented by means of an array-based sequence, sorted by key –at each step, the number of candidate items is halved –terminates after O(log n) steps Example: find(7) m l h m l h m l h l m h
30 prepared from lecture material © 2004 Goodrich & Tamassia Search Tables A search table is a dictionary implemented by means of a sorted sequence –We store the items of the dictionary in an array-based sequence, sorted by key Performance: –find takes O(log n) time, using binary search –insert takes O(n) time since in the worst case we have to shift O(n) items to make room for the new item –remove take O(n) time since in the worst case we have to shift O(n) items to compact the items after the removal The lookup table is effective only for dictionaries of small size or for dictionaries on which searches are the most common operations, while insertions and removals are rarely performed (e.g., credit card authorizations)
31 prepared from lecture material © 2004 Goodrich & Tamassia Binary Search Trees A binary search tree is a binary tree storing keys (or key-value entries) at its internal nodes and satisfying the following property: –Let u, v, and w be three nodes such that u is in the left subtree of v and w is in the right subtree of v. We have key(u) key(v) key(w) External nodes do not store items Which traversal of the tree enumerates the keys in increasing order?
32 prepared from lecture material © 2004 Goodrich & Tamassia Search To search for a key k, we trace a downward path starting at the root The next node visited depends on the outcome of the comparison of k with the key of the current node If we reach a leaf, the key is not found and we return notFound Example: find(4) Algorithm TreeSearch(k, v) if T.isExternal (v) return v if k key(v) return TreeSearch(k, T.left(v)) else if k key(v) return v else { k key(v) } return TreeSearch(k, T.right(v))
33 prepared from lecture material © 2004 Goodrich & Tamassia Insertion To perform operation insert(k, o), we search for key k (using TreeSearch) Assume k is not already in the tree, and let let w be the leaf reached by the search We insert k at node w and expand w into an internal node Example: insert w w
34 prepared from lecture material © 2004 Goodrich & Tamassia Deletion To perform operation remove( k ), we search for key k Assume key k is in the tree, and let let v be the node storing k If node v has a leaf child w, we remove v and w from the tree with operation removeExternal( w ), which removes w and its parent Example: remove v w
35 prepared from lecture material © 2004 Goodrich & Tamassia Deletion (cont.) We consider the case where the key k to be removed is stored at a node v whose children are both internal –we find the internal node w that follows v in an inorder traversal –we copy key(w) into node v –we remove node w and its left child z (which must be a leaf) by means of operation removeExternal( z ) Example: remove 3 Why is this correct? v w z v 2
36 prepared from lecture material © 2004 Goodrich & Tamassia Performance Consider a dictionary with n items implemented by means of a binary search tree of height h –the space used is O(n) –methods find, insert and remove take O(h) time The height h is O(n) in the worst case and O(log n) in the best case
37 prepared from lecture material © 2004 Goodrich & Tamassia Average Binary Tree Depth What is the depth of an average binary search tree? –generate by insertion only, all permutations equally likely: O(log n) --- we can show this using full history recurrence relations –generated by insertion and deletion, very large sequences ( n) --- very hard to show J. Culberson The Effects of Updates in Binary Search Trees. 17th annual ACM Symposium on Theory of Computing. The tree degenerates: This is not good as all operations become more expensive Solution: Self-adjusting trees that maintain their balance using specialize (more expensive) update operations: next lecture. randomly generated, 500 inserts after 250,000 insert/delete pairs