Trees (Ch. 9.2) Longin Jan Latecki Temple University based on slides by Simon Langley and Shang-Hua Teng
Basic Data Structures - Trees Informal: a tree is a structure that looks like a real tree (up- side-down) Formal: a tree is a connected graph with no cycles.
Trees - Terminology x bem cda root leaf height=2 size=7 Every node must have its value(s) Non-leaf node has subtree(s) Non-root node has a single parent node value subtree nodes
Types of Tree Binary Tree m-ary Trees Each node has at most 2 sub-trees Each node has at most m sub-trees
Binary Search Trees A binary search tree: … is a binary tree. if a node has value N, all values in its left sub-tree are less than or equal to N, and all values in its right sub-tree are greater than N.
This is NOT a binary search tree
This is a binary search tree
Searching a binary search tree search(t, s) { If(s == label(t)) return t; If(t is leaf) return null If(s < label(t)) search(t’s left tree, s) else search(t’s right tree, s)} h Time per level O(1) Total O(h)
Searching a binary search tree search( t, s ) { while(t != null) { if(s == label(t)) return t; if(s < label(t) t = leftSubTree(t); else t = rightSubTree(t); } return null; h Time per level O(1) Total O(h)
Here’s another function that does the same (we search for label s): TreeSearch(t, s) while (t != NULL and s != label[t]) if (s < label[t]) t = left[t]; else t = right[t]; return t;
Insertion in a binary search tree: we need to search before we insert Time complexity ? Insert Insert O(height_of_tree) O(log n) if it is balanced n = size of the tree always insert to a leaf
Insertion insertInOrder(t, s) { if(t is an empty tree) // insert here return a new tree node with value s else if( s < label(t)) t.left = insertInOrder(t.left, s ) else t.right = insertInOrder(t.right, s) return t }
Comparison – Insertion in an ordered list Insert 6 Time complexity? O(n) n = size of the list insertInOrder(list, s) { loop1: search from beginning of list, look for an item >= s loop2: shift remaining list to its right, start from the end of list insert s } 6789
Try it!! Build binary search trees for the following input sequences 7, 4, 2, 6, 1, 3, 5, 7 7, 1, 2, 3, 4, 5, 6, 7 7, 4, 2, 1, 7, 3, 6, 5 1, 2, 3, 4, 5, 6, 7, 8 8, 7, 6, 5, 4, 3, 2, 1
Suppose we have 3GB character data file that we wish to include in an . Suppose file only contains 26 letters {a,…,z}. Suppose each letter in {a,…,z} occurs with frequency f . Suppose we encode each letter by a binary code If we use a fixed length code, we need 5 bits for each character The resulting message length is Can we do better? Data Compression
Data Compression: A Smaller Example Suppose the file only has 6 letters {a,b,c,d,e,f} with frequencies Fixed length 3G= bits Variable length Fixed length Variable length
How to decode? At first it is not obvious how decoding will happen, but this is possible if we use prefix codes
Prefix Codes No encoding of a character can be the prefix of the longer encoding of another character: We could not encode t as 01 and x as since 01 is a prefix of By using a binary tree representation we generate prefix codes with letters as leaves
Decoding prefix codes Follow the tree until it reaches to a leaf, and then repeat! A message can be decoded uniquely!
Prefix codes allow easy decoding Decode: s sa san 0 sane
Some Properties Prefix codes allow easy decoding An optimal code must be a full binary tree (a tree where every internal node has two children) For C leaves there are C-1 internal nodes The number of bits to encode a file is where f(c) is the freq of c, length T (c) is the tree depth of c, which corresponds to the code length of c
Optimal Prefix Coding Problem Given is a set of n letters (c 1,…, c n ) with frequencies (f 1,…, f n ). Construct a full binary tree T to define a prefix code that minimizes the average code length
Greedy Algorithms Many optimization problems can be solved using a greedy approach The basic principle is that local optimal decisions may be used to build an optimal solution But the greedy approach may not always lead to an optimal solution overall for all problems The key is knowing which problems will work with this approach and which will not We study The problem of generating Huffman codes
Greedy algorithms A greedy algorithm always makes the choice that looks best at the moment My everyday examples: Driving in Los Angeles, NY, or Boston for that matter Playing cards Invest on stocks Choose a university The hope: a locally optimal choice will lead to a globally optimal solution For some problems, it works Greedy algorithms tend to be easier to code
David Huffman’s idea A Term paper at MIT Build the tree (code) bottom-up in a greedy fashion Each tree has a weight in its root and symbols as its leaves. We start with a forest of one vertex trees representing the input symbols. We recursively merge two trees whose sum of weights is minimal until we have only one tree.
Building the Encoding Tree