DAST Tirgul 7
state exactly what you prove Proofs Most important thing to do in proofs: state exactly what you prove
We proved in ex5: A) Any node in an almost complete tree is a root of an almost complete tree. B) Any node in a max-heap is a root of a max-heap. C) We will show today the correctness proof of max_heapify.
Correctness Proof of max-heapify: ex5 - q 3c Claim: If heapify is called on a node x which is: Inside an almost complete tree And whose children are roots of max heaps, The result is: x becomes a root of a max heap. Nodes outside of the subtree of x are unchanged.
We will prove this by induction: Induction on the height (h) of the node x. (reminder: The height of a node is the length (in edges) of the longest path to a leaf.) Basis: h = 0. A leaf was and stays a valid heap. Assume for h. we now prove for h+1. If value of x is larger than its children – nothing happens, and the heap is already a max-heap, we are done. If the value of x is smaller than one of its children y and z, let y be the maximum of the children, so that x is swapped with y. We need to claim two things: Z is a valid heap. The recursive heapify works: After the swap, and after calling to max-heapify on y, y is a valid heap. We will then deduce that After the swap and max-heapify, x is a valid heap.
Correctness of heapify cont. z is the root of a valid heap: by our claim’s assumptions – z is a child of x and thus the root of a valid heap, and nothing changed in tree(z) during the swap and a the call to heapify on y (by our induction assumption). After the call to heapify, y is the root of a valid heap: Using claim assumptions on x: x’s children y,z are valid heaps. By B), y’s children are valid heaps. After the swap, this is still true. When we call heapify on y, the conditions of the claim hold. We can use the induction assumption since the height of y is smaller from that of x. Therefore, after heapify on y, y is the root of a valid heap. x is now the root of a heap because we swapped the original value of x with the maximum of its children, and by claim assumption, these children were heaps. Finally, Nothing outside the subtree of x changed. heapify on y did not change anything out of tree(y), and the only other thing that changed in tree(x) is the swap, which is inside tree(x).
Correctness of heapify - detailed The fact that z is the root of a valid heap is true by our claim assumptions – Z is a child of x and thus the root of a valid heap, and nothing changed in tree(z) during the swap and during the call to max-heapify on y (by our induction assumption). The fact that after the call to max_heapify y is the root of a valid heap is due to the following. Using claim assumptions: From the requirement on x, x’s children y and z are valid heaps. Therefore, before the swap, y is a root of a max heap. By B), y’s children are therefore valid max-heaps. After the swap, this is still true. When we call max_heapify on y, the conditions of the claim hold. We can use the induction assumption since the height of y is smaller by at least 1 from that of x. and therefore by the induction assumption, after max_heapify on y, y is the root of a max_heap. We now know that after the swap and max_heapify at y, z and y are roots of max_heaps. To deduce that x is now the root of a max_heap we just need to show that value in x is now greater than the value at all nodes in tree(z) and tree(y). but this is because we swapped the original value of x with the maximum of its children, and by claim assumption, these children were max-heaps and thus contained the max value of all values in their tree. so now x contains the maximum value in tree(x), and in particular is greater than the value of its children. Finally, we need to prove that nothing outside the subtree of x changed! This again can be proved by the same induction on h. Max_heapify on y did not change anything out of tree(y), and the only thing other than that that changed in tree(x) is the swap, which is inside tree(x). QED
Induction assumptions Induction assumption are crucial. It is very common that assumptions are simple properties. On the other hand, it is sometimes useful to use more general in the induction (e.g. in build-heap we demanded that at step j all nodes up to n-j are valid heaps).
What is a Binary Search Tree? The keys in a binary search tree (BST) are always stored in such a way as to satisfy the search property: Let x be a node in a BST. If y is a node in the left subtree of x, then key[y]≤ key[x]. If y is a node in the right subtree of x, then key[x] < key[y]. Different BSTs can represent the same set of values. The worst-case running time for most operations in search-tree is proportional to the height of the tree.
The BST can be unbalanced 2 3 5 7 8 5 7 3 2 8 A BST on 6 nodes with height 2. A less efficient BST with height 4 that contains the same keys.
BST-SEARCH TREE-SEARCH (x, k) if x= NIL or k = key[x] if k < key[x] then return x if k < key[x] then return TREE-SEARCH(left[x], k) 3. else return TREE-SEARCH(right[x], k) O(H), H = height of the tree
Successor TREE-SUCCESSOR(x) if right[x] ≠ NIL 2. then return TREE-MINIMUM (right[x]) 3. y ← p[x] 4. while y ≠ NIL and x = right[y] 5. do x ← y 6. y ← parent[y] 7. return y Time is O(D); we either follow a path up the tree or follow a path down the tree
Predecessor Note: the structure of a BST allows us to determine the successor of a node without ever comparing keys. The procedure TREE-PREDECESSOR is symmetric to TREE-SUCCESSOR and also runs in time O(D).
Value = Set of words it appeared after Applications We store in the tree complex objects, having a comparison key and an additional value: Dictionary Key = a word Value = Synonyms Yellow pages Key = names Value = phone number Word statistics Key = word Value = Set of words it appeared after
Order Statistics The i’ th order statistic of a set of n elements is the i’ th smallest element. Sometimes called rank of an element. The minimum of a set of elements is the first order statistic (i = 1) The maximum is the n’ th order statistic (i = n). A median is the "halfway point" of the set (floor(n/2)).
Order Statistics How can you find the i’ th order statistic in a sorted array ? Time complexity? Find the i’ th order statistic in a BST. Naive Solution: Find minimum, and call successor i times. Worst case: O(n) Can we do better?
Finding Order Statistics Take One Design a BST such that each node holds its rank in the BST How to find the i’th element ? Time: O(h) What happens if we insert a new element? and delete?... 5/4 7/5 3/2 5/3 2/1 8/6
Finding Order Statistics Take Two Design a BST such that each node holds the size of its subtree (rooted in it) When searching for i’th element, each node checks the value of its left child Did we improve the time of insert and delete compared to take one? Can you think of a more simple solution? 5/6 7/2 3/3 5/1 2/1 8/1
Order statistics pseudo code Os-find(root, j) 1. left_size = 0; 2. if has_left(root) {left_size = size(left(root)) } 3. if left_size >= j 4. return Os-find(left(root),j) 5. else if left_size = j-1 6. return value(root) 7. else return Os-find(right(root),j-left_size -1)
More things to do with size Actually, each node holds the size of the left subtree. Given a and b, how can we find the number of elements x such that a ≤ x ≤ b ? Try recursion… 5/4 7/1 3/2 5/1 2/1 8/1