David Luebke 1 10/25/2015 CS 332: Algorithms Review For Midterm.

David Luebke 1 10/25/2015 CS 332: Algorithms Review For Midterm

David Luebke 2 10/25/2015 Administrivia l Reminder: Midterm Thursday, Oct 26 n 1 8.5x11 crib sheet allowed u Both sides, mechanical reproduction okay u You will turn it in with the exam l Reminder: Exercise 1 due today in class n I’ll try to provide feedback over the break l Homework 3 n I’ll try to have this graded and in CS332 box by tomorrow afternoon

David Luebke 3 10/25/2015 Review Of Topics l Asymptotic notation l Solving recurrences l Sorting algorithms n Insertion sort n Merge sort n Heap sort n Quick sort n Counting sort n Radix sort

David Luebke 4 10/25/2015 Review of Topics l Medians and order statistics l Structures for dynamic sets n Priority queues n Binary search trees n Red-black trees n Skip lists n Hash tables

David Luebke 5 10/25/2015 Review Of Topics l Augmenting data structures n Order-statistic trees n Interval trees

David Luebke 6 10/25/2015 Review: Induction l Suppose n S(k) is true for fixed constant k u Often k = 0 n S(n)  S(n+1) for all n >= k l Then S(n) is true for all n >= k

David Luebke 7 10/25/2015 Proof By Induction l Claim:S(n) is true for all n >= k l Basis: n Show formula is true when n = k l Inductive hypothesis: n Assume formula is true for an arbitrary n l Step: n Show that formula is then true for n+1

David Luebke 8 10/25/2015 Induction Example: Gaussian Closed Form l Prove 1 + 2 + 3 + … + n = n(n+1) / 2 n Basis: u If n = 0, then 0 = 0(0+1) / 2 n Inductive hypothesis: u Assume 1 + 2 + 3 + … + n = n(n+1) / 2 n Step (show true for n+1): 1 + 2 + … + n + n+1 = (1 + 2 + …+ n) + (n+1) = n(n+1)/2 + n+1 = [n(n+1) + 2(n+2)]/2 = (n+1)(n+2)/2 = (n+1)(n+1 + 1) / 2

David Luebke 9 10/25/2015 Induction Example: Geometric Closed Form l Prove a 0 + a 1 + … + a n = (a n+1 - 1)/(a - 1) for all a != 1 n Basis: show that a 0 = (a 0+1 - 1)/(a - 1) a 0 = 1 = (a 1 - 1)/(a - 1) n Inductive hypothesis: u Assume a 0 + a 1 + … + a n = (a n+1 - 1)/(a - 1) n Step (show true for n+1): a 0 + a 1 + … + a n+1 = a 0 + a 1 + … + a n + a n+1 = (a n+1 - 1)/(a - 1) + a n+1 = (a n+1+1 - 1)(a - 1)

David Luebke 10 10/25/2015 Review: Analyzing Algorithms l We are interested in asymptotic analysis: n Behavior of algorithms as problem size gets large n Constants, low-order terms don’t matter

David Luebke 11 10/25/2015 An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } 30104020 1234 i =  j =  key =  A[j] =  A[j+1] = 

David Luebke 12 10/25/2015 An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } 30104020 1234 i = 2j = 1key = 10 A[j] = 30 A[j+1] = 10

David Luebke 13 10/25/2015 An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } 30 4020 1234 i = 2j = 1key = 10 A[j] = 30 A[j+1] = 30

David Luebke 15 10/25/2015 An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } 30 4020 1234 i = 2j = 0key = 10 A[j] =  A[j+1] = 30

David Luebke 16 10/25/2015 An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } 30 4020 1234 i = 2j = 0key = 10 A[j] =  A[j+1] = 30

David Luebke 17 10/25/2015 An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } 10304020 1234 i = 2j = 0key = 10 A[j] =  A[j+1] = 10

David Luebke 39 10/25/2015 An Example: Insertion Sort InsertionSort(A, n) { for i = 2 to n { key = A[i] j = i - 1; while (j > 0) and (A[j] > key) { A[j+1] = A[j] j = j - 1 } A[j+1] = key } } 10203040 1234 i = 4j = 1key = 20 A[j] = 10 A[j+1] = 20 Done!

David Luebke 40 10/25/2015 Insertion Sort Statement Effort InsertionSort(A, n) { for i = 2 to n { c 1 n key = A[i] c 2 (n-1) j = i - 1; c 3 (n-1) while (j > 0) and (A[j] > key) { c 4 T A[j+1] = A[j] c 5 (T-(n-1)) j = j - 1 c 6 (T-(n-1)) } 0 A[j+1] = key c 7 (n-1) } 0 } T = t 2 + t 3 + … + t n where t i is number of while expression evaluations for the i th for loop iteration

David Luebke 41 10/25/2015 Analyzing Insertion Sort l T(n) = c 1 n + c 2 (n-1) + c 3 (n-1) + c 4 T + c 5 (T - (n-1)) + c 6 (T - (n-1)) + c 7 (n-1) = c 8 T + c 9 n + c 10 l What can T be? n Best case -- inner loop body never executed  t i = 1  T(n) is a linear function n Worst case -- inner loop body executed for all previous elements  t i = i  T(n) is a quadratic function n If T is a quadratic function, which terms in the above equation matter?

David Luebke 42 10/25/2015 Upper Bound Notation l We say InsertionSort’s run time is O(n 2 ) n Properly we should say run time is in O(n 2 ) n Read O as “Big-O” (you’ll also hear it as “order”) l In general a function n f(n) is O(g(n)) if there exist positive constants c and n 0 such that f(n)  c  g(n) for all n  n 0 l Formally n O(g(n)) = { f(n):  positive constants c and n 0 such that f(n)  c  g(n)  n  n 0

David Luebke 43 10/25/2015 Big O Fact l A polynomial of degree k is O(n k ) l Proof: n Suppose f(n) = b k n k + b k-1 n k-1 + … + b 1 n + b 0 u Let a i = | b i | n f(n)  a k n k + a k-1 n k-1 + … + a 1 n + a 0

David Luebke 44 10/25/2015 Lower Bound Notation l We say InsertionSort’s run time is  (n) l In general a function n f(n) is  (g(n)) if  positive constants c and n 0 such that 0  c  g(n)  f(n)  n  n 0

David Luebke 45 10/25/2015 Asymptotic Tight Bound l A function f(n) is  (g(n)) if  positive constants c 1, c 2, and n 0 such that c 1 g(n)  f(n)  c 2 g(n)  n  n 0

David Luebke 46 10/25/2015 Other Asymptotic Notations l A function f(n) is o(g(n)) if  positive constants c and n 0 such that f(n) < c g(n)  n  n 0 l A function f(n) is  (g(n)) if  positive constants c and n 0 such that c g(n) < f(n)  n  n 0 l Intuitively, n o() is like < n O() is like  n  () is like > n  () is like  n  () is like =

David Luebke 47 10/25/2015 Review: Recurrences l Recurrence: an equation that describes a function in terms of its value on smaller functions

David Luebke 48 10/25/2015 Review: Solving Recurrences l Substitution method l Iteration method l Master method

David Luebke 49 10/25/2015 Review: Substitution Method l Substitution Method: n Guess the form of the answer, then use induction to find the constants and show that solution works n Example: u T(n) = 2T(n/2) +  (n)  T(n) =  (n lg n) u T(n) = 2T(  n/2  + n  ???

David Luebke 50 10/25/2015 Review: Substitution Method l Substitution Method: n Guess the form of the answer, then use induction to find the constants and show that solution works n Examples: u T(n) = 2T(n/2) +  (n)  T(n) =  (n lg n) u T(n) = 2T(  n/2  ) + n  T(n) =  (n lg n) n We can show that this holds by induction

David Luebke 51 10/25/2015 Substitution Method l Our goal: show that T(n) = 2T(  n/2  ) + n = O(n lg n) l Thus, we need to show that T(n)  c n lg n with an appropriate choice of c n Inductive hypothesis: assume T(  n/2  )  c  n/2  lg  n/2  n Substitute back into recurrence to show that T(n)  c n lg n follows, when c  1 (show on board)

David Luebke 52 10/25/2015 Review: Iteration Method l Iteration method: n Expand the recurrence k times n Work some algebra to express as a summation n Evaluate the summation

David Luebke 53 10/25/2015 l s(n) = c + s(n-1) c + c + s(n-2) 2c + s(n-2) 2c + c + s(n-3) 3c + s(n-3) … kc + s(n-k) = ck + s(n-k) Review:

David Luebke 54 10/25/2015 l So far for n >= k we have n s(n) = ck + s(n-k) l What if k = n? n s(n) = cn + s(0) = cn Review:

David Luebke 55 10/25/2015 l T(n) = 2T(n/2) + c 2(2T(n/2/2) + c) + c 2 2 T(n/2 2 ) + 2c + c 2 2 (2T(n/2 2 /2) + c) + 3c 2 3 T(n/2 3 ) + 4c + 3c 2 3 T(n/2 3 ) + 7c 2 3 (2T(n/2 3 /2) + c) + 7c 2 4 T(n/2 4 ) + 15c … 2 k T(n/2 k ) + (2 k - 1)c Review:

David Luebke 56 10/25/2015 l So far for n > 2k we have n T(n) = 2 k T(n/2 k ) + (2 k - 1)c l What if k = lg n? n T(n) = 2 lg n T(n/2 lg n ) + (2 lg n - 1)c = n T(n/n) + (n - 1)c = n T(1) + (n-1)c = nc + (n-1)c = (2n - 1)c Review:

David Luebke 57 10/25/2015 Review: The Master Theorem l Given: a divide and conquer algorithm n An algorithm that divides the problem of size n into a subproblems, each of size n/b n Let the cost of each stage (i.e., the work to divide the problem + combine solved subproblems) be described by the function f(n) l Then, the Master Theorem gives us a cookbook for the algorithm’s running time:

David Luebke 58 10/25/2015 Review: The Master Theorem l if T(n) = aT(n/b) + f(n) then

David Luebke 59 10/25/2015

David Luebke 60 10/25/2015 Review: Merge Sort MergeSort(A, left, right) { if (left < right) { mid = floor((left + right) / 2); MergeSort(A, left, mid); MergeSort(A, mid+1, right); Merge(A, left, mid, right); } // Merge() takes two sorted subarrays of A and // merges them into a single sorted subarray of A. // Merge()takes O(n) time, n = length of A

David Luebke 61 10/25/2015 Review: Analysis of Merge Sort StatementEffort l So T(n) =  (1) when n = 1, and 2T(n/2) +  (n) when n > 1 l Solving this recurrence (how?) gives T(n) = n lg n MergeSort(A, left, right) {T(n) if (left < right) {  (1) mid = floor((left + right) / 2);  (1) MergeSort(A, left, mid); T(n/2) MergeSort(A, mid+1, right); T(n/2) Merge(A, left, mid, right);  (n) }

David Luebke 62 10/25/2015 Review: Heaps l A heap is a “complete” binary tree, usually represented as an array: 16 410 14793 281 1614108793241 A =

David Luebke 63 10/25/2015 Review: Heaps To represent a heap as an array: Parent(i) { return  i/2  ; } Left(i) { return 2*i; } right(i) { return 2*i + 1; }

David Luebke 64 10/25/2015 Review: The Heap Property l Heaps also satisfy the heap property: A[Parent(i)]  A[i]for all nodes i > 1 n In other words, the value of a node is at most the value of its parent n The largest value is thus stored at the root (A[1]) l Because the heap is a binary tree, the height of any node is at most  (lg n)

David Luebke 65 10/25/2015 Review: Heapify() Heapify() : maintain the heap property n Given: a node i in the heap with children l and r n Given: two subtrees rooted at l and r, assumed to be heaps n Action: let the value of the parent node “float down” so subtree at i satisfies the heap property u If A[i] < A[l] or A[i] < A[r], swap A[i] with the largest of A[l] and A[r] u Recurse on that subtree n Running time: O(h), h = height of heap = O(lg n)

David Luebke 66 10/25/2015 Review: BuildHeap() BuildHeap() : build heap bottom-up by running Heapify() on successive subarrays Walk backwards through the array from n/2 to 1, calling Heapify() on each node. n Order of processing guarantees that the children of node i are heaps when i is processed l Easy to show that running time is O(n lg n) l Can be shown to be O(n) n Key observation: most subheaps are small

David Luebke 67 10/25/2015 Review: Heapsort() Heapsort() : an in-place sorting algorithm: n Maximum element is at A[1] n Discard by swapping with element at A[n] u Decrement heap_size[A] u A[n] now contains correct value Restore heap property at A[1] by calling Heapify() n Repeat, always swapping A[1] for A[heap_size(A)] l Running time: O(n lg n) BuildHeap : O(n), Heapify : n * O(lg n)

David Luebke 68 10/25/2015 Review: Priority Queues l The heap data structure is often used for implementing priority queues n A data structure for maintaining a set S of elements, each with an associated value or key Supports the operations Insert(), Maximum(), and ExtractMax() n Commonly used for scheduling, event simulation

David Luebke 69 10/25/2015 Priority Queue Operations l Insert(S, x) inserts the element x into set S l Maximum(S) returns the element of S with the maximum key l ExtractMax(S) removes and returns the element of S with the maximum key

David Luebke 70 10/25/2015 Implementing Priority Queues HeapInsert(A, key) // what’s running time? { heap_size[A] ++; i = heap_size[A]; while (i > 1 AND A[Parent(i)] < key) { A[i] = A[Parent(i)]; i = Parent(i); } A[i] = key; }

David Luebke 71 10/25/2015 Implementing Priority Queues HeapMaximum(A) { // This one is really tricky: return A[i]; }

David Luebke 72 10/25/2015 Implementing Priority Queues HeapExtractMax(A) { if (heap_size[A] < 1) { error; } max = A[1]; A[1] = A[heap_size[A]] heap_size[A] --; Heapify(A, 1); return max; }

David Luebke 73 10/25/2015 Example: Combat Billiards l Extract the next collision C i from the queue l Advance the system to the time T i of the collision l Recompute the next collision(s) for the ball(s) involved l Insert collision(s) into the queue, using the time of occurrence as the key l Find the next overall collision C i+1 and repeat

David Luebke 74 10/25/2015 Review: Quicksort l Quicksort pros: n Sorts in place n Sorts O(n lg n) in the average case n Very efficient in practice l Quicksort cons: n Sorts O(n 2 ) in the worst case n Naïve implementation: worst-case = sorted n Even picking a different pivot, some particular input will take O(n 2 ) time

David Luebke 75 10/25/2015 Review: Quicksort l Another divide-and-conquer algorithm n The array A[p..r] is partitioned into two non- empty subarrays A[p..q] and A[q+1..r] u Invariant: All elements in A[p..q] are less than all elements in A[q+1..r] n The subarrays are recursively quicksorted n No combining step: two subarrays form an already-sorted array

David Luebke 76 10/25/2015 Review: Quicksort Code Quicksort(A, p, r) { if (p < r) { q = Partition(A, p, r); Quicksort(A, p, q); Quicksort(A, q+1, r); }

David Luebke 77 10/25/2015 Review: Partition Code Partition(A, p, r) x = A[p]; i = p - 1; j = r + 1; while (TRUE) repeat j--; until A[j] <= x; repeat i++; until A[i] >= x; if (i < j) Swap(A, i, j); else return j; partition() runs in O(n) time

David Luebke 78 10/25/2015 Review: Analyzing Quicksort l What will be the worst case for the algorithm? n Partition is always unbalanced l What will be the best case for the algorithm? n Partition is perfectly balanced l Which is more likely? n The latter, by far, except... l Will any particular input elicit the worst case? n Yes: Already-sorted input

David Luebke 79 10/25/2015 Review: Analyzing Quicksort l In the worst case: T(1) =  (1) T(n) = T(n - 1) +  (n) l Works out to T(n) =  (n 2 )

David Luebke 80 10/25/2015 Review: Analyzing Quicksort l In the best case: T(n) = 2T(n/2) +  (n) l Works out to T(n) =  (n lg n)

David Luebke 81 10/25/2015 Review: Analyzing Quicksort l Average case works out to T(n) =  (n lg n) l Glance over the proof (lecture 6) but you won’t have to know the details l Key idea: analyze the running time based on the expected split caused by Partition()

David Luebke 82 10/25/2015 Review: Improving Quicksort l The real liability of quicksort is that it runs in O(n 2 ) on already-sorted input l Book discusses two solutions: n Randomize the input array, OR n Pick a random pivot element l How do these solve the problem? n By insuring that no particular input can be chosen to make quicksort run in O(n 2 ) time

David Luebke 83 10/25/2015 Sorting Summary l Insertion sort: n Easy to code n Fast on small inputs (less than ~50 elements) n Fast on nearly-sorted inputs n O(n 2 ) worst case n O(n 2 ) average (equally-likely inputs) case n O(n 2 ) reverse-sorted case

David Luebke 84 10/25/2015 Sorting Summary l Merge sort: n Divide-and-conquer: u Split array in half u Recursively sort subarrays u Linear-time merge step n O(n lg n) worst case n Doesn’t sort in place

David Luebke 85 10/25/2015 Sorting Summary l Heap sort: n Uses the very useful heap data structure u Complete binary tree u Heap property: parent key > children’s keys n O(n lg n) worst case n Sorts in place n Fair amount of shuffling memory around

David Luebke 86 10/25/2015 Sorting Summary l Quick sort: n Divide-and-conquer: u Partition array into two subarrays, recursively sort u All of first subarray < all of second subarray u No merge step needed! n O(n lg n) average case n Fast in practice n O(n 2 ) worst case u Naïve implementation: worst case on sorted input u Address this with randomized quicksort

David Luebke 87 10/25/2015 Review: Comparison Sorts l Comparison sorts: O(n lg n) at best n Model sort with decision tree n Path down tree = execution trace of algorithm n Leaves of tree = possible permutations of input n Tree must have n! leaves, so O(n lg n) height

David Luebke 88 10/25/2015 Review: Counting Sort l Counting sort: n Assumption: input is in the range 1..k n Basic idea: u Count number of elements k  each element i u Use that number to place i in position k of sorted array n No comparisons! Runs in time O(n + k) n Stable sort n Does not sort in place: u O(n) array to hold sorted output u O(k) array for scratch storage

David Luebke 89 10/25/2015 Review: Counting Sort 1CountingSort(A, B, k) 2for i=1 to k 3C[i]= 0; 4for j=1 to n 5C[A[j]] += 1; 6for i=2 to k 7C[i] = C[i] + C[i-1]; 8for j=n downto 1 9B[C[A[j]]] = A[j]; 10C[A[j]] -= 1;

David Luebke 90 10/25/2015 Review: Radix Sort l Radix sort: n Assumption: input has d digits ranging from 0 to k n Basic idea: u Sort elements by digit starting with least significant u Use a stable sort (like counting sort) for each stage n Each pass over n numbers with d digits takes time O(n+k), so total time O(dn+dk) u When d is constant and k=O(n), takes O(n) time n Fast! Stable! Simple! n Doesn’t sort in place

David Luebke 91 10/25/2015 Review: Binary Search Trees l Binary Search Trees (BSTs) are an important data structure for dynamic sets l In addition to satellite data, elements have: n key: an identifying field inducing a total ordering n left: pointer to a left child (may be NULL) n right: pointer to a right child (may be NULL) n p: pointer to a parent node (NULL for root)

David Luebke 92 10/25/2015 Review: Binary Search Trees l BST property: key[left(x)]  key[x]  key[right(x)] l Example: F BH KDA

David Luebke 93 10/25/2015 Review: Inorder Tree Walk l An inorder walk prints the set in sorted order: TreeWalk(x) TreeWalk(left[x]); print(x); TreeWalk(right[x]); n Easy to show by induction on the BST property n Preorder tree walk: print root, then left, then right n Postorder tree walk: print left, then right, then root

David Luebke 94 10/25/2015 Review: BST Search TreeSearch(x, k) if (x = NULL or k = key[x]) return x; if (k < key[x]) return TreeSearch(left[x], k); else return TreeSearch(right[x], k);

David Luebke 95 10/25/2015 Review: BST Search (Iterative) IterativeTreeSearch(x, k) while (x != NULL and k != key[x]) if (k < key[x]) x = left[x]; else x = right[x]; return x;

David Luebke 96 10/25/2015 Review: BST Insert l Adds an element x to the tree so that the binary search tree property continues to hold l The basic algorithm n Like the search procedure above n Insert x in place of NULL n Use a “trailing pointer” to keep track of where you came from (like inserting into singly linked list) l Like search, takes time O(h), h = tree height

David Luebke 97 10/25/2015 Review: Sorting With BSTs l Basic algorithm: n Insert elements of unsorted array from 1..n n Do an inorder tree walk to print in sorted order l Running time: n Best case:  (n lg n) (it’s a comparison sort) n Worst case: O(n 2 ) n Average case: O(n lg n) (it’s a quicksort!)

David Luebke 98 10/25/2015 Review: Sorting With BSTs l Average case analysis n It’s a form of quicksort! for i=1 to n TreeInsert(A[i]); InorderTreeWalk(root); 3 1 8 2 6 7 5 57 1 28 6 7 5 26 7 5 3 18 26 57

David Luebke 99 10/25/2015 Review: More BST Operations l Minimum: n Find leftmost node in tree l Successor: n x has a right subtree: successor is minimum node in right subtree n x has no right subtree: successor is first ancestor of x whose left child is also ancestor of x u Intuition: As long as you move to the left up the tree, you’re visiting smaller nodes. l Predecessor: similar to successor

David Luebke 100 10/25/2015 Review: More BST Operations l Delete: n x has no children: u Remove x n x has one child: u Splice out x n x has two children: u Swap x with successor u Perform case 1 or 2 to delete it F BH KDA C Example: delete K or H or B

David Luebke 101 10/25/2015 Review: Red-Black Trees l Red-black trees: n Binary search trees augmented with node color n Operations designed to guarantee that the height h = O(lg n)

David Luebke 102 10/25/2015 Red-Black Properties l The red-black properties: 1. Every node is either red or black 2.Every leaf (NULL pointer) is black u Note: this means every “real” node has 2 children 3.If a node is red, both children are black u Note: can’t have 2 consecutive reds on a path 4.Every path from node to descendent leaf contains the same number of black nodes 5.The root is always black l black-height: # black nodes on path to leaf n Lets us prove RB tree has height h  2 lg(n+1)

David Luebke 103 10/25/2015 Operations On RB Trees l Since height is O(lg n), we can show that all BST operations take O(lg n) time l Problem: BST Insert() and Delete() modify the tree and could destroy red-black properties l Solution: restructure the tree in O(lg n) time n You should understand the basic approach of these operations n Key operation: rotation

David Luebke 104 10/25/2015 RB Trees: Rotation l Our basic operation for changing tree structure: l Rotation preserves inorder key ordering l Rotation takes O(1) time (just swaps pointers) y x C AB x A y BC rightRotate(y) leftRotate(x)

David Luebke 105 10/25/2015 Review: Skip Lists l A relatively recent data structure n “A probabilistic alternative to balanced trees” “A probabilistic alternative to balanced trees” n A randomized algorithm with benefits of r-b trees u O(lg n) expected search time u O(1) time for Min, Max, Succ, Pred n Much easier to code than r-b trees n Fast!

David Luebke 106 10/25/2015 Review: Skip Lists l The basic idea: l Keep a doubly-linked list of elements n Min, max, successor, predecessor: O(1) time n Delete is O(1) time, Insert is O(1)+Search time l Add each level-i element to level i+1 with probability p (e.g., p = 1/2 or p = 1/4) level 1 391218293537 level 2 level 3

David Luebke 107 10/25/2015 Review: Skip List Search l To search for an element with a given key: n Find location in top list u Top list has O(1) elements with high probability u Location in this list defines a range of items in next list n Drop down a level and recurse l O(1) time per level on average l O(lg n) levels with high probability l Total time: O(lg n)

David Luebke 108 10/25/2015 Review: Skip List Insert l Skip list insert: analysis n Do a search for that key n Insert element in bottom-level list n With probability p, recurse to insert in next level n Expected number of lists = 1+ p + p 2 + … = ??? = 1/(1-p) = O(1) if p is constant n Total time = Search + O(1) = O(lg n) expected l Skip list delete: O(1)

David Luebke 109 10/25/2015 Review: Skip Lists l O(1) expected time for most operations l O(lg n) expected time for insert l O(n 2 ) time worst case n But random, so no particular order of insertion evokes worst-case behavior l O(n) expected storage requirements l Easy to code

David Luebke 110 10/25/2015 Review: Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols: variable names, procedure names, etc. u Associated data: memory location, call graph, etc. n For a symbol table (also called a dictionary), we care about search, insertion, and deletion n We typically don’t care about sorted order

David Luebke 111 10/25/2015 Review: Hash Tables l More formally: n Given a table T and a record x, with key (= symbol) and satellite data, we need to support: u Insert (T, x) u Delete (T, x) u Search(T, x) n Don’t care about sorting the records l Hash tables support all the above in O(1) expected time

David Luebke 112 10/25/2015 Review: Direct Addressing l Suppose: n The range of keys is 0..m-1 n Keys are distinct l The idea: n Use key itself as the address into the table n Set up an array T[0..m-1] in which u T[i] = xif x  T and key[x] = i u T[i] = NULLotherwise n This is called a direct-address table

David Luebke 113 10/25/2015 Review: Hash Functions l Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

David Luebke 114 10/25/2015 Review: Resolving Collisions l How can we solve the problem of collisions? l Open addressing n To insert: if slot is full, try another slot, and another, until an open slot is found (probing) n To search, follow same sequence of probes as would be used when inserting the element l Chaining n Keep linked list of elements in slots n Upon collision, just add new element to list

David Luebke 115 10/25/2015 Review: Chaining l Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

David Luebke 116 10/25/2015 Review: Analysis Of Hash Tables l Simple uniform hashing: each key in table is equally likely to be hashed to any slot l Load factor  = n/m = average # keys per slot n Average cost of unsuccessful search = O(1+α) n Successful search: O(1+ α/2) = O(1+ α) n If n is proportional to m, α = O(1) l So the cost of searching = O(1) if we size our table appropriately

David Luebke 117 10/25/2015 Review: Choosing A Hash Function l Choosing the hash function well is crucial n Bad hash function puts all elements in same slot n A good hash function: u Should distribute keys uniformly into slots u Should not depend on patterns in the data l We discussed three methods: n Division method n Multiplication method n Universal hashing

David Luebke 118 10/25/2015 Review: The Division Method l h(k) = k mod m n In words: hash k into a table with m slots using the slot given by the remainder of k divided by m l Elements with adjacent keys hashed to different slots: good l If keys bear relation to m: bad l Upshot: pick table size m = prime number not too close to a power of 2 (or 10)

David Luebke 119 10/25/2015 Review: The Multiplication Method l For a constant A, 0 < A < 1: l h(k) =  m (kA -  kA  )  l Upshot: n Choose m = 2 P n Choose A not too close to 0 or 1 n Knuth: Good choice for A = (  5 - 1)/2 Fractional part of kA

David Luebke 120 10/25/2015 Review: Universal Hashing l When attempting to foil an malicious adversary, randomize the algorithm l Universal hashing: pick a hash function randomly when the algorithm begins (not upon every insert!) n Guarantees good performance on average, no matter what keys adversary chooses n Need a family of hash functions to choose from

David Luebke 121 10/25/2015 Review: Universal Hashing l Let  be a (finite) collection of hash functions n …that map a given universe U of keys… n …into the range {0, 1, …, m - 1}. l If  is universal if: n for each pair of distinct keys x, y  U, the number of hash functions h   for which h(x) = h(y) is |  |/m n In other words: u With a random hash function from , the chance of a collision between x and y (x  y) is exactly 1/m

David Luebke 122 10/25/2015 Review: A Universal Hash Function l Choose table size m to be prime l Decompose key x into r+1 bytes, so that x = {x 0, x 1, …, x r } n Only requirement is that max value of byte < m n Let a = {a 0, a 1, …, a r } denote a sequence of r+1 elements chosen randomly from {0, 1, …, m - 1} n Define corresponding hash function h a   : n With this definition,  has m r+1 members

David Luebke 123 10/25/2015 Review: Dynamic Order Statistics l We’ve seen algorithms for finding the ith element of an unordered set in O(n) time l OS-Trees: a structure to support finding the ith element of a dynamic set in O(lg n) time Support standard dynamic set operations ( Insert(), Delete(), Min(), Max(), Succ(), Pred() ) n Also support these order statistic operations: void OS-Select(root, i); int OS-Rank(x);

David Luebke 124 10/25/2015 Review: Order Statistic Trees l OS Trees augment red-black trees: n Associate a size field with each node in the tree x->size records the size of subtree rooted at x, including x itself: M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1

David Luebke 125 10/25/2015 Review: OS-Select l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); }

David Luebke 126 10/25/2015 Review: OS-Select l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6

David Luebke 127 10/25/2015 Review: OS-Select l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6 i = 5 r = 2

David Luebke 128 10/25/2015 Review: OS-Select l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6 i = 5 r = 2 i = 3 r = 2

David Luebke 129 10/25/2015 Review: OS-Select l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6 i = 5 r = 2 i = 3 r = 2 i = 1 r = 1

David Luebke 130 10/25/2015 Review: OS-Select l Example: show OS-Select(root, 5): Note: use a sentinel NIL element at the leaves with size = 0 to simplify code, avoid testing for NULL M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6 i = 5 r = 2 i = 3 r = 2 i = 1 r = 1

David Luebke 131 10/25/2015 Review: Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } Idea: rank of right child x is one more than its parent’s rank, plus the size of x’s left subtree

David Luebke 132 10/25/2015 Review: Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } Example 1: find rank of element with key H y r = 1

David Luebke 133 10/25/2015 Review: Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } Example 1: find rank of element with key H r = 1 y r = 1+1+1 = 3

David Luebke 134 10/25/2015 Review: Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } Example 1: find rank of element with key H r = 1 r = 3 y r = 3+1+1 = 5

David Luebke 135 10/25/2015 Review: Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } Example 1: find rank of element with key H r = 1 r = 3 r = 5 y r = 5

David Luebke 136 10/25/2015 Review: Maintaining Subtree Sizes l So by keeping subtree sizes, order statistic operations can be done in O(lg n) time l Next: maintain sizes during Insert() and Delete() operations n Insert(): Increment size fields of nodes traversed during search down the tree n Delete(): Decrement sizes along a path from the deleted node to the root n Both: Update sizes correctly during rotations

David Luebke 137 10/25/2015 Reivew: Maintaining Subtree Sizes l Note that rotation invalidates only x and y l Can recalculate their sizes in constant time l Thm 15.1: can compute any property in O(lg n) time that depends only on node, left child, and right child y 19 x 11 x 19 y 12 rightRotate(y) leftRotate(x) 64 76 47

David Luebke 138 10/25/2015 Review: Interval Trees l The problem: maintain a set of intervals n E.g., time intervals for a scheduling program: n Query: find an interval in the set that overlaps a given query interval u [14,16]  [15,18] u [16,19]  [15,18] or [17,19] u [12,14]  NULL 107 115 8418152321 1719

David Luebke 139 10/25/2015 Interval Trees l Following the methodology: n Pick underlying data structure u Red-black trees will store intervals, keyed on i  low n Decide what additional information to store u Store the maximum endpoint in the subtree rooted at i n Figure out how to maintain the information u Insert: update max on way down, during rotations u Delete: similar n Develop the desired new operations

David Luebke 140 10/25/2015 Searching Interval Trees IntervalSearch(T, i) { x = T->root; while (x != NULL && !overlap(i, x->interval)) if (x->left != NULL && x->left->max  i->low) x = x->left; else x = x->right; return x } l Running time: O(lg n)

David Luebke 141 10/25/2015 Review: Correctness of IntervalSearch() l Key idea: need to check only 1 of node’s 2 children n Case 1: search goes right u Show that  overlap in right subtree, or no overlap at all n Case 2: search goes left u Show that  overlap in left subtree, or no overlap at all

David Luebke 142 10/25/2015 Review: Correctness of IntervalSearch() l Case 1: if search goes right,  overlap in the right subtree or no overlap in either subtree n If  overlap in right subtree, we’re done n Otherwise: u x  left = NULL, or x  left  max < x  low (Why?) u Thus, no overlap in left subtree! while (x != NULL && !overlap(i, x->interval)) if (x->left != NULL && x->left->max  i->low) x = x->left; else x = x->right; return x;

David Luebke 143 10/25/2015 Review: Correctness of IntervalSearch() l Case 2: if search goes left,  overlap in the left subtree or no overlap in either subtree n If  overlap in left subtree, we’re done n Otherwise: u i  low  x  left  max, by branch condition u x  left  max = y  high for some y in left subtree u Since i and y don’t overlap and i  low  y  high, i  high < y  low u Since tree is sorted by low’s, i  high < any low in right subtree u Thus, no overlap in right subtree while (x != NULL && !overlap(i, x->interval)) if (x->left != NULL && x->left->max  i->low) x = x->left; else x = x->right; return x;

David Luebke 1 10/25/2015 CS 332: Algorithms Review For Midterm.

Similar presentations

Presentation on theme: "David Luebke 1 10/25/2015 CS 332: Algorithms Review For Midterm."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

David Luebke 1 10/25/2015 CS 332: Algorithms Review For Midterm.

Similar presentations

Presentation on theme: "David Luebke 1 10/25/2015 CS 332: Algorithms Review For Midterm."— Presentation transcript:

Similar presentations

About project

Feedback