David Luebke 1 3/19/2016 CS 332: Algorithms Augmenting Data Structures
David Luebke 2 3/19/2016 Administrivia l Midterm is postponed until Thursday, Oct 26 l Reminder: homework 3 due today n In the CS front office n Due at 5 PM (but don’t risk being there at 4:59!) n Check your for some clarifications & hints
David Luebke 3 3/19/2016 Review: Hash Tables l More formally: n Given a table T and a record x, with key (= symbol) and satellite data, we need to support: u Insert (T, x) u Delete (T, x) u Search(T, x) n Don’t care about sorting the records l Hash tables support all the above in O(1) expected time
David Luebke 4 3/19/2016 Review: Direct Addressing l Suppose: n The range of keys is 0..m-1 n Keys are distinct l The idea: n Use key itself as the address into the table n Set up an array T[0..m-1] in which u T[i] = xif x T and key[x] = i u T[i] = NULLotherwise n This is called a direct-address table
David Luebke 5 3/19/2016 Review: Hash Functions l Next problem: collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)
David Luebke 6 3/19/2016 Review: Resolving Collisions l How can we solve the problem of collisions? l Open addressing n To insert: if slot is full, try another slot, and another, until an open slot is found (probing) n To search, follow same sequence of probes as would be used when inserting the element l Chaining n Keep linked list of elements in slots n Upon collision, just add new element to list
David Luebke 7 3/19/2016 Review: Chaining l Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7
David Luebke 8 3/19/2016 Review: Analysis Of Hash Tables l Simple uniform hashing: each key in table is equally likely to be hashed to any slot l Load factor = n/m = average # keys per slot n Average cost of unsuccessful search = O(1+α) n Successful search: O(1+ α/2) = O(1+ α) n If n is proportional to m, α = O(1) l So the cost of searching = O(1) if we size our table appropriately
David Luebke 9 3/19/2016 Review: Choosing A Hash Function l Choosing the hash function well is crucial n Bad hash function puts all elements in same slot n A good hash function: u Should distribute keys uniformly into slots u Should not depend on patterns in the data l We discussed three methods: n Division method n Multiplication method n Universal hashing
David Luebke 10 3/19/2016 Review: The Division Method l h(k) = k mod m n In words: hash k into a table with m slots using the slot given by the remainder of k divided by m l Elements with adjacent keys hashed to different slots: good l If keys bear relation to m: bad l Upshot: pick table size m = prime number not too close to a power of 2 (or 10)
David Luebke 11 3/19/2016 Review: The Multiplication Method l For a constant A, 0 < A < 1: l h(k) = m (kA - kA ) l Upshot: n Choose m = 2 P n Choose A not too close to 0 or 1 n Knuth: Good choice for A = ( 5 - 1)/2 Fractional part of kA
David Luebke 12 3/19/2016 Review: Universal Hashing l When attempting to foil an malicious adversary, randomize the algorithm l Universal hashing: pick a hash function randomly when the algorithm begins (not upon every insert!) n Guarantees good performance on average, no matter what keys adversary chooses n Need a family of hash functions to choose from
David Luebke 13 3/19/2016 Review: Universal Hashing l Let be a (finite) collection of hash functions n …that map a given universe U of keys… n …into the range {0, 1, …, m - 1}. l If is universal if: n for each pair of distinct keys x, y U, the number of hash functions h for which h(x) = h(y) is | |/m n In other words: u With a random hash function from , the chance of a collision between x and y (x y) is exactly 1/m
David Luebke 14 3/19/2016 Review: A Universal Hash Function l Choose table size m to be prime l Decompose key x into r+1 bytes, so that x = {x 0, x 1, …, x r } n Only requirement is that max value of byte < m n Let a = {a 0, a 1, …, a r } denote a sequence of r+1 elements chosen randomly from {0, 1, …, m - 1} n Define corresponding hash function h a : n With this definition, has m r+1 members
David Luebke 15 3/19/2016 Augmenting Data Structures l This course is supposed to be about design and analysis of algorithms l So far, we’ve only looked at one design technique (What is it?)
David Luebke 16 3/19/2016 Augmenting Data Structures l This course is supposed to be about design and analysis of algorithms l So far, we’ve only looked at one design technique: divide and conquer l Next up: augmenting data structures n Or, “One good thief is worth ten good scholars”
David Luebke 17 3/19/2016 Dynamic Order Statistics l We’ve seen algorithms for finding the ith element of an unordered set in O(n) time l Next, a structure to support finding the ith element of a dynamic set in O(lg n) time n What operations do dynamic sets usually support? n What structure works well for these? n How could we use this structure for order statistics? n How might we augment it to support efficient extraction of order statistics?
David Luebke 18 3/19/2016 Order Statistic Trees l OS Trees augment red-black trees: n Associate a size field with each node in the tree x->size records the size of subtree rooted at x, including x itself: M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1
David Luebke 19 3/19/2016 Selection On OS Trees M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 How can we use this property to select the ith element of the set?
David Luebke 20 3/19/2016 OS-Select OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); }
David Luebke 21 3/19/2016 OS-Select Example l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); }
David Luebke 22 3/19/2016 OS-Select Example l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6
David Luebke 23 3/19/2016 OS-Select Example l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6 i = 5 r = 2
David Luebke 24 3/19/2016 OS-Select Example l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6 i = 5 r = 2 i = 3 r = 2
David Luebke 25 3/19/2016 OS-Select Example l Example: show OS-Select(root, 5): M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } i = 5 r = 6 i = 5 r = 2 i = 3 r = 2 i = 1 r = 1
David Luebke 26 3/19/2016 OS-Select: A Subtlety OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } l What happens at the leaves? l How can we deal elegantly with this?
David Luebke 27 3/19/2016 OS-Select OS-Select(x, i) { r = x->left->size + 1; if (i == r) return x; else if (i < r) return OS-Select(x->left, i); else return OS-Select(x->right, i-r); } l What will be the running time?
David Luebke 28 3/19/2016 Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 What is the rank of this element?
David Luebke 29 3/19/2016 Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 Of this one? Why?
David Luebke 30 3/19/2016 Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 Of the root? What’s the pattern here?
David Luebke 31 3/19/2016 Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 What about the rank of this element?
David Luebke 32 3/19/2016 Determining The Rank Of An Element M8M8 C5C5 P2P2 Q1Q1 A1A1 F3F3 D1D1 H1H1 This one? What’s the pattern here?
David Luebke 33 3/19/2016 OS-Rank OS-Rank(T, x) { r = x->left->size + 1; y = x; while (y != T->root) if (y == y->p->right) r = r + y->p->left->size + 1; y = y->p; return r; } l What will be the running time?
David Luebke 34 3/19/2016 OS-Trees: Maintaining Sizes l So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time l Next step: maintain sizes during Insert() and Delete() operations n How would we adjust the size fields during insertion on a plain binary search tree?
David Luebke 35 3/19/2016 OS-Trees: Maintaining Sizes l So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time l Next step: maintain sizes during Insert() and Delete() operations n How would we adjust the size fields during insertion on a plain binary search tree? n A: increment sizes of nodes traversed during search
David Luebke 36 3/19/2016 OS-Trees: Maintaining Sizes l So we’ve shown that with subtree sizes, order statistic operations can be done in O(lg n) time l Next step: maintain sizes during Insert() and Delete() operations n How would we adjust the size fields during insertion on a plain binary search tree? n A: increment sizes of nodes traversed during search n Why won’t this work on red-black trees?
David Luebke 37 3/19/2016 Maintaining Size Through Rotation l Salient point: rotation invalidates only x and y l Can recalculate their sizes in constant time n Why? y 19 x 11 x 19 y 12 rightRotate(y) leftRotate(x)
David Luebke 38 3/19/2016 Augmenting Data Structures: Methodology l Choose underlying data structure n E.g., red-black trees l Determine additional information to maintain n E.g., subtree sizes l Verify that information can be maintained for operations that modify the structure n E.g., Insert(), Delete() (don’t forget rotations!) l Develop new operations n E.g., OS-Rank(), OS-Select()
David Luebke 39 3/19/2016 The End l Up next: n Interval trees n Review for midterm