Duke CPS Faster and faster and … search l Binary search trees ä average case insert/search/delete = O( ) ä worst case = O( ) l balanced search trees exist, worst case is good ä AVL trees, red-black trees, 2-3 or or B-trees ä why not always use balanced trees? When? Comparison based search: (log n) ä lower bound on how fast we can search
Duke CPS Beating lower bounds l How can we prove the lower bound? ä Distinguishing between different keys: binary decisions l How can we beat the lower bound? ä Why worry?10 12 items comparisons, very fast! l Hashing: average case is O(1) for search, independent of number of elements being searched! ä Extra credit word tracking was a prelude to hashing
Duke CPS Hashing l Store keys (strings) in specific linked list ä hash function determines which list ä alternative hash functions l What makes a good hash function? l Can collisions be avoided? ä Birthday paradox ä collision resolution via chaining ä worst case? average case? ‘a’ ‘b’ ‘c’ ‘z’ “awe” “box” “bat” “cow” int hash(const string & s) { return s[0] - ‘a’; }
Duke CPS Chaining/hashing l We want chains to be short ä need lots of hash “buckets” and a good hash function index of bucket = hash(key) % TABLE_SIZE ä TABLE_SIZE should be a prime number l actual performance depends on load factor ä load factor = # keys/# buckets ä aka average chain length (pretty close) ä load factor, on average, is 1 for “good data”