LECTURE 34: MAPS & HASH CSC 212 – Data Structures
Entry ADT Entry ADT represents searchable data Two methods declared in Entry: key() & value() Entry implementations need key & value fields Entry instance holds single key-value pair setValue() also included in most implementations Does NOT define setKey()
Sequence:Element::Map:___ Sequence is collection of elements Many implementations possible for this ADT All of them could hold a number of elements Collection of Entry s is defined by a Map Possible to have many implementations of Map Entry s stored in each of these implementations 9 “c” Entry s 11 “xd” 1 “ab” -4 “dc” View of the Map
Sequence:Element::Map:___ Sequence is collection of elements Many implementations possible for this ADT All of them could hold a number of elements Collection of Entry s is defined by a Map Possible to have many implementations of Map Entry s stored in each of these implementations 9 “c” Position s elements 11 “xd” 1 “ab” -4 “dc” View of the Sequence used by the Map
Lessons from Polly… 1. When searching, key get (s) value 2. Each key is unique & has at most 1 value value 3. Failed search is usual case, not exceptional one
In all seriousness, can be matter of life-or-death 911 Operators immediately need addresses Google’s search performance in TB/s O(log n) time too slow for these uses Would love to use arrays Get O(1) time to add, remove, or lookup data This HUGE array needs massive RAM purchase Map Performance
Monster Amounts of RAM Java requires int s be used as array indices Unfortunately int and RAM have limits Integer.MAX_VALUE = 2,147,483,647 Items in Google index = ~8,200,000,000 (2005) Possible phone numbers = 10,000,000,000 Enabling O (1) array use requires we do more As with all life’s problems we turn to hash
Hash function turns key into int from 0 – N -1 Result is usable as index for an array Function specific for key type; cannot be reused Store the Entry s in array – a HASH TABLE (Great name for shop in Amsterdam, too) Compute index with hash function Entry stored in array at that index If O(1) time used computing hash Could need O(1) time to get Entry Adding & removing in O(1) time, too Hashing To The Rescue
Hash Table Example Table is array of Entry Simple hash function is h(x) x mod 10,000 Key used is x h(x) is Entry ’s index Always mod array length Not all locations used Holes can appear in array Empty slots left null Hash Table Entry s “Jay Doe” “Bob Doe” “Jill Roe” “Rhi Smith” 9999
What Hash Does Implement Map with a hash table Given a key, easily look up its Entry Always computes same index for that key Hash must be computed on each access O(1) efficiency of array utilized But is wasted if hash is slow Spreads out Entry s, ideally Want to use entire hash table
Bad Hash h(x) = 0 Fast, repeatable, little use of table h(x) = random.nextInt () Fast, not repeatable, uses entire table h(x) = current index -or- free index Slow, repeatable, uses entire table h(x) = x x x x 31 … Moderate, repeatable, but too large
Really Bad Hash Using only part of the key Inevitably, you will guess wrong Portion of key that matters Use this portion of this key
Hash first turns key into int East to do for numbers, at least For a String, could add value of each character Would hash to same index “spot”, “pots”, “stop” Instead use polynomial code like Horner’s method: ( x 0 * a k-1 ) + (x 1 * a k-2 ) + … + (x k-2 * a 1 ) + x k-1 Good Hash Censored Example: “spot” = (‘s’ * a 3 ) + (‘p’ * a 2 ) + (‘o’ * a 1 ) + ‘t’
Hash only use is computing array indices Useless if larger than table’s length: no index exists! “spot” = 4,293,383, when a =33 “triskaidekaphobia” = too big for my calculator Instead use modulus (%) to compress result: result = (result + length) % length Remember that modulus returns the remainder Keeps result within array (just like array-based queue) Compression
Occurs when 2 keys hash to same index Ideal hash spreads keys out evenly across table As much as possible this limits collisions Small table size important also, since RAM limited Unfortunately, there is no such thing as ideal hash Must handle collisions if you want it to work Ultimately, this could kill our O(1) efficiency buzz Collisions
Bucket Arrays Make hash table an array of linked list Node s First node in a linked list aliased by each array location Whenever we have collision, we “chain” Entry s Create new Node that stores the Entry The linked list will have new Node at its front
Bucket Arrays But what if have really bad hash? Hashes to same index in every situation All Entry s now found in single linked list O(n) execution times would now be required
Continue week #12 assignment Due at usual time, whatever that may be Read sections – of the book Examine better approaches to handling collisions Consider what we should do in following situation: Before Next Lecture…