Hashing & Hash Tables
Sets/Dictionaries Set - Our best efforts to date:
Easy Set Fast way to represent set if 0-9 only possible values:
Easy Set Fast way to represent set if 0-9 only possible values: Could apply to letters A-J via mapping char int
Easy Set Fast way to represent set if 0-9 only possible values: How could we apply same strategy to all English words? AaAbAcAdAeAfAg… ???
Hashing Hash function : maps data onto fixed size value
Cryptographic Hashing Desirable traits: – Output is fixed size – Easy to compute – Output varies wildly with small input change – One way
Hash Table Hash Table : – Use hash function to map values into array indexes – Constant time to find index and check
Hash Table Hash Functions Desirable qualities – Return number 0…(tablesize – 1) map values into array indexes – Efficiently computable constant time to find index – Evenly distribute keys over table
Hash Table Functions Desirable qualities – Return number 0…(tablesize – 1) – Efficiently computable – Evenly distribute keys over table Don't waste space – Mapping is onto – every index has 1+ keys Minimize collisions
Hash Table Functions Split roles – hash function vs mapping to table: – Hash Function: Evenly distribute keys over space (unsigned ints) – Table mapping: Hash function's result % table size = index
Optimal Hash Functions If all keys and table size known, can compute optimal hash… – Rarely the case
Hash Function - Integral For integral types: – Hash(x) = x – Table size should be prime
Hash Function - Integral For integral types: – Hash(x) = x – Table size should be prime Keys often have pattern – if not relatively prime to table size, get paterns: , 10, 20 2, 12, 22 4, 14, 24 6, 16, 26 8, 18, 28
Hash Function - String String approach 1 – add up characters: for (i=0;i<key.length();i++) hashVal += key[i]; Problem 1: What if TableSize is 10,000 and all keys are 8 or less characters long? Problem 2: What if keys often contain the same characters (“abc”, “bca”, etc.)?
Hash Function - String String approach 2 – multiply each character by different powers of some number: – "apple" : 'a' * 31 4 'p' * 31 3 'p' * 31 2 'l' * 31 1 'e' * 31 0
Hash Function - String String approach 2 – multiply each character by different powers of some number: – "apple" : 'a' * 'p' * 'p' * 'l' * 'e' * 31 0 Efficiently do via bit shifting: for (i=0;i<key.length();i++) hashVal = (hashVal << 6) ^ key[i]; * 64
Hash Function - String String approach 2 – multiply each character by different powers of some number: – "apple" : 'a' * 'p' * 'p' * 'l' * 'e' * 31 0 Efficiently do via bit shifting: for (i=0;i<key.length();i++) hashVal = (hashVal << 6) ^ key[i]; Binary XOR
Collisions Collision : two keys map to same index: – 12 and
Probing Linear Probing: value goes in next available slot
Probing Linear Probing: value goes in next available slot
Probing Linear Probing: value goes in next available slot
Probing Linear Probing: value goes in next available slot Issue: – No longer constant access
Load Factor Must be < 1 for linear probing Performance drops rapidly past.5
Clustering Say we go to put in 3: Now 2-5 are blocked – Anything 2-6 will fill
Finding Probing used again to find keys: Find 32 – yep its there
Finding Probing used again to find keys: Find 42 – nope – must not be
Deletion Say we delete 22: Find 32…
Deletion Say we delete 22: Find 32… not there!
Tombstone Special value indicating something was there Search knows to continue Insertion can use that slot – But need to continue search to avoid duplicate #322