Download presentation
Presentation is loading. Please wait.
1
1 Hash Tables Gordon College CS212
2
2 Hash Tables Recall order of magnitude of searches –Linear search O(n) –Binary search O(log 2 n) –Balanced binary tree search O(log 2 n) –Unbalanced binary tree can degrade to O(n)
3
3 Hash Tables In some situations faster search is needed –Solution is to use a hash function –Value of key field given to hash function –Location in a hash table is calculated Like an array but much better: Do not have to set aside space to account for every possible key
4
4 Hash Functions mapping from key to index Simple function: mod (%) the key by arbitrary integer int h(int i) {return i % maxSize; } Note the max number of locations in table maxSize
5
5 Hash Function Access Note that we have traded speed for wasted space –Table must be considerably larger than number of items anticipated
6
6 Hash Function Access Example: 7 digit serial number Need 10 million records* * Not practical to have this much space when in reality you are only stocking at most a few thousand records Why 10 million records? n!/(n-r)! 10!/(10-7)! Number of r-permutations of a set with n elements
7
7 Hash Function (Mapping) Example: 7 digit serial number Use only 10000 slots Hashing (Mapping) function - unsigned int Hf(int key) Hf(1234567) = 1234567 % 10000 = 4567 1234567/10000 = 123.4567 1234567 - (123 * 10000) = 4567
8
8 Hash Function (Mapping) Design Considerations –Efficient –Minimize collisions –Produce uniformly distributed mappings (helps minimize collisions) –Must be able to deal with int, char, string, etc. types for keys –Must be able to associate a hash function with a container
9
9 Function Objects Can pass a function to a function Can use Function Objects template class functionobject { public: returntype operator() (arguments) const { return returnvalue; } ……. };
10
10 Function Objects Example function class: less than template class lessThan { public: bool operator() (const T& x, const T& y) const { return x < y; } };
11
11 Function Objects Example function class use template void insertionSort(vector & v, Compare comp) { int i, j, n = v.size(); T temp; ….. } Called: insertionSort(v, lessThan ());
12
12 Function Objects Example function class use (as seen with the SET container) template class lessThan { public: bool operator() (const T& x, const T& y) const { return x < y; } }; set > A(arr, arr+arrSize); for( set >::iterator ii=A.begin();ii!=A.end();ii++ ) cout << *ii << " "; cout << endl;
13
13 Collisions Hash Function Access Problem Collisions are possible: Depending on the number of slots and the size of the key mapping
14
14 Collisions Hash Function Access Problem Problem: same value returned by h(i) for different values of i –Called collisions Simple solution: linear probing –Linear search begins at collision location –Continues until empty slot found for insertion
15
15 Linear Probing
16
16 Hash Functions Retrieving a value: linear probe until found –If empty slot encountered then value is not in table What if deletions permitted? Slot can be marked so it will not be empty and cause an invalid linear probe
17
17 Hash Functions Improved performance strategies: –Increase table capacity (less collisions) –Use different collision resolution technique –Devise different hash function Hash table capacity –Size of table must be 1.5 to 2 times the size of the number of items to be stored –Otherwise probability of collisions is too high
18
18 Other Collision Strategies Linear probing can result in primary clustering Consider: quadratic probing –Probe sequence from location i is i + 1, i – 1, i + 4, i – 4, i + 9, i – 9, … –Secondary clusters can still form Double hashing –Use a second hash function to determine probe sequence hF(key) --> index hF(index)--> next index
19
19 Collision Strategies Chaining –Table is a list or vector of head nodes to linked lists –When item hashes to location, it is added to that linked list
20
20 Chaining
21
21 Improving the Hash Function Ideal hash function –Simple to evaluate (fast) –Scatters items uniformly throughout table Modulo arithmetic not so good for strings –Possible to manipulate numeric (ASCII) value of first and last characters of a name
22
22 Hash Function (basic mapping) class hFintID { public: unsigned int operator() (int item) const { return (unsigned int) item % 10000; } }; hFintID hf; Hf(12341234) = 1234;
23
23 Hash Function (better) class hFint { public: unsigned int operator() (int item) const { unsigned int value = (unsigned int) item; value *= value; value /=256; //discard low order 8 bits // (division performs a shift right) return value % 65536; } }; Midsquare technique mixes up the digits in the serial number
24
24 String Hash Functions class hFstring { public: unsigned int operator() (const string & item) const { unsigned int prime = 2049982463; int n = 0, i; for (i = 0; i < item.length(); i++) n = n*8 + item[i]; return n > 0 ? (n % prime) : (-n % prime); } }; GOAL: random distribution
25
25 Custom Hash Functions class hfCode { public: unsigned int operator() (const code & item) const { return (unsigned int )item.getNum % NumofSlots; } }; FILE0000.CHK, FILE0001.CHK, FILE0002.CHK
26
26 Search Algorithms Sequential Search - search O(n) (fairly slow) + good when data set size is small and does have to be sorted Binary Search (sorted vector) + search O(log n) [much faster] + low cost when it comes to space - however, requires data be sorted - not good when the data set is very dynamic (sorting overhead) Binary Search Tree + search O(log n) + can scan data in order - higher cost when it comes to space (various pointers) Hashing + search O(1) [fastest] - higher cost when it comes to space (depends on method)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.