Hashing
Hash Tables - Introduction zA structure that offers fast insertion and searching zInsertion and searching is almost O(1) zHashing - a range of key values is transformed into a range of array index values
Hashing - Introduction zIn a dictionary, if the main key was the array index, searching and inserting items would be very fast. zExample: Empdata[1000], employee database, index=employee number - search for employee with emp. number = Answer: Empdata[500] - Running Time: O(1)
Hash Tables zIn the previous example, it was easy since employee number is an integer. zWhat if the main key is a word in the English Alphabet (i.e. last names) zHow can the main key be mapped into an array index
Hash Tables zSum of Digits Method - map the alphabet A-Z to the numbers 1 to 26 (a=1,b=2,c=3,etc.) - add the total of the letters - For example, “cats” (c=3,a=1,t=20,s=19); =43 -”cats” will be stored using index = 43
Hash Tables zProblem - Too may words with the same index - “was”,”tin”,”give”,”tend”,”moan”,”tick” and several other words add to 43
Hashing zAnother Method (Multiply by Powers) - an integer in the numeric system is in the power of =7x x x =7x x x zCan do the same thing with words - Use 27 as base (26 letters + blank)
Hashing z“cats”=3* * * = 60,337 zunique index for every word zmain drawback : takes too much space - For up to 10 letter words(27 9 ), one 7,000,000,000,000 (7000 gigabytes)
Hashing zWhile the scheme was able to generate unique keys, it assigns spaces to non- words (aaaaaa,zzzzzzz,aaacccc,etc.) zBe able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array
Hashing zHashing function - The process of converting a number in a large range into a number in a smaller range.
Hashing z“cats”=3* * * = 60,337 zunique index for every word zmain drawback : takes too much space - For up to 10 letter words(27 9 ), one 7,000,000,000,000 (7000 gigabytes)
Hashing zWhile the scheme was able to generate unique keys, it assigns spaces to non- words (aaaaaa,zzzzzzz,aaacccc,etc.) zBe able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array
Hashing zHashing function - The process of converting a number in a large range into a number in a smaller range. zSize of smaller range - twice the size of the data set (2s) - for 50,000 words, array of 100,000 elements
Hashing zHash Function - achieved by using the modulo function (returns the remainder) - for example, 33 mod 10 = 3 - LargeNumber mod Smallrange
Hashing Hugenumber=C 0 *27 9 +C 1 *27 8 +C 2 *27 7 …. C 9 *27 0 zarraysize = numberofwords * 2 zarrayindex=Hugenumber mod arraysize
Hashing - Collisions zHashing presents the risk of two elements with the same index (although better than sumofdigits). zCollision - two elements with the same index key after hashing
Collisions zTwo approaches to handle collision - Open Addressing - Separate Chaining zOpen Addressing - Finding the next available free cell zSeparate Chaining - install a linked list at each index
Open Addressing zThree Types - Linear Probing, Quadratic Probing, and Double Hashing zLinear Probing - Finding the next available cell (x+1,x+2,etc.) - leads to clustering
Clustering zQuadratic Probing - Finds next available cell using the squares as the step method (x+1,x+4,x+27,etc) zDouble Hash - Hash again using a different hash function to find next free cell - 2nd hash : step size
Separate Chaining zA linked list is installed in the array index such that entries with the same keys are attached to the linked list
Hashing zRead Chapter 5 (Data Structures and Algorithms in C by Weiss) zChapter 7 (in Goodrich and Tamassia Book) zHash Functions implementations are presented in these chapters
Summary Notes
Data Structures and Algorithms zConceptual Approach zJava and C Implementations are presented in both books zFor further studies, focus more on the mathematical aspects (look at theorems & propositions) - Proving
When to use what zGeneral Purpose Data Structures - arrays,linked lists, trees, and hash tables - used to store and retrieve data using key values -applications : can be used for storing personnel records, inventories, contact lists,etc.
General Purpose Data Structures zArrays Best used : - when amount of data is reasonably small - when to amount of data is predictable in advance
General Purpose Data Structures zLinked Lists - when data stored cannot be predicted - when data will be frequently inserted and deleted zBinary Search Trees - used when arrays or linked lists are too slow - O(logN) : insertion,searching, deletion
General Purpose Data Structures zHash Tables - fastest data storage structure - used in spell checkers and as symbol tables in compilers - may require additional memory for open addressing implementations
Special Purpose Data Structures zStacks,Queues (Priority Queues) zused by a computer program to aid in carrying out some algorithm zFor example, in graph algorithms, stack and queues were used zAbstract Data Types - implemented by a more fundamental data structure (array,linked list) - conceptual aids
Special Purpose Data Structures zStacks - used when you want to access the last data inserted (LIFO structure) - implemented using array or linked lists depending on size zQueues - used when you want to access the first data item (FIFO structure)
Graphs zUnique data structure zDirectly model real world situations (maps, flight-airports,etc) zstructure of the graph reflects the structure of the problem zmain choice is representation : adjacent list or adjacency matrix
Sorting zFor limited data elements (up to entries), insertion sort may be sufficient zWhen bogged down, can use merge sort or quick sort (merge sort however requires additional memory)
Sorting - Running Times