Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing. Hash Tables - Introduction zA structure that offers fast insertion and searching zInsertion and searching is almost O(1) zHashing - a range of.

Similar presentations


Presentation on theme: "Hashing. Hash Tables - Introduction zA structure that offers fast insertion and searching zInsertion and searching is almost O(1) zHashing - a range of."— Presentation transcript:

1 Hashing

2 Hash Tables - Introduction zA structure that offers fast insertion and searching zInsertion and searching is almost O(1) zHashing - a range of key values is transformed into a range of array index values

3 Hashing - Introduction zIn a dictionary, if the main key was the array index, searching and inserting items would be very fast. zExample: Empdata[1000], employee database, index=employee number - search for employee with emp. number = 500 - Answer: Empdata[500] - Running Time: O(1)

4 Hash Tables zIn the previous example, it was easy since employee number is an integer. zWhat if the main key is a word in the English Alphabet (i.e. last names) zHow can the main key be mapped into an array index

5 Hash Tables zSum of Digits Method - map the alphabet A-Z to the numbers 1 to 26 (a=1,b=2,c=3,etc.) - add the total of the letters - For example, “cats” (c=3,a=1,t=20,s=19);3+1+20+19=43 -”cats” will be stored using index = 43

6 Hash Tables zProblem - Too may words with the same index - “was”,”tin”,”give”,”tend”,”moan”,”tick” and several other words add to 43

7 Hashing zAnother Method (Multiply by Powers) - an integer in the numeric system is in the power of 10 - 7546=7x1000 + 5x100 + 4x10+6 - 7546=7x10 3 + 5x10 2 + 4x10 1 +6 zCan do the same thing with words - Use 27 as base (26 letters + blank)

8 Hashing z“cats”=3*27 3 +1*27 2 +20*27 1 +19 = 60,337 zunique index for every word zmain drawback : takes too much space - For up to 10 letter words(27 9 ), one 7,000,000,000,000 (7000 gigabytes)

9 Hashing zWhile the scheme was able to generate unique keys, it assigns spaces to non- words (aaaaaa,zzzzzzz,aaacccc,etc.) zBe able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array

10 Hashing zHashing function - The process of converting a number in a large range into a number in a smaller range.

11 Hashing z“cats”=3*27 3 +1*27 2 +20*27 1 +19 = 60,337 zunique index for every word zmain drawback : takes too much space - For up to 10 letter words(27 9 ), one 7,000,000,000,000 (7000 gigabytes)

12 Hashing zWhile the scheme was able to generate unique keys, it assigns spaces to non- words (aaaaaa,zzzzzzz,aaacccc,etc.) zBe able to compress a huge range of numbers from the numbers-multiplied-by powers system into a smaller(reasonably sized) array

13 Hashing zHashing function - The process of converting a number in a large range into a number in a smaller range. zSize of smaller range - twice the size of the data set (2s) - for 50,000 words, array of 100,000 elements

14 Hashing zHash Function - achieved by using the modulo function (returns the remainder) - for example, 33 mod 10 = 3 - LargeNumber mod Smallrange

15 Hashing  Hugenumber=C 0 *27 9 +C 1 *27 8 +C 2 *27 7 …. C 9 *27 0 zarraysize = numberofwords * 2 zarrayindex=Hugenumber mod arraysize

16 Hashing - Collisions zHashing presents the risk of two elements with the same index (although better than sumofdigits). zCollision - two elements with the same index key after hashing

17 Collisions zTwo approaches to handle collision - Open Addressing - Separate Chaining zOpen Addressing - Finding the next available free cell zSeparate Chaining - install a linked list at each index

18 Open Addressing zThree Types - Linear Probing, Quadratic Probing, and Double Hashing zLinear Probing - Finding the next available cell (x+1,x+2,etc.) - leads to clustering

19 Clustering zQuadratic Probing - Finds next available cell using the squares as the step method (x+1,x+4,x+27,etc) zDouble Hash - Hash again using a different hash function to find next free cell - 2nd hash : step size

20 Separate Chaining zA linked list is installed in the array index such that entries with the same keys are attached to the linked list 11 2 3 9861881333

21 Hashing zRead Chapter 5 (Data Structures and Algorithms in C by Weiss) zChapter 7 (in Goodrich and Tamassia Book) zHash Functions implementations are presented in these chapters

22 Summary Notes

23 Data Structures and Algorithms zConceptual Approach zJava and C Implementations are presented in both books zFor further studies, focus more on the mathematical aspects (look at theorems & propositions) - Proving

24 When to use what zGeneral Purpose Data Structures - arrays,linked lists, trees, and hash tables - used to store and retrieve data using key values -applications : can be used for storing personnel records, inventories, contact lists,etc.

25 General Purpose Data Structures zArrays Best used : - when amount of data is reasonably small - when to amount of data is predictable in advance

26 General Purpose Data Structures zLinked Lists - when data stored cannot be predicted - when data will be frequently inserted and deleted zBinary Search Trees - used when arrays or linked lists are too slow - O(logN) : insertion,searching, deletion

27 General Purpose Data Structures zHash Tables - fastest data storage structure - used in spell checkers and as symbol tables in compilers - may require additional memory for open addressing implementations

28 Special Purpose Data Structures zStacks,Queues (Priority Queues) zused by a computer program to aid in carrying out some algorithm zFor example, in graph algorithms, stack and queues were used zAbstract Data Types - implemented by a more fundamental data structure (array,linked list) - conceptual aids

29 Special Purpose Data Structures zStacks - used when you want to access the last data inserted (LIFO structure) - implemented using array or linked lists depending on size zQueues - used when you want to access the first data item (FIFO structure)

30 Graphs zUnique data structure zDirectly model real world situations (maps, flight-airports,etc) zstructure of the graph reflects the structure of the problem zmain choice is representation : adjacent list or adjacency matrix

31 Sorting zFor limited data elements (up to 1000- 1500 entries), insertion sort may be sufficient zWhen bogged down, can use merge sort or quick sort (merge sort however requires additional memory)

32 Sorting - Running Times


Download ppt "Hashing. Hash Tables - Introduction zA structure that offers fast insertion and searching zInsertion and searching is almost O(1) zHashing - a range of."

Similar presentations


Ads by Google