Download presentation
Presentation is loading. Please wait.
Published byDonna Burke Modified over 9 years ago
1
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item). Ex. Student records stored in an array where each student is assigned an id no. and that number is used for the index. Are there any problems with this idea? Gaps will develop if students leave and insertions of new students are limited by the original size of array. Knowing the student id no. is not convenient. Using the index itself as the key field is not efficient.
2
2. Def. Hash Function - a function used to convert numbers from a large range into numbers in a small range. (The key field is usually the large range and the index of the array is usually the small range.) Ex. Dictionary of 50,000 words. Use the word itself as the key field, but code it numerically to determine a unique location to store the word in the array. Let a = 1, b = 2, c = 3, …z = 26 and let positions of letters in the word have power of ten values: Ex. dab = 4 * 10 2 + 1 * 10 1 + 2 * 10 0 = 412 What size array would be needed to store these 50,000 words, if no word is longer than 10 characters?
3
zzzzzzzzzz would have the code 28,888,888,890! (too big - bigger than largest int - no array could be that big) Also, if locations were chosen this way, there would be many many empty cells. What size array should be needed for this dictionary? 100,000 - usually twice as large as the no. of items to allow room for collisions (def. obvious but coming up) A hash function is needed to convert the numeric code to a smaller range.
4
Commonly used hash function: index = largerange % arraysize Ex. Hash the word gave to find its location in the array dictionary. 7*10 3 + 1*10 2 + 22*10 1 + 5*10 0 = 7325 Ex. Hash the word gaty to find its location in the array dictionary. 7*10 3 + 1*10 2 + 20*10 1 + 25*10 0 = 7325 COLLISION!
5
4. There are 2 methods to resolve collisions: 3. Def. Collision - hashvalue of occupied cell occurs. Def. Open addressing - in case of collision, search for or store in some other available cell. Def. Separate chaining - install a linked list at each index of the array and insert all items that hash to an index into the list.
6
5. Types of open addressing: Linear probe method - if collision occurs at index x, search locations x+1, x+2, etc. Ex. Gaty would be stored in location 7326 (if available) otherwise location 7327, or 7328, etc. Note: resolves collisions but primary clusters occur. Quadratic probe method - search x+1, x+2 2, x+2 3 etc. Note: resolves primary clusters, but secondary clusters occur.
7
Rehashing ( also called double hashing) - when collision occurs determine step to search for available cell by hashing the key value again by a new function. Ex. Step = 5 - key % 5 What steps result?5,4,3,2,1 How is this different from the linear & quadratic probe methods? The step is different for different keys. Note: table size must be prime in order to probe all cells. (ex. size=20, step=5, x=0: 0,5,10,15,0,5, 10,15,… try size=19, step=5, x=0: 0,5,10,15,1,6,11,16,2,7,12,17,3,8,13,18,4,9,14
8
Write code to increase a hash value by step. Hashval += step What do we do if a hash value becomes greater than the size of the array? Wrap around: hashval %= arraysize What do we do about duplicate key values? Should not be allowed. When first item with key is found, search stops. Second item with same key would never be found (unless code is change. Select key value that is unique to the item. (ex. Social security no.)
9
How do we handle deletions? Replace one field by -1 rather than replace entire object by null. Often object info may be needed in the future. Ex. Even when employee leaves, pension & tax info is needed. However, there is another reason in this code. Something undesirable occurs if the object is replaced by null. Demonstrate what and explain why. What method requires this condition and why? While (hashRay[hashVal] != null && hashRay[hashVal].iData != -1)
10
6. Def. Load factor - the ratio of the no. of items in a hash table to the size of the table (array). The more full a table is the worse clustering becomes. Therefore, hash tables should be designed to never become more than 1/2 to 2/3 full when open addressing is used. 7. When separate chaining is used to avoid collisions, is load factor a concern? No. n items or more can be placed in a table of size n and the load factor will be 1 or more.(i.e.some locations will hold 1 or more items in its linked list.)
11
How do we handle duplicates with separate chaining? Duplicates are allowed and will be stored in the same list. Note: search process slows as list is searched linearly. How do we handle deletions? Deletions can be made from a linked list, if appropriate for the application, without empty cell problems resulting.
12
7. What is the advantage of a hash table? O(1) complexity to search for or insert an item (i.e. constant time regardless of the number of items). 8. Disadvantage? Must know size of array needed in advance (in Java arrays can not be resized - another bigger array would be needed). This problem is reduced when separate chaining is used. Also, there is no way to access items in order.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.