Presentation is loading. Please wait.

Presentation is loading. Please wait.

[0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a.

Similar presentations


Presentation on theme: "[0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a."— Presentation transcript:

1

2 [0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a for loop. But if I want to access Jim’s record only, I have to start at 0 and loop through the array until I find it. With a big array this could be rather inefficient. Is there a better way? Sequential access good Arrays Direct access bad  Remember! The array elements just hold references to the objects, not the objects themselves! Consider this array of Student records

3 Sequential access bad  Hash tables Direct access good Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Hashing Function Jim’s student ID no. “6” The student records are stored in an array. The place in the array that a particular student is held is determined by the hashing function. The hashing function takes some value, e.g. a name, or, as here, a student id number, and translates it into an array index. So if we want to find Jim’s record we just give his id number to the hashing function and it tells us where his record is located. We don’t need to search through the records. This is direct access.

4 Collisions What happens if the hashing function gives the same array index for two different students? This happens and it is called a collision. There are a number of ways of dealing with collisions, the details of which you don’t need to know. But what you do need to know is that the performance of hash tables degrades over time because of multiple collisions. Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Hashing Function Hiro’s student ID no. “6” Collision!

5 Collisions [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] Hashing Function Erik’s student ID no. “4” Erik David’s student ID no. “1” David Hyun’s student ID no. “4” Collision! Hyun goes into next available index Hyun If there had already been a lot of records in the array when the collision happened, Hyun may have been pushed a long way down the array. Later, when we try to access Hyun’s record, the hashing function still gives us 4 as the place to find him. But he’s not there! So we have to do a sequential search from index number 4, through the array, to find him. This is the reason that hash table performance degrades over time.

6 The Hashing Algorithm The simplest way to translate the Student ID into an array index is to use the modulo operator (% in Java). The modulo operator returns the remainder of a division operation, for example 11 % 4 = 3. Question: If we have an array of 10 elements, what do we need to mod our Student IDs by to be sure of getting some value from 0 to 10? Answer: 11 Question: Let’s say we have an array of size N. Now what to we need to mod our Student IDs by? Answer: N+1 Random Student ID: Array size: Array index this student will be assigned to using modulo operator: What happens if we don’t have a numerical Student ID to use? Say we only have their name? Well we just convert the string into some numerical value using one of several methods. MD5 is a common method; you give it text, it gives you a 128-bit number. The important thing is that we get an even distribution of entries into the array to minimize collisions. MD5 is also used to verify copies of documents because even if one character has changed during the copying, the number that MD5 returns will be totally different. What happens if we don’t have a numerical Student ID to use? Say we only have their name? Well we just convert the string into some numerical value using one of several methods. MD5 is a common method; you give it text, it gives you a 128-bit number. The important thing is that we get an even distribution of entries into the array to minimize collisions. MD5 is also used to verify copies of documents because even if one character has changed during the copying, the number that MD5 returns will be totally different.


Download ppt "[0][1][2][3][4][5][6][7][8][9] Bing David Ina Abhinav Erik Hyun Jim Fiona Gheeta Chelsea I can easily loop through all the student records by using a."

Similar presentations


Ads by Google