Download presentation
Presentation is loading. Please wait.
Published byChristine Burke Modified over 9 years ago
1
D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University
2
A NALOGY Let's say that you have a drawer full of socks, 20 red socks (all identical) and 12 blue socks, and it is dark in the room. How many socks should you grab, to assure that you have at least one matching pair ? How about 20 red socks, 12 blue socks, and 8 green socks ? How about unlimited # of red, blue, green, yellow, and purple socks ?
3
A NALOGY In a city of 2 million people, no one has more than 1.5 million hairs on his/her head. Can you show that at least two people in the city have exactly the same number of hairs on their heads?
4
P IGEONHOLE P RINCIPLE In mathematics, the pigeonhole principle states that if n pigeons are put into m pigeonholes with n > m, then at least one pigeonhole must contain more than one pigeon. -- wikipedia n = the range of possible keys m = the size of the hash table n = the range of possible keys m = the size of the hash table
5
C OLLISION When possible key range > table size, two distinct keys k 1 and k 2 may be mapped to the same index h(k 1 ) = h(k 2 ) This condition is known as collision resolution strategy is required yelloworangeredgreenblueblackwhite ????
6
C OLLISION H ANDLING 3 STRATEGIES Open addressing Linear probing Quadratic probing Double Hashing Separate chaining Coalesced hashing
7
C OLLISION H ANDLING OPEN ADDRESSING In open addressing, a colliding entry will be placed in a new slot in the same table John Smith Lisa Smith Kenny Baker J (74) K (75) L (76) M (77) N (78) Jane Smith John Smith / 521-8976 Lisa Smith / 521-5030 Kenny Baker / 418-4165 Jane Smith / 521-1234 Kayla Newman ?
8
C OLLISION H ANDLING S EPARATE C HAINING In separate chaining, colliding entries are stored in linked list in different area John Smith Lisa Smith Kenny Baker J (74) K (75) L (76) M (77) Jane Smith Kayla Newman John Smith 521-8976 Jane Smith 521-1234 Kenny Baker 418-4165 Lisa Smith 521-5030 Kayla Newman 418-4222
9
C OLLISION H ANDLING C OALESCED H ASHING Coalesced hashing combines open addressing and separate chaining. It uses linked list like separate chaining, but stored in empty slot in the same table John Smith Lisa Smith Kenny Baker J (74) K (75) L (76) M (77) N (78) Jane Smith John Smith / 521-8976 Lisa Smith / 521-5030 Kenny Baker / 418-4165 Jane Smith / 521-1234 Kayla Newman Kayla Newman / 418-4222
10
P ERFORMANCE A NALYSIS What is the advantage and disadvantage of the three collision handling methods ? How to compare them ? What measurement can we use ? Load Factor : what is the average number of elements stored in a slot ? Probe Number : how many slots we need to examine before finding the empty slot ?
11
E XAMPLE :: OPEN ADDRESSING Load factor = 1 (because every slot only has 1 element) Probe number for “Lisa Smith” = ? Probe number for “Kayla Newman” = ? John Smith Lisa Smith Kenny Baker J (74) K (75) L (76) M (77) N (78) Jane Smith John Smith / 521-8976 Lisa Smith / 521-5030 Kenny Baker / 418-4165 Jane Smith / 521-1234 Kayla Newman Kayla Newman / 418-4222
12
E XAMPLE :: SEPARATE CHAINING Load factor = #of probe (because collided elements are stored in linked list) Probe number for “Jane Smith” = ? Probe number for “Kayla Newman” = ? John Smith Lisa Smith Kenny Baker J (74) K (75) L (76) M (77) Jane Smith Kayla Newman John Smith 521-8976 Jane Smith 521-1234 Kenny Baker 418-4165 Lisa Smith 521-5030 Kayla Newman 418-4222 What if we insert new element in the beginning of the list ?
13
E XAMPLE :: COALESCED HASHING John Smith Lisa Smith Kenny Baker J (74) K (75) L (76) M (77) N (78) Jane Smith John Smith / 521-8976 Lisa Smith / 521-5030 Kenny Baker / 418-4165 Jane Smith / 521-1234 Load factor = 1 (because every slot only has 1 element) What is the advantage of this method ? How many slot(s) to check to insert Kenny Baker ? How many slot(s) to check to search Kenny Baker ?
14
O PEN A DDRESSING In open addressing, a colliding entry will be placed in a new slot in the same table (using hash function h(k,i), where i is the probe number) There are generally 3 techniques to decide the next slot to be filled : linear probing quadratic probing double hashing The sequence of h(k,0), h(k,1), h(k,2), … is called probe sequence
15
O PEN A DDRESSING L INEAR P ROBING Define where h`(k) is the initial hash function, and i is the probe number for key k Example: m=13 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision) h(k,1) = (5+1) mod 13 = 6 k = 19 h’(k) = 6 (collision) h(k,1) = (6+1) mod 13 = 7 k = 31 h’(k) = 5 (collision) h(k,1) = (5+1) mod 13 = 6 (collision) h(k,2) = (5+2) mod 13 = 7 (collision) h(k,3) = (5+3) mod 13 = 8
16
Suffers from primary clustering Clusters arises since an empty slot preceded by i non-empty slots gets filled next with probability (i+1)/m There are only m distinct probe sequence O PEN A DDRESSING L INEAR P ROBING idxData 1A 2B 3C 4D 5E 6F 7 8 9 …… Every k which h(k) between 1 and 6 will be placed in this slot
17
O PEN A DDRESSING Q UADRATIC P ROBING Define where h`(k) is the initial hash function, i is the probe number for key k, and c 1 & c 2 are some constant
18
O PEN A DDRESSING Q UADRATIC P ROBING Example: m=13, c 1 =2, c 2 =3 k = 5 h’(k) = 5 k = 18 h’(k) = 5 (collision) h(k,1) = (5+2*1+3*1 2 ) mod 13 = 10 k = 19 h’(k) = 6 k = 31 h’(k) = 5 (collision) h(k,1) = (5+2*1+3*1 2 ) mod 13 = 10 (collision) h(k,2) = (5+2*2+3*2 2 ) mod 13 = 8 k = 32 h’(k) = 6(collision) h(k,1) = (6+2*1+3*1 2 ) mod 13 = 11 h(k,1) is not exactly next to h’(k), thus avoid primary clustering problem However, keys with same h’(k) are re-hashed to same place. This leads to a milder form of clustering, called secondary clustering. (again, there are only m distinct probe sequence)
19
O PEN A DDRESSING Q UADRATIC P ROBING Observe these 2 cases where h’(k)=5 and h’(k)=6 ( m= 13, c 1 = 2 and c 2 = 3) Note that only slot 0, 5, 6, 8, 9,10, and 12 can be filled by keys with h’(k)=5 Only slot 0, 1, 6, 7, 9, 10, and 11 can be filled by keys with h’(k)=6 h'(k) = 5h'(k) = 6 probe#h(k,i)probe#h(k,i) 110111 2829 31230 49410 51250 6869 710711 8586 9697 100 1 110 1 126 7 135 6 This suggest that some slots might get filled with higher probability than the others.
20
O PEN A DDRESSING Q UADRATIC P ROBING The choice of m, c 1, and c 2 are important for m = 2 n, a good choice is c 1 = c 2 = 0.5 For prime m > 2, most choice of c 1 and c 2 will make h(k, i) distinct for i in [0, (M-1)/2)]. Example: m = 2 4 = 16, c 1 = c 2 = 0.5, h’(k) = 0 Probe #h(k,i)Probe #h(k,i) 0084 11913 23107 36112 4101214 5151311 65149 712158
21
O PEN A DDRESSING D OUBLE H ASHING Define where h 1 (k) is the initial hash function, i is the probe number for key k, and h 2 (k) is a different hash function than h 1 (k) Two different keys a and b that initially hashed to the same location ( h 1 (a) = h 1 (b)) will have a different probe sequence, since h 2 (a) ≠ h 2 (b)
22
O PEN A DDRESSING D OUBLE H ASHING h 2 (k) must be relative prime to m Example : Let m be the power of 2 and h 2 (k) always returns an odd number Let m be prime and h 2 (k) always returns positive integers less than m There are Θ(m 2 ) distinct probe sequence
23
B ASIC H ASH T ABLE O PERATION INSERT(key, value) we have discussed this a lot value SEARCH(key) similar to INSERT DELETE(key) do not delete the value, mark it “deleted” instead (why?) In separate chaining, all three operations are merely inserting, searching, and deleting in appropriate linked list When to stop searching ?
24
I NSERTION IN OPEN ADDRESSING INSERT(key, value) // returns true if key is successfully inserted // returns false otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty or marked as deleted) table[idx] = (key,value) return true else i = i+1 return false
25
SEARCHING IN OPEN ADDRESSING SEARCH(key) // returns associated value if key is found // returns null otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty) return null //reached an empty slot, //so key is must not be in the hash table else if(table[idx] not marked as deleted AND table[idx].key == key)) return table[idx].value//key found else i = i+1 //try the next slot return null //tried all m possible slots and key not found
26
DELETION IN OPEN ADDRESSING DELETE(key) // returns associated value if key is found // and successfully deleted. returns null otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty) return null //reached an empty slot, //so key is must not be in the hash table else if((table[idx] not marked as deleted AND table[idx].key == key)) temp = table[idx].value mark table[idx] as deleted return temp else i = i+1 //try the next slot return null //tried all m possible slots and key not found
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.