Design & Analysis of Algorithm Hashing (Contd.)

Design & Analysis of Algorithm Hashing (Contd.)
Informatics Department Parahyangan Catholic University

Analogy Let's say that you have a drawer full of socks, 20 red socks (all identical) and 12 blue socks, and it is dark in the room. How many socks should you grab, to assure that you have at least one matching pair ? How about 20 red socks, 12 blue socks, and 8 green socks ? How about unlimited # of red, blue, green, yellow, and purple socks ?

Analogy In a city of 2 million people, no one has more than 1.5 million hairs on his/her head. Can you show that at least two people in the city have exactly the same number of hairs on their heads?

Pigeonhole Principle In mathematics, the pigeonhole principle states that if n pigeons are put into m pigeonholes with n > m, then at least one pigeonhole must contain more than one pigeon. -- wikipedia n = the range of possible keys m = the size of the hash table

Collision When possible key range > table size, two distinct keys k1 and k2 may be mapped to the same index h(k1) = h(k2) This condition is known as collision  resolution strategy is required yellow orange red green blue black white White with red strip -> hashed to “white” White with black strip -> hashed to “white” White with blue strip -> hashed to “white” ?

Collision Handling 3 strategies
Open addressing Linear probing Quadratic probing Double Hashing Separate chaining Coalesced hashing

Collision Handling open addressing
In open addressing, a colliding entry will be placed in a new slot in the same table J (74) K (75) L (76) M (77) N (78) John Smith John Smith / Jane Smith / Jane Smith Kenny Baker / Kenny Baker Lisa Smith / Lisa Smith ? Kayla Newman

Collision Handling Separate Chaining
In separate chaining, colliding entries are stored in linked list in different area J (74) K (75) L (76) M (77) John Smith John Smith Jane Smith Jane Smith Kenny Baker Kayla Newman Kenny Baker Lisa Smith Lisa Smith Kayla Newman

Collision Handling Coalesced Hashing
Coalesced hashing combines open addressing and separate chaining. It uses linked list like separate chaining, but stored in empty slot in the same table J (74) K (75) L (76) M (77) N (78) John Smith John Smith / Jane Smith / Jane Smith Lisa Smith / Lisa Smith Kenny Baker / Kenny Baker Kayla Newman / Kayla Newman

Performance Analysis What is the advantage and disadvantage of the three collision handling methods ? How to compare them ? What measurement can we use ? Load Factor : what is the average number of elements stored in a slot ? Probe Number : how many slots we need to examine before finding the empty slot ?

Example :: open addressing
J (74) K (75) L (76) M (77) N (78) John Smith John Smith / Jane Smith / Jane Smith Kenny Baker / Kenny Baker Lisa Smith / Lisa Smith Kayla Newman / Kayla Newman Load factor = 1 (because every slot only has 1 element) Probe number for “Lisa Smith” = ? Probe number for “Kayla Newman” = ?

Example :: separate chaining
J (74) K (75) L (76) M (77) John Smith John Smith Jane Smith Jane Smith Kenny Baker Kayla Newman Kenny Baker Lisa Smith Lisa Smith Kayla Newman What if we insert new element in the beginning of the list ? Load factor = #of probe (because collided elements are stored in linked list) Probe number for “Jane Smith” = ? Probe number for “Kayla Newman” = ?

Example :: coalesced hashing
J (74) K (75) L (76) M (77) N (78) John Smith John Smith / Jane Smith / Jane Smith Lisa Smith / Lisa Smith Kenny Baker / Kenny Baker Load factor = 1 (because every slot only has 1 element) What is the advantage of this method ? How many slot(s) to check to insert Kenny Baker ? How many slot(s) to check to search Kenny Baker ?

Open Addressing In open addressing, a colliding entry will be placed in a new slot in the same table (using hash function h(k,i), where i is the probe number) There are generally 3 techniques to decide the next slot to be filled : linear probing quadratic probing double hashing The sequence of h(k,0), h(k,1), h(k,2), … is called probe sequence

Open Addressing Linear Probing
Define where h`(k) is the initial hash function, and i is the probe number for key k Example: m=13 k = 5  h’(k) = 5 k = 18  h’(k) = 5 (collision) h(k,1) = (5+1) mod 13 = 6 k = 19  h’(k) = 6 (collision) h(k,1) = (6+1) mod 13 = 7 k = 31  h’(k) = 5 (collision) h(k,1) = (5+1) mod 13 = 6 (collision) h(k,2) = (5+2) mod 13 = 7 (collision) h(k,3) = (5+3) mod 13 = 8

Open Addressing Linear Probing
Suffers from primary clustering Clusters arises since an empty slot preceded by i non-empty slots gets filled next with probability (i+1)/m There are only m distinct probe sequence idx Data 1 A 2 B 3 C 4 D 5 E 6 F 7 8 9 … Every k which h(k) between 1 and 6 will be placed in this slot

Open Addressing Quadratic Probing
Define where h`(k) is the initial hash function, i is the probe number for key k, and c1 & c2 are some constant

h(k,1) is not exactly next to h’(k), thus avoid primary clustering problem Example: m=13, c1=2, c2=3 k = 5  h’(k) = 5 k = 18  h’(k) = 5 (collision) h(k,1) = (5+2*1+3*12) mod 13 = 10 k = 19  h’(k) = 6 k = 31  h’(k) = 5 (collision) h(k,1) = (5+2*1+3*12) mod 13 = 10 (collision) h(k,2) = (5+2*2+3*22) mod 13 = 8 k = 32  h’(k) = 6(collision) h(k,1) = (6+2*1+3*12) mod 13 = 11 However, keys with same h’(k) are re-hashed to same place. This leads to a milder form of clustering, called secondary clustering. (again, there are only m distinct probe sequence)

Observe these 2 cases where h’(k)=5 and h’(k)=6 (m=13, c1=2 and c2=3) Note that only slot 0, 5, 6, 8, 9,10, and 12 can be filled by keys with h’(k)=5 Only slot 0, 1, 6, 7, 9, 10, and 11 can be filled by keys with h’(k)=6 h'(k) = 5 h'(k) = 6 probe# h(k,i) 1 10 11 2 8 9 3 12 4 5 6 7 13 This suggest that some slots might get filled with higher probability than the others.

The choice of m, c1, and c2 are important for m = 2n , a good choice is c1 = c2 = 0.5 For prime m > 2, most choice of c1 and c2 will make h(k, i) distinct for i in [0, (M-1)/2)]. Example: m = 24 = 16, c1 = c2 = 0.5, h’(k) = 0 Probe # h(k,i) 8 4 1 9 13 2 3 10 7 6 11 12 14 5 15

Open Addressing Double Hashing
Define where h1(k) is the initial hash function, i is the probe number for key k, and h2(k) is a different hash function than h1(k) Two different keys a and b that initially hashed to the same location (h1(a) = h1(b)) will have a different probe sequence, since h2(a) ≠ h2(b)

Open Addressing Double Hashing
h2(k) must be relative prime to m Example : Let m be the power of 2 and h2(k) always returns an odd number Let m be prime and h2(k) always returns positive integers less than m There are Θ(m2) distinct probe sequence

Basic Hash Table Operation
INSERT(key, value) we have discussed this a lot value SEARCH(key) similar to INSERT DELETE(key) do not delete the value, mark it “deleted” instead (why?) When to stop searching ? In separate chaining, all three operations are merely inserting, searching, and deleting in appropriate linked list

Insertion in open addressing
INSERT(key, value) // returns true if key is successfully inserted // returns false otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty or marked as deleted) table[idx] = (key,value) return true else i = i+1 return false

SEARCHING in open addressing
SEARCH(key) // returns associated value if key is found // returns null otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty) return null //reached an empty slot, //so key is must not be in the hash table else if(table[idx] not marked as deleted AND table[idx].key == key)) return table[idx].value //key found else i = i+1 //try the next slot return null //tried all m possible slots and key not found

DELETION in open addressing
DELETE(key) // returns associated value if key is found // and successfully deleted. returns null otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty) return null //reached an empty slot, //so key is must not be in the hash table else if((table[idx] not marked as deleted AND table[idx].key == key)) temp = table[idx].value mark table[idx] as deleted return temp else i = i+1 //try the next slot return null //tried all m possible slots and key not found

Design & Analysis of Algorithm Hashing (Contd.)

Similar presentations

Presentation on theme: "Design & Analysis of Algorithm Hashing (Contd.)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Design & Analysis of Algorithm Hashing (Contd.)

Similar presentations

Presentation on theme: "Design & Analysis of Algorithm Hashing (Contd.)"— Presentation transcript:

Similar presentations

About project

Feedback