Alternative method for dictionary building : Hashing

Alternative method for dictionary building : Hashing
Chapter 3.5 Alternative method for dictionary building : Hashing

Hashing (= Key Transformation)
Space (e.g. All possible English words) 2725 Address Space (array index) 20000 Mapping Function Many to one function !

Hashing Mapping Function H The most effective : A = ORD(Key) MOD n
with n = Size of Address Space n preferably a prime number n certainly not = 2k Exotic functions based on bit manipulations quite common but sometimes statistically ineffective

The Hashing Function A = ORD(Key) MOD n e.g. Key = "Key"
ORD (Key) = ORD(K) * 20 + ORD(e) * 28 + ORD(y) * 216 If n were chosen equal to 256, all words beginning with K would have the same address. In fact, only the first character would determine the address ! To be sure all characters play a role chose n prime !

Hashing Collisions (two keys map into one address)
Detection : Key included in record Handling : Various techniques Linked lists with pointers Open addressing Linear Quadratic

Linked List with Pointers
Hashing Address space

Open Addressing Linear : h0 = H(Key) hi = (h0+i) MOD n Quadratic :
Kathy Linear : h0 = H(Key) hi = (h0+i) MOD n Quadratic : hi = (h0+i2) MOD n Linda

Kathy Linear : h0 = H(Key) hi = (h0+i) MOD n Quadratic : hi = (h0+i2) MOD n Jim Jim Linda

Kathy Linear : h0 = H(Key) hi = (h0+i) MOD n Quadratic : hi = (h0+i2) MOD n Jim Bob Linda

Open Addressing Linear : h0 = H(Key) h1 = (h0+1) MOD n Quadratic :
Kathy Linear : h0 = H(Key) h1 = (h0+1) MOD n Quadratic : hi = (h0+i2) MOD n Jim Bob Linda

Kathy Linear : h0 = H(Key) h2 = (h0+2) MOD n Quadratic : hi = (h0+i2) MOD n Bob Jim Bob Linda

Kathy Linear : h0 = H(Key) hi = (h0+i) MOD n Quadratic : hi = (h0+i2) MOD n Jim Bob Linda

Kathy Linear : h0 = H(Key) h1 = (h0+1) MOD n Quadratic : h1 = (h0+12) MOD n Jim Bob Linda

Kathy Linear : h0 = H(Key) h2 = (h0+2) MOD n Quadratic : h2 = (h0+22) MOD n Bob Jim Bob Linda

Analysis of Hashing All possible keys equally likely
Uniformly distributed over range 0..n-1 Number of keys already in table = k Probability of hitting free place = 1- k/n Pi = probability to need i attempts P1 = (n-k)/n P2 = (k/n)*[(n-k)/(n-1)] P3 = (k/n)*[(k-1)/(n-1)]*[(n-k)/(n-2)] Pi = (k/n)*[(k-1)/(n-1)]*[(k-2)/(n-2)]*... *[(k-i+2)/(n-i+2)]*[(n-k)/(n-i+1)]

Analysis of Hashing (2) E = Average number of probes  = k/n
E = [- ln(1- )] /   0.1 0.5 0.9 0.99 E 1.05 1.39 2.56 4.66

Hashing Main advantage : Speed Main disadvantages :
Size of table FIXED Data in random order Removal extremely difficult

Alternative method for dictionary building : Hashing

Similar presentations

Presentation on theme: "Alternative method for dictionary building : Hashing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Alternative method for dictionary building : Hashing

Similar presentations

Presentation on theme: "Alternative method for dictionary building : Hashing"— Presentation transcript:

Similar presentations

About project

Feedback