Download presentation
Presentation is loading. Please wait.
E.G.M. PetrakisHashing1 Data organization in main memory or disk sequential, binary trees, … The location of a key depends on other keys => unnecessary key comparisons to find a key Question: find key with a single comparison Hashing: the location of a record is computed using its key only Fast for random accesses - slow for range queries
E.G.M. PetrakisHashing2 Hash Table Hash Function: transforms keys to array indices n... 4 3 2 1 0 h(key) dataindex h(key): Hash Function m
E.G.M. PetrakisHashing3 positionkeyrecord h(key) = key mod 1000
E.G.M. PetrakisHashing4 Good Hash Functions 1. Uniform: distribute keys evenly in space 2. Perfect: two records cannot occupy the same location or 3. Order preserving: Difficult to find such hash functions Property 2 is the most essential Most functions are no better than h(key) = key mod m Hash collision:
E.G.M. PetrakisHashing5 Collision Resolution 1. Open Addressing (rehashing): compute new position to store the key in the table (no extra space) i. linear probing ii. double hashing 2. Separate Chaining: lists of keys mapped to the same position (uses extra space)
E.G.M. PetrakisHashing6 Open Addressing Computes a new address to store the key if it is occupied (rehashing) if occupied too, compute a new address, … until an empty position is found primary hash function: i=h(key) rehash function: rh(i)=rh(h(key)) hash sequence: (h 0,h 1,h 2 …) = (h(key), rh(h(key)), rh(rh(h(key)))…) To find a key follow the same hash sequence
E.G.M. PetrakisHashing7 Example i=h(key)=key mod 100 rh(i) = (i+1) mod 100 key: 193 i=h(193)=93 rh(i)=(93+1)=94 Key 193 will occupy position 94 193
E.G.M. PetrakisHashing8 Problem 1: Locate Empty Positions No empty position can be found i. the table is full check on number of empty positions ii. the hash function fails to find an empty position although the table is not full !! i=h(key) = key mod 1000 rh(i) = (i + 200) mod 1000 => checks only 5 positions on a table of 1000 positions rh(i) = (i+1) mod 1000 successive positions rh(i) = (i+c) mod 1000 where GCD(c,m) = 1
E.G.M. PetrakisHashing9 Problem 2: Primary Clustering Different keys that hash into different addresses compete with each other in successive rehashes i=h(key) = key mod 100 rh(i) = (i+1) mod 100 keys: 1990, 1991, 1992, 1993, 1994 => 94
E.G.M. PetrakisHashing10 Problem 3: Secondary Clustering Different keys which hash to the same hash value have the same rehash sequence i=h(key) = key mod 10 rh(i,j) = (i + j) mod 10 i. key 23 : h(23) = 3 rh = 4, 6, 9, 3, … ii. key 13 : h(13) = 3 rh = 4, 6, 9, 3, …
E.G.M. PetrakisHashing11 Linear Probing Store the key into the next free position h 0 = h(key) usually h 0 = key mod m h i = (h i-1 + 1) mod m, i >= 1 S = {22, 35, 301, 99, 102, 452}
E.G.M. PetrakisHashing12 Observation 1 Different insertion sequences => different hash sequences S 1 = {11,3,27,99,8,50,77,2 2,12,31,33,40,53}=>28 probes S 2 ={53,40,33,31,12,22,7 7,50,8,99,27,3,11}=> 30 probes H(key) = key mod 13 number of probes
E.G.M. PetrakisHashing13 Observation 2 Deletions are not easy: i=h(key) = key mod 10 rh(i) = (i+1) mod 10 Action: delete(65) and search(5) Problem: search will stop at the empty position and will not find 5 Solution: mark position as deleted rather than empty the marked position can be reused
E.G.M. PetrakisHashing14 Observation 3 Linear probing tends to create long sequences of occupied positions the longer a sequence is, the longer it tends to become P: probability to use a position in the cluster Β m
E.G.M. PetrakisHashing15 Observation 4 Linear probing suffers from both primary and secondary clustering Solution: double hashing uses two hash functions h 1, h 2 and a rehashing function rh
E.G.M. PetrakisHashing16 Double Hashing Two hash functions and a rehashing function primary hash function i=h 1 (key)= key mod m secondary hash function h 2 (key) rehashing function: rh(key) = (i + h 2 (key)) mod m h 2 (m,key) is some function of m, key helps rh in computing random positions in the hash table h 2 is computed once for each key!
E.G.M. PetrakisHashing17 Example of Double Hashing i. hash function: h 1 (key) = key mod m q = (key div m) mod m ii. rehash function: rh(i, key) = (i + h 2 (key)) mod m
E.G.M. PetrakisHashing18 Example (continued) A. m = 10,key = 23 h 1 (23) = 3, h 2 (23) = 2 rh(3,2)=(3+2) mod 10 = 5 rehash sequence: 5, 7, 9, 1, … m = 10, key = 13 h 1 (key)=3, h 2 (13)=1, rh(3,1)=(3+1)mod10=4 rehash sequence: 4, 5, 6,…
E.G.M. PetrakisHashing19 Performance of Open Addressing Distinguish between successful and unsuccessful search Assume a series of probes to random positions independent events load factor: λ = n/m λ: probability to probe an occupied position each position has the same probability P=1/m
E.G.M. PetrakisHashing20 Unsuccessful Search The hash sequence is exhausted let u be the expected number of probes u equals the expected length of the hash sequence P(k): probability to search k positions in the hash sequence
E.G.M. PetrakisHashing21
E.G.M. PetrakisHashing22 independent events u increases with λ => performance drops as λ increases
E.G.M. PetrakisHashing23 Successful Search The hash sequence is not exhausted the number of probes to find a key equals the number of probes s at the time the key was inserted plus 1 λ was less at that time consider all values of λ increases with λ u: equivalent to unsuccessful search approximation
E.G.M. PetrakisHashing24 Performance The performance drops as λ increases the higher the value of λ is, the higher the probability of collisions Unsuccessful search is more expensive than successful search unsuccessful search exhausts the hash sequence
E.G.M. PetrakisHashing25 Experimental Results SUCCESSFUL UNSUCCESSFUL LOAD FACTOR LINEAR i + bkey DOUBLE LINEAR i + bkey DOUBLE 25% 1.17 1.16 1.15 1.39 1.37 1.33 50% 1.50 1.44 1.39 2.50 2.19 2.00 75% 2.50 2.01 1.85 8.50 4.64 4.00 90% 5.50 2.85 2.56 50.50 11.40 10.00 95% 10.50 3.52 3.15 200.50 22.04 20.00
E.G.M. PetrakisHashing26 Performance on Full Table
E.G.M. PetrakisHashing27 Separate Chaining Keys hashing to the same hash value are stored in separate lists one list per hash position can store more than m records easy to implement the keys in each list can be ordered
E.G.M. PetrakisHashing28 h(key) = key mod m
E.G.M. PetrakisHashing29 Performance of Separate Chaining Depends on the average chain size insertions are independent events let P(c,n,m): probability that a position has been selected c times after n insertions on a table of size m P(c,n,m): probability that the chain has length c => binomial distribution p=1/m: success case q=1-p: failure case
E.G.M. PetrakisHashing30 => P(c,n,m)=(1/c!)λ c e -λ Poison =>
E.G.M. PetrakisHashing31 Unsuccessful Search The entire chain is searched the average number of comparisons equals its average length u
E.G.M. PetrakisHashing32 Successful Search Not the whole chain is searched the average number of comparisons equals the length s of the chain at time the key was inserted plus 1 the performance at the time a key was inserted equals that of unsuccessful search!
E.G.M. PetrakisHashing33 Performance The performance drops with the length of the chains worst case: all keys are stored in a single chain worst case performance: O(N) unsuccessful search performs better than successful search!! WHY ? no problem with deletions!!
E.G.M. PetrakisHashing34 Coalesced Hashing The hash sequence is implemented as a linked list within the hash table no rehash function the next hash position is the next available position in linked list extra space for the list h(key) = key mod 10 keys: 19, 29, 49, 59
E.G.M. PetrakisHashing35 avail initially: avail = 9 h(key) = key mod 10 keys: 14,29,34,28,42,39,84,38 initialization List of empty positions Holds lists of rehashing positions and list of empty positions
E.G.M. PetrakisHashing36 Performance of Coalesced Hashing Unsuccessful search Successful search probes/search
Similar presentations
© 2025 Inc.
All rights reserved.