Chapter 13.4 Hash Tables Steve Ikeoka ID: 113 CS 257 – Spring 2008
Topics Covered Previously Secondary-Storage Hash Tables Insertion into a Hash Table Hash-Table Deletion Efficiency of Hash Table Indexes Extensible Hash Tables
Insertion into Extensible Hash Tables To insert a record with search key K Compute h(K) Take the first i bits of sequence Go to bucket array indexed by those i bits If the block B has room Put record in block B Else there are two possibilities Depending on j, how many bits of h(K) used to determine membership in block B
Insertion into Extensible Hash Tables (cont’d) If j<i Split block B into two blocks Distribute records based on (j+1)st bit Put j+1 in each block’s “nub” Adjust points in bucket array to point to B or the new block, depending on (j+1)st bit If all records of B go into one of block, repeat process with next higher j on overfull block
Insertion into Extensible Hash Tables (cont’d) If j=i Increment i by 1 Doubles bucket array length to 2 i+1 entries w is sequence of i bits to index bucket array Entries indexed by w0 and w1 each point to block that the w entry pointed to Membership in the block still determined by same number of bits as previously used Split block B as in previous case since j<i
Example Start with the table on the right Insert record with h(K) = 1010 First bit is 1 so put in 2 nd block Block is full, so split it Double bucket array, set i=2 Set j=2 in the split blocks 1001 and 1010 go in block goes in block 11 i= i=
Example (cont’d) Insert records with hash values 0000 and 0111 Both go in the first block Since j=1 < i=2 Just split the block Both blocks now have j= and 0001 stay in the first block 0111 goes to the new block Entry for 01 in bucket array points to new block i=
Example (cont’d) Insert record with h(K) = 1000 The block for 10 overflows Split the bucket array, set i=3 Block 10 splits into blocks for 100 and 101, with j=3 Other blocks stay j=2 i=
Problems with Extensible Hash Tables Substantial work to double bucket array size When the bucket array is doubled in size, it may no longer fit in main memory If # of records per block is small, likely that a block will split before the logical time to do so If 2 records per block and three records share a 20 bit sequence that begins their keys Would need i=20 and a million-bucket array
Linear Hash Tables Avg # of records per block is a fixed fraction, say 80%, of the # of records that fill one block. Overflow blocks are permitted Avg # of overflow blocks will be less than 1 If n buckets, use log 2 n rightmost bits of sequence produced by hash function
Linear Hash Tables (cont’d) Suppose i bits of the key is being used m is the i-bit binary integer a 1 a 2 …a i If m<n, bucket m exists, so place the record in that bucket If n≤m<2 i, then bucket m doesn’t exist, so place the record in bucket m-2 i-1 (change a 1, which must be 1, to 0)
Example # of buckets, n = 2 Hash function h produces 4 bits Using only 1 bit Records go in 1 st bucket if hash value ends in 0 Records go in 2 nd bucket if hash value ends in 1 Choose n so r ≤ 1.7n Average occupancy does not 85% of capacity i=1 n=2 r=
Insertion into Linear Hash Tables Compute h(K), where K is key of record Use i bits at end of h(K) as bucket number m If m<n put record in bucket m If m n put record in bucket m-2 i-1 If no room in bucket, create overflow block After each insert, compare # of records r with the threshold ratio r/n Add next bucket to table if ratio is too high If n exceeds 2 i, increment i by 1
Example Insert record with h(K)=0101 Place record in 2 nd bucket Exceed the threshold ratio Raise n to 3, i = log 2 3 = 2 Split the bucket 00 Keys ending in 00 stay in 1 st bucket Keys ending in 01 go to new bucket i=2 n=3 r=
Example (cont’d) Insert record with h(K) = 0001 Place in bucket 01, which exists Block is full, so add overflow block 5/3 < 1.7 so no new bucket i=2 n=3 r=
Example (cont’d) Insert record with h(K) = 0111 Bucket 11 does not exist yet Redirect to bucket 01 (change first bit to 0) Fits in bucket’s overflow block Ratio 6/3 > 1.7, so create a new bucket, 11 Split records in bucket 01 Records ending in 01 stay in bucket 01 Records ending in 11 go to bucket 11 Can delete the overflow block
Example (cont’d) Next insert would exceed the 1.7 ratio Would raise n to 5 i would become 3 i=2 n=4 r=
Lookup in a Linear Hash Table Lookup follows same procedure as selecting the bucket for insertion Lookup a record with h(K) = 1010 last two bits are 10 so m = 2 m<n so bucket 10 exists A record with h(K) = 1010 exists Need to examine the complete key of the record to be sure it is the one we want i=2 n=3 r=
Lookup in a Linear Hash Table Lookup a record with h(K) = 1011 last two bits are 11 so m = 3 m n so bucket 11 does not exist Redirect to bucket 01 by changing leading 1 to 0 Bucket 01 has no record with h(K) = 1011 So desired record is not in the hash table