Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables.

Similar presentations


Presentation on theme: "1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables."— Presentation transcript:

1 1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables

2 2 Recall Hash Tables A hash table Hash tables use an index function that maps many possible keys to a single location. If the table is sparse, then most of the time only 1 key will go to each location. If 2 records do get assigned to the same location (a collision), we use a method for reassigning the second record (collision resolution).

3 3 The C++ Hash Table Specification const int hash_size = 997; // a prime number of appropriate size class Hash_table { public: Hash_table( ); void clear( ); Error_code insert(const Record &new_entry); Error_code retrieve(const Key &target, Record &found) const; private: Record table[hash_size]; };

4 4 Implementation of insert( ) Error_code Hash_table :: insert(const Record &new_entry) { Error_code result = success; int probe_count, // Counter to be sure that table is not full. increment, // Increment used for quadratic probing. probe; // Position currently probed in the hash table. Key null; // Null key for comparison purposes. null.make_blank( ); probe = hash(new_entry);//Find location to insert new_entry probe_count = 0; increment = 1;

5 5 insert( ) continued while (table[probe] != null // Is the location empty? && table[probe] != new_entry // Duplicate key? && probe_count < (hash_size + 1)/2) { // Has overflow occurred? probe_count++; probe = (probe + increment)%hash_size; increment += 2; // Prepare increment for next iteration. } if (table[probe] == null) table[probe] = new_entry; // Insert new entry. else if (table[probe] == new_entry) result = duplicate_error; else result = overflow; // The table is full. return result; }

6 6 Likelihood of collisions How many people have to be in a room before the probability that two of them have the same birthday reaches 50%? P = (1 - (364/365)*(363/365)*(362/365)*...*(365-m+1)/365 > 0.5 when m >= 23 The calculation for a probability of a collision in a table is similar. The table does not have to be very full for the probability of a collision to reach at least 50%. Therefore: Collisions happen! We must handle them efficiently.

7 7 Counting Probes We can analyze the running time of hash tables by counting comparisons. Comparisons take place when "probing" an entry: Looking at an entry and comparing its key to a target. The number of probes done depends on how full the table is. n = number of entries in the table t = number of total positions in table (= hash_size) = n/t = Load Factor  = 0 means no entries in table  = 0.5 means the table is 1/2 full  <= 1 for contiguous table without chaining (open addressing)  can be greater than 1 if using chaining

8 8 Number of comparisons for chaining Unsuccessful searches: If entries distributed evenly over the table, then the expected number of entries in each chain is: n/t =. For an unsuccessful search, we must do one probe for each entry in the list, so the average number of probes (or comparisons) is. Successful searches: Average number of comparisons for sequential search of a list with k items is: (k + 1)/2 The node we are looking for is in our list, the other n-1 nodes are distributed evenly over the table so the average number of nodes will be: k = (n-1)/t + 1 ~ n/t + 1 = + 1. Average number of comparisons will be ( + 1 + 1)/2 =  /2 + 1

9 9 Open addressing (without chaining) Evenly distributed entries, Random probing: Number of Comparisons (approx) Successful case: (1/ )ln(1/(1- )) Unsuccessful case:1/(1 - ) Linear Probing: Successful case:0.5(1 + 1/(1- ) ) Unsuccessful case:0.5(1 + 1/(1- ) 2 )

10 Theoretical and empirical results

11 11 Hash Tables vs. Other Methods Speed of retrieval from a hash table does not depend on the total number of entries, but on the ratio of entries/table-size ( ). A table of size 40 with 20 entries has the same performance as a table of size 4000 with 2000 entries. Sequential Search:  (n) Binary Search:  ( lg(n)) Hash Table retrieval: O (1) for small. Read section 9.8 on choosing a method for storage and retrieval of data.

12 12 Radix sort Radix sort creates a table of queues. Each queue corresponds to a letter of the alphabet. Sort from least significant letter to most significant letter.

13 13 Implementation of Radix Sort const int key_size = 5; const int max_chars = 28; template void Sortable_list :: radix_sort( ) { Record data; Queue queues[max_chars]; for (int position = key_size - 1; position >= 0; position--) { // Loop from the least to the most significant position. while (remove(0, data) == success) { int queue_number = alphabetic_order(data.key_letter(position)); queues[queue_number].append(data); // Queue operation. } rethread(queues); // Reassemble the list. } }


Download ppt "1 Data Structures CSCI 132, Spring 2014 Lecture 34 Analyzing Hash Tables."

Similar presentations


Ads by Google