Hash Tables Part II: Using Buckets

Hash Tables Part II: Using Buckets
CS 261 – Data Structures Hash Tables Part II: Using Buckets

Hash Tables, Review Hash tables are similar to Vectors except…
Elements can be indexed by values other than integers A single position may hold more than one element Arbitrary values (hash keys) map to integers by means of a hash function Computing a hash function is usually a two-step process: Transform the value (or key) to an integer Map that integer to a valid hash table index Example: storing names Compute an integer from a name Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max
Robert John 0 Angie, Robert Hash Function 1 Linda 2 Joe, Max, John 3 4 Abigail, Mark

Hash Tables: Resolving Collisions
There are two general approaches to resolving collisions: Open address hashing: if a spot is full, probe for next empty spot Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2

Resolving Collisions: Chaining / Buckets
Maintain a Collection at each hash table entry: Chaining/buckets: maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry 0 Robert Angie 1 Linda 2 Max John Joe 3 4 Mark Abigail

Combining arrays and linked lists …
struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };

Hash table init Void hashTableInit (struct hashTable &ht, int size) {
int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }

Adding a value to a hash table
public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 }

Contains test, remove Contains: Find correct bucket, then see if the element is there Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?

Hash Table Size Load factor: l = n / m
So, load factor represents the average number of elements at each table entry Want the load factor to remain small Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size # of elements Load factor Size of table

Hash Tables: Algorithmic Complexity
Assumptions: Time to compute hash function is constant Chaining uses a linked list Worst case analysis  All values hash to same position Best case analysis  Hash function uniformly distributes the values (all buckets have the same number of objects in them) Find element operation: Worst case for open addressing  O( ) Worst case for chaining  O( ) Best case for open addressing  O( ) Best case for chaining  O( ) n n  O(log n) if use AVL tree 1 1

Hash Tables: Average Case
Assuming that the hash function distributes elements uniformly (a BIG if) Then the average case for all operations is O() So you want to try and keep the load factor relatively small. You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 But that only improves things IF the hash function distributes values uniformly. What happens if hash value is always zero?

So when should you use hash tables?
Your data values must be objects with good hash functions defined (string, Double) Or you need to write your own definition of hashCode Need to know that the values are uniformly distributed If you can’t guarantee that, then a skip list or AVL tree is often faster

Your turn Now do the worksheet to implement hash table with buckets
Run down linked list for contains test Think about how to do remove Keep track of number of elements Resize table if load factor is bigger than 3.0 Questions??

Hash Tables Part II: Using Buckets

Similar presentations

Presentation on theme: "Hash Tables Part II: Using Buckets"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hash Tables Part II: Using Buckets

Similar presentations

Presentation on theme: "Hash Tables Part II: Using Buckets"— Presentation transcript:

Similar presentations

About project

Feedback