CS 261 – Data Structures Hash Tables Part II: Using Buckets.

CS 261 – Data Structures Hash Tables Part II: Using Buckets

Hash Tables, Review Hash tables are similar to Vectors except… –Elements can be indexed by values other than integers –A single position may hold more than one element Arbitrary values (hash keys) map to integers by means of a hash function Computing a hash function is usually a two-step process: 1.Transform the value (or key) to an integer 2.Map that integer to a valid hash table index Example: storing names –Compute an integer from a name –Map the integer to an index in a table (i.e., a vector, array, etc.)

Hash Tables Say we’re storing names: Angie Joe Abigail Linda Mark Max Robert John Hash Function 0 Angie, Robert 1 Linda 2 Joe, Max, John 3 4 Abigail, Mark

Hash Tables: Resolving Collisions There are two general approaches to resolving collisions: 1.Open address hashing: if a spot is full, probe for next empty spot 2.Chaining (or buckets): keep a Linked list at each table entry Today we will look at option 2

0 Robert 1 2 MaxJohn 3 4 Mark Resolving Collisions: Chaining / Buckets Maintain a Collection at each hash table entry: –Chaining/buckets: maintain a linked list (or other collection type data structure, such as an AVL tree) at each table entry Angie Linda Joe Abigail

Combining arrays and linked lists … struct hashTable { struct link ** table; // array initialized to null pointers int tablesize; int dataCount; // count number of elements … };

Hash table init Void hashTableInit (struct hashTable &ht, int size) { int i; ht->count = 0; ht->table = (struct link **) malloc(size * sizeof(struct link *)); assert(ht->table != 0); ht->tablesize = size; for (i = 0; i < size; i++) ht->table[i] = 0; /* null pointer */ }

Adding a value to a hash table public void add (struct hashTable * ht, EleType newValue) { // find correct bucket, add to list int indx = abs(hashfun(newValue)) % table.length; struct link * newLink = (struct link *) malloc(…) assert(newLink != 0); newLink->value = newValue; newLink->next = ht->table[indx]; ht->table[indx] = newLink; /* add to bucket */ ht->count++; // note: next step: reorganize if load factor > 3.0 }

Contains test, remove Contains: Find correct bucket, then see if the element is there Remove: Slightly more tricky, because you only want to decrement the count if the element is actually in the list. Alternatives: instead of keeping count in the hash table, can call count on each list. What are pro/con for this?

Remove - need to change previous Since we have only single links, remove is tricky Solutions: use double links (too much work), or use previous pointers We have seen this before. Keep a pointer to the previous node, trail after current node Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it } prev = current; }

Two cases, prev is null or not Prev = 0; For (current = ht->table[indx]; current != 0; current = current->next) { if (EQ(current->value, testValue)) { … remove it if (prev == 0) ht->table[indx] = current->next; else prev->next = current->next; free (current); ht->size--; return; } prev = current; }

Hash Table Size Load factor: = n / m –So, load factor represents the average number of elements at each table entry Want the load factor to remain small Can do same trick as open table hashing - if load factor becomes larger than some fixed limit (say, 3.0) then you double the table size Load factor # of elements Size of table

Hash Tables: Algorithmic Complexity Assumptions: –Time to compute hash function is constant –Chaining uses a linked list –Worst case analysis  All values hash to same position –Best case analysis  Hash function uniformly distributes the values (all buckets have the same number of objects in them) Find element operation: –Worst case for open addressing  O( ) –Worst case for chaining  O( ) –Best case for open addressing  O( ) –Best case for chaining  O( ) n 1 1 n  O(log n) if use AVL tree

Hash Tables: Average Case Assuming that the hash function distributes elements uniformly (a BIG if) Then the average case for all operations is O( ) So you want to try and keep the load factor relatively small. You can do this by resizing the table (doubling the size) if the load factor is larger than some fixed limit, say 10 But that only improves things IF the hash function distributes values uniformly. What happens if hash value is always zero?

So when should you use hash tables? Your data values must be objects with good hash functions defined (string, Double) Or you need to write your own definition of hashCode Need to know that the values are uniformly distributed If you can’t guarantee that, then a skip list or AVL tree is often faster

Your turn Now do the worksheet to implement hash table with buckets Run down linked list for contains test Think about how to do remove Keep track of number of elements Resize table if load factor is bigger than 3.0 Questions??

CS 261 – Data Structures Hash Tables Part II: Using Buckets.

Similar presentations

Presentation on theme: "CS 261 – Data Structures Hash Tables Part II: Using Buckets."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 261 – Data Structures Hash Tables Part II: Using Buckets.

Similar presentations

Presentation on theme: "CS 261 – Data Structures Hash Tables Part II: Using Buckets."— Presentation transcript:

Similar presentations

About project

Feedback