Hash Tables.

Hash Tables

Overview of Data Structures
Arrays Access: O(1) Insertion: O(N) (average case) Deletion: O(N) (average case) Linked lists Access: O(N) Insertion: O(N) (average case – O(1) at front) Deletion: O(N) (average case – O(1) at front) Binary Search Trees Access: O(logN) if balanced Insertion: O(logN) if balanced Deletion: O(logn) if balanced

BST: Better Performance
Divide and conquer – reduce work by a factor of 2 at each step Can we reduce by a bigger factor?

Set Representation To represent keys with no duplicates:
Use an array Store value k at index k in array Could just store 0 (not in set) or 1 (in set) at index k (or true/false) Problems – memory use, often sparsely filled array; negative ints, ... Big-O analysis: Access: Insertion: Deletion: index 1 2 3 4 5 6 7 value

Hash Tables Hash tables overcome the problems of arrays but maintain fast access, insertion, deletion Use an array and hash functions to determine the index of each element hash: to mix randomly, to chop into pieces hash function: mapping that takes a large data value (not necessarily an integer) and reduces it to a smaller piece of data (usually an integer) maps values to indexes hash code: the output of a hash function for a particular input Ex: On last slide, the hash function was hashCode(i) = i hash table: array that stores elements via hashing Call the array entries "buckets" – in previous slide, we had 8 buckets

Another Hash Function Want to allow negative integers
hashCode(i) = abs(i) % capacity, where capacity is number of buckets int hashCode(int i) { return abs(i) % capacity; } Suppose capacity = 8 mySet.add(3); mySet.add(-4); mySet.add(10); mySet.add(115); // collides with 3 index 1 2 3 4 5 6 7 value 10 3 115 -4

Collisions collision: a hash function maps two or more values to same index mySet.add(3); mySet.add(-4); mySet.add(10); mySet.add(115); // collides with 3 Must have way of resolving collisions index 1 2 3 4 5 6 7 value 10 3 115 -4

Hash Function Example Use names as the key
Take the second letter of the name, take int value of letter (a=0, b=1, etc.), divide by 8 and take the remainder Ex: What does "Norbert" hash to? o  15 15 % 8 = 7 Mary  1 % 8 = 1 Huy  21 % 8 = 6 Brandon  18 % 8 = 2 Scott  3 % 8 = 3 Shyam  8 % 8 = 0 Bobby  15 % 8 = 7 Colin  15 % 8 = 7 (collision) Not a 1-1 mapping from keys to hash values What's the max number of values that this function can hash perfectly (without collisions)?

Hash Function Another example of a hash function: maps a string to the sum of the ascii values for its characters int hashCode(string s) { int sum = 0; for(int i = 0; i < s.length(); i++) sum += s[i]; return sum; } Question: what's the hash code for "BAD"?

Collision Handling How to insert an element when an element is already occupying the bucket it's mapped to

Collision Resolution: Probing
probing: resolve a collision by moving to another index linear probing: move to next available index, wrapping around if necessary mySet.add(3); mySet.add(-4); mySet.add(10); mySet.add(115); // collides with 3 quadratic probing: a variation that adds terms of a polynomial to index when a collision occurs index + 1, index + 4, index + 9, ... Disadvantage: clusters – elements at consecutive indexes slows down lookup Resize when load factor reaches some limit load factor of table with n elements: n/capacity index 1 2 3 4 5 6 7 value 10 -4 115

Collision Resolution: Chaining
Elements in hash table are another data structure linked list, balanced binary tree Uses more space Everything goes in the right bucket, can't run out of indexes index 1 2 3 4 5 6 value NULL 12 34 42 74 14

Chaining: how to add element
Make sure you avoid duplicates Add to linked list of new element's bucket – fastest to add to front Insertion: O(1) index 1 2 3 4 5 6 value NULL 12 34 42 74 14

Implement HashSet Class
Exercise: Represent a set of integers using a hash table Include the following member functions: HashSet() add(int value) clear() contains(int value) remove(int value)

HashSet.h #ifndef _HASHSET_H #define _HASHSET_H #include<string> struct HashNode { int data; HashNode* next; HashNode(int data = 0; HashNode* next = NULL) { thisdata = data; thisnext = next; } };

HashSet.h class HashSet { public: HashSet(); // construct new empty set ~HashSet(); // frees memory allocated by set void add(int value); // adds value to set if not already present void clear(); // removes all elements from set bool contains(int value) const; // returns true if value is in set void print() const; // prints hash table void remove(int value); // Removes value from set, if present private: HashNode** elements; // array of HashNode pointers int capacity; int size; int hashCode(int value) const; // returns integer index of value }; #endif

HashSet.cpp #include "HashSet.h" using namespace std; HashSet::HashSet() { elements = new HashNode* [10](); capacity = 10; size = 0; } HashSet::~HashSet() { // to do delete [] elements; int HashSet::hashCode(int value) const { return abs(value) % capacity;

add Method Add an element to the hash table
Don't add duplicate elements Compute hash code, and add element to corresponding linked list (if it's not already there) index 1 2 3 4 5 6 value NULL 12 34 table.add(52); 42 74 52 14

Exercise: add Method Write the add Method for HashSet

HashSet.cpp void HashSet::add(int value) { if(!contains(value)) { int bucket = hashCode(value); // add to linked list HashNode* newNode = new HashNode(value); newNode->next = elements[bucket]; elements[bucket] = newNode; size++; }

HashSet.cpp void HashSet::print() const { for(int i = 0; i < capacity; i++) { cout << "[" << i << "]: "; HashNode* cur = elements[i]; while(cur != NULL) { cout << " -> " << cur->data; cur = cur->next; } cout << endl; cout << "size = " << size << endl;

contains Method Is a value in the hash table? Returns true or false.
Compute the value's hash code, and look through linked list at that index. table.contains(42); // true table.contains(54); // false index 1 2 3 4 5 6 value NULL 12 34 42 74 14

Exercise: contains Method
Write the contains method for HashSet

contains Method bool HashSet::contains(int value) { int bucket = hashCode(value); HashNode* cur = elements[bucket]; while(cur != NULL) { if(cur->data == value) return true; cur = cur->next; } return false;

remove Method

remove Method // if value is in hash table, remove it. void HashSet::remove(int value) { if(!contains(value)) return; int bucket = hashCode(value); HashNode* temp = NULL; if(elements[bucket]data == value){ temp = elements[bucket]; elements[bucket] = tempnext; } // element to remove isn't first else{ HashNode* cur = elements[bucket]; HashNode* temp = NULL; while(curnext != NULL) { if(curnextdata == value){ temp = curnext; curnext = curnextnext; } cur = curnext; delete temp;

Rehash rehash: move to larger array when table too full
Can't copy old array into larger array – hash codes will change load factor = (# of elements)/(# of buckets) Common to rehash when load factor is around 0.75 Reduces collisions – linked lists are shorter

Hash Tables.

Similar presentations

Presentation on theme: "Hash Tables."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hash Tables.

Similar presentations

Presentation on theme: "Hash Tables."— Presentation transcript:

Similar presentations

About project

Feedback