Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University.

Similar presentations


Presentation on theme: "CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University."— Presentation transcript:

1 CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University

2 Outline Hash Tables (Chapter 5)

3 Motivation Recall Big Question 4: –How can I retrieve/search data efficiently? After investigating the balanced binary search trees (AVL, Red-Black), we can ask: –Is it possible to break the log(n) barrier for insertion and deletion?

4 Hash Tables A hash table is a data structure that was invented as an attempt to break the log(N) insertion and deletion barrier of the balanced binary search trees. Conceptually, a hash table is an array of items plus a hash function that maps arbitrary objects to indices of the array. A hash function first extracts a key from a given object and then maps the key into a legal array index. For example, if an object is an employee record, the key could be the employee’s SSN or the employee’s first and last names. Typical keys are numbers and strings.

5 Example: A Hash Table “Mark” “Rachel” “David” “Deborah” 0 1 2 3 4 5 6 7 8 9 “John”

6 Hash Functions 0 1 2 3 4 5 6 Hashing Key Extraction legal index Object

7 Hash Functions It is impossible to find a hash function that computes indices (two different array cells) for any two distinct keys. Why? Because there are infinitely many keys, but only finitely many slots in the table. Question: What are we to do? Answer: Look for hash functions that distribute keys evenly among the cells.

8 Three Hashing Problems Choose a hash function: –Simple and fast; –Distributes keys evenly. Choose a table size. Choose a collision resolution strategy (what to do when several keys are mapped to the same index).

9 Choosing a Hash Function If keys are integers, Key Mod TableSize is a sensible strategy. Caveat: Keys should be random and should not have some undesirable properties. For example, if TableSize = 10 and all keys end in 0, Key Mod TableSize is not a sensible strategy.

10 Choosing a Table Size To avoid the situations with uneven key distributions, TableSize is typically a prime number. When keys are random integers Key Mod TableSize works fairly well.

11 A Hash Function: Example 1

12 int hash(const string& key, int tableSize) { int hashVal = 0; for(int i = 0; i < key.length(); i++) { hashVal += key[i]; } return hashVal % tableSize; }

13 Comments on hash1 Easy to compute and fast. If the TableSize is large, the function may not distribute keys well. Why? Suppose TableSize = 10,007 (a prime) and all keys are ASCII strings of length 8 or smaller. hash1’s range is [0, 127*8=1016]. This is NOT an acceptable distribution.

14 Hash Function: Example 2

15 int hash2(const string &key, int tableSize) { int hashVal = 0; for(int j=0; j < key.length(); j++) { hashVal = 37 * hashVal + key[j]; } hashVal %= tableSize; if ( hashVal < 0 ) { hashVal += tableSize; } return hashVal; }

16 Comments On Hash2 Easy to compute. Fast on relatively short keys. Distributes keys fairly well. Potential problems with very long keys, because there will be lots of buffer overflows and collisions.

17 Collision Resolution A collision occurs when an element is inserted under a key that hashes to the cell that is already occupied with a different element.

18 Collision Resolution Strategies Separate chaining Open addressing

19 Separate Chaining Separate chaining keeps a list of all elements whose keys hash to the same index. What does it mean? Under separate chaining, a hash table is an array of lists. The term “lists” is used rather loosely in the previous statement. It can be an array of AVL search trees or an array of has tables. But the linked list remains the most common choice.

20 Hash Table: Implementation template class CHashTable { … private: vector > m_Lists; int m_Size; … }; int hash(const string &key) { …}

21 Hash Table: Implementation class CEmployee { private: string m_Name; double m_Salary; … }; int hash(const Employee &x) { return hash(x.GetName()); }

22 Hash Table: Implementation template int CHashTable ::hashIndex(const T& x) const { int index = hash(x); index %= m_Lists.size(); if ( index < 0 ) index += m_Lists.size(); return index; }


Download ppt "CS2420: Lecture 33 Vladimir Kulyukin Computer Science Department Utah State University."

Similar presentations


Ads by Google