Download presentation
Presentation is loading. Please wait.
Published byAlexina Boyd Modified over 8 years ago
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables
Prof. Amr Goneid, AUC2 Dictionaries(2): Hash Tables Hash Tables as Dictionaries Hashing Process Collision Handling: Open Addressing Collision Handling: Chaining Properties of Hash Functions Template Class Hash Table Performance
Prof. Amr Goneid, AUC3 1. Hash Tables as Dictionaries Simple containers such as tables, stacks and queues permit access of elements by position or order of insertion. A Dictionary is a form of container that permits access by content.
Prof. Amr Goneid, AUC4 The Dictionary Data Structure A dictionary DS should support the following main operations: Insert (D,x): Insert item x in dictionary D Delete (D,x): Delete item x from D Search (D,k): search for key k in D
Prof. Amr Goneid, AUC5 The Dictionary Data Structure Examples: Unsorted arrays and Linked Lists: permit linear search Sorted arrays: permit Binary search Ordered Lists: permit linear search Binary Search Trees (BST): fast support of all dictionary operations. Hash Tables: Fast retrieval by hashing key directly to a position.
Prof. Amr Goneid, AUC6 The Dictionary Data Structure There are 3 types of dictionaries: Static Dictionaries — These are built once and never change. Thus they need to support search, but not insertion or deletion. These are better implemented using arrays or Hash tables with linear probing. Semi-dynamic Dictionaries — These structures support insertion and search queries, but not deletion. These can be implemented as arrays, linked lists or Hash tables with linear probing.
Prof. Amr Goneid, AUC7 The Dictionary Data Structure Fully Dynamic Dictionaries — These need fast support of all dictionary operations. Binary Search Trees are best. Hash tables are also great for fully dynamic dictionaries as well, provided we use chaining as the collision resolution mechanism.
Prof. Amr Goneid, AUC8 The Dictionary Data Structure In the revision part R3, we present two dictionary data structures that support all basic operations. Both are linear structures and so employ linear search, i.e O(n). They are suitable for small to medium sized data. The first uses a run-time array to implement an ordered list and is suitable if we know the maximum data size The second uses a linked list and is suitable if we do not know the size of data to insert.
Prof. Amr Goneid, AUC9 Hash Tables as Dictionaries Dictionaries implemented as linear lists perform searching through matching. Linear search costs O(n) comparisons. Dictionaries implemented as BST’s also search by matching. However, the search cost is O(h), where (h) is the tree height. For balanced trees, this is O(log n). Some situations require even faster search. This can be achieved by using dictionaries based on Hash Tables. Hash tables are excellent dictionary data structures, particularly if deletion need not be supported.
Prof. Amr Goneid, AUC10 Hash Tables as Dictionaries Hashing applies a function to the search key so we can determine where the item will appear in an array (Hash Table) without looking at the other items (Direct Search). The function is called a “Hash Function” Under ideal circumstances the cost of search is constant, independent of the size (n) of keys, i.e. it is O(1)
Prof. Amr Goneid, AUC11 2. Hashing Process For a hash table of size (n): h = hash (key), h = 0,1,2,...,n-1 The basic hash function converts the key to an integer, and takes the value of this integer mod the size of the hash table. keydata 0 1 h n-1 hash(key) key O(1)
Prof. Amr Goneid, AUC12 Collision Collision It could happen that two keys hash to the same position, e.g., a table of size 11 and two keys, 55 and 66: 55 % 11 0 and 66 % 11 0 Two distinct keys mapped to the same location are called “synonyms” and the situation is called “collision” There are different ways to handle collisions. One of them is called “open addressing” or “Linear Probing”
Prof. Amr Goneid, AUC13 3. Collision Handling: Open Addressing
Prof. Amr Goneid, AUC14 Collision Handling: Open Addressing In open addressing, we use a simple rule to decide where to put a newitem when the desired space is already occupied. We always put it inthe next unoccupied cell. On searching for a given item, we go to the intended location and search sequentially. If we find an empty cell before we find the item, it does not exist anywhere in the table.
Prof. Amr Goneid, AUC15 Example consider inserting the following sequence of keys in a hash table of size n = 11 {55,35,66,76,59,48,84,70} Assume a simple hashing function: h = hash(key) = key % n Assume the table to be initially empty. We may use -1 as an empty symbol.
Prof. Amr Goneid, AUC16 Example 55 0 35 2 55 35
Prof. Amr Goneid, AUC17 Example 66 0 collides with 55 55 35
Prof. Amr Goneid, AUC18 Example 66 0 so it is put in the next available slot 55 66 35
Prof. Amr Goneid, AUC19 Example 76 10 59 4 55 66 35 59 76
Prof. Amr Goneid, AUC20 Example 48 4 collides with 59 55 66 35 59 76
Prof. Amr Goneid, AUC21 Example 48 4 so it is put in the next available slot 55 66 35 59 48 76
Prof. Amr Goneid, AUC22 Example 84 7 55 66 35 59 48 84 76
Prof. Amr Goneid, AUC23 Example 70 4 collides with 59 55 66 35 59 48 84 76
Prof. Amr Goneid, AUC24 Example 70 4 so it is put in the next available slot 55 66 35 59 48 70 84 76
Prof. Amr Goneid, AUC25 Example What happens if we have to probe beyond the end of the table? For example 54 10 collides with 76 55 66 35 59 48 70 84 76
Prof. Amr Goneid, AUC26 Example So, we do a circular search: h = (h+1) % n 54 10 55 66 35 54 59 48 70 84 76
Prof. Amr Goneid, AUC27 Insertion Algorithm bool insert (key, data) { if (table is not full) { h = hash(key);// Hash key to slot h while (slot h not empty) h = (h+1) % MaxSize; // Circular Search insert key and data at slot h; return true; } else return false; }
Prof. Amr Goneid, AUC28 Search Algorithm Searching for a key in a hash table using open addressing faces 3 situations: The slot h is empty, then the key does not exist There is a match at slot h, key is found Another key occupies slot h, so we do a circular search until one of the above situations exists, or we return back to the starting point, in which case the key does not exist.
Prof. Amr Goneid, AUC29 Search Algorithm bool search (key ) { if(table is not empty) { h = hash(k); start = h; // Hash key to slot h for ( ; ; ) { if (slot h is Empty) return false; if (there is a match at h) return true; h = (h+1) % MaxSize; // Circular Search if (h == start) return false; } else return false; }
Prof. Amr Goneid, AUC30 4. Collision Handling: Chaining Chaining is a collision resolution mechanism A smaller table is used in which each location is associated with a linked list Synonyms of a key in slot are stored in the linked list associated with that slot. Searching is done by hashing the key to a main slot and if not found, a linear search is conducted in the associated linked list.
Prof. Amr Goneid, AUC31 Example 60 66 59 47 35 89 55 4433 4567 387127 36 h = key % 11
Prof. Amr Goneid, AUC32 5. Properties of Hash Functions A hash function is usually specified in two steps: Hash code map: h 1 (key) -> an integer (K) Compression Map: h 2 (K) -> [0, N-1] i.e. h(key) = h 2 (h 1 (key))
Prof. Amr Goneid, AUC33 Properties of Hash Functions A hash function should be simple and fast A hash function should scatter (h) over the range 0 to MaxSize-1 A hash function should not cluster keys in regions of the table. Using MaxSize as a prime number reduces clustering. The key to efficiency is using a large-enough table that contains many holes.
Prof. Amr Goneid, AUC34 Properties of Hash Functions There are many hash functions with varying performance. For numeric keys, Random Hashing is very good: If x is the key, then a large integer is obtained as: K = (α x + β) % m α = 25173 β = 13849 m = 65536 The hashed value is then computed as: h = K % MaxSize
Prof. Amr Goneid, AUC35 Properties of Hash Functions For a string key (S) consisting of characters {S 0 S 1...S L-1 } we may use one of the following:
Prof. Amr Goneid, AUC36 Other Hash Functions Hash Code Maps: Memory addresses as integers (K) Partition bits of the key into components of fixed length (e.g. 8 or 16 its) and sum components Hash Compression Maps: Divide: h 2 (K) = K mod N Multiply, add and divide (MAD): h 2 (K) = (aK+b) mod N, with a mod N 0
Prof. Amr Goneid, AUC37 6. Template Class HashTable As an example, we consider a template class hashTable that supports most dictionary functions, but not deletion The table is implemented as a dynamic array. We use a simple remainder hashing function Linear probing is used for collision handling
Prof. Amr Goneid, AUC38 Template Class HashTable // File: hashTable.h // Definition of Hash Table Template Class #ifndef HASH_TABLE_H #define HASH_TABLE_H template class hashTable { public: hashTable(int nelements = 11);// Constructor ~hashTable();// Destructor
Prof. Amr Goneid, AUC39 Template Class HashTable // Member Functions // Initialize table to all empty slots using an Empty symbol void emptyTable(const keyType & ); // Check if table is empty bool tableIsEmpty() const; // Check if table is full bool tableIsFull() const; // Return number of occupied slots int occupancy() const;
Prof. Amr Goneid, AUC40 Template Class HashTable // Insert key and data in a slot. Return false if table is full bool insert(const keyType &, const dataType & ); // Search for a key. If found, set location (h) to slot bool search(const keyType & ); // Update the data part of the current slot void updateData(const dataType & ); // Retrieve the data part of the current slot void retrieveData(dataType &) const; // Traverse whole table void traverse() const;
Prof. Amr Goneid, AUC41 Template Class HashTable private: // Slot Class class slot { public: keyType key; // key dataType data;// Data }; // end of class slot declaration
Prof. Amr Goneid, AUC42 Template Class HashTable slot *T;// Pointer to Storage Array int h;// Index to a slot int MaxSize; // Maximum table size int csize// no. of occupied slots keyType Empty;// empty symbol // Private Member function int hash(const keyType & ) const; // Hashing Function }; #endif // HASH_TABLE_H #include "hashTable.cpp"
Prof. Amr Goneid, AUC43 7. Performance: Linear Probing Although searching in a hash table is supposed to be of complexity O(1), collision will increase search cost. Consider a hash table of size m. Let P(n,m) be the probability that No collisions happen when inserting the n th key in a hash table already occupied by (n-1) keys. Then P(1,m) = m/m = 1, and P(2,m) = (m/m)(m-1)/m, etc.
Prof. Amr Goneid, AUC44 Performance: Linear Probing Generally:
Fall 2007Prof. Amr Goneid, AUC45 Performance: Linear Probing When m = 100, this probability is about 50% when n = 12 and is almost 0 when n = 30. m = 100 n P(n,m)
Prof. Amr Goneid, AUC46 Performance : Linear Probing An important factor is the Load Factor α = No. of Keys / MaxSize = occupancy Let S(α) be the average cost of successful search for a key, and U(α) be that for unsuccessful search. The problem of deriving these costs was solved by Donald Knuth in 1962.
Prof. Amr Goneid, AUC47 Performance : Linear Probing The solution is S( ) ≈ ( 1/2 ) ( 1 + x ) for successful search U( ) ≈ ( 1/2 ) ( 1 + x 2 ) for unsuccessful search where x = 1/(1- ) and is the load factor. The following table shows how the costs are affected by the load factor: 66%75%90% S(α)22.55.5 U(α)58.550.5
Prof. Amr Goneid, AUC48 Performance: Double Hashing In case of collision, a second hashing function is used to hash key to another position. Average Case Analysis (Knuth): Example: 10 3 U(n) 2.55 1.6 S(n) 0.9 2/3 Successful Search S(n) = - ln (1 - )/ Unsuccessful Search U(n) = 1/(1 - )
Prof. Amr Goneid, AUC49 Performance: Chaining n = total number of keys Q = number of main slots For n >> Q then the average chain length is L = n/Q Best Case: T(n) = 1 Worst Case: T(n) = L + 1 = n/Q + 1 Average case: T(n) = n/(2Q) + 1 L Q
Prof. Amr Goneid, AUC50 Learn on your own about: Hashing Functions Buckets and Chaining Double hashing
Similar presentations
© 2025 Inc.
All rights reserved.