Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 210 Data Structures and Algorithms

Similar presentations


Presentation on theme: "CSCI 210 Data Structures and Algorithms"— Presentation transcript:

1 CSCI 210 Data Structures and Algorithms
Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables Prof. Amr Goneid, AUC

2 Dictionaries(2): Hash Tables
Hash Tables as Dictionaries Hashing Process Collision Handling: Open Addressing Collision Handling: Chaining Properties of Hash Functions Template Class Hash Table Performance Prof. Amr Goneid, AUC

3 1. Hash Tables as Dictionaries
Simple containers such as tables, stacks and queues permit access of elements by position or order of insertion. A Dictionary is a form of container that permits access by content. Prof. Amr Goneid, AUC

4 The Dictionary Data Structure
A dictionary DS should support the following main operations: Insert (D,x): Insert item x in dictionary D Delete (D,x): Delete item x from D Search (D,k): search for key k in D Prof. Amr Goneid, AUC

5 The Dictionary Data Structure
Examples: Unsorted arrays and Linked Lists: permit linear search Sorted arrays: permit Binary search Ordered Lists: permit linear search Binary Search Trees (BST): fast support of all dictionary operations. Hash Tables: Fast retrieval by hashing key directly to a position. Prof. Amr Goneid, AUC

6 The Dictionary Data Structure
There are 3 types of dictionaries: Static Dictionaries — These are built once and never change. Thus they need to support search, but not insertion or deletion. These are better implemented using arrays or Hash tables with linear probing. Semi-dynamic Dictionaries — These structures support insertion and search queries, but not deletion. These can be implemented as arrays, linked lists or Hash tables with linear probing. Prof. Amr Goneid, AUC

7 The Dictionary Data Structure
Fully Dynamic Dictionaries — These need fast support of all dictionary operations. Binary Search Trees are best. Hash tables are also great for fully dynamic dictionaries as well, provided we use chaining as the collision resolution mechanism. Prof. Amr Goneid, AUC

8 The Dictionary Data Structure
In the revision part R3, we present two dictionary data structures that support all basic operations. Both are linear structures and so employ linear search, i.e O(n). They are suitable for small to medium sized data. The first uses a run-time array to implement an ordered list and is suitable if we know the maximum data size The second uses a linked list and is suitable if we do not know the size of data to insert. Prof. Amr Goneid, AUC

9 Hash Tables as Dictionaries
Dictionaries implemented as linear lists perform searching through matching. Linear search costs O(n) comparisons. Dictionaries implemented as BST’s also search by matching. However, the search cost is O(h), where (h) is the tree height. For balanced trees, this is O(log n). Some situations require even faster search. This can be achieved by using dictionaries based on Hash Tables. Hash tables are excellent dictionary data structures, particularly if deletion need not be supported. Prof. Amr Goneid, AUC

10 Hash Tables as Dictionaries
Hashing applies a function to the search key so we can determine where the item will appear in an array (Hash Table) without looking at the other items (Direct Search). Also, we do not care about the sorting order of the keys The function o use is called a “Hash Function” Under ideal circumstances the cost of search is constant, independent of the size (n) of keys, i.e. it is O(1) Prof. Amr Goneid, AUC

11 2. Hashing Process For a hash table of size (n):
h = hash (key), h = 0,1,2,...,n-1 The basic hash function converts the key to an integer, and takes the value of this integer mod the size of the hash table. key data 1 h O(1) key hash(key) n-1 Prof. Amr Goneid, AUC

12 Collision It could happen that two keys hash to the same position, e.g., a table of size 11 and two keys, 55 and 66: 55 % 11  0 and 66 % 11  0 Two distinct keys mapped to the same location are called “synonyms” and the situation is called “collision” There are different ways to handle collisions. One of them is called “open addressing” or “Linear Probing” Prof. Amr Goneid, AUC

13 3. Collision Handling: Open Addressing
Prof. Amr Goneid, AUC

14 Collision Handling: Open Addressing / Linear Probing
In open addressing, we use a simple rule to probe where to put a new item when the desired slot h is already occupied. A popular probe sequence is Linear Probing. We always put the item in the next unoccupied cell. If slot h is occupied, the next slot to probe is h = (h+1) mod maxsize On searching for a given item, we go to the intended location and search sequentially. If we find an empty cell before we find the item, it does not exist anywhere in the table. Prof. Amr Goneid, AUC

15 Example consider inserting the following sequence of keys in a hash table of size n = 11 {55,35,66,76,59,48,84,70} Assume a simple hashing function: h = hash(key) = key % n Assume the table to be initially empty. We may use -1 as an empty symbol. -1 Prof. Amr Goneid, AUC

16 Example 55  0 35  2 55 -1 35 Prof. Amr Goneid, AUC

17 Example 66  0 collides with 55 55 -1 35 Prof. Amr Goneid, AUC

18 Example 66  0 so it is put in the next available slot 55 66 35 -1
Prof. Amr Goneid, AUC

19 Example 76  10 59  4 55 66 35 -1 59 76 Prof. Amr Goneid, AUC

20 Example 48  4 collides with 59 55 66 35 -1 59 76
Prof. Amr Goneid, AUC

21 Example 48  4 so it is put in the next available slot 55 66 35 -1 59
76 Prof. Amr Goneid, AUC

22 Example 84  7 55 66 35 -1 59 48 84 76 Prof. Amr Goneid, AUC

23 Example 70  4 collides with 59 55 66 35 -1 59 48 84 76
Prof. Amr Goneid, AUC

24 Example 70  4 so it is put in the next available slot 55 66 35 -1 59
48 70 84 76 Prof. Amr Goneid, AUC

25 Example What happens if we have to probe beyond the end of the table?
For example 54  10 collides with 76 55 66 35 -1 59 48 70 84 76 Prof. Amr Goneid, AUC

26 Example So, we do a circular search: h = (h+1) % n 54  10 55 66 35 54
59 48 70 84 -1 76 Prof. Amr Goneid, AUC

27 Demo Linear Probing Demo Prof. Amr Goneid, AUC

28 Insertion Algorithm bool insert (key , data) { if (table is not full)
h = hash(key); // Hash key to slot h while (slot h not empty) h = (h+1) % MaxSize; // Circular Advance insert key and data at slot h; return true; } else return false; Prof. Amr Goneid, AUC

29 Search Algorithm Searching for a key in a hash table using open
addressing faces 3 situations: The slot h is empty, then the key does not exist There is a match at slot h, key is found Another key occupies slot h, so we do a circular search until one of the above situations exists, or we return back to the starting point, in which case the key does not exist. Prof. Amr Goneid, AUC

30 Search Algorithm bool search (key ) { if (table is not empty)
h = hash(k); // Hash key to slot h start = h; // Starting Slot while (true) if (slot h is Empty) return false; if (there is a match at h) return true; h = (h+1) % MaxSize; // Circular Advance if (h == start) return false; } else return false; Prof. Amr Goneid, AUC

31 4. Collision Handling: Chaining
Chaining is a collision resolution mechanism A smaller table is used in which each location is associated with a linked list Synonyms of a key in slot are stored in the linked list associated with that slot. Searching is done by hashing the key to a main slot and if not found, a linear search is conducted in the associated linked list. Prof. Amr Goneid, AUC

32 Example 55 66 44 33 89 45 67 35 47 36 59 60 38 71 27 h = key % 11 Prof. Amr Goneid, AUC

33 5. Properties of Hash Functions
A hash function is usually specified in two steps: Hash code map: h1(key) -> an integer (K) Compression Map: h2(K) -> [0, N-1] i.e. h(key) = h2(h1(key)) Prof. Amr Goneid, AUC

34 Properties of Hash Functions
A hash function should be simple, fast and single-valued A hash function should scatter (h) over the range 0 to MaxSize-1, i.e. it should provide a uniform distribution of hash values A hash function should not cluster keys in regions of the table. Using MaxSize as a prime number reduces clustering. The key to efficiency is using a large-enough table that contains many holes. Prof. Amr Goneid, AUC

35 Properties of Hash Functions
There are many hash functions with varying performance. For numeric keys, Random Hashing is very good: If x is the key, then a large integer is obtained as: K = (α x + β) % m α = β = m = 65536 The hashed value is then computed as: h = K % MaxSize Prof. Amr Goneid, AUC

36 Properties of Hash Functions
For a string key (S) consisting of characters {S0 S1...SL-1 } we may use one of the following: Prof. Amr Goneid, AUC

37 Other Hash Functions Hash Code Maps: Hash Compression Maps:
Memory addresses as integers (K) Partition bits of the key into components of fixed length (e.g. 8 or 16 its) and sum components Hash Compression Maps: Divide: h2(K) = K mod N Multiply, add and divide (MAD): h2(K) = (aK+b) mod N, with a mod N  0 Prof. Amr Goneid, AUC

38 6. ADT HashTable As an example, we consider a hashTable ADT that supports most dictionary functions, but not deletion The table is implemented as a dynamic array. We use a simple remainder hashing function Linear probing is used for collision handling Prof. Amr Goneid, AUC

39 HashTable ADT Operations
constructor: Construct an empty table Destructor: Destroy table MakeTableEmpty: Empty whole table TableIsEmpty : Return True if table is empty TableIsFull : Return True if table is full Occupancy: Return number of occupied slots Insert: Insert key and data in a slot Search: Search for a key Retrieve: Retrieve the data part of the current slot Update: Update the data part of the current slot Traverse: Traverse whole table Prof. Amr Goneid, AUC

40 7. Performance: Linear Probing
Although searching in a hash table is supposed to be of complexity O(1), collision will increase search cost. Consider a hash table of size m. Let P(n,m) be the probability that No collisions happen when inserting the nth key in a hash table already occupied by (n-1) keys. Then P(1,m) = m/m = 1, and P(2,m) = (m/m)(m-1)/m, etc. Prof. Amr Goneid, AUC

41 Performance: Linear Probing
Generally: Prof. Amr Goneid, AUC

42 Performance: Linear Probing
For m = 100, this probability is about 50% when n = 12 and is almost 0 when n = 30 . P(n,m) m = 100 n Fall 2007 Prof. Amr Goneid, AUC

43 Performance : Linear Probing
An important factor is the Load Factor α = No. of Keys / MaxSize = occupancy Let S(α) be the average cost of successful search for a key, and U(α) be that for unsuccessful search. The problem of deriving these costs was solved by Donald Knuth in 1962. Prof. Amr Goneid, AUC

44 Performance : Linear Probing
The solution is S() ≈ ( 1/2 ) ( 1 + x ) for successful search U() ≈ ( 1/2 ) ( 1 + x2 ) for unsuccessful search where x = 1/(1- ) and  is the load factor. The following table shows how the costs are affected by the load factor: 66% 75% 90% S(α) 2 2.5 5.5 U(α) 5 8.5 50.5 Prof. Amr Goneid, AUC

45 Performance: Double Hashing
In case of collision, a second hashing function is used to hash key to the next probe position. h = [h1(key)+ h2(key)] mod Maxsize Average Case Analysis (Knuth): Example: Successful Search S(n) = - ln (1 - )/  Unsuccessful Search U(n) = 1/(1 - ) 10 3 U(n) 2.55 1.6 S(n) 0.9 2/3 Prof. Amr Goneid, AUC

46 Performance: Chaining
n = total number of keys Q = number of main slots For n >> Q then the average chain length is L = n/Q Best Case: T(n) = 1 Worst Case: T(n) = L + 1 = n/Q + 1 Average case: T(n) = n/(2Q) + 1 L Q Prof. Amr Goneid, AUC

47 Learn on your own about:
Hashing Functions Buckets and Chaining Double hashing Prof. Amr Goneid, AUC


Download ppt "CSCI 210 Data Structures and Algorithms"

Similar presentations


Ads by Google