Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.

Similar presentations


Presentation on theme: "Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables."— Presentation transcript:

1 Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables

2 Prof. Amr Goneid, AUC2 Dictionaries(2): Hash Tables Hash Tables as Dictionaries Hashing Process Collision Handling: Open Addressing Collision Handling: Chaining Properties of Hash Functions Template Class Hash Table Performance

3 Prof. Amr Goneid, AUC3 1. Hash Tables as Dictionaries Simple containers such as tables, stacks and queues permit access of elements by position or order of insertion. A Dictionary is a form of container that permits access by content.

4 Prof. Amr Goneid, AUC4 The Dictionary Data Structure A dictionary DS should support the following main operations: Insert (D,x): Insert item x in dictionary D Delete (D,x): Delete item x from D Search (D,k): search for key k in D

5 Prof. Amr Goneid, AUC5 The Dictionary Data Structure Examples: Unsorted arrays and Linked Lists: permit linear search Sorted arrays: permit Binary search Ordered Lists: permit linear search Binary Search Trees (BST): fast support of all dictionary operations. Hash Tables: Fast retrieval by hashing key directly to a position.

6 Prof. Amr Goneid, AUC6 The Dictionary Data Structure There are 3 types of dictionaries: Static Dictionaries — These are built once and never change. Thus they need to support search, but not insertion or deletion. These are better implemented using arrays or Hash tables with linear probing. Semi-dynamic Dictionaries — These structures support insertion and search queries, but not deletion. These can be implemented as arrays, linked lists or Hash tables with linear probing.

7 Prof. Amr Goneid, AUC7 The Dictionary Data Structure Fully Dynamic Dictionaries — These need fast support of all dictionary operations. Binary Search Trees are best. Hash tables are also great for fully dynamic dictionaries as well, provided we use chaining as the collision resolution mechanism.

8 Prof. Amr Goneid, AUC8 The Dictionary Data Structure In the revision part R3, we present two dictionary data structures that support all basic operations. Both are linear structures and so employ linear search, i.e O(n). They are suitable for small to medium sized data. The first uses a run-time array to implement an ordered list and is suitable if we know the maximum data size The second uses a linked list and is suitable if we do not know the size of data to insert.

9 Prof. Amr Goneid, AUC9 Hash Tables as Dictionaries Dictionaries implemented as linear lists perform searching through matching. Linear search costs O(n) comparisons. Dictionaries implemented as BST’s also search by matching. However, the search cost is O(h), where (h) is the tree height. For balanced trees, this is O(log n). Some situations require even faster search. This can be achieved by using dictionaries based on Hash Tables. Hash tables are excellent dictionary data structures, particularly if deletion need not be supported.

10 Prof. Amr Goneid, AUC10 Hash Tables as Dictionaries Hashing applies a function to the search key so we can determine where the item will appear in an array (Hash Table) without looking at the other items (Direct Search). Also, we do not care about the sorting order of the keys The function o use is called a “Hash Function” Under ideal circumstances the cost of search is constant, independent of the size (n) of keys, i.e. it is O(1)

11 Prof. Amr Goneid, AUC11 2. Hashing Process For a hash table of size (n): h = hash (key), h = 0,1,2,...,n-1 The basic hash function converts the key to an integer, and takes the value of this integer mod the size of the hash table. keydata 0 1 h n-1 hash(key) key O(1)

12 Prof. Amr Goneid, AUC12 Collision Collision It could happen that two keys hash to the same position, e.g., a table of size 11 and two keys, 55 and 66: 55 % 11  0 and 66 % 11  0 Two distinct keys mapped to the same location are called “synonyms” and the situation is called “collision” There are different ways to handle collisions. One of them is called “open addressing” or “Linear Probing”

13 Prof. Amr Goneid, AUC13 3. Collision Handling: Open Addressing

14 Prof. Amr Goneid, AUC14 Collision Handling: Open Addressing / Linear Probing In open addressing, we use a simple rule to probe where to put a newitem when the desired slot h is already occupied. A popular probe sequence is Linear Probing. We always put the item inthe next unoccupied cell. If slot h is occupied, the next slot to probe is h = (h+1) mod maxsize On searching for a given item, we go to the intended location and search sequentially. If we find an empty cell before we find the item, it does not exist anywhere in the table.

15 Prof. Amr Goneid, AUC15 Example consider inserting the following sequence of keys in a hash table of size n = 11 {55,35,66,76,59,48,84,70} Assume a simple hashing function: h = hash(key) = key % n Assume the table to be initially empty. We may use -1 as an empty symbol.

16 Prof. Amr Goneid, AUC16 Example 55  0 35  2 55 35

17 Prof. Amr Goneid, AUC17 Example 66  0 collides with 55 55 35

18 Prof. Amr Goneid, AUC18 Example 66  0 so it is put in the next available slot 55 66 35

19 Prof. Amr Goneid, AUC19 Example 76  10 59  4 55 66 35 59 76

20 Prof. Amr Goneid, AUC20 Example 48  4 collides with 59 55 66 35 59 76

21 Prof. Amr Goneid, AUC21 Example 48  4 so it is put in the next available slot 55 66 35 59 48 76

22 Prof. Amr Goneid, AUC22 Example 84  7 55 66 35 59 48 84 76

23 Prof. Amr Goneid, AUC23 Example 70  4 collides with 59 55 66 35 59 48 84 76

24 Prof. Amr Goneid, AUC24 Example 70  4 so it is put in the next available slot 55 66 35 59 48 70 84 76

25 Prof. Amr Goneid, AUC25 Example What happens if we have to probe beyond the end of the table? For example 54  10 collides with 76 55 66 35 59 48 70 84 76

26 Prof. Amr Goneid, AUC26 Example So, we do a circular search: h = (h+1) % n 54  10 55 66 35 54 59 48 70 84 76

27 Prof. Amr Goneid, AUC27 Insertion Algorithm bool insert (key, data) { if (table is not full) { h = hash(key);// Hash key to slot h while (slot h not empty) h = (h+1) % MaxSize; // Circular Advance insert key and data at slot h; return true; } else return false; }

28 Prof. Amr Goneid, AUC28 Search Algorithm Searching for a key in a hash table using open addressing faces 3 situations: The slot h is empty, then the key does not exist There is a match at slot h, key is found Another key occupies slot h, so we do a circular search until one of the above situations exists, or we return back to the starting point, in which case the key does not exist.

29 Prof. Amr Goneid, AUC29 Search Algorithm bool search (key ) { if (table is not empty) { h = hash(k); // Hash key to slot h start = h;// Starting Slot while (true) { if (slot h is Empty) return false; if (there is a match at h) return true; h = (h+1) % MaxSize; // Circular Advance if (h == start) return false; } else return false; }

30 Prof. Amr Goneid, AUC30 4. Collision Handling: Chaining Chaining is a collision resolution mechanism A smaller table is used in which each location is associated with a linked list Synonyms of a key in slot are stored in the linked list associated with that slot. Searching is done by hashing the key to a main slot and if not found, a linear search is conducted in the associated linked list.

31 Prof. Amr Goneid, AUC31 Example 60 66 59 47 35 89 55 4433 4567 387127 36 h = key % 11

32 Prof. Amr Goneid, AUC32 5. Properties of Hash Functions A hash function is usually specified in two steps: Hash code map: h 1 (key) -> an integer (K) Compression Map: h 2 (K) -> [0, N-1] i.e. h(key) = h 2 (h 1 (key))

33 Prof. Amr Goneid, AUC33 Properties of Hash Functions A hash function should be simple, fast and single-valued A hash function should scatter (h) over the range 0 to MaxSize-1, i.e. it should provide a uniform distribution of hash values A hash function should not cluster keys in regions of the table. Using MaxSize as a prime number reduces clustering. The key to efficiency is using a large-enough table that contains many holes.

34 Prof. Amr Goneid, AUC34 Properties of Hash Functions There are many hash functions with varying performance. For numeric keys, Random Hashing is very good: If x is the key, then a large integer is obtained as: K = (α x + β) % m α = 25173 β = 13849 m = 65536 The hashed value is then computed as: h = K % MaxSize

35 Prof. Amr Goneid, AUC35 Properties of Hash Functions For a string key (S) consisting of characters {S 0 S 1...S L-1 } we may use one of the following:

36 Prof. Amr Goneid, AUC36 Other Hash Functions Hash Code Maps: Memory addresses as integers (K) Partition bits of the key into components of fixed length (e.g. 8 or 16 its) and sum components Hash Compression Maps: Divide: h 2 (K) = K mod N Multiply, add and divide (MAD): h 2 (K) = (aK+b) mod N, with a mod N  0

37 Prof. Amr Goneid, AUC37 6. ADT HashTable As an example, we consider a hashTable ADT that supports most dictionary functions, but not deletion The table is implemented as a dynamic array. We use a simple remainder hashing function Linear probing is used for collision handling

38 Prof. Amr Goneid, AUC38 HashTable ADT Operations constructor: Construct an empty table Destructor: Destroy table MakeTableEmpty: Empty whole table TableIsEmpty : Return True if table is empty TableIsFull : Return True if table is full Occupancy: Return number of occupied slots Insert: Insert key and data in a slot Search: Search for a key Retrieve: Retrieve the data part of the current slot Update: Update the data part of the current slot Traverse: Traverse whole table

39 Prof. Amr Goneid, AUC39 7. Performance: Linear Probing Although searching in a hash table is supposed to be of complexity O(1), collision will increase search cost. Consider a hash table of size m. Let P(n,m) be the probability that No collisions happen when inserting the n th key in a hash table already occupied by (n-1) keys. Then P(1,m) = m/m = 1, and P(2,m) = (m/m)(m-1)/m, etc.

40 Prof. Amr Goneid, AUC40 Performance: Linear Probing Generally:

41 Fall 2007Prof. Amr Goneid, AUC41 Performance: Linear Probing For m = 100, this probability is about 50% when n = 12 and is almost 0 when n = 30. m = 100 n P(n,m)

42 Prof. Amr Goneid, AUC42 Performance : Linear Probing An important factor is the Load Factor α = No. of Keys / MaxSize = occupancy Let S(α) be the average cost of successful search for a key, and U(α) be that for unsuccessful search. The problem of deriving these costs was solved by Donald Knuth in 1962.

43 Prof. Amr Goneid, AUC43 Performance : Linear Probing The solution is S(  ) ≈ ( 1/2 ) ( 1 + x ) for successful search U(  ) ≈ ( 1/2 ) ( 1 + x 2 ) for unsuccessful search where x = 1/(1-  ) and  is the load factor. The following table shows how the costs are affected by the load factor:  66%75%90% S(α)22.55.5 U(α)58.550.5

44 Prof. Amr Goneid, AUC44 Performance: Double Hashing In case of collision, a second hashing function is used to hash key to the next probe position. h = [h 1 (key)+ h 2 (key)] mod Maxsize Average Case Analysis (Knuth): Example: 10 3 U(n) 2.55 1.6 S(n) 0.9 2/3  Successful Search S(n) = - ln (1 -  )/  Unsuccessful Search U(n) = 1/(1 -  )

45 Prof. Amr Goneid, AUC45 Performance: Chaining n = total number of keys Q = number of main slots For n >> Q then the average chain length is L = n/Q Best Case: T(n) = 1 Worst Case: T(n) = L + 1 = n/Q + 1 Average case: T(n) = n/(2Q) + 1 L Q

46 Prof. Amr Goneid, AUC46 Learn on your own about: Hashing Functions Buckets and Chaining Double hashing


Download ppt "Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables."

Similar presentations


Ads by Google