Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
The Dictionary ADT Definition A dictionary is an ordered or unordered list of key-element pairs, where keys are used to locate elements in the list. Example:
Skip List & Hashing CSE, POSTECH.
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Techniques.
Dictionaries and Hash Tables1  
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Dictionaries 4/17/2017 3:23 PM Hash Tables  
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
1 Hash Tables  a hash table is an array of size Tsize  has index positions 0.. Tsize-1  two types of hash tables  open hash table  array element type.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
IS 2610: Data Structures Searching March 29, 2004.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hash Tables. 2 Exercise 2 /* Exercise 1 */ void mystery(int n) { int i, j, k; for (i = 1; i
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
Data Structures Using C++
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Prof. amr Goneid, AUC1 CSCE 110 PROGRAMMING FUNDAMENTALS WITH C++ Prof. Amr Goneid AUC Part 15. Dictionaries (1): A Key Table Class.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Searching.
Advanced Associative Structures
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSE 373 Data Structures and Algorithms
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing.
Dictionaries and Hash Tables
Presentation transcript:

Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables

Prof. Amr Goneid, AUC2 Dictionaries(2): Hash Tables Hash Tables as Dictionaries Hashing Process Collision Handling: Open Addressing Collision Handling: Chaining Properties of Hash Functions Template Class Hash Table Performance

Prof. Amr Goneid, AUC3 1. Hash Tables as Dictionaries Simple containers such as tables, stacks and queues permit access of elements by position or order of insertion. A Dictionary is a form of container that permits access by content.

Prof. Amr Goneid, AUC4 The Dictionary Data Structure A dictionary DS should support the following main operations: Insert (D,x): Insert item x in dictionary D Delete (D,x): Delete item x from D Search (D,k): search for key k in D

Prof. Amr Goneid, AUC5 The Dictionary Data Structure Examples: Unsorted arrays and Linked Lists: permit linear search Sorted arrays: permit Binary search Ordered Lists: permit linear search Binary Search Trees (BST): fast support of all dictionary operations. Hash Tables: Fast retrieval by hashing key directly to a position.

Prof. Amr Goneid, AUC6 The Dictionary Data Structure There are 3 types of dictionaries: Static Dictionaries — These are built once and never change. Thus they need to support search, but not insertion or deletion. These are better implemented using arrays or Hash tables with linear probing. Semi-dynamic Dictionaries — These structures support insertion and search queries, but not deletion. These can be implemented as arrays, linked lists or Hash tables with linear probing.

Prof. Amr Goneid, AUC7 The Dictionary Data Structure Fully Dynamic Dictionaries — These need fast support of all dictionary operations. Binary Search Trees are best. Hash tables are also great for fully dynamic dictionaries as well, provided we use chaining as the collision resolution mechanism.

Prof. Amr Goneid, AUC8 The Dictionary Data Structure In the revision part R3, we present two dictionary data structures that support all basic operations. Both are linear structures and so employ linear search, i.e O(n). They are suitable for small to medium sized data. The first uses a run-time array to implement an ordered list and is suitable if we know the maximum data size The second uses a linked list and is suitable if we do not know the size of data to insert.

Prof. Amr Goneid, AUC9 Hash Tables as Dictionaries Dictionaries implemented as linear lists perform searching through matching. Linear search costs O(n) comparisons. Dictionaries implemented as BST’s also search by matching. However, the search cost is O(h), where (h) is the tree height. For balanced trees, this is O(log n). Some situations require even faster search. This can be achieved by using dictionaries based on Hash Tables. Hash tables are excellent dictionary data structures, particularly if deletion need not be supported.

Prof. Amr Goneid, AUC10 Hash Tables as Dictionaries Hashing applies a function to the search key so we can determine where the item will appear in an array (Hash Table) without looking at the other items (Direct Search). Also, we do not care about the sorting order of the keys The function o use is called a “Hash Function” Under ideal circumstances the cost of search is constant, independent of the size (n) of keys, i.e. it is O(1)

Prof. Amr Goneid, AUC11 2. Hashing Process For a hash table of size (n): h = hash (key), h = 0,1,2,...,n-1 The basic hash function converts the key to an integer, and takes the value of this integer mod the size of the hash table. keydata 0 1 h n-1 hash(key) key O(1)

Prof. Amr Goneid, AUC12 Collision Collision It could happen that two keys hash to the same position, e.g., a table of size 11 and two keys, 55 and 66: 55 % 11  0 and 66 % 11  0 Two distinct keys mapped to the same location are called “synonyms” and the situation is called “collision” There are different ways to handle collisions. One of them is called “open addressing” or “Linear Probing”

Prof. Amr Goneid, AUC13 3. Collision Handling: Open Addressing

Prof. Amr Goneid, AUC14 Collision Handling: Open Addressing / Linear Probing In open addressing, we use a simple rule to probe where to put a newitem when the desired slot h is already occupied. A popular probe sequence is Linear Probing. We always put the item inthe next unoccupied cell. If slot h is occupied, the next slot to probe is h = (h+1) mod maxsize On searching for a given item, we go to the intended location and search sequentially. If we find an empty cell before we find the item, it does not exist anywhere in the table.

Prof. Amr Goneid, AUC15 Example consider inserting the following sequence of keys in a hash table of size n = 11 {55,35,66,76,59,48,84,70} Assume a simple hashing function: h = hash(key) = key % n Assume the table to be initially empty. We may use -1 as an empty symbol.

Prof. Amr Goneid, AUC16 Example 55  0 35 

Prof. Amr Goneid, AUC17 Example 66  0 collides with

Prof. Amr Goneid, AUC18 Example 66  0 so it is put in the next available slot

Prof. Amr Goneid, AUC19 Example 76  

Prof. Amr Goneid, AUC20 Example 48  4 collides with

Prof. Amr Goneid, AUC21 Example 48  4 so it is put in the next available slot

Prof. Amr Goneid, AUC22 Example 84 

Prof. Amr Goneid, AUC23 Example 70  4 collides with

Prof. Amr Goneid, AUC24 Example 70  4 so it is put in the next available slot

Prof. Amr Goneid, AUC25 Example What happens if we have to probe beyond the end of the table? For example 54  10 collides with

Prof. Amr Goneid, AUC26 Example So, we do a circular search: h = (h+1) % n 54 

Prof. Amr Goneid, AUC27 Insertion Algorithm bool insert (key, data) { if (table is not full) { h = hash(key);// Hash key to slot h while (slot h not empty) h = (h+1) % MaxSize; // Circular Advance insert key and data at slot h; return true; } else return false; }

Prof. Amr Goneid, AUC28 Search Algorithm Searching for a key in a hash table using open addressing faces 3 situations: The slot h is empty, then the key does not exist There is a match at slot h, key is found Another key occupies slot h, so we do a circular search until one of the above situations exists, or we return back to the starting point, in which case the key does not exist.

Prof. Amr Goneid, AUC29 Search Algorithm bool search (key ) { if (table is not empty) { h = hash(k); // Hash key to slot h start = h;// Starting Slot while (true) { if (slot h is Empty) return false; if (there is a match at h) return true; h = (h+1) % MaxSize; // Circular Advance if (h == start) return false; } else return false; }

Prof. Amr Goneid, AUC30 4. Collision Handling: Chaining Chaining is a collision resolution mechanism A smaller table is used in which each location is associated with a linked list Synonyms of a key in slot are stored in the linked list associated with that slot. Searching is done by hashing the key to a main slot and if not found, a linear search is conducted in the associated linked list.

Prof. Amr Goneid, AUC31 Example h = key % 11

Prof. Amr Goneid, AUC32 5. Properties of Hash Functions A hash function is usually specified in two steps: Hash code map: h 1 (key) -> an integer (K) Compression Map: h 2 (K) -> [0, N-1] i.e. h(key) = h 2 (h 1 (key))

Prof. Amr Goneid, AUC33 Properties of Hash Functions A hash function should be simple, fast and single-valued A hash function should scatter (h) over the range 0 to MaxSize-1, i.e. it should provide a uniform distribution of hash values A hash function should not cluster keys in regions of the table. Using MaxSize as a prime number reduces clustering. The key to efficiency is using a large-enough table that contains many holes.

Prof. Amr Goneid, AUC34 Properties of Hash Functions There are many hash functions with varying performance. For numeric keys, Random Hashing is very good: If x is the key, then a large integer is obtained as: K = (α x + β) % m α = β = m = The hashed value is then computed as: h = K % MaxSize

Prof. Amr Goneid, AUC35 Properties of Hash Functions For a string key (S) consisting of characters {S 0 S 1...S L-1 } we may use one of the following:

Prof. Amr Goneid, AUC36 Other Hash Functions Hash Code Maps: Memory addresses as integers (K) Partition bits of the key into components of fixed length (e.g. 8 or 16 its) and sum components Hash Compression Maps: Divide: h 2 (K) = K mod N Multiply, add and divide (MAD): h 2 (K) = (aK+b) mod N, with a mod N  0

Prof. Amr Goneid, AUC37 6. ADT HashTable As an example, we consider a hashTable ADT that supports most dictionary functions, but not deletion The table is implemented as a dynamic array. We use a simple remainder hashing function Linear probing is used for collision handling

Prof. Amr Goneid, AUC38 HashTable ADT Operations constructor: Construct an empty table Destructor: Destroy table MakeTableEmpty: Empty whole table TableIsEmpty : Return True if table is empty TableIsFull : Return True if table is full Occupancy: Return number of occupied slots Insert: Insert key and data in a slot Search: Search for a key Retrieve: Retrieve the data part of the current slot Update: Update the data part of the current slot Traverse: Traverse whole table

Prof. Amr Goneid, AUC39 7. Performance: Linear Probing Although searching in a hash table is supposed to be of complexity O(1), collision will increase search cost. Consider a hash table of size m. Let P(n,m) be the probability that No collisions happen when inserting the n th key in a hash table already occupied by (n-1) keys. Then P(1,m) = m/m = 1, and P(2,m) = (m/m)(m-1)/m, etc.

Prof. Amr Goneid, AUC40 Performance: Linear Probing Generally:

Fall 2007Prof. Amr Goneid, AUC41 Performance: Linear Probing For m = 100, this probability is about 50% when n = 12 and is almost 0 when n = 30. m = 100 n P(n,m)

Prof. Amr Goneid, AUC42 Performance : Linear Probing An important factor is the Load Factor α = No. of Keys / MaxSize = occupancy Let S(α) be the average cost of successful search for a key, and U(α) be that for unsuccessful search. The problem of deriving these costs was solved by Donald Knuth in 1962.

Prof. Amr Goneid, AUC43 Performance : Linear Probing The solution is S(  ) ≈ ( 1/2 ) ( 1 + x ) for successful search U(  ) ≈ ( 1/2 ) ( 1 + x 2 ) for unsuccessful search where x = 1/(1-  ) and  is the load factor. The following table shows how the costs are affected by the load factor:  66%75%90% S(α) U(α)

Prof. Amr Goneid, AUC44 Performance: Double Hashing In case of collision, a second hashing function is used to hash key to the next probe position. h = [h 1 (key)+ h 2 (key)] mod Maxsize Average Case Analysis (Knuth): Example: 10 3 U(n) S(n) 0.9 2/3  Successful Search S(n) = - ln (1 -  )/  Unsuccessful Search U(n) = 1/(1 -  )

Prof. Amr Goneid, AUC45 Performance: Chaining n = total number of keys Q = number of main slots For n >> Q then the average chain length is L = n/Q Best Case: T(n) = 1 Worst Case: T(n) = L + 1 = n/Q + 1 Average case: T(n) = n/(2Q) + 1 L Q

Prof. Amr Goneid, AUC46 Learn on your own about: Hashing Functions Buckets and Chaining Double hashing