Hash Maps Rem Collier Room A1.02 School of Computer Science and Informatics University College Dublin, Ireland.

Slides:



Advertisements
Similar presentations
© 2004 Goodrich, Tamassia Hash Tables
Advertisements

© 2004 Goodrich, Tamassia Hash Tables
Hashing as a Dictionary Implementation
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Data Structures Lecture 12 Fang Yu Department of Management Information Systems National Chengchi University Fall 2010.
Log Files. O(n) Data Structure Exercises 16.1.
Maps, Dictionaries, Hashtables
CSC401 – Analysis of Algorithms Lecture Notes 5 Heaps and Hash Tables Objectives: Introduce Heaps, Heap-sorting, and Heap- construction Analyze the performance.
Dictionaries and Hash Tables1  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing General idea: Get a large array
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
CS 221 Analysis of Algorithms Data Structures Dictionaries, Hash Tables, Ordered Dictionary and Binary Search Trees.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
Dictionaries and Hash Tables1 Hash Tables  
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 the hash table. hash table A hash table consists of two major components …
CSC 212 – Data Structures Lecture 28: More Hash and Dictionaries.
Lecture14: Hashing Bohyung Han CSE, POSTECH CSED233: Data Structures (2014F)
Algorithms Design Fall 2016 Week 6 Hash Collusion Algorithms and Binary Search Trees.
Maps Rem Collier Room A1.02 School of Computer Science and Informatics
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hashing (part 2) CSE 2011 Winter March 2018.
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Dictionaries and Hash Tables
Dictionaries Dictionaries 07/27/16 16:46 07/27/16 16:46 Hash Tables 
© 2013 Goodrich, Tamassia, Goldwasser
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Hash functions Open addressing
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Hashing CS2110 Spring 2018.
Hash Table.
Data Structures Maps and Hash.
Hash Tables 11/22/2018 3:15 AM Hash Tables  1 2  3  4
Hashing CS2110.
Dictionaries 11/23/2018 5:34 PM Hash Tables   Hash Tables.
Dictionaries and Hash Tables
Copyright © Aiman Hanna All rights reserved
Hash Tables   Maps Dictionaries 12/7/2018 5:58 AM Hash Tables  
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Dictionaries and Hash Tables
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
Hash Tables Computer Science and Engineering
Dictionaries 4/5/2019 1:49 AM Hash Tables  
Data Structures – Week #7
Collision Handling Collisions occur when different elements are mapped to the same cell.
CS210- Lecture 17 July 12, 2005 Agenda Collision Handling
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
Data Structures and Algorithm Analysis Hashing
Dictionaries and Hash Tables
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hash Maps Rem Collier Room A1.02 School of Computer Science and Informatics University College Dublin, Ireland

Hash Tables An array based approach to implementing the Map ADT. Key Features: An array of size N, A hash function, denoted h(k), which maps keys to integer values, known as hash values, in the range 0 to N-1: Algorithm hashFunction(k): return k % N A collision handling strategy, which deals with the case where two keys have the same hash value Two main types of collision handling strategy: separate chaining and open addressing.

Separate Chaining Basic strategy in which entries with the same hash value are “chained” together. Approach: Use an array of Lists. Collisions are resolved by adding the new entry to the end of the associated List. Example Problem: Create a hash table of size 13 that uses the following hash function: h(x) = x mod 13 Insert entries with the following keys: 18, 44, 41, 22, 59, 32, 31, 73

Pseudo Code Algorithm put(k, v): h  hashFunction(k) temp  null entry  new Entry(k,v) if (A[h] = null) then A[h]  new List() A[h].insertLast(entry) else P  find(A[h], k) if (P = null) then A[h].insertLast(entry) else e  A[h].replace(P, entry) temp  e.value() size  size + 1 return temp Algorithm remove(k): h  hashFunction(k) if (A[h] = null) then return null P  find(A[h], k) if (P = null) then return null e  A[h].remove(P) size  size - 1 return e.value() Algorithm get(k): h  hashFunction(k) if (A[h] = null) then return null P  find(A[h], k) if (P = null) then return null return P.element().value()

Performance The performance of get(), put() and remove() depends on the number of collisions: Best Case: no collisions occur means O(1) running-time! Worst Case: every key has the same hash value means O(n) running-time! Normally, Hash Table performance is measured as expected running- time (see table). In practice, we try to achieve this by choosing a good hash function… OperationTime size()O(1) isEmpty()O(1) get(k)O(1) put(k,v)O(1) remove(k)O(1) keys()O(1) values()O(1) entries()O(1)

Hash Functions Hash Functions convert keys to integer hash values in the range 0 to N-1. Any data type / object can be a key (e.g. strings, doubles, bank accounts, …) To handle this Hash Functions combine two basic maps: Hash Code Map: Assigns an integer value to each key Compression Map: Converts the integer to an integer in the range 0 to N-1. Previous example used a compression map known as the division method (% N): N should be prime Need to be wary of patterns in the hash codes of the form: pN + q

The MAD Method A better compression map is the Multiply Add and Divide (MAD) method. This method takes the hash code: Multiplies it by a constant value, known as the scale factor, Adds a second constant value, known as the shift, and then Returns the remainder when this value is divided by N. For a given hash code, i, this method takes the form: (ai + b) mod N Constraint: a % N should not equal 0…

Hash Code Maps Primitive Data Types Integer Cast: re-interpret the bits as an integer value e.g. for a byte, k, use (int) k Component Sum: break the bits into integer size blocks, cast each block as an integer, and sum the values: e.g. for a long, k, (int) (k >> 32) + (int) k Polynomial Sum: same as component sum, but multiply each term by a constant polynomial coefficient: e.g. for a sequence S= c 1 c 2..c n, use Objects: Use the objects memory address (or adapt one of the above) Has proven to be a simple but effective general solution

Hash Code Maps & Strings String = sequence of characters Character encodings are integer numbers (typically 8 / 16 bit) Naïve solution: Use component sum h(“dog”) = (int) ‘d’ + (int) ‘o’ + (int) ‘g’ = = 314 h(“god”) = (int) ‘g’ + (int) ‘o’ + (int) ‘d’ = = 314 !?!?! Better solution: Use polynomial sum (p=3): h(“god”) = * *9 = 1,336 h(“dog”) = * *9 = 1,360 Experimental Note: For 50,000 English words, a value of p = 33, results in less than 7 collisions!!!

Separate Chaining Separate Chaining: Use an array of Lists. Collisions result in new entries being added to the end of the corresponding list. In theory, offers infinite capacity. Drawbacks: Uses an auxiliary data structure (List). In practice, the number of collisions increases as the number of entries increases. Open Addressing: Do not require an auxiliary data structure Have finite capacity but support rehashing

Linear Probing Strategy Create an array of entries. Use the hash value h(k) as an index into this array. A collision occurs when h(k) is occupied. Resolve collision by placing the entry in the next (circularly) available array position. This is done by “probing” consecutive positions in the array (e.g. h(k) + 1, h(k) + 2, …) Lets explore how this works through the following example: Assume a hash table of size 13 that uses linear probing, together with the following hash function: h(x) = x mod 13 Insert entries with the following keys: 18, 44, 41, 22, 59, 32, 31, 73

Algorithm get(k): i  hashFunction(k) p  0 repeat c  A[i] if c = null return null else if c.key () = k return c.value() else i  (i + 1) mod N p  p + 1 until p = N return null Retrieval with Linear Probing Consider a hash table, A, that uses linear probing get(k) We start at cell h(k) We probe consecutive locations until one of the following occurs An item with key k is found, or An empty cell is found, or N cells have been unsuccessfully probed

Removal of Entries One issue that we still need to resolve is how to remove entries from a linear probing hash table implementation: Search is the key operation. Current search algorithm terminates when either N entries have been checked, or a “gap” is found. Problem: If we simply remove entries, they will be replaced by “gaps”. These “gaps” would cause the search algorithm to stop. Solution: special token (object) called AVAILABLE. Removed entries are replaced by the AVAILABLE token. A modified search algorithm could check whether each probe detects a valid entry, or the token.

Updates with Linear Probing Algorithm put(k, v): h  hashFunction(k) p  0 available  -1 while (p < N) do e  A[h] if (e = null) then if (available = -1) then A[h]  new Entry(k, v) size  size + 1 else A[available]  new Entry(k, v) size  size + 1 return null if (e = AVAILABLE) and (available == -1) then available  h else if (e.key() = k) then temp  e.value() A[h] = new Entry(k, v) return temp h  (h + 1) mod N p  p + 1 return null

Updates with Linear Probing Algorithm remove(k): h  hashFunction(k) p  0 while (p < N) do e  A[h] if e = null then return null if e.key() = k then temp  e.value() A[h] = AVAILABLE size  size – 1 return temp h  (h + 1) mod N p  p + 1 return null

Double Hashing Idea: Use a secondary hash function d(k): Probing is not linear, but based on the following equation: (i + jd(k)) mod Nfor j = 0, 1, …, N - 1 Restrictions: The secondary hash function d(k) cannot have zero values The table size N must be a prime to allow probing of all the cells Common choice of d(k): d(k) = q - (k mod q)whereq < N and q is a prime The possible values for d(k) are 1, 2, …, q Example: Implementation: N = 13, h(k) = k mod 13, d(k) = 7 - k mod 7 Insert keys: 18, 41, 22, 44, 59, 32, 31, 73

Performance of Hashing In the worst case, searches, insertions and removals on a hash table take O(n) time This occurs when all the keys inserted into the dictionary collide The load factor a = n/N also affects the performance of a hash table Assuming hash values are like random numbers, the expected number of probes for (open addressing) insertion is: 1 / (1 - a) The expected running time of all the dictionary ADT operations in a hash table is O(1) In practice, hashing is very fast provided the load factor is not close to 100% Java HashMap’s rehash at 75% Applications of hash tables: small databases compilers browser caches

Rehashing Rehashing is the process of expanding the capacity of a hash table. It’s a lot like an extendable array (I.e. Vector) Rehashing is performed when the load factor moves above a certain threshold. We rehash by: Creating a new array (> 2N in size) Specifying a new compression map (e.g. update the division method to work with the new size) Inserting each entry into the new array. Given insertion is O(1), rehashing is an O(N) operation: We have to check each index in the old array…