Hashing Techniques.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Carrano and Henry, © 2013.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Nov 12, 2009IAT 8001 Hash Table Bucket Sort. Nov 12, 2009IAT 8002  An array in which items are not stored consecutively - their place of storage is calculated.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables and Associative Containers CS-212 Dick Steflik.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Comp 335 File Structures Hashing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
CHAPTER 8 SEARCHING CSEB324 DATA STRUCTURES & ALGORITHM.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Dictionaries and Their Implementations Chapter 18 Data Structures and Problem Solving with C++: Walls and Mirrors, Frank Carrano, © 2012.
Hash Tables ADT Data Dictionary, with two operations – Insert an item, – Search for (and retrieve) an item How should we implement a data dictionary? –
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing CSE 2011 Winter July 2018.
Advanced Associative Structures
Hash Table.
Hash Table.
Dictionaries and Their Implementations
CSCE 3110 Data Structures & Algorithm Analysis
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
EE 312 Software Design and Implementation I
Data Structures – Week #7
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Hashing Techniques

Hashing Techniques Several ADTs for storing and retrieving data were discussed – Linear Lists, Binary Trees, BSTs, AVL Trees. An important operation Findkey() has a time complexity: O(n) in Lists, O(n) in Binary Trees, O(log n) in BSTs, O(log n) in AVL trees. Can Findkey() be implemented with a time complexity better than O(log n)? – With Hash Tables it is possible to implement Findkey() with O(1) time complexity.

Hashing Techniques A hash table for a given key type consists of Hash function Array (called table) of size N Hash function maps a key to a location in the table. Hash functions can be of two types: Perfect hash functions Imperfect hash functions

Hashing Techniques Perfect hash functions map a key to the ‘exact address’ in the hash table. The key can be found at the address without additional search. Perfect hash functions are hard to determine and compute. exact address  f (key) Imperfect hash functions map a key to a ‘home address’ in the hash table. The key may not be at the address and finding it may require additional search. Imperfect hash functions are easy to determine and compute. home address  H(key)

Hashing Techniques In the hash table the data elements are scattered randomly throughout the hash table – there is no first, root or last element. Hash tables are suitable for implementing sets but not linear or hierarchical structures.

Initially Hash Table is empty Hash Table: Example Table Size, N = 7. Hash Function: H(key) = key mod 7. 1 2 3 6 empty 4 5 Initially Hash Table is empty …

Hash Table: Example Insertions: keys 374, 1091, 911are inserted. H(374) = 374 mod 7 = 3 H(1091) = 1091 mod 7 = 6 H (911) = 911 mod 7 = 1 1 2 3 6 empty 1091 911 4 5 374

Hash Table: Example Retrievals: keys 374 and 740 are retrieved H(374) = 374 mod 7 = 3 Table address 3 contains the key. H(740) = 740 mod 7 = 5 Table address 5 is empty. 1 2 3 6 empty 1091 911 4 5 374

Hash Table: Example Insertions: key 227 is to be inserted. H(227) = 227 mod 7 = 3 Hash functions tells us store it in home address 3 but there is already a key stored in home address 3. This called a collision. Since table is not empty the collided key 227 must be stored somewhere in the table. Rehashing or Collision Resolution Strategies

Hash Functions Performance of a hash table depends on its hash function. For a data set with a certain type of keys hash function formulated should minimize the number of collisions. The following slides deal with techniques of formulating various types of hash functions.

(i) Digit Selection Keys may be student id. numbers (e.g. 427102345) or social security numbers (e.g. 981-101-0002) Hash function based on digit selection selects a subset of digits from the key e.g. last 3 digits may be selected. 345  H(427102345)

(i) Digit Selection Which digits to select? .. The ones that are random in the data. First 3 digits from the left, in student ids are the same for many students… not a good choice… last 3 digits are random. How many digits to select? Depends on the table size we want. Select 3 digits – table size 1000, 4 digits – table size 10000.

(i) Digit Selection How to select the digits? Last 3 digits from the left can be selected using mod 1000. The middle 3 digits can be selected through: (key / 1000) mod 1000

(ii) Division Hash function based on division is of the following form: H(key) = key mod m 0 <= H(key) < m. These functions lead to a number of collisions for certain values of m. E.g. if m = 25 all keys divisible by 5 map to positions 0, 5, 10, 15 and 20.

(ii) Division Ideally m must have no common factors with the keys … an easy way to ensure this is to chose m a prime number.

(iii) Multiplication A hash function based on multiplication first squares the key and then chooses a certain subset of the digits of the product. Suppose key = 54321. Hash function finds the square: (54321)2 = 2950771041 and then middle 3 digits are chosen i.e. 077.

(iv) Folding A hash function based on folding adds the digits of the key. Suppose key = d1d2d3d4d5. The key can be folded at single digits, as follows: H(key) = d1 + d2 + d3 + d4 + d5. Table size will be 45 since 0 <= H(key) <= 45.

(iv) Folding If a larger table size is needed, the key can be folded at 2 digits, as follows: H(key) = 0d1 + d2d3 + d4d5’ Table size will be 207 since 0 <= H(key) <= 207. Folding can be used with other operations e.g. H(key) = fold (key) mod m

(v) Character Valued Keys Keys can be characters or strings e.g. key = xyz. In such keys the characters are replaced by their binary ASCII codes and the binary number is converted to decimal integer and treated as any other integer key.

Collision Resolution Strategies Also known as rehashing techniques. Strategies determine where to store a collided key in the event of a collision. Strategies can be grouped into the following three categories: Open address methods External or separate chaining Coalesced chaining

Open Address Methods The strategy finds an empty position in the table after the collision and stores the collided key at the empty position. Linear rehashing, quadratic rehashing, random rehashing and double rehashing fall in this category.

Linear Rehashing Also called linear probing. Each table location inspected is referred to as a “probe” Linear rehashing starts a (circular) sequential search through the table until an empty location is found or all the table has been examined. Rehash address (i.e. address of the next location to be inspected) is computed by rehash address = (p + c) mod TableSize p = previous collision address; c = a constant (we take c = 1) If an empty location is found at the rehash address the collided key is stored at the location.

Linear Rehashing Consider the hash table in the state shown. Attempt to insert 227 leads to a collision with 374 … H(227) = 227 mod 7 = 3. According to linear rehashing the collided key 227 will be stored in the circularly next empty location, which is at table address 4. (See the next slide) 1 2 3 6 empty 1091 911 4 5 374

Linear Rehashing 1 2 3 6 4 5 1 2 3 6 4 5 1 2 3 6 4 5 After Insert 227 H(421) = 421 mod 7 = 1 Insert 624 H(624) = 624 mod 7 = 1 Probes Probes Probes 1 2 3 6 empty 1091 911 4 5 374 227 1 2 1 2 3 6 empty 421 1091 911 4 5 374 227 1 2 1 2 3 6 624 421 1091 911 empty 4 5 374 227 1 2 5 One probe is an access into the hash-table. Number of probes required to insert a key indicates the cost of inserting the key.

Linear Rehashing Suppose key 624 is to be retrieved. H(624) = 1. After rehashing and 5 probes the key is found at address 5. Suppose key 631 is to be retrieved. H(631) = 1. After rehashing and 7 probes the ‘empty’ location 0 is found – if the key was present it would have been at 0. Therefore, retrieval algorithm returns ‘not found’ 1 2 3 6 624 421 1091 911 empty 4 5 374 227

Linear Rehashing Suppose one of the keys 421, 374 or 227 are removed – e.g. 374 is removed – address 3 is marked ‘empty’. Now attempt to find key 624 will incorrectly result in a failure – search will stop at address 3. This problem can be solved by having three different status for a table location. 1 2 3 6 624 421 1091 911 empty 4 5 227

Linear Rehashing status = {empty, occupied, deleted} During search for a key the search does not stop at locations with status deleted or occupied. So the key 624 is found at address 5. 624 421 1091 911 empty 1 2 3 6 4 5 deleted 227

Linear Rehashing The performance of a hash table employing linear rehashing as collision resolution strategy is measured in terms of the average number of probes required to insert a set of keys.

External Chaining Also called separate chaining According to this strategy the collided keys are stored in a list associated with the home address. Hash table is an array of lists. Insertions and deletions are easy to implement.

External Chaining 1 2 3 6 4 5 1 2 3 6 4 5 Insertions Key = 374 374 mod 7 = 3 Key = 1091 1091 mod 7 = 6 Key = 911 911 mod 7 = 1 Collisions Key = 227 227 mod 7 = 3 Key = 421 421 mod 7 = 1 Key = 624 624 mod 7 = 1 1 2 3 6 4 5 null 911 374 1091 624 227 421 1 2 3 6 4 5 null 911 374 1091

External Chaining Advantages (compared to Linear Rehashing) Deletions are easily possible. Number of elements can be greater than the table size. Retrieval operations are efficient since hash function is computed only once during retrieval.

Coalesced Chaining Similar to external chaining – collided elements are stored in lists. Similar to linear rehashing – lists are stored within the hash table. Hash table is divided into two regions: an address region – stores normal keys and a cellar – stores collided keys. epla – empty position with largest address.

Coalesced Chaining 1 2 3 4 5 6 1 2 3 4 5 6 H(key) = key mod 5 1 2 3 6 empty epla 4 5 Address Region Cellar Insertions Key = 27 27 mod 5 = 2 Key = 29 29 mod 5 = 4 1 2 3 6 empty 27 epla 4 5 29

Coalesced Chaining 1 2 3 6 4 5 1 2 3 6 4 5 epla 27 32 empty 29 1 2 3 6 epla 27 32 empty 4 5 29 Insertion with Collision Key = 34 34 mod 5 = 4 Stored at epla & linked to link list at address 4 Insertion with Collision Key = 32 32 mod 5 = 2 Stored at epla & linked to link list at address 2 1 2 3 6 34 27 32 empty 4 5 epla 29

Coalesced Chaining 1 2 3 6 4 5 1 2 3 6 4 5 Insertions with Collision Key = 37 37 mod 5 = 2 Stored at epla & linked to link list at address 2 Insertions with Collision Key = 47 47 mod 5 = 2 Stored at epla & linked to link list at address 2 1 2 3 6 34 27 32 epla empty 4 5 37 29 1 2 3 6 34 27 32 47 epla 4 5 37 29