Design & Analysis of Algorithm Hashing (Contd.)

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables.
Hashing CS 3358 Data Structures.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CSE 326 Hashing Richard Anderson (instead of Martin Tompa)
Hashing General idea: Get a large array
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
D ESIGN & A NALYSIS OF A LGORITHM 02 – H ASHING (C ONTD.) Informatics Department Parahyangan Catholic University.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Fundamental Structures of Computer Science II
CE 221 Data Structures and Algorithms
Chapter 27 Hashing Jung Soo (Sue) Lim Cal State LA.
Design & Analysis of Algorithm Hashing
Data Structures Using C++ 2E
Hashing Jeff Chastine.
Hash table CSC317 We have elements with key and satellite data
Hashing CSE 2011 Winter July 2018.
Hashing - resolving collisions
Hashing Alexandra Stefan.
Week 8 - Wednesday CS221.
Hashing - Hash Maps and Hash Functions
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash functions Open addressing
Quadratic probing Double hashing Removal and open addressing Chaining
Advanced Associative Structures
Hash Table.
Hash Table.
Hash In-Class Quiz.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Data Structures and Algorithms
Chapter 21 Hashing: Implementing Dictionaries and Sets
Collision Resolution Neil Tang 02/18/2010
Hashing.
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Data Structures and Algorithms
Hash Tables – 2 Comp 122, Spring 2004.
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Collision Resolution Neil Tang 02/21/2008
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing.
Data Structures and Algorithm Analysis Hashing
DATA STRUCTURES-COLLISION TECHNIQUES
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
EE 312 Software Design and Implementation I
Hash Tables – 2 1.
Collision Resolution: Open Addressing Extendible Hashing
Lecture-Hashing.
Presentation transcript:

Design & Analysis of Algorithm Hashing (Contd.) Informatics Department Parahyangan Catholic University

Analogy  Let's say that you have a drawer full of socks, 20 red socks (all identical) and 12 blue socks, and it is dark in the room. How many socks should you grab, to assure that you have at least one matching pair ? How about 20 red socks, 12 blue socks, and 8 green socks ? How about unlimited # of red, blue, green, yellow, and purple socks ?

Analogy In a city of 2 million people, no one has more than 1.5 million hairs on his/her head. Can you show that at least two people in the city have exactly the same number of hairs on their heads?

Pigeonhole Principle In mathematics, the pigeonhole principle states that if n pigeons are put into m pigeonholes with n > m, then at least one pigeonhole must contain more than one pigeon. -- wikipedia n = the range of possible keys m = the size of the hash table

Collision When possible key range > table size, two distinct keys k1 and k2 may be mapped to the same index h(k1) = h(k2) This condition is known as collision  resolution strategy is required yellow orange red green blue black white White with red strip -> hashed to “white” White with black strip -> hashed to “white” White with blue strip -> hashed to “white” ?

Collision Handling 3 strategies Open addressing Linear probing Quadratic probing Double Hashing Separate chaining Coalesced hashing

Collision Handling open addressing In open addressing, a colliding entry will be placed in a new slot in the same table J (74) K (75) L (76) M (77) N (78) John Smith John Smith / 521-8976 Jane Smith / 521-1234 Jane Smith Kenny Baker / 418-4165 Kenny Baker Lisa Smith / 521-5030 Lisa Smith ? Kayla Newman

Collision Handling Separate Chaining In separate chaining, colliding entries are stored in linked list in different area J (74) K (75) L (76) M (77) John Smith John Smith 521-8976 Jane Smith 521-1234 Jane Smith Kenny Baker 418-4165 Kayla Newman 418-4222 Kenny Baker Lisa Smith 521-5030 Lisa Smith Kayla Newman

Collision Handling Coalesced Hashing Coalesced hashing combines open addressing and separate chaining. It uses linked list like separate chaining, but stored in empty slot in the same table J (74) K (75) L (76) M (77) N (78) John Smith John Smith / 521-8976 Jane Smith / 521-1234 Jane Smith Lisa Smith / 521-5030 Lisa Smith Kenny Baker / 418-4165 Kenny Baker Kayla Newman / 418-4222 Kayla Newman

Performance Analysis What is the advantage and disadvantage of the three collision handling methods ? How to compare them ? What measurement can we use ? Load Factor : what is the average number of elements stored in a slot ? Probe Number : how many slots we need to examine before finding the empty slot ?

Example :: open addressing J (74) K (75) L (76) M (77) N (78) John Smith John Smith / 521-8976 Jane Smith / 521-1234 Jane Smith Kenny Baker / 418-4165 Kenny Baker Lisa Smith / 521-5030 Lisa Smith Kayla Newman / 418-4222 Kayla Newman Load factor = 1 (because every slot only has 1 element) Probe number for “Lisa Smith” = ? Probe number for “Kayla Newman” = ?

Example :: separate chaining J (74) K (75) L (76) M (77) John Smith John Smith 521-8976 Jane Smith 521-1234 Jane Smith Kenny Baker 418-4165 Kayla Newman 418-4222 Kenny Baker Lisa Smith 521-5030 Lisa Smith Kayla Newman What if we insert new element in the beginning of the list ? Load factor = #of probe (because collided elements are stored in linked list) Probe number for “Jane Smith” = ? Probe number for “Kayla Newman” = ?

Example :: coalesced hashing J (74) K (75) L (76) M (77) N (78) John Smith John Smith / 521-8976 Jane Smith / 521-1234 Jane Smith Lisa Smith / 521-5030 Lisa Smith Kenny Baker / 418-4165 Kenny Baker Load factor = 1 (because every slot only has 1 element) What is the advantage of this method ? How many slot(s) to check to insert Kenny Baker ? How many slot(s) to check to search Kenny Baker ?

Open Addressing In open addressing, a colliding entry will be placed in a new slot in the same table (using hash function h(k,i), where i is the probe number) There are generally 3 techniques to decide the next slot to be filled : linear probing quadratic probing double hashing The sequence of h(k,0), h(k,1), h(k,2), … is called probe sequence

Open Addressing Linear Probing Define where h`(k) is the initial hash function, and i is the probe number for key k Example: m=13 k = 5  h’(k) = 5 k = 18  h’(k) = 5 (collision) h(k,1) = (5+1) mod 13 = 6 k = 19  h’(k) = 6 (collision) h(k,1) = (6+1) mod 13 = 7 k = 31  h’(k) = 5 (collision) h(k,1) = (5+1) mod 13 = 6 (collision) h(k,2) = (5+2) mod 13 = 7 (collision) h(k,3) = (5+3) mod 13 = 8

Open Addressing Linear Probing Suffers from primary clustering Clusters arises since an empty slot preceded by i non-empty slots gets filled next with probability (i+1)/m There are only m distinct probe sequence idx Data 1 A 2 B 3 C 4 D 5 E 6 F 7 8 9 … Every k which h(k) between 1 and 6 will be placed in this slot

Open Addressing Quadratic Probing Define where h`(k) is the initial hash function, i is the probe number for key k, and c1 & c2 are some constant

Open Addressing Quadratic Probing h(k,1) is not exactly next to h’(k), thus avoid primary clustering problem Example: m=13, c1=2, c2=3 k = 5  h’(k) = 5 k = 18  h’(k) = 5 (collision) h(k,1) = (5+2*1+3*12) mod 13 = 10 k = 19  h’(k) = 6 k = 31  h’(k) = 5 (collision) h(k,1) = (5+2*1+3*12) mod 13 = 10 (collision) h(k,2) = (5+2*2+3*22) mod 13 = 8 k = 32  h’(k) = 6(collision) h(k,1) = (6+2*1+3*12) mod 13 = 11 However, keys with same h’(k) are re-hashed to same place. This leads to a milder form of clustering, called secondary clustering. (again, there are only m distinct probe sequence)

Open Addressing Quadratic Probing Observe these 2 cases where h’(k)=5 and h’(k)=6 (m=13, c1=2 and c2=3) Note that only slot 0, 5, 6, 8, 9,10, and 12 can be filled by keys with h’(k)=5 Only slot 0, 1, 6, 7, 9, 10, and 11 can be filled by keys with h’(k)=6 h'(k) = 5 h'(k) = 6 probe# h(k,i) 1 10 11 2 8 9 3 12 4 5 6 7 13 This suggest that some slots might get filled with higher probability than the others.

Open Addressing Quadratic Probing The choice of m, c1, and c2 are important for m = 2n , a good choice is c1 = c2 = 0.5 For prime m > 2, most choice of c1 and c2 will make h(k, i) distinct for i in [0, (M-1)/2)]. Example: m = 24 = 16, c1 = c2 = 0.5, h’(k) = 0 Probe # h(k,i) 8 4 1 9 13 2 3 10 7 6 11 12 14 5 15

Open Addressing Double Hashing Define where h1(k) is the initial hash function, i is the probe number for key k, and h2(k) is a different hash function than h1(k) Two different keys a and b that initially hashed to the same location (h1(a) = h1(b)) will have a different probe sequence, since h2(a) ≠ h2(b)

Open Addressing Double Hashing h2(k) must be relative prime to m Example : Let m be the power of 2 and h2(k) always returns an odd number Let m be prime and h2(k) always returns positive integers less than m There are Θ(m2) distinct probe sequence

Basic Hash Table Operation INSERT(key, value) we have discussed this a lot value SEARCH(key) similar to INSERT DELETE(key) do not delete the value, mark it “deleted” instead (why?) When to stop searching ? In separate chaining, all three operations are merely inserting, searching, and deleting in appropriate linked list

Insertion in open addressing INSERT(key, value) // returns true if key is successfully inserted // returns false otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty or marked as deleted) table[idx] = (key,value) return true else i = i+1 return false

SEARCHING in open addressing SEARCH(key) // returns associated value if key is found // returns null otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty) return null //reached an empty slot, //so key is must not be in the hash table else if(table[idx] not marked as deleted AND table[idx].key == key)) return table[idx].value //key found else i = i+1 //try the next slot return null //tried all m possible slots and key not found

DELETION in open addressing DELETE(key) // returns associated value if key is found // and successfully deleted. returns null otherwise i = 0 while(i < m) idx = HASHFUNCTION(key, i) if(table[idx] is empty) return null //reached an empty slot, //so key is must not be in the hash table else if((table[idx] not marked as deleted AND table[idx].key == key)) temp = table[idx].value mark table[idx] as deleted return temp else i = i+1 //try the next slot return null //tried all m possible slots and key not found