Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.

Slides:



Advertisements
Similar presentations
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Log Files. O(n) Data Structure Exercises 16.1.
Hashing Techniques.
Hashing CS 3358 Data Structures.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 12 / 2008 Instructor: Michael Eckmann.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
IT 60101: Lecture #151 Foundation of Computing Systems Lecture 15 Searching Algorithms.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
Hashing Hashing is another method for sorting and searching data.
Hash Tables - Motivation
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
TOPIC 5 ASSIGNMENT SORTING, HASH TABLES & LINKED LISTS Yerusha Nuh & Ivan Yu.
Prof. Amr Goneid, AUC1 CSCI 210 Data Structures and Algorithms Prof. Amr Goneid AUC Part 5. Dictionaries(2): Hash Tables.
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Advanced Associative Structures
Data Structures and Algorithms
Data Structures and Algorithms
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing.
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining Open addressing: linear probing, quadratic probing, double hashing Rehashing Load factor

Tree Structures Binary Search Trees AVL Trees

Tree Structures insert / delete / find worst average Binary Search Trees Nlog N AVL Trees log N

Goal Develop a structure that will allow user to insert/delete/find records in constant average time –structure will be a table (relatively small) –table completely contained in memory –implemented by an array –capitalizes on ability to access any element of the array in constant time

Hash Function Determines position of key in the array. Assume table (array) size is N Function f(x) maps any key x to an int between 0 and N−1 For example, assume that N=15, that key x is a non-negative integer between 0 and MAX_INT, and hash function f(x) = x % 15. (Hash functions for strings aggregate the character values --- see Weiss §5.2.)

Hash Function Let f(x) = x % 15. Then, if x = f(x) = Storing the keys in the array is straightforward: _ _ 47 _ _ _ _ _ _ _ Thus, delete and find can be done in O(1), and also insert, except…

Hash Function What happens when you try to insert: x = 65 ? x = 65 f(x) = _ _ 47 _ _ _ _ _ _ _ 65(?) This is called a collision.

Handling Collisions Separate Chaining Open Addressing –Linear Probing –Quadratic Probing –Double Hashing

Handling Collisions Separate Chaining

Let each array element be the head of a chain        35 Where would you store: 29, 16, 14, 99, 127 ?

Separate Chaining Let each array element be the head of a chain: Where would you store: 29, 16, 14, 99, 127 ?             New keys go at the front of the relevant chain.

Separate Chaining: Disadvantages Parts of the array might never be used. As chains get longer, search time increases to O(n) in the worst case. Constructing new chain nodes is relatively expensive (still constant time, but the constant is high). Is there a way to use the “unused” space in the array instead of using chains to make more space?

Handling Collisions Linear Probing

Let key x be stored in element f(x)=t of the array (?) What do you do in case of a collision? If the hash table is not full, attempt to store key in the next array element (in this case (t+1)%N, (t+2)%N, (t+3)%N …) until you find an empty slot.

Linear Probing Where do you store 65 ?    attempts Where would you store: 29?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, …  attempts Where would you store: 16?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, …  Where would you store: 14?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, …   attempts Where would you store: 99?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, …     attempts Where would you store: 127 ?

Linear Probing If the hash table is not full, attempt to store key in array elements (t+1)%N, (t+2)%N, …  attempts

Linear Probing Eliminates need for separate data structures (chains), and the cost of constructing nodes. Leads to problem of clustering. Elements tend to cluster in dense intervals in the array. Search efficiency problem remains. Deletion becomes trickier….     

Deletion problem H=KEY MOD 10 Insert 47, 57, 68, 18, 67 Find 68 Find 10 Delete 47 Find 57

Deletion Problem -- SOLUTION “Lazy” deletion Each cell is in one of 3 possible states: –active –empty –deleted For Find or Delete –only stop search when EMPTY state detected (not DELETED)

Deletion-Aware Algorithms Insert –Cell empty or deletedinsert at H, cell = active –Cell activeH = (H + 1) mod TS Find –cell emptyNOT found –cell deletedH = (H + 1) mod TS –cell activeif key == key in cell -> FOUND else H = (H + 1) mod TS Delete –cell active; key != key in cellH = (H + 1) mod TS –cell active; key == key in cellDELETE; cell=deleted –cell deletedH = (H + 1) mod TS –cell emptyNOT found

Handling Collisions Quadratic Probing

Let key x be stored in element f(x)=t of the array (?) What do you do in case of a collision? If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N, (t+3 2 )%N … until you find an empty slot.

Quadratic Probing Where do you store 65 ? f(65)=t=     t t+1 t+4 t+9 attempts Where would you store: 29?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N …  t+1 t attempts Where would you store: 16?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N …  t attempts Where would you store: 14?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N …    t+1 t+4 t attempts Where would you store: 99?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N …    t t+1 t+4 attempts Where would you store: 127 ?

Quadratic Probing If the hash table is not full, attempt to store key in array elements (t+1 2 )%N, (t+2 2 )%N … Where would you store: 127 ?  t attempts

Quadratic Probing Tends to distribute keys better than linear probing Alleviates problem of clustering Runs the risk of an infinite loop on insertion, unless precautions are taken. E.g., consider inserting the key 16 into a table of size 16, with positions 0, 1, 4 and 9 already occupied. Therefore, table size should be prime.

Handling Collisions Double Hashing

Use a hash function for the decrement value –Hash(key, i) = H 1 (key) – (H 2 (key) * i) Now the decrement is a function of the key –The slots visited by the hash function will vary even if the initial slot was the same –Avoids clustering Theoretically interesting, but in practice slower than quadratic probing, because of the need to evaluate a second hash function.

Double Hashing Let key x be stored in element f(x)=t of the array Array: (?) What do you do in case of a collision? Define a second hash function f 2 (x)=d. Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)%N … until you find an empty slot.

Double Hashing Typical second hash function f 2 (x)=R − ( x % R ) where R is a prime number, R < N

Double Hashing Where do you store 65 ? f(65)=t=5 Let f 2 (x)= 11 − (x % 11) f 2 (65)=d=1 Note: R=11, N=15 Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)%N … Array:    t t+1 t+2 attempts

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (29)=d=4 Where would you store: 29? Array:  t attempt

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (16)=d=6 Where would you store: 16? Array:  t attempt Where would you store: 14?

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (14)=d=8 Array:    t+16 t+8 t attempts Where would you store: 99?

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (99)=d=11 Array:     t+22 t+11 t t+33 attempts Where would you store: 127 ?

Double Hashing If the hash table is not full, attempt to store key in array elements (t+d)%N, (t+2d)%N … Let f 2 (x)= 11 − (x % 11) f 2 (127)=d=5 Array:    t+10 t t+5 attempts Infinite loop!

REHASHING When the load factor exceeds a threshold, double the table size (smallest prime > 2 * old table size). Rehash each record in the old table into the new table. Expensive: O(N) work done in copying. However, if the threshold is large (e.g., ½), then we need to rehash only once per O(N) insertions, so the cost is “amortized” constant-time.

Factors affecting efficiency –Choice of hash function –Collision resolution strategy –Load Factor Hashing offers excellent performance for insertion and retrieval of data.

Comparison of Hash Table & BST BSTHashTable Average SpeedO(log 2 N)O(1) Find Min/MaxYesNo Items in a rangeYesNo Sorted InputVery BadNo problems Use HashTable if there is any suspicion of SORTED input & NO ordering information is required.