CE 221 Data Structures and Algorithms

Slides:



Advertisements
Similar presentations
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Advertisements

Hash Tables.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing CS 3358 Data Structures.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing. Hashing as a Data Structure Performs operations in O(c) –Insert –Delete –Find Is not suitable for –FindMin –FindMax –Sort or output as sorted.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
1.  We’ll discuss the hash table ADT which supports only a subset of the operations allowed by binary search trees.  The implementation of hash tables.
DATA STRUCTURES AND ALGORITHMS Lecture Notes 7 Prepared by İnanç TAHRALI.
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Hash Tables CSIT 402 Data Structures II. Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
Fundamental Structures of Computer Science II
Hashing (part 2) CSE 2011 Winter March 2018.
Data Structures Using C++ 2E
Hash table CSC317 We have elements with key and satellite data
Hashing Problem: store and retrieving an item using its key (for example, ID number, name) Linked List takes O(N) time Binary Search Tree take O(logN)
CSCI 210 Data Structures and Algorithms
Hashing Alexandra Stefan.
Hash Tables (Chapter 13) Part 2.
Handling Collisions Open Addressing SNSCT-CSE/16IT201-DS.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)
Instructor: Lilian de Greef Quarter: Summer 2017
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Hash Tables.
Data Structures and Algorithms
Chapter 21 Hashing: Implementing Dictionaries and Sets
Collision Resolution Neil Tang 02/18/2010
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
CSE373: Data Structures & Algorithms Lecture 14: Hash Collisions
Data Structures and Algorithms
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Dictionaries and Hash Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
EE 312 Software Design and Implementation I
Data Structure and Algorithm Analysis 05: Hashing
Collision Resolution Neil Tang 02/21/2008
Collision Handling Collisions occur when different elements are mapped to the same cell.
Hashing.
Data Structures and Algorithm Analysis Hashing
DATA STRUCTURES-COLLISION TECHNIQUES
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Collision Resolution: Open Addressing Extendible Hashing
Lecture-Hashing.
CSE 373: Data Structures and Algorithms
Presentation transcript:

CE 221 Data Structures and Algorithms Chapter 5: Hashing Text: Read Weiss, §5.1-5.5 Izmir University of Economics

Izmir University of Economics Introduction Search Tree ADT was disccussed. Now, Hash Table ADT Hashing is a technique used for performing insertions and deletions in constant average time. Thus, findMin, findMax and printAll are not supported. Izmir University of Economics

Izmir University of Economics Tree Structures find, insert, delete worst case average case BST N log N AVL Izmir University of Economics

Izmir University of Economics Goal Develop a structure that will allow users to insert / delete / find records in constant average time structure will be a table (relatively small) table completely contained in memory implemented by an array capitalizes on ability to access any element of the array in constant time Izmir University of Economics

Izmir University of Economics Hash Function Determines position of a key in the array. Assume table (array) size is N (TableSize). Hash function hash(x) maps any key x to an int between 0 and N−1 For example, assume that N=15, that key x is a non-negative integer between 0 and MAX_INT, and hash function hash(x) = x % 15. Izmir University of Economics

Choosing the Hash Functions (1) If keys are integers key % TableSize ex:all keys=10*i,TableSize=10 So, It’s a good idea to make TableSize a prime If keys are strings, then ASCII codes of chars % TableSize, when TableSize=10,007 and keys are at most 8 chars in length (127*8=1,016) The base 26+1=27 representation of the first 3 letters of the key (key[0] +27* key[1] + 729* key[2] ) % TableSize (263=17,576 but only 2,851 combinations in English) Izmir University of Economics

Choosing the Hash Functions (2) Another hash function involving all characters compute this by Horner’s Rule Izmir University of Economics

Izmir University of Economics Hash Function Let hash(x) = x % 15. Then, if x = 25 129 35 2501 47 36 f(x) = 10 9 5 11 2 6 Storing the keys in the array is straightforward: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _ Thus, delete and find can be done in O(1), and also insert, except… Izmir University of Economics

Izmir University of Economics Hash Function What happens when you try to insert: x = 65 ? x = 65 hash(x) = 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 _ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _ 65(?) If, when an element is inserted, it hashes to the same value as an already inserted element, this is called a collision. Izmir University of Economics

Izmir University of Economics Handling Collisions Separate Chaining Open Addressing Linear Probing Quadratic Probing Double Hashing Izmir University of Economics

Izmir University of Economics Handling Collisions Separate Chaining Izmir University of Economics

Izmir University of Economics Separate Chaining Keep a list of elements that hash to the same value New elements can be inserted at the front of the list ex: x=i2 and hash(x)=x%10 Izmir University of Economics

Performance of Separate Chaining Load factor of a hash table, λ λ = N/M (# of elements in the table/TableSize) So the average length of list is λ Search Time = Time to evaluate hash function + the time to traverse the list Unsuccessful search= λ nodes are examined Successful search=1 + ½* (N-1)/M (the node searched + half the expected # of other nodes) =1+1/2 *λ Observation: Table size is not important but load factor is. For separate chaining make λ 1 Izmir University of Economics

Separate Chaining: Disadvantages Parts of the array might never be used. As chains get longer, search time increases to O(N) in the worst case. Constructing new chain nodes is relatively expensive (still constant time, but the constant is high). Is there a way to use the “unused” space in the array instead of using chains to make more space? Izmir University of Economics

Izmir University of Economics Handling Collisions Open Addressing Izmir University of Economics

Izmir University of Economics Handling Collisions Linear Probing An alternative to resolving collisions with linked lists is to try alternative cells until an empty cell is found. Alternative Cells are h0(x), h1(x), h2(x),... where hi(x)=(hash(x)+f(i)) % TableSize with f(0)=0, f is the collision resolution strategy. Because all the data go inside the table For Linear probing, λ < 0.5 Izmir University of Economics

Izmir University of Economics Linear Probing In linear probing f(i)=i (Popular Choice) Let key x be stored in element hash(x)=t of the array 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65(?) What do you do in case of a collision? If the hash table is not full, attempt to store key in the next array element (in this case (t+1)%N, (t+2)%N, (t+3)%N …) until you find an empty slot. Izmir University of Economics

Izmir University of Economics Linear Probing Where do you store 65 ? 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 65 129 25 2501    attempts Izmir University of Economics

Linear Probing Performance (1) If the table is relatively empty, blocks of occupied cells start forming (primary clustering) Expected # of probes for insertions and unsuccessful searches is ½(1+1/(1-λ)2) for successful searches is ½(1+1/(1-λ)) Izmir University of Economics

(earlier insertion are cheaper) Linear Probing Performance (2) Assumptions: If clustering is not a problem, large table, probes are independent of each other Expected # of probes for unsuccessful searches (=expected # of probes until an empty cell is found) Expected # of probes for successful searches (=expected # of probes when an element was inserted) (=expected # of probes for an unsuccessful search) Average cost of an insertion (fraction of empty cells = 1 -λ) (earlier insertion are cheaper) Izmir University of Economics

Izmir University of Economics Linear Probing Eliminates need for separate data structures (chains), and the cost of constructing nodes. Leads to problem of clustering. Elements tend to cluster in dense intervals in the array. Search efficiency problem remains. Deletion becomes trickier….      Izmir University of Economics

Deletion Problem -- SOLUTION Standard deletion cannot be performed in a probing hash table, because the cell might have caused a collison to go past it. “Lazy” deletion Each cell is in one of 3 possible states: active empty deleted For Find or Delete only stop search when EMPTY state detected (not DELETED) Izmir University of Economics

Deletion-Aware Algorithms Insert call Find, if cell = active already there if Cell = deleted or empty insert, cell = active Find cell empty NOT found cell deleted if key == key -> NOT FOUND else H = (H + 1) mod TS cell active if key == key -> FOUND Delete call Find, cell active DELETE; cell=deleted cell deleted NOT found Izmir University of Economics

Izmir University of Economics Handling Collisions Quadratic Probing Izmir University of Economics

Izmir University of Economics Quadratic Probing In quadratic probing f(i)=i2 (Popular Choice) Let key x be stored in element hash(x)=t of the array 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65(?) What do you do in case of a collision? If the hash table is not full, attempt to store key in array elements (t+12)%N, (t+22)%N, (t+32)%N … until you find an empty slot. Izmir University of Economics

Izmir University of Economics Quadratic Probing Where do you store 65 ? hash(65)=t=5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 47 35 36 129 25 2501 65     t t+1 t+4 t+9 attempts Izmir University of Economics

Izmir University of Economics Quadratic Probing Theorem: If quadratic probing is used, and the table size is prime, then a new element can always be inserted if the table is at least half empty. Proof: Let TableSize be an odd prime > 3, -prove first alternative locations are all distinct. -Assume two of these locations are the same and . Then, Since TableSize is prime and i and j are distinct (also less than floor(TableSize)), this is not possible. It follows that the first M/2 alternative are all distinct, and an insertion must succeed if the table is at least half full. Izmir University of Economics

Izmir University of Economics Quadratic Probing Tends to distribute keys better than linear probing, alleviates the problem of clustering (primary clustering. There remains the problem of secondary clustering in which elements that hash to the same position will probe the same alternative cells Runs the risk of an infinite loop on insertion, unless precautions are taken. E.g., consider inserting the key 16 into a table of size 16, with positions 0, 1, 4 and 9 already occupied. Therefore, table size should be prime. Izmir University of Economics

Izmir University of Economics Handling Collisions Double Hashing Izmir University of Economics

Izmir University of Economics Double Hashing f(i)=i*hash2(x) is a popular choice hash2(x)should never evaluate to zero Now the increment is a function of the key The slots visited by the hash function will vary even if the initial slot was the same Avoids clustering Theoretically interesting, but in practice slower than quadratic probing, because of the need to evaluate a second hash function. Izmir University of Economics

Izmir University of Economics Double Hashing Typical second hash function hash2(x)=R − ( x % R ) where R is a prime number, R < N Izmir University of Economics

Izmir University of Economics Double Hashing Where do you store 99 ? hash(99)=t=9 Let hash2(x) = 11 − (x % 11), hash2(99)=d=11 Note: R=11, N=15 Attempt to store key in array elements (t+d)%N, (t+2d)%N, (t+3d)%N … Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 129 25 2501 99 29     t+22 t+11 t t+33 attempts Where would you store: 127? Izmir University of Economics

Izmir University of Economics Double Hashing Let f2(x)= 11 − (x % 11) hash2(127)=d=5 Array: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 14 16 47 35 36 65 129 25 2501 99 29    t+10 t t+5 attempts Infinite loop! Izmir University of Economics

Izmir University of Economics Rehashing If the table gets too full, the running times for the operations will start taking too long. When the load factor exceeds a threshold, double the table size (smallest prime > 2 * old table size). Rehash each record in the old table into the new table. Expensive: O(N) work done in copying. However, if the threshold is large (e.g., ½), then we need to rehash only once per O(N) insertions, so the cost is “amortized” constant-time. Izmir University of Economics

Factors affecting efficiency Choice of hash function Collision resolution strategy Load Factor Hashing offers excellent performance for insertion and retrieval of data. Izmir University of Economics

Comparison of Hash Table & BST BST HashTable Average Speed O(log2N) O(1) Find Min/Max Yes No Items in a range Yes No Sorted Input Very Bad No problems Use HashTable if there is any suspicion of SORTED input & NO ordering information is required. Izmir University of Economics

Izmir University of Economics Homework Assignments 5.1, 5.2, 5.12, 5.14 You are requested to study and solve the exercises. Note that these are for you to practice only. You are not to deliver the results to me. Izmir University of Economics