Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CSC220 Winter What is strength of b-tree? Can we make an array to be as fast search and insert as B-tree and LL?
1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables.
Hashing / Hash tables Chapter 20 CSCI 3333 Data Structures.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
1 Hashing (Walls & Mirrors - end of Chapter 12). 2 I hate quotations. Tell me what you know. – Ralph Waldo Emerson.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (excerpts) Advanced Implementation of Tables CS102 Sections 51 and 52 Marc Smith and.
Hash Tables. Container of elements where each element has an associated key Each key is mapped to a value that determines the table cell where element.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
Appendix E-A Hashing Modified. Chapter Scope Concept of hashing Hashing functions Collision handling – Open addressing – Buckets – Chaining Deletions.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Hashing - 2 Designing Hash Tables Sections 5.3, 5.4, 5.4, 5.6.
WEEK 1 Hashing CE222 Dr. Senem Kumova Metin
LECTURE 35: COLLISIONS CSC 212 – Data Structures.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Chapter 5: Hashing Collision Resolution: Open Addressing Extendible Hashing Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova,
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CS 206 Introduction to Computer Science II 04 / 08 / 2009 Instructor: Michael Eckmann.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
CE 221 Data Structures and Algorithms
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Introduction to Algorithms 6.046J/18.401J
Data Structures and Algorithms
Hashing Alexandra Stefan.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Data Structures and Algorithm Analysis Hashing
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley unless otherwise indicated. All rights reserved.

Hash tables Suppose we decide that the average cost of O(log N) for operations of a binary search tree are too slow. Hash tables provide a way to insert, delete and find in average O(1) time. Why did we even bother with binary search trees?

No free lunch The constant time comes with a cost: Hash table elements have no order, so –visiting according to an ordering property –finding the minimum or maximum elements –etc. are all expensive

Foundations of hashing Much like binary search trees, we choose some field of a record to serve as a key. A function maps the key to an index. HashFunction(key) index The index is used in an array and the array entries are sometimes referred to as hash buckets.

Hash functions A naïve hash function for a string might build a polynomial from the strings encoding: Example:

Hash functions We can index an array by hash function index: Cool + any other information

Uh-oh For a 4 character string we need over 268,000,000 entries in the array. We can reduce the size to something manageable by using the modulo operator: % = 2124

More about the hash function If we consider the hash function to be a polynomial of variable X, e.g. for strings: we can reduce the number of multiplications by incrementally computing the hash function

Overflowing the hash function Consider X=128 as in our previous example for hashing strings and assume that we are using 64 bit unsigned integers: so any 10 character string (and most 9 character strings) would overflow a 64 bit unsigned integer.

Resolving hash overflow hash_value = 0 for i = 0 to length(A) hash_value = hash_value*X + encoding(A[i]) Avoids computing X i explicitly, but the sum can still overflow…

Resolving hash overflow 1.Apply modulo after each operation hash_value = 0 for i = 0 to length(A) hash_value = /* modulo is expensive */ (hash_value*X + encoding(A[i])) % TableSize 2.Allow overflow. We need to be careful though as long polynomials will shift the first elements of the key out of range.

Avoiding overflow

Allowing overflow

Going to extremes… Here, we have effectively set X in our polynomial to the value 1. What are the implications of this?

Collisions Our hash function is no longer unique. If we choose our hash function carefully, this will not happen too often. Nonetheless, we still need to handle it and we will investigate different ways to do so.

Linear probing Simple idea: When a collison occurs, look for the next empty hash bucket. use this one hashes to used

Linear probing analysis The load factor is defined as Let us assume that 1.Each insertion/access of the hash table is independent of other ones (very naïve assumption) 2.The hash table is large (reasonable assumption)

Naïve analysis Assuming independence of probes, the average number of buckets examined in a linear probing insertion is 1/(1-λ) Proof: Pr(empty bucket)=(1- λ). On average, if an event occurs with Pr(event)=p, we need to try 1/p times before we expect to have seen the event with probability 1. So, we should have to try 1/(1- λ) times before we see an empty bucket.

Primary clustering Hash insertions and finds are not independent. Results in primary clustering

Linear probing complexity Given a loading factor of λ, the number of cells examined in a linear probing insertion is approximately: We will accept this without proof.

Analysis of find Unsuccessful find –Same as cost of insertion. Successful find –Same as finding item at time when inserted. –If the bin was unused, only 1 probe is needed. –As more collisions occur, the number of probes increase.

Analysis of successful find when primary clustering is present Need to average over all load factors up to the current one:

Deletion Cost similar to that of find. We cannot simply delete a node. –Why not?

Lazy deletion Instead of clearing an entry, we mark it as deleted. A new insertion may place a new value there and mark it active. Hash bins are either: unused, active, or deleted.

Perhaps we can do better… Linear probing is not bad: –Average number of probes for a successful search with a hash table 50% loaded is 2.5. –Begins to be problematic as λ approaches 1 (λ= ). –Note that this is independent of the table size. Any algorithm that wishes to reduce this needs must be inexpensive enough that it is cheaper than the small number of probes typically needed.

Quadratic probing Basic idea: Scatter the collisions so they do not group near one another Suppose hash(n) = H and bin H is used. –Try (H + i 2 )%TableSize for i = 1, 2, 3, … –Note that linear probing used (H+i)%TableSize for i = 1, 2, 3, … Works best when the table size is a prime number.

Quadratic probing Thm 20.4 – When inserting into a hash table that is at least half empty using quadratic probing, a new element can always be inserted, and no hash bucket is probed more than one time.

Insertion with quadratic probing

What does this buy us? For a hash table which is less than half full, we have removed the primary clustering. Consequently, we are closer our naïve analysis. On average, when the table is half full, this saves us: –.5 for each insertion –.1 for each successful search In addition, long chains are avoided.

What does this cost us? The squared operation and the modulo are relatively expensive given that on average we do not save much. Fortunately, we can improve this...

Efficient quadratic probing

Effective quadratic probing Multiplication by can be implemented trivially by shift. 2i-1 < M as we never insert into a table that is more than half full. So, H i-1 +2i-1 is either less than H i or is <2M and can be adjusted by subtracting M.

More than M/2 entries? Increase the size of the table to the next prime number. Figure 20.7 (read) shows a prime number generation subroutine that is at most O(N.5 logN). This is less than O(N). Copying the table take O(N) time, and has an amortized cost of O(1).

Copying the hash bins We do not use the same entries. –Why not? Instead we rehash each item to a new position.

Read Read the remainder of the code online and make sure that you understand it. In addition, read the iterator class code.

Complexity of quadratic probing No known analysis Eliminates primary clustering Introduces secondary clustering

Alternatives Double hashing – Resolve collisions with a second hash function Separate chain hashing – Place collisions on a linked list.

Applications Content addressable tables Symbol tables Game playing – Caching state Song recognition