Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Hashing.
CSCE 3400 Data Structures & Algorithm Analysis
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing as a Dictionary Implementation
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Hashing Techniques.
Hashing CS 3358 Data Structures.
CSE 373 Data Structures Lecture 10
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hashing Text Read Weiss, §5.1 – 5.5 Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Chapter 5: Hashing Collision Resolution: Separate Chaining Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Hash Tables - Motivation
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
March 23 & 28, Csci 2111: Data and File Structures Week 10, Lectures 1 & 2 Hashing.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Hashing (part 2) CSE 2011 Winter March 2018.
Hash table CSC317 We have elements with key and satellite data
Hashing - resolving collisions
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Resolving collisions: Open addressing
Hashing Alexandra Stefan.
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Lecture-Hashing.
Presentation transcript:

Lecture 6 Hashing

Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list has element j, then j is stored in A[j-1], otherwise A[j-1] contains 0. Complexity of find operation is O(1)

Hash table The objective is to find an element in constant time ``on an average.’’ Supposing we know the elements belong to 1,2…U, and we are allowed an overall space of U, then this can be done as described before. But U can be very large. Space for storage is called ``hash table,’’ H

Assume that the hashtable has size M There is a hashfunction which maps an element to a value p in 0,….M-1, and the element is placed in position p in the hashtable. The function is called h[j], (the hash value for j is h[j]) If h[j] = k, then the element is added to H[k]. Suppose we want a list of integers, then an example hash function is h[j] = j modulo M.

M = 5 List contains 1, 3, 9,

We may want to store elements which are not numbers, e.g., names. Then we use a function to convert each element to an integer and hash the integer. We want to store string, abc Represent each symbol by the ASCII code, choose a number r, integer value for abc is ASCII(a)r 2 + ASCII(b)r + ASCII ( c )

Implementation Hashtables are arrays. Size of a hash table is normally a prime number Two different elements may hash to the same value (collision) Hashing needs collision resolution Hash functions are chosen so that the hash values are spread over 0,…..M-1, and there are only few collisions.

Separate Chaining Store all the elements mapped to the same position in a linked list. H[k] is the list of all elements mapped to k. To find an element j, compute h(j). Let h(j) = k. Then search in link list H[k] To insert an element j, compute h(j). Let h(j) = k. Then insert in link list H[k] To delete an element, delete from the link list M = 5

Insertion is O(1). Worst case searching complexity depends on the maximum length of a list H[p] O(q) if q is the maximum length. We are interested in average searching complexity M = 5 Search for 7, look in position 2 in the array, find it empty, conclude 2 not there Search for 13, look in position 3 in the array, search the link list, 13 not found, conclude that 13 is not there Insert 4, add it in the link list starting with 9

Load factor is the average size of a list. = number of elements in the hash table/ number of positions in the hash table(M) Average find complexity is 1 + Want to be approximately 1 To reduce worst case complexity we choose hash functions which distribute the elements evenly in the list.

Open Addressing Separate chaining requires manipulation of pointers and dynamic memory allocation which are expensive. Open addressing is an alternate scheme. Want to insert key (element) j Compute h(j) = k If H[k] is empty store in H[k], otherwise try H[k+1], H[k+2], etc. (increment in modulo size) Linear Probing

Every position in hash table contains one element each. Can always insert a key as long as the table is not full Finding may be difficult if the table is close to full M = 5 List contains 1, 3, 9, 8

The idea is to declare a hash table large enough so that it is never full. Initially, all slots are empty. Elements are inserted as described. When an element is deleted, the space is marked deleted (empty and deleted are different). During the find operation, one looks for element k starting from where it should be (H[h(k)]), till the element is found, or an empty slot is found. In the latter case, we conclude that the element is not in the list.

1389 M = 5 Looking for 13, Start from the position which has 3, then look at that with 9, then that with 8, next with 1, reach an empty slot, conclude not there Looking for 8, start from the position which has 3, then look at that with 9, then that with 8, conclude found Any problem if empty and deleted are not distinguished? Yes, may conclude that element not here even if it is 138 M = 5 Delete 9 Search for 8, start from the position with 3, go to next slot, finds nothing, concludes empty and thus 8 not there!

When we insert an element k, then start from H[h(k)] and move till an empty or deleted slot can be found. An element can be inserted as long as the hash-table is not full. If hash values are clustered, then even if hash table is relatively empty, finding may be difficult.

Quadratic Probing Alternative to linear probing. To insert key k, try slot h(k). If the slot is full try slot h(k) + 1, then h(k) + 4, then h(k) + 9 and so on. Advantage? Are we guaranteed to be able to insert as long as the hash table is not full? Not much of clustering M = 3, first two positions full, let h(k) = 0, h(k) + n 2 mod M is always 0 or 1. Prove it. Thus we never reach the third position which is the only empty one.

If size of hash table M is a prime number, then we can always insert a new element if the table is at most half full. We want to insert element k. h(k) = j. Let n =  M/2  If the locations j, j + 1, j + 4,…..,j + n 2 are all distinct modulo M, then we can insert an element in the hash table. Why? Now we show that j, j + 1, j + 4,…..,j + n 2 are distinct. Let that not be so, Suppose there is p, q, 0  p < q  n with j + p 2 = j + q 2 mod M If these are distinct, and still can not be inserted then the hashtable has n + 1 elements, i.e.,  M/2  + 1 elements, which is not possible as the hash table is half full.

p 2 = q 2 mod M (p – q)(p + q) = 0 mod M Then either p = q mod M or p + q = 0 mod M. Is that right? Since p and q are distinct and less than M/2, neither p = q mod M nor p + q = 0 mod M Yes, since M is prime.

Rehashing If the hash table is close to full, then a hash table of bigger size is used. The old hash table is copied into a new one. The old hash table is subsequently deleted. Should be done infrequently. Chapter 5 of Weiss