Lecture-Hashing.

Slides:



Advertisements
Similar presentations
Hashing.
Advertisements

Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Hashing CS 3358 Data Structures.
CSE 373 Data Structures Lecture 10
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hash Tables - Motivation
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
DS.H.1 Hashing Chapter 5 Overview The General Idea Hash Functions Separate Chaining Open Addressing Rehashing Extendible Hashing Application Example: Geometric.
Sections 10.5 – 10.6 Hashing.
CE 221 Data Structures and Algorithms
Hashing (part 2) CSE 2011 Winter March 2018.
Hashing.
May 3rd – Hashing & Graphs
Data Structures Using C++ 2E
Hashing Jeff Chastine.
Hash table CSC317 We have elements with key and satellite data
CSCI 210 Data Structures and Algorithms
Hashing CSE 2011 Winter July 2018.
Data Abstraction & Problem Solving with C++
School of Computer Science and Engineering
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing - resolving collisions
Hashing Alexandra Stefan.
Hashing - Hash Maps and Hash Functions
Subject Name: File Structures
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Review Graph Directed Graph Undirected Graph Sub-Graph
Hash functions Open addressing
Advanced Associative Structures
Hash Table.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Hash Tables.
Dictionaries and Their Implementations
Resolving collisions: Open addressing
Hashing Alexandra Stefan.
Advance Database System
CS202 - Fundamental Structures of Computer Science II
Advanced Implementation of Tables
Advanced Implementation of Tables
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
EE 312 Software Design and Implementation I
Ch Hash Tables Array or linked list Binary search trees
Hashing.
Data Structures and Algorithm Analysis Hashing
17CS1102 DATA STRUCTURES © 2018 KLEF – The contents of this presentation are an intellectual and copyrighted property of KL University. ALL RIGHTS RESERVED.
EE 312 Software Design and Implementation I
Chapter 13 Hashing © 2011 Pearson Addison-Wesley. All rights reserved.
Presentation transcript:

Lecture-Hashing

At this point we have seen several dynamic data structures. Today we will look at hash tables. Various implementations for building sets or maps (key is used to order corresponding satellite data). Simple lists, which had O(n) access time. Binary search trees- storing values (or key-data pairs) in the node of the tree. If the tree is balanced, then access operations in tree implementations have O(lg n) access time. What if we could do even better? In fact, we can perform these operations in O(1) time with hash tables, implemented using arrays. The idea is to exploit the power of arrays to update a random element in O(1) time

Motivating Example Want to store a list whose keys are integers between 1 and 5 and the keys are unique. Define an array of size 5, and if the list has key j, then j is stored in A[j-1], otherwise A[j-1] contains 0. Complexity of find/insert/delete operations is O(1). This is a Direct Address Hash Table. What is the obvious problem with this Hash Table design?

Hash table The objective is to find a key in constant time ``on average.’’ Suppose we know the keys belong to 1,2…U, and we are allowed an overall space of U, then this can be done with a Direct Address Hash Table. But U can be very large and the keys may not be unique. Space for storage is called the ``hash table,’’ H.

If the keys are not unique, then we can simply construct a set of m lists and store the heads of these lists in the direct address table. The time to find an element matching an input key will still be O(1). If each element of the collection has a distinguishing feature (other than its key), and the maximum number of duplicates is nmax, then searching for a specific element is O(nmax). If duplicates are the exception, then nmax is much smaller than n and a direct address table will provide good performance. But if nmax approaches n, then the time to find a specific element is O(n) and a tree structure is more efficient.

Direct addressing is easily generalized to the case where there is a function, h(k) => (1,m) which maps each value of the key, k, to the range (1,m). In this case, we place the element in T[h(k)] rather than T[k] and we can search in O(1) time as before. Assume that the hash table has size M The function is called h[j], (the hash value for j is h[j]) If h[j] = k, then the element is added to H[k]. Suppose we want to store a list of integers, then an example hash function is h[j] = j modulo M.

List contains 1 , 3, 9, 8 M = 5 1 3 9 8 The direct address approach requires that the function, h(k), is a one-to-one mapping from each k to integers in (1,m). Such a function is a perfect hashing function: it maps each key to a distinct integer within the range and results in a O(1) search time table. Finding a perfect hashing function is not always possible. Assume a hash function, h(k), that maps most of the keys onto unique integers, but maps a small number of keys onto the same integer. If the number of collisions is sufficiently small, then hash tables work well and give O(1) search times.

We may want to store elements which are not numbers, e.g., names. Then we use a function to convert each element to an integer and hash the integer. We want to store string, abc Represent each symbol by the ASCII code, choose a number r, integer value for abc is ASCII(a)r2 + ASCII(b)r + ASCII ( c ) View Animation

Ways to handle collisions? Chaining Chain all collisions in lists attached to the appropriate slot. This allows an unlimited number of collisions to be handled and doesn't require knowledge of how many elements are contained in the collection. The tradeoff is the same as with linked lists. Re-hashing Use a second hashing operation when there is a collision. Re-hash until an empty "slot" in the table is found.The re-hashing function can either be a new function or a re-application of the original one. As long as the functions are applied to a key in the same order, then a sought key can always be located. Linear probing One of the simplest re-hashing functions is +1 (or -1), ie on a collision, look in the neighboring slot in the table. It calculates the new address extremely quickly and is extremely efficient on a modern RISC processor due to efficient cache utilization.

Implementation Hashtables are arrays. Size of a hash table is normally a prime number Two different elements may hash to the same value (collision) Hashing needs collision resolution - Linear probing is subject to a clustering phenomenon. Re-hashes from one location occupy a block of slots in the table which "grows" towards slots to which other keys hash. This exacerbates the collision problem and the number of re-hashed can become large. Hash functions are chosen so that the hash values are spread over 0,…..M-1, and there are only few collisions.

Separate Chaining Store all the elements mapped to the same position in a linked list. 1 3 8 9 M = 5 H[k] is the list of all elements mapped to k. To find an element j, compute h(j). Let h(j) = k. Then search in linked list H[k] To insert an element j, compute h(j). Let h(j) = k. Then insert in linked list H[k] To delete an element, delete from the link list.

1 3 8 9 M = 5 Search for 7, look in position 2 in the array, find it empty, conclude 7 is not there Search for 13, look in position 3 in the array, search the linked list, 13 not found, conclude that 13 is not there Insert 4, add it in the linked list starting with 9 Insertion is O(1). Worst case searching complexity depends on the maximum length of a list H[p] O(q) if q is the maximum length. We are interested in average searching complexity.

Load factor  is the average size of a list.  = number of elements in the hash table/number of positions in the hash table (M). Average complexity to find an item is 1 + . We want  to be approximately 1. To reduce worst case complexity we choose hash functions which distribute the elements evenly in the list.

Open Addressing Separate chaining requires manipulation of pointers and dynamic memory allocation which are expensive. Open addressing is an alternate scheme. Want to insert key (element) j Compute h(j) = k If H[k] is empty store in H[k], otherwise try H[k+1], H[k+2], etc. (increment in modulo size) Linear Probing

8 1 3 9 List contains 1 , 3, 9, 8 M = 5 Every position in hash table contains one element. Can always insert a key as long as the table is not full Finding may be difficult if the table is close to full.

The idea is to declare a hash table large enough so that it is never full. Initially, all slots are empty. Elements are inserted as described. When an element is deleted, the space is marked deleted (empty and deleted are different). During the find operation, one looks for element k starting from where it should be (H[h(k)]), till the element is found, or an empty slot is found. In the latter case, we conclude that the element is not in the list.

1 3 8 9 M = 5 Looking for 13, Start from the position which has 3, then look at that with 9, then that with 8, next with 1, reach an empty slot, conclude not there Looking for 8, start from the position which has 3, then look at that with 9, then that with 8, conclude found Any problem if empty and deleted are not distinguished? Yes, may conclude that element not here even if it is Delete 9 1 3 8 M = 5 Search for 8, start from the position with 3, go to next slot, finds nothing, concludes empty and thus 8 not there!

When we insert an element k, then start from H[h(k)] and move till an empty or deleted slot can be found. An element can be inserted as long as the hash-table is not full. If hash values are clustered, then even if hash table is relatively empty, finding may be difficult.

Quadratic Probing Alternative to linear probing. To insert key k, try slot h(k). If the slot is full try slot h(k) + 1, then h(k) + 4, then h(k) + 9 and so on. Advantage? Not much clustering Are we guaranteed to be able to insert as long as the hash table is not full? M = 3, first two positions full, let h(k) = 0, h(k) + n2 mod M is always 0 or 1. Prove it. Thus we never reach the third position which is the only empty one.

Hash Table Resize If the hash table is close to full, then a hash table of bigger size is used. The old hash table is copied into a new one. The old hash table is subsequently deleted. Should be done infrequently.