Hashing Hashing is another method for sorting and searching data.

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Hashing.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
Hash Tables1   © 2010 Goodrich, Tamassia.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Chapter 7 Section 3. What is hashing? Hashing is using a 1-D array to implement a dictionary o This implementation is called a "hash table" Items.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
October 6, Algorithms and Data Structures Lecture VII Simonas Šaltenis Aalborg University
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
COSC 1030 Lecture 10 Hash Table. Topics Table Hash Concept Hash Function Resolve collision Complexity Analysis.
Hash Tables © Rick Mercer.  Outline  Discuss what a hash method does  translates a string key into an integer  Discuss a few strategies for implementing.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing O(1) data access (almost) -access, insertion, deletion, updating in constant time (on average) but at a price… references: Weiss, Goodrich & Tamassia,
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Chapter 11 (Lafore’s Book) Hash Tables Hwajung Lee.
1 What is it? A side order for your eggs? A form of narcotic intake? A combination of the two?
Sets and Maps Chapter 9.
Sections 10.5 – 10.6 Hashing.
Hashing Jeff Chastine.
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Advanced Associative Structures
Hash Table.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Sets and Maps Chapter 9.
Collision Handling Collisions occur when different elements are mapped to the same cell.
Presentation transcript:

Hashing Hashing is another method for sorting and searching data. Hashing makes it easier to add and remove elements from a data structure. The worst-case behavior for locating a key is linear – Q(n). Java’s standard hash table class is: java.util.Hashtable

Hashing Hashing usually implements a data structure called a hash table. A hash table is an effective data structure. A hash table is a generalization of an array. A hash table requires a key to access data.

Hashing A hash table uses an array whose length is proportional to the number of keys actually stored. The array index is computed from the key, rather than using the key to access the array. The key is a unique identifying value.

Hashing Functions Hashing requires the use of a hashing function. The purpose of the hashing function is to compute the storage slot from the key. Maps key values to array indices. This calculation reduces the range of array indices that need to be handled.

Hashing Functions If a hashing function groups key values together, this is called clustering of the keys. A good hashing function distributes the key values uniformly through the array’s index range. Any hashing function that results in clustering should be changed. A good hashing function has an equal likelihood of hashing a key into any of the slots. The java.util.Hashtable contains the method hashCode

Hashing Functions The division hash function depends upon the remainder of division. Math.abs(H(k)) % table.length When using the division hash function, it is best to have a table size that is a prime number of the form 4n + 3. Using the division hash function can result in many collisions.

Hashing Functions The mid-square hash function converts the key to an integer, then doubles the key. The function returns the middle digits of the results. The multiplicative hash function converts the key to an integer and multiplies it by a constant less than one. The function returns the first few digits of the fractional part of the result.

Example Table Universe of Keys - U Actual Keys – K H(k1) H(k4) K1 H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) K4 K5 K2 K3 H(k3) m - 1

Collisions A collision occurs when the hashing function calculates the same array index for two different objects and one is already stored into the array index location. Two keys hash to the same slot.

Collision Example Table Universe of Keys - U Actual Keys – K H(k1) H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 K2 K3 H(k3) m - 1

Open Addressing Open addressing ensures that all elements are stored directly into the hash table. Every table slot contains either data or null. The problem is that the table can fill up. The good thing is that there are no external storage locations for the table elements.

Open Addressing Open addressing attempts to resolve collisions using various methods.

Linear Probing Linear Probing resolves collisions by placing the data into the next open slot in the table. If this slot is open, the data is stored in the slot. If this slot is not open, the algorithm looks at the next slot (index) until an open slot is found.

Linear Probing It is difficult to delete items from a hash table that uses open addressing. Can not simply put null into the slot because may miss information. Instead place Deleted into the empty slot. If H’(k) is the ordinary hash function, the linear probing hash function is: H(k, i) = (H’(k) + 1) % m where i = 0, 1, 2, … , m and m is the number of elements that can be stored into the table.

Linear Probing A problem associated with Linear Probing is called, primary clustering. Primary clustering occurs when many items hash into the same slot and long runs of slots are filled up. This results in increased search times.

Linear Probing Table Universe of Keys - U Actual Keys – K H(k1) H(k4) H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 H(k5) K2 K3 H(k3) m - 1

Double Hashing Double hashing is one of the best methods for dealing with collisions. The slot location is calculated based upon the hash function (H1(k)). If the slot is full, then a second hash function is calculated and combined with the first hash function (H(k, i)) to determine a new slot.

Double Hashing Assume that: Then: H1(k) = Math.abs(H(k)) % table.length H2(k) = 1 + Math.abs(H(k)) % (table.length – x) where x is a small value; 1, 2, or 3. Then: H(k, i) = (H1(k) + i H2(k) ) % m

Double Hashing Table Universe of Keys - U Actual Keys – K H(k5) H(k1) H(k5) H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) = H(k5) K4 K5 K2 K3 H(k3) m - 1

External Chaining In external chaining the hash table contains an array in which each component can hold more than one element of the hash table. Essentially, a multiple dimension array or a linked list of elements can exist for each table slot. The typical implementation is that each slot contains a linked list.

External Chaining Table Universe of Keys - U Actual Keys – K H(k1) H(k1) Universe of Keys - U H(k4) K1 Actual Keys – K H(k2) H(k5) K4 K5 K2 K3 H(k3) m - 1

Load Factor The load factor is a fraction that represents the number of elements stored in the table divided by the size of the table’s array. a = the number of elements stored in the table the size of the table’s array

Load Factor If open addressing is used, then each table slot holds at most one element, therefore, the load factor can never be greater than 1. If external chaining is used, then each table slot can hold many elements, therefore, the load factor may be greater than 1.

Hashing Analysis The worst case analysis for hashing is the case where every key is hashed into the same slot. Q (n) – linear time. The average time can be much faster.

Average Search Analysis Searching with Linear probing. For a table that is not near full: ½ ( 1 + 1 / (1 – a) ) For a table that is full or near full: Math.Sqrt( n ( p / 8) ) Searching with double hashing. (-ln (1 – a) ) / a where ‘l’ in ‘ln’ is ‘L’ Searching with chained hashing. 1 + (a / 2 ) See Figure 11.6 in Main. Page 561

Coding Example Search Times program that demonstrates Linear, Binary, and Hashing. The hashing uses the HashTable class.

Hashing Java provides the HashTable class, but it also provides two other classes. The HashMap class implements a hash table using a map data structure. The HashSet class implements a hash table using sets.