Hashing21 Hashing II: The leftovers. hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

1 Designing Hash Tables Sections 5.3, 5.4, Designing a hash table 1.Hash function: establishing a key with an indexed location in a hash table.
Hash Tables.
CSCE 3400 Data Structures & Algorithm Analysis
Data Structures Using C++ 2E
File Processing - Indirect Address Translation MVNC1 Hashing Indirect Address Translation Chapter 11.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
Appendix I Hashing. Chapter Scope Hashing, conceptually Using hashes to solve problems Hash implementations Java Foundations, 3rd Edition, Lewis/DePasquale/Chase21.
Searching Kruse and Ryba Ch and 9.6. Problem: Search We are given a list of records. Each record has an associated key. Give efficient algorithm.
Hashing Techniques.
Data Structures Hash Tables
1 Chapter 9 Maps and Dictionaries. 2 A basic problem We have to store some records and perform the following: add new record add new record delete record.
Sets and Maps Chapter 9. Chapter 9: Sets and Maps2 Chapter Objectives To understand the Java Map and Set interfaces and how to use them To learn about.
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Introduction to Hashing CS 311 Winter, Dictionary Structure A dictionary structure has the form: (Key, Data) Dictionary structures are organized.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
Hashing 1. Def. Hash Table an array in which items are inserted according to a key value (i.e. the key value is used to determine the index of the item).
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture8.
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
1 Hash table. 2 Objective To learn: Hash function Linear probing Quadratic probing Chained hash table.
1 Hash table. 2 A basic problem We have to store some records and perform the following:  add new record  delete record  search a record by key Find.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Data Structures and Algorithms Hashing First Year M. B. Fayek CUFE 2010.
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
ECE 250 Algorithms and Data Structures Douglas Wilhelm Harder, M.Math. LEL Department of Electrical and Computer Engineering University of Waterloo Waterloo,
Chapter 11 Hash Tables © John Urrutia 2014, All Rights Reserved1.
Chapter 11 Hash Anshuman Razdan Div of Computing Studies
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
CS261 Data Structures Hash Tables Open Address Hashing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
Data Structure & Algorithm Lecture 8 – Hashing JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Hashing by Rafael Jaffarove CS157b. Motivation  Fast data access  Search  Insertion  Deletion  Ideal seek time is O(1)
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
Sets and Maps Chapter 9. Chapter Objectives  To understand the Java Map and Set interfaces and how to use them  To learn about hash coding and its use.
Data Structures Using C++ 2E
Hashing CSE 2011 Winter July 2018.
Data Structures Using C++ 2E
Search by Hashing.
Quadratic probing Double hashing Removal and open addressing Chaining
Hash Table.
Hash Table.
CS202 - Fundamental Structures of Computer Science II
Data Structures – Week #7
What we learn with pleasure we never forget. Alfred Mercier
Presentation transcript:

hashing21 Hashing II: The leftovers

hashing22 Hash functions Choice of hash function can be important factor in reducing the likelihood of collisions Division hashing: key % CAPACITY –Certain table sizes are more conducive to collision avoidance with this method –A 1970 study suggests that a good size is a prime number of the form 4k+3 –For example, 811 = 202 * 4 + 3)

hashing23 Other hash functions Mid-square hash –multiply key by itself –use some middle digits of the result as hash value Multiplicative hash –multiply key by a floating-point constant that is less than one –use first few digits of fractional part of result as hash value

hashing24 Insertion and linear probing During insertion process, collision may occur In case of collision, the insertion function moves forward through the array until a vacant spot is found; the process is know as linear probing

hashing25 The perils of probing When many keys hash to the same index, elements start to group in clumps near that index; as the table grows, these clumps get larger As the table approaches capacity, these clumps tend to merge into gigantic clusters; hence this process is known as clustering Performance of insertion and search functions degrades when clustering occurs

hashing26 Double hashing A common technique used to reduce clustering is double hashing In double hashing, a second hash function is used to determine where to seek the next vacancy in the array when a collision occurs Rather than using linear probing, double hashing ensures that a few entries are skipped after each collision

hashing27 Double hashing Step 1: hash key and check for collision Step 2: if collision occurs, run key through second hash function; check index result spaces beyond original Example: –1st hash produces space at this index is taken, so run 2nd hash –2nd hash produces 9, so check space at index if not vacant, go to 224, then 233, etc.

hashing28 Considerations for second hash function Value added to index must not exceed valid range of array (0.. CAPACITY-1) Can stay within range by using the following formula to determine next index (where hash2 is the second hash function): index = (index + hash2(key)) % CAPACITY

hashing29 Considerations for second hash function Every array position must be examined - with double hashing, spots could be skipped, returning to start position before every available location has been probed To avoid this, make sure CAPACITY-1 is relatively prime with respect to value returned by hash2 (in other words, 2nd hash value and last array index should have no common factors)

hashing210 Example values for double hashing Both CAPACITY and CAPACITY-2 should be primes -- e.g. 809 and 811 First hash function: return (key % CAPACITY); Second hash function: return (1 + (key % (CAPACITY - 2)));

hashing211 Modifications to dictionary class for double hashing Add a hash2 function as private member Change next_index to return (1 + hash2(key) % CAPACITY)

hashing212 Chained hashing Open-address hashing uses static arrays in which each element contains one entry; when the array is full, can’t add more entries Could use dynamic arrays, but would require resizing to new prime number and rehashing entire table Chained hashing is a more workable alternative

hashing213 Chained hashing Chained hashing uses approach similar to first priority queue we studied: –each array element is a list which can hold several entries –all records that hash to a particular index are placed in the list at that index –a chained hash table can hold many more records than a simple hash table

hashing214 Time analysis of hashing In the worst case, every key gets hashed to the same index -- this makes insertion, deletion and searching linear operations (O(N)) Best case is the same as the linear search algorithm (O(1)), for the same reason Neither of these cases is particularly likely, however

hashing215 Time analysis of hashing Average case is relatively complex, especially if deletions are allowed Three different formulas have been developed for the average number of elements that must be examined for a successful search -- each corresponds to a different version of hashing (open-address with linear probing, open-address with double hashing, and chained hashing)

hashing216 Time analysis of hashing Each formula depends on the number of elements in the table the greater the number of items, the more collisions the more collisions, the longer the average search time

hashing217 Time analysis of hashing A load factor (  ) is used in each formula --  is the ratio of the number of occupied table locations to the size of the table:  = used / CAPACITY

hashing218 Time analysis of hashing For open-address hashing with linear probing, given the following conditions: –hash table is not full –no deletions Average number of table elements examined for a successful search is: 1/2 * (1 + 1/(1 -  ))

hashing219 Time analysis of hashing Example of open-address hashing with linear probing: used = 525 CAPACITY = 713  = 525 / 713, ~.75 therefore, average number of searches is: 1/2 * (1 + 1 / (1 -.75)) = 2.5 meaning about 3 table elements must be examined to complete a successful search

hashing220 Time analysis of hashing Average search time for open addressing with double hashing, given: –table is not full –no deletions Formula: -ln (1 -  ) /  For same values as previous example, result is slightly less than 2

hashing221 Time analysis of hashing For chained hashing, conditions for average search time are different: –each element of a table is the head pointer of a linked list –each list may have several items –  may be greater than one –formula remains valid even with deletions: 1 +  / 2