CSCE 3400 Data Structures & Algorithm Analysis

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Tables CIS 606 Spring 2010.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.
Lecture 11 oct 6 Goals: hashing hash functions chaining closed hashing application of hashing.
Skip List & Hashing CSE, POSTECH.
Data Structures Using C++ 2E
Hashing as a Dictionary Implementation
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
© 2004 Goodrich, Tamassia Hash Tables1  
Hashing Chapters What is Hashing? A technique that determines an index or location for storage of an item in a data structure The hash function.
Using arrays – Example 2: names as keys How do we map strings to integers? One way is to convert each letter to a number, either by mapping them to 0-25.
Hashing Techniques.
Hashing CS 3358 Data Structures.
Lecture 11 March 5 Goals: hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 11 oct 7 Goals: hashing hash functions chaining closed hashing application of hashing.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Introducing Hashing Chapter 21 Copyright ©2012 by Pearson Education, Inc. All rights reserved.
L. Grewe. Computing hash function for a string Horner’s rule: (( … (a 0 x + a 1 ) x + a 2 ) x + … + a n-2 )x + a n-1 ) int hash( const string & key )
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Table Professor Sin-Min Lee Department of Computer Science.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Comp 335 File Structures Hashing.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Hashing Hashing is another method for sorting and searching data.
Hashing as a Dictionary Implementation Chapter 19.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
CS201: Data Structures and Discrete Mathematics I Hash Table.
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations general idea of hashing hash functions chaining closed hashing.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Hashing 8 April Example Consider a situation where we want to make a list of records for students currently doing the BSU CS degree, with each.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Been-Chian Chien, Wei-Pang Yang, and Wen-Yang Lin 8-1 Chapter 8 Hashing Introduction to Data Structure CHAPTER 8 HASHING 8.1 Symbol Table Abstract Data.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Hashing Suppose we want to search for a data item in a huge data record tables How long will it take? – It depends on the data structure – (unsorted) linked.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CPSC 252 Hashing Page 1 Hashing We have already seen that we can search for a key item in an array using either linear or binary search. It would be better.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Data Structures Using C++
1 Hashing by Adlane Habed School of Computer Science University of Windsor May 6, 2005.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
CMSC 341 Hashing Readings: Chapter 5. Announcements Midterm II on Nov 7 Review out Oct 29 HW 5 due Thursday CMSC 341 Hashing 2.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Data Structures Using C++ 2E
Data Structures Using C++ 2E
Advanced Associative Structures
Lecture 17 April 11, 11 Chapter 5, Hashing dictionary operations
CSCE 3110 Data Structures & Algorithm Analysis
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
EE 312 Software Design and Implementation I
Collision Handling Collisions occur when different elements are mapped to the same cell.
What we learn with pleasure we never forget. Alfred Mercier
EE 312 Software Design and Implementation I
Presentation transcript:

CSCE 3400 Data Structures & Algorithm Analysis Hashing Reading: Chap.5, Weiss

How to Implement a Dictionary? Sequences ordered unordered Binary Search Trees Hashtables

Hashing Another important and widely useful technique for implementing dictionaries Constant time per operation (on the average) Worst case time proportional to the size of the set for each operation (just like array and chain implementation)

Basic Idea Use hash function to map keys into positions in a hash table Ideally If element e has key k and h is hash function, then e is stored in position h(k) of table To search for e, compute h(k) to locate position. If no element, dictionary does not contain e.

Example Dictionary Student Records Keys are ID numbers (951000 - 952000), no more than 100 students Hash function: h(k) = k-951000 maps ID into distinct table positions 0-1000 array table[1001] hash table ... 1 2 3 1000 buckets

Analysis (Ideal Case) O(b) time to initialize hash table (b number of positions or buckets in hash table) O(1) time to perform insert, remove, search

Ideal Case is Unrealistic Works for implementing dictionaries, but many applications have key ranges that are too large to have 1-1 mapping between buckets and keys! Example: Suppose key can take on values from 0 .. 65,535 (2 byte unsigned int) Expect  1,000 records at any given time Impractical to use hash table with 65,536 slots!

h(k1) =  = h(k2): k1 and k2 have collision at slot  Hash Functions If key range too large, use hash table with fewer buckets and a hash function which maps multiple keys to same bucket: h(k1) =  = h(k2): k1 and k2 have collision at slot  Popular hash functions: hashing by division h(k) = k%D, where D number of buckets in hash table Example: hash table with 11 buckets h(k) = k%11 80  3 (80%11= 3), 40  7, 65  10 58  3 collision!

Collision Resolution Policies Two classes: (1) Open hashing, a.k.a. separate chaining (2) Closed hashing, a.k.a. open addressing Difference has to do with whether collisions are stored outside the table (open hashing) or whether collisions result in storing one of the records at another slot in the table (closed hashing)

Closed Hashing Associated with closed hashing is a rehash strategy: “If we try to place x in bucket h(x) and find it occupied, find alternative location h1(x), h2(x), etc. Try each in order, if none empty table is full,” h(x) is called home bucket Simplest rehash strategy is called linear hashing hi(x) = (h(x) + i) % D In general, our collision resolution strategy is to generate a sequence of hash table slots (probe sequence) that can hold the record; test each slot until find empty one (probing)

Example Linear (Closed) Hashing D=8, keys a,b,c,d have hash values h(a)=3, h(b)=0, h(c)=4, h(d)=3 Where do we insert d? 3 already filled Probe sequence using linear hashing: h1(d) = (h(d)+1)%8 = 4%8 = 4 h2(d) = (h(d)+2)%8 = 5%8 = 5* h3(d) = (h(d)+3)%8 = 6%8 = 6 etc. 7, 0, 1, 2 Wraps around the beginning of the table! b 1 2 3 a c 4 d 5 6 7

Operations Using Linear Hashing Test for membership: findItem Examine h(k), h1(k), h2(k), …, until we find k or an empty bucket or home bucket If no deletions possible, strategy works! What if deletions? If we reach empty bucket, cannot be sure that k is not somewhere else and empty bucket was occupied when k was inserted Need special placeholder deleted, to distinguish bucket that was never used from one that once held a value May need to reorganize table after many deletions

Performance Analysis - Worst Case Initialization: O(b), b# of buckets Insert and search: O(n), n number of elements in table; all n key values have same home bucket No better than linear list for maintaining dictionary!

Improved Collision Resolution Linear probing: hi(x) = (h(x) + i) % D all buckets in table will be candidates for inserting a new record before the probe sequence returns to home position clustering of records, leads to long probing sequences Linear probing with skipping: hi(x) = (h(x) + ic) % D c constant other than 1 records with adjacent home buckets will not follow same probe sequence (Pseudo)Random probing: hi(x) = (h(x) + ri) % D ri is the ith value in a random permutation of numbers from 1 to D-1 insertions and searches use the same sequence of “random” numbers

Example II insert 1052 (h.b. 7) I 1001 1001 1 9537 1. What if next element has home bucket 0?  go to bucket 3 Same for elements with home bucket 1 or 2! Only a record with home position 3 will stay.  p = 4/11 that next record will go to bucket 3 1 9537 h(k) = k%11 2 3016 2 3016 3 3 4 4 5 5 6 6 7 9874 7 9874 8 2009 2. Similarly, records hashing to 7,8,9 will end up in 10 3. Only records hashing to 4 will end up in 4 (p=1/11); same for 5 and 6 8 2009 9 9875 9 9875 10 1052 10 next element in bucket 3 with p = 8/11

Hash Functions - Numerical Values Consider: h(x) = x%16 poor distribution, not very random depends solely on least significant four bits of key Better, mid-square method if keys are integers in range 0,1,…,K , pick integer C such that DC2 about equal to K2, then h(x) = x2/C % D extracts middle r bits of x2, where 2r=D (a base-D digit) better, because most or all of bits of key contribute to result

Hash Function – Strings of Characters Folding Method: int h(String x, int D) { int i, sum; for (sum=0, i=0; i<x.length(); i++) sum+= (int)x.charAt(i); return (sum%D); } sums the ASCII values of the letters in the string ASCII value for “A” =65; sum will be in range 650-900 for 10 upper-case letters; good when D around 100, for example order of chars in string has no effect

Hash Function – Strings of Characters Polynomial hash codes static long hashCode(String key, int D) { int h= key.charAt(0);; for (int i=1, i<key.length(); i++){ h = h*a; h += (int) key.charAt(i); } return h%D; They have found that a = 33, 37, and 41 have less then 7 collisions.

Hash Function – Strings of Characters Much better: Cyclic Shift static long hashCode(String key, int D) { int h=0; for (int i=0, i<key.length(); i++){ h = (h << 4) | ( h >> 27); h += (int) key.charAt(i); } return h%D;

Open Hashing Each bucket in the hash table is the head of a linked list All elements that hash to a particular bucket are placed on that bucket’s linked list Records within a bucket can be ordered in several ways by order of insertion, by key value order, or by frequency of access order

Open Hashing Data Organization ... 1 ... 2 3 4 ... D-1

Analysis Open hashing is most appropriate when the hash table is kept in main memory, implemented with a standard in-memory linked list We hope that number of elements per bucket roughly equal in size, so that the lists will be short If there are n elements in set, then each bucket will have roughly n/D If we can estimate n and choose D to be roughly as large, then the average bucket will have only one or two members

Analysis Cont’d Average time per dictionary operation: D buckets, n elements in dictionary  average n/D elements per bucket insert, search, remove operation take O(1+n/D) time each If we can choose D to be about n, constant time Assuming each element is likely to be hashed to any bucket, running time constant, independent of n

Comparison with Closed Hashing Worst case performance is O(n) for both Number of operations for hashing 23 6 8 10 23 5 12 4 9 19 D=9 h(x) = x % D

Hashing Problem Draw the 11 entry hashtable for hashing the keys 12, 44, 13, 88, 23, 94, 11, 39, 20 using the function (2i+5) mod 11, closed hashing, linear probing Pseudo-code for listing all identifiers in a hashtable in lexicographic order, using open hashing, the hash function h(x) = first character of x. What is the running time?