Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]

Slides:



Advertisements
Similar presentations
Hash Tables.
Advertisements

Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
Hashing as a Dictionary Implementation
Log Files. O(n) Data Structure Exercises 16.1.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Techniques.
Hashing CS 3358 Data Structures.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
1 CSE 326: Data Structures Hash Tables Autumn 2007 Lecture 14.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Hash Tables1 Part E Hash Tables  
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
1. 2 Problem RT&T is a large phone company, and they want to provide enhanced caller ID capability: –given a phone number, return the caller’s name –phone.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Data Structures and Algorithm Analysis Hashing Lecturer: Jing Liu Homepage:
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Hash Tables1   © 2010 Goodrich, Tamassia.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 5. Abstract Data Structures & Algorithms 5.2 Static Data Structures.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Hashing Hashing is another method for sorting and searching data.
© 2004 Goodrich, Tamassia Hash Tables1  
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Chapter 5: Hashing Part I - Hash Tables. Hashing  What is Hashing?  Direct Access Tables  Hash Tables 2.
Chapter 10 Hashing. The search time of each algorithm depend on the number n of elements of the collection S of the data. A searching technique called.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Copyright © Curt Hill Hashing A quick lookup strategy.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
Hash Tables Ellen Walker CPSC 201 Data Structures Hiram College.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Data Structures Using C++ 2E
Hashing CSE 2011 Winter July 2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Data Structures Using C++ 2E
Hash functions Open addressing
Hash Table.
Introduction to Algorithms 6.046J/18.401J
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
EE 312 Software Design and Implementation I
Presentation transcript:

hashing1 Hashing

hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]

hashing3 Problem: usually, the number of possible keys is far larger than the number of keys actually stored, or even than available memory. (E.g., strings.) Idea of hashing: use a function h to map keys into a smaller set of indices, say the integers 0..m. This function is called a hash function. E.g. h(k) = position of k’s first letter in the alphabet.

hashing4 T: Problem: Collisions. They are inevitable if there are more possible key values than table slots. Thomas Andy Cindy Tony

hashing5 Two questions: 1. How can we choose the hash function to minimize collisions? 2. What do we do about collisions when they occur?

hashing6 Running times for hashing (  assumed): So hashing is useful when worst-case guarantees and ordering are not required. OperationAverage CaseWorst Case INSERT1n DELETE1n SEARCH1n MINIMUMnn MAXIMUMnn SUCCESSORnn PREDECESSORnn

hashing7 Real-World Facts (shhh!) Hashing is vastly more prevalent than trees for in-memory storage. Examples: –UNIX shell command cache –“arrays” in Icon, Awk, Tcl, Perl, etc. –compiler symbol tables –Filenames on CD-ROM

hashing8 Example: Scripting Language

hashing9 Resolving Collisions Let’s assume for now that our hash function is OK, and deal with the collision resolution problem. Two groups of solutions: 1. Store the colliding key in the hash-table array. (“Closed hashing”) 2. Store it somewhere else. (“Open hashing”) (Note: CLRS calls #1 “open addressing.”) Let’s look at #2 first.

hashing10 Open Hashing: Collision Resolution by Chaining Put all the keys that hash to the same index onto a linked list. Each T[i] called a bucket or slot. T: 1 Andy 2 3 Cindy 20 Thomas Tony

hashing11 Code for a Chained Hash Table

hashing12 Chained Hash Table (Continued)

hashing13 Analysis of Hashing with Chaining

hashing14 Analysis of Hashing with Chaining (continued) Average Case: Assume h(k) is equally likely to be any slot, regardless of other keys’ hash values. This assumption is called simple uniform hashing. (By the way, we also assume throughout that h takes constant time to compute.)

hashing15

hashing16 Average time for successful search: Assume that INSERT puts new keys at the end of the list. ( The result is the same regardless of where INSERT puts the key.) Then the work we do to find a key is the same as what we did to insert it. And that is the same as successful search. Let’s add up the total time to search for all the keys in the table. (Then we’ll divide by n, the number of keys, to get the average.) We’ll go through the keys in the order they were inserted.

hashing17

hashing18

hashing19 Growing To grow: Whenever   some threshold (e.g. 3/4), double the number of slots. Requires rehashing everything—but by the same analysis we did for growing arrays, the amortized time for INSERT will remain  (1), average case.

hashing20 Collision Resolution, Idea #2 Store colliders in the hash table array itself: T: 1 Andy 2 3 Cindy 20 Tony Tony 21 Thomas Insert Thomas (“Closed hashing” or “Open addressing”)

hashing21 Collision Resolution, Idea #2 Advantage: –No extra storage for lists Disadvantages: –Harder to program –Harder to analyze –Table can overflow –Performance is worse

hashing22 When there is a collision, where should the new item go? Many answers. In general, think of the hash function as having two arguments: the key and a probe number saying how many times we’ve tried to find a place for the items. (Code for INSERT and SEARCH is in CLRS, p.238.)

hashing23 Probing Methods Linear probing: if a slot is occupied, just go to the next slot in the table. (Wrap around at the end.) # of slots in table key probe # our original hash function

hashing24 Closed Hashing Algorithms

hashing25

hashing26 Example of Linear Probing a m=5 3 b 4 c h(k,i) = (h’(k)+i) mod m INSERT(d). h’(d) =3 i h’(d,i) Put d in slot 0 Problem: long runs of items tend to build up, slowing down the subsequent operations. (primary clustering)

hashing27 Quadratic Probing two constants, fixed at “compile-time” Better than linear probing, but still leads to clustering, because keys with the same value for h’ have the same probe sequence.

hashing28 Double Hashing Use one hash function to start, and a second to pick the probe sequence: must be relatively prime in m in order to sweep out all slots. E.g. pick m a power of 2 and make always odd.

hashing29

hashing30

hashing31

hashing32

hashing33 Choosing a Good Hash Function It should run quickly, and “hash” the keys up—each key should be equally likely to fit in any slot. General rules: –Exploit known facts about the keys –Try to use all bits of the key

hashing34 Choosing A Good Hash Function (Continued) Although most commonly strings are being hashed, we’ll assume k is an integer. Can always interpret strings (byte sequences) as numbers in base 256:

hashing35 The division method: (m is still the # of slots) Very simple— but m must be chosen carefully. –E.g. if you’re hashing decimal integers, then m= a power of ten means you’re just taking the low-order digits. –If you’re hashing strings, then m = 256 means the last character. best to choose m to be a prime far from a power of 2

hashing36

hashing37 Hash Functions in Practice Almost all hashing is done on strings. Typically, one computes byte-by-byte on the string to get a non-negative integer, then takes it mod m. E.g. (sum of all the bytes) mod m. Problem: anagrams hash to the same value. Other ideas: xor, etc. Hash function in Microsoft Visual C++ class library: