Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Table.
Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CSCE 3400 Data Structures & Algorithm Analysis
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Data Structures – LECTURE 11 Hash tables
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
E.G.M. PetrakisHashing1  Data organization in main memory or disk  sequential, binary trees, …  The location of a key depends on other keys => unnecessary.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
CS 206 Introduction to Computer Science II 11 / 17 / 2008 Instructor: Michael Eckmann.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hashing for dummies 1 For quick search, insert and delete of data from a table. 1 figures and concepts from MITOPENCOURSEWARE
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 206 Introduction to Computer Science II 04 / 06 / 2009 Instructor: Michael Eckmann.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing1 Hashing. hashing2 Observation: We can store a set very easily if we can use its keys as array indices: A: e.g. SEARCH(A,k) return A[k]
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
Chapter 12 Hash Table. ● So far, the best worst-case time for searching is O(log n). ● Hash tables  average search time of O(1).  worst case search.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Hash table CSC317 We have elements with key and satellite data
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Introduction to Algorithms 6.046J/18.401J
Hashing Alexandra Stefan.
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Introduction to Algorithms 6.046J/18.401J
Hash Tables – 2 Comp 122, Spring 2004.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CS202 - Fundamental Structures of Computer Science II
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 5243: Algorithms Hash Tables.
CS 3343: Analysis of Algorithms
Hash Tables – 2 1.
Presentation transcript:

Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing Prof. Charles E. Leiserson October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.1

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.2 Symbol-table problem record key Operations on S: INSERT (S, x) DELETE (S, x) SEARCH (S, k) How should the data structure S be organized? Other fields containing satellite data Symbol table S holding n records:

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.3 Direct-access table I DEA : Suppose that the keys are drawn from the set U ⊆ {0, 1, …, m–1}, and keys are distinct. Set up an array T[0..m–1]: Then, operations take Θ(1) time. Problem: The range of keys can be large: ‧ 64-bit numbers (which represent 18,446,744,073,709,551,616different keys), ‧ character strings (even larger!). if and otherwise.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.4 Hash functions universe U of all keys into {0, 1, …, m–1} : Solution: Use a hash function h to map the When a record to be inserted maps to an already occupied slot in T, a collision occurs.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.5 Resolving collisions by chaining ‧ Link records in the same slot into a list. Worst case: Every key hashes to the same slot. Access time = Θ(n) if |S| =n

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.6 Average-case analysis of chaining We make the assumption of simple uniform hashing: Each key k ∈ S is equally likely to be hashed to any slot of table T, independent of where other keys are hashed. Let n be the number of keys in the table, and let m be the number of slots. Define the load factor of T to be α= n/m =average number of keys per slot.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.7 Search cost The expected time for an unsuccessful search for a record with a given key is = Θ(1 + α).

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.8 Search cost apply hash function and access slot The expected time for an unsuccessful search for a record with a given key is = Θ(1 + α). search the list

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.9 Search cost The expected time for an unsuccessful search for a record with a given key is = Θ(1 + α). apply hash function and access slot search the list Expected search time = Θ(1) if α= O(1), or equivalently, if n = O(m).

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.10 Search cost The expected time for an unsuccessful search for a record with a given key is = Θ(1 + α). apply hash function and access slot search the list Expected search time = Θ(1) if α= O(1), or equivalently, if n = O(m). A successful search has same asymptotic bound, but a rigorous argument is a little more complicated. (See textbook.)

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.11 Choosing a hash function The assumption of simple uniform hashing is hard to guarantee, but several common techniques tend to work well in practice as long as their deficiencies can be avoided. Desirata: A good hash function should distribute the keys uniformly into the slots of the table. Regularity in the key distribution should not affect this uniformity.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.12 Division method Assume all keys are integers, and define h(k) = k mod m. Deficiency: Don’t pick an m that has a small divisor d. A preponderance of keys that are congruent modulo d can adversely affect uniformity. Extreme deficiency: If m= 2 r, then the hash doesn’t even depend on all the bits of k: If and then

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.13 Division method (continued) h(k) = k mod m. Pick m to be a prime not too close to a power of 2 or 10 and not otherwise used prominently in the computing environment. Annoyance: Sometimes, making the table size a prime is inconvenient. But, this method is popular, although the next method we’ll see is usually superior.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.14 Multiplication method Assume that all keys are integers, m= 2 r, and our computer has w-bit words. Define h(k) = (A·k mod 2 w ) rsh (w–r), where rsh is the “bitwise right-shift” operator and A is an odd integer in the range 2 w–1 < A< 2 w. Don’t pick A too close to 2 w–1 or 2 w. Multiplication modulo 2 w is fast compared to division. The rsh operator is fast.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.15 Multiplication method example has w= 7 -bit words: Modular wheel Suppose that m= 8 = 23 and that our computer

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.16 Resolving collisions by open addressing No storage is used outside of the hash table itself. Insertion systematically probes the table until an empty slot is found. The hash function depends on both the key and probe number: h: U×{0, 1, …, m–1} →{0, 1, …, m–1}. The probe sequence 〈 h(k,0), h(k,1), …, h(k,m–1) 〉 should be a permutation of {0, 1, …, m–1}. The table may fill up, and deletion is difficult (but not impossible).

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.17 Example of open addressing Insert key k= 496: 0.Probe h(496,0) collision Insert key k= 496:

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.18 Example of open addressing 0. Probe h(496,0) 1. Probe h(496,1) Insert key k= 496: collision

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.19 Example of open addressing insertion Insert key k= 496: 0. Probe h(496,0) 1. Probe h(496,1) 2. Probe h(496,2)

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.20 Example of open addressing Search uses the same probe sequence, terminating suc- cessfully if it finds the key and unsuccessfully if it encounters an empty slot. Search for key k= 496: 0. Probe h(496,0) 1. Probe h(496,1) 2. Probe h(496,2)

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.21 Probing strategies Linear probing: Given an ordinary hash function h’(k), linear probing uses the hash function h(k,i) = (h’(k) +i) mod m. This method, though simple, suffers from primary clustering, where long runs of occupied slots build up, increasing the average search time. Moreover, the long runs of occupied slots tend to get longer.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.22 Probing strategies Double hashing: Given two ordinary hash functions h 1 (k)and h 2 (k), double hashing uses the hash function h(k,i) = (h 1 (k) +i . h 2 (k)) mod m. This method generally produces excellent results, but h 2 (k) must be relatively prime to m. One way is to make m a power of 2 and design h 2 (k) to produce only odd numbers.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.23 Analysis of open addressing We make the assumption of uniform hashing: Each key is equally likely to have any one of the m! permutations as its probe sequence. Theorem. Given an open-addressed hash table with load factor α= n/m< 1, the expected number of probes in an unsuccessful search is at most 1/(1–α).

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.24 Proof of the theorem Proof. At least one probe is always necessary. With probability n/m, the first probe hits an occupied slot, and a second probe is necessary. With probability (n–1)/(m–1), the second probe hits an occupied slot, and a third probe is necessary. With probability (n–2)/(m–2), the third probe hits an occupied slot, etc. Observe that for

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.25 Proof (continued) Therefore, the expected number of probes is The textbook has a more rigorous proof and an analysis of successful searches.

October 3, 2005 Copyright © by Erik D. Demaine and Charles E. Leiserson L7.26 Implications of the theorem If α is constant, then accessing an open- addressed hash table takes constant time. If the table is half full, then the expected number of probes is 1/(1–0.5) = 2. If the table is 90%full, then the expected number of probes is 1/(1–0.9) = 10.