Hashing Amihood Amir Bar Ilan University 2014. Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C <- A+B.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Data Structures – LECTURE 11 Hash tables
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hash Tables1 Part E Hash Tables  
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Data Structures Hashing Uri Zwick January 2014.
Hashtables David Kauchak cs302 Spring Administrative Talk today at lunch Midterm must take it by Friday at 6pm No assignment over the break.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Algorithm Course Dr. Aref Rashad February Algorithms Course..... Dr. Aref Rashad Part: 4 Search Algorithms.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
1 HashTable. 2 Dictionary A collection of data that is accessed by “key” values –The keys may be ordered or unordered –Multiple key values may/may-not.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
1 HASHING Course teacher: Moona Kanwal. 2 Hashing Mathematical concept –To define any number as set of numbers in given interval –To cut down part of.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Hashing Hashing is another method for sorting and searching data.
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Hashtables David Kauchak cs302 Spring Administrative Midterm must take it by Friday at 6pm No assignment over the break.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA Course: Data Structures Lecturer: Haim Kaplan and Uri Zwick.
Hashing Goal Perform inserts, deletes, and finds in constant average time Topics Hash table, hash function, collisions Collision handling Separate chaining.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Hash table CSC317 We have elements with key and satellite data
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hashing and Hash Tables
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Introduction to Algorithms 6.046J/18.401J
Resolving collisions: Open addressing
Hash Tables – 2 Comp 122, Spring 2004.
Hashing Alexandra Stefan.
Introduction to Algorithms
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
CS 3343: Analysis of Algorithms
Hash Tables – 2 1.
Lecture-Hashing.
Presentation transcript:

Hashing Amihood Amir Bar Ilan University 2014

Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C <- A+B

Direct Addressing Compiler keeps track of: C <- A+B A in location 1 B in location 2 C in location 3 How? Tables? Linked lists? Skip lists? AVL trees? B-trees?

Encode? Consider ASCII code of alphabet: (American Standard Code for Information Interchange)

Encode In ASCII code: A to Z are to

Encode In ASCII code: A to Z are to In decimal: 65 to 90. So if we subtract 64 from the ASCII We will get locations 1 to 26.

Map In general: Consider a function h: U {0,…,m}. Where U is the universe of keys, m is number of keys. When we want to access the record of key kєU just go to location h(k) and it will point to the location of the record.

Problems: 1.What happens if keys are not consecutive? E.g. parameters: pointer, variable, idnumber 2.What happens if not all keys are used? E.g. ID numbers of students: , , What happens if h(k) is not bijective?

Hash Tables: If U is larger than m, h(k) can not be bijective, thus we may encounter collisions: k 1 ≠k 2 where h(k 1 )= h(k 2 ) What do we do then? Solution 1: chaining.

Chaining Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 k4k4 —— k5k5 k2k2 k3k3 k8k8 k6k6 k7k7

Hash operations: CHAINED-HASH-INSERT(T,x) Insert x at end of list T[h(key[x])] CHAINED-HASH-SEARCH(T,k) Search for an element with key k in list T[h(k)] CHAINED-HASH-DELETE(T,x) Delete x from list T[h(key[x])]

Time: Depends on chains. We assume: simple uniform hashing. h hashes into any slot with equal uniform likelihood.

Load Factor: We will analyze average search time. This depends on how large is the hash table relative to the number of keys. Let n= number of keys, m= hash table size n/m is the load factor. Write n/m = α.

Average Complexity of Search Notation: length of list T[j] is n j.Σ n j = n. Average value of n j = E(n j ) = n/m = α Assumptions: h(k) computed in constant time. h(k) simple uniform hash function.

Searching for new key Compute h(k) in time O(1). Get to list T[h(k)] (uniformly distributed) whose length is n j. Need to search list T[h(k)]. Time for search: n j. E(n j ) = α. Conclude: Average time to search for new key is Θ(1+α). What is worst case time?

Searching for Existing Key Do we gain anything by searching for a key that is already in list? Note: Expected number of elements in successful search for k = Expected length of list T[h(k)] when k was inserted + 1. (since we insert elements at end)

Searching for Existing Key Let k 1, …, k n be the keys in order of their insertion. When k i is inserted, the expected list length is (i-1)/m. So the expected length of a successful search is the average of all such lists:

Calculate: = = =

Conclude: Average time for searching in a hash table with chaining is Θ(1+α) Note: Generally m is chosen as O(n) Thus the average operation on a hash table with chaining takes constant time.

Choosing a Hash Function Assume keys are natural numbers: {0, 1, 2, … } A natural mapping of natural numbers to {0, 1, …, m} is by dividing by m+1 and taking the remainder, i.e. h(k) = k mod (m+1)

The Division Method What are good m’s? In ASCII example: A to Z are to In decimal: 65 to 90. We subtracted 64 from the ASCII and got locations 1 to 26. If we choose m=32 we achieve the same result.

What does it Mean? 32 = 2 5 Dividing by 32 and taking the remainder means taking the 5 LSBs in the binary representation. A is = 1 Z is = 26 Is this good? – Here, yes. In general?

Want Even Distribution Wewant even distribution of the bits. If m=2 b then only the b least significant bits participate in the hashing. We would like all bits to participate. Solution: Use for m a prime number not too close to a power of 2. Always good idea: Check distribution of real application data.

The Multiplication Method Choose: 0 < σ < 1 h(k) = Meaning: 1. multiply key by σ and take fraction. 2. then multiply by m and take floor.

Multiplication Method Example Choose: σ = m=32 (Knuth recommends (√5 -1)/2 ) k=2391 h(k) = 2391 x = = x 32 = So h(2391)=20.

Why does this involve all bits? Assume: k uses w bits, which is a computer word. Word operations take constant time. Assume: m = 2 p. Easy Implementation of h(k):

Implementing h(k) Easy Implementation of h(k): X k h(k) p bits w bits σ2wσ2w Getting rid of this Means dividing by 2 w.

Worst Case Problem: Malicious adversary can make sure all keys hash to same entry. Usual solution: randomize. But then how do we get to correct hash table entry upon search?

Universal Hashing We want to make sure that: Not every time we hash keys k 1, k 2, …, k n They hash to same table entry. How can we make sure of that? Use a number of different hash functions and employ them at random.

Universal Hashing A collection H of hash functions from universe U to {0,…,m} is universal if for every pair of keys x,yєU, where x≠y the number of hash functions hє H for which h(x)=h(y) is |H|/m. This means: If we choose a function hє H at random the chance of collision between x and y where x≠y is 1/m.

Constructing a Universal Collection of Hash Functions Choose m to be prime. Decompose a key k as follows: k = [k 0, k 1, …, k r ] r+1 pieces log m bits value < m

Universal Hashing Construction Choose randomly m r+1 sequences of length r+1 of elements from {0,…,m-1}. Each sequence is a=[a 0, …, a r ], a i є {0,…,m-1}. Each sequence defines a hash function The universal class is: H = U{h a } a

This is a Universal Class Need to show that the probability that x≠y collide is 1/m. Because x≠y then there is some i for which x i ≠y i. Wlog assume i=0. Since m is prime then for any fixed [a 1,…,a r ] there is exactly one value that a 0 can get that satisfies h(x)=h(y). Why?

Proof Continuation… h(x)-h(y)=0 means: But x 0 -y 0 is non-zero. For prime m there is a unique multiplicative inverse modulo m. Thus there is only one value between 0 and m-1 that a 0 can get.

End of Proof Conclude: Each pair x≠y may collide once for each [a 1,…,a r ]. But there are m r possible values for [a 1,…,a r ]. However: Since there are m r+1 different hash functions, the keys x and y collide with probability:

Open Addressing Solution 2 to the collision problem. We don’t want to use chaining. (save space and complexity of pointers.) Where should we put the collision? Inside hash table, in an empty slot. Problem: Which empty slot? -- We assume a probing hashing function.

Idea behind Probing Hashing function of the form: h : U x {0,…,m} {0,…,m} The initial hashing function: h(k,0). h(k,i) for i=1,…,m, gives subsequent values. We try locations h(k,0), h(k,1), etc. until an open slot is found in the hash table. If none found we report an overflow error.

Average Complexity of Probing Assume uniform hashing – i.e. A probe sequence for key k is equally likely to be any permutation of {0,…,m}. What is the expected number of probes with load factor α? Note: since this is open address hashing, α = n/m <1

Probing Complexity – Case I Case I : key not in hash table. Key not in table every probe, except for last, is to an occupied slot. Let p i =Pr{exactly i probes access occupied slot} Then The expected number of probes is

Probing Complexity – Case I Note: For i>n, p i =0. Claim:, Where q i =Pr{at least i probes access occupied slots} Why?

Probing Complexity – Case I q2q2 q3q3 q4q4

What is: q i ? q 1 = n/m there are n elements in table, the probability of having something in the first slot accessed is therefore n/m q 2 = n/m ((n-1)/(m-1))

Probing Complexity – Case I Conclude: The expected complexity of insering an element into the hash table is: Example: If the hash table is half full, the expected number of probes is 2. If it is 90% full, the expected number is 10.

Probing Complexity – Case II Case II : key in hash table. As in the chaining case. Expected search time for key in table = Expected insertion time. We have shown: Expected time to insert key i+1 is:

Probing Complexity – Case II Conclude: Expected key insertion time: Need to compute

Probing Complexity – Case II In General: What is ?

Probing Complexity – Case II Consider any monotonic function f(x)>0. Approximate By the area under the function.

Probing Complexity – Case II Consider any monotonic function f(x)>0. a-1abb+1 f(x)

Probing Complexity – Case II Conclude: Approximate By

Probing Complexity – Case II Conclude: Approximate By

Probing Complexity – Case II Conclude: Expected key insertion time: Example: If the hash table is half full, the expected number of probes is 2. If it is 90% full, the expected number is 2.6.

Types of Probing What Functions can be used for probing? 1.Linear 2.Quadratic 3.Double Hashing

Linear Probing j <- h(k) If T[j] occupied, then repeat until finding an empty slot: j <- j+1 End T[j] <- k Attention: 1.Wrap around when end of table reached. 2.If table full then overflow error.

Linear Probing - Discussion Pro: Easy to implement. Con: Clustering. an empty slot following a cluster of length i has probability (i+1)/m to be filled.

Idea behind Probing Hashing function of the form: h : U x {0,…,m} {0,…,m} The initial hashing function, h’(k), gives First value, but if this one is taken, we try h(k,i) for i=0,…,m. In linear probing we have: h(k,i) = (h’(k)+i) mod (m+1)

For Uniformity: There are m! different paths of length m, we should be able to generate them all. Linear probing: Generates m paths, so clearly not uniform. The more different paths generated, the better.

Quadratic Probing h(k,i) = (h’(k) + c 1 i + c 2 i 2 ) mod m Works better but: 1.Still only m different paths. 2.Note that if h’(x)=h’(y) then h(x,i)=h(y,i) for all i. This causes what is called secondary clustering.

Double Hashing h(k,i)=(h 1 (k)+ih 2 (k))mod m To get permutation: h 2 (k) must be relative prime to m. Possibility: h 1 (k)=k mod m h 2 (k)=1+(k mod m’) Either m power of 2 and m’ odd or m and m’ both prime with distance 2 between them.

Double Hashing Example h(k,i)=(h 1 (k)+ih 2 (k))mod m h 1 (k)=k mod 13 h 2 (k)=1+(k mod 11 ) Probing Seq. of 14: 14 mod 13 = 1 (1+4) mod 13 = 5 (1+8) mod 13 = 9. Probing Seq. of mod 13 = 1 (1+6) mod 13 = 7 (1+12) mod 13 = 0.

Double Hashing Example Advantage: m 2 different sequences generated.

Deletions Problem: If an element is deleted, we may think a key is not in the table! Solution: When an element is deleted, mark it with flag. Meaning: It can cause long searches if many deletions. Therefore: In very dynamic setting use chaining.