Introduction to Algorithms 6.046J/18.401J

Slides:



Advertisements
Similar presentations
October 17, 2005 Copyright© Erik D. Demaine and Charles E. Leiserson L2.1 Introduction to Algorithms 6.046J/18.401J LECTURE9 Randomly built binary.
Advertisements

©2001 by Charles E. Leiserson Introduction to AlgorithmsDay 9 L6.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine.
ONE WAY FUNCTIONS SECURITY PROTOCOLS CLASS PRESENTATION.
Theory I Algorithm Design and Analysis (5 Hashing) Prof. Th. Ottmann.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
Data Structures – LECTURE 11 Hash tables
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Dictionaries and Hash Tables1  
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Universal Hashing When attempting to foil an malicious adversary, randomize the algorithm Universal hashing: pick a hash function randomly when the algorithm.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Lecture 10: Search Structures and Hashing
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Spring 2015 Lecture 6: Hash Tables
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Great Theoretical Ideas in Computer Science.
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Great Theoretical Ideas in Computer Science.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
October 4, 2016Theory of Computation Lecture 9: A Universal Program I 1Minimalization Example 15: R(x, y) R(x, y) is the remainder when x is divided by.
Many slides here are based on E. Demaine , D. Luebke slides
Theory of Computation Lecture 10: A Universal Program I
Introduction to Algorithms Prof. Charles E. Leiserson
Hash table CSC317 We have elements with key and satellite data
Great Theoretical Ideas in Computer Science
Hashing CSE 2011 Winter July 2018.
Data Structures and Algorithms
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Great Theoretical Ideas In Computer Science
Dynamic Order Statistics
Randomized Algorithms
EEE2108: Programming for Engineers Chapter 8. Hashing
Hashing Alexandra Stefan.
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Hash Functions/Review
Hash Table.
Instructor: Lilian de Greef Quarter: Summer 2017
Randomized Algorithms CS648
Dictionaries and Hash Tables
Randomized Algorithms
Introduction to Algorithms 6.046J/18.401J
Hash Tables – 2 Comp 122, Spring 2004.
Great Theoretical Ideas in Computer Science
Dictionaries and Hash Tables
Hashing Alexandra Stefan.
Copyright © Cengage Learning. All rights reserved.
Introduction to Algorithms
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
Pseudorandom number, Universal Hashing, Chaining and Linear-Probing
CS 5243: Algorithms Hash Tables.
Diving Deeper into Chaining and Linear Probing
Compact routing schemes with improved stretch
CS 3343: Analysis of Algorithms
Hashing Course: Data Structures Lecturer: Uri Zwick March 2008
Slide Sources: CLRS “Intro. To Algorithms” book website
DATA STRUCTURES-COLLISION TECHNIQUES
Hash Tables – 2 1.
Lecture-Hashing.
Presentation transcript:

Introduction to Algorithms 6.046J/18.401J Introduction to Algorithms, Lecture 8 October 3, 2001 Introduction to Algorithms 6.046J/18.401J LECTURE 8 Hashing II Universal hashing Universality theorem Constructing a set of universal hash functions Perfect hashing Prof. Charles E. Leiserson October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 A weakness of hashing Problem: For any hash function h, a set of keys exists that can cause the average access time of a hash table to skyrocket. An adversary can pick all keys from {k  U : h(k) = i} for some slot i. IDEA: Choose the hash function at random, independently of the keys. Even if an adversary can see your code, he or she cannot find a bad set of keys, since he or she doesn’t know exactly which hash function will be chosen. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Universal hashing Definition. Let U be a universe of keys, and let H be a finite collection of hash functions, each mapping U to {0, 1, …, m–1}. We say H is universal if for all x, y  U, where x  y, we have |{h  H : h(x) = h(y)}| ≤ |H| / m. That is, the chance of a collision between x and y is ≤ 1/m if we choose h randomly from H. H {h : h(x) = h(y)} |H| m October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Universality is good Theorem. Let h be a hash function chosen (uniformly) at random from a universal set H of hash functions. Suppose h is used to hash n arbitrary keys into the m slots of a table T. Then, for a given key x, we have E[#collisions with x] < n/m. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Proof of theorem Proof. Let Cx be the random variable denoting the total number of collisions of keys in T with x, and let cxy = 1 if h(x) = h(y), 0 otherwise. Note: E[cxy] = 1/m and . October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Proof (continued) Take expectation of both sides. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Proof (continued) Take expectation of both sides. Linearity of expectation. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Proof (continued) Take expectation of both sides. Linearity of expectation. E[cxy] = 1/m. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Proof (continued) Take expectation of both sides. Linearity of expectation. E[cxy] = 1/m. . Algebra. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Constructing a set of universal hash functions Introduction to Algorithms, Lecture 8 October 3, 2001 Constructing a set of universal hash functions Let m be prime. Decompose key k into r + 1 digits, each with value in the set {0, 1, …, m–1}. That is, let k = k0, k1, …, kr, where 0  ki < m. Randomized strategy: Pick a = a0, a1, …, arwhere each ai is chosen randomly from {0, 1, …, m–1}. Dot product, modulo m Define . REMEMBER THIS! How big is H = {ha}? |H| = mr + 1. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Universality of dot-product hash functions Introduction to Algorithms, Lecture 8 October 3, 2001 Universality of dot-product hash functions Theorem. The set H = {ha} is universal. Proof. Suppose that x = x0, x1, …, xr and y = y0, y1, …, yrbe distinct keys. Thus, they differ in at least one digit position, wlog position 0. For how many ha  H do x and y collide? . We must have ha(x) = ha(y), which implies that October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Proof (continued) Equivalently, we have or , which implies that . October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Fact from number theory Introduction to Algorithms, Lecture 8 October 3, 2001 Fact from number theory Theorem. Let m be prime. For any z  Zm such that z  0, there exists a unique z–1  Zm such that z · z–1  1 (mod m). Example: m = 7. z z–1 1 2 3 4 5 6 1 4 5 2 3 6 October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Back to the proof We have , and since x0  y0 , an inverse (x0 – y0 )–1 must exist, which implies that . Thus, for any choices of a1, a2, …, ar, exactly one choice of a0 causes x and y to collide. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Proof (completed) Q. How many ha’s cause x and y to collide? A. There are m choices for each of a1, a2, …, ar , but once these are chosen, exactly one choice for a0 causes x and y to collide, namely . Thus, the number of ha’s that cause x and y to collide is mr · 1 = mr = |H|/m. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Perfect hashing Given a set of n keys, construct a static hash table of size m = O(n) such that SEARCH takes (1) time in the worst case. IDEA: Two- level scheme with universal hashing at both levels. No collisions at level 2! 40 37 22 1 2 3 4 5 6 26 m a 7 8 14 27 S4 S6 S1 31 00 9 86 T h31(14) = h31(27) = 1 October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Collisions at level 2 Theorem. Let H be a class of universal hash functions for a table of size m = n2. Then, if we use a random h  H to hash n keys into the table, the expected number of collisions is at most 1/2. Proof. By the definition of universality, the probability that 2 given keys in the table collide under h is 1/m = 1/n2. Since there are pairs of keys that can possibly collide, the expected number of collisions is . October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 No collisions at level 2 Corollary. The probability of no collisions is at least 1/2. Proof. Markov’s inequality says that for any nonnegative random variable X, we have Pr{X  t}  E[X]/t. Applying this inequality with t = 1, we find that the probability of 1 or more collisions is at most 1/2. Thus, just by testing random hash functions in H, we’ll quickly find one that works. October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson

Introduction to Algorithms, Lecture 8 October 3, 2001 Analysis of storage For the level-1 hash table T, choose m = n, and let ni be random variable for the number of keys that hash to slot i in T. By using ni2 slots for the level-2 hash table Si, the expected total storage required for the two-level scheme is therefore , since the analysis is identical to the analysis from recitation of the expected running time of bucket sort. (For a probability bound, apply Markov.) October 5, 2005 Copyright © 2001-5 by Erik D. Demaine and Charles E. Leiserson © 2001 by Charles E. Leiserson