Hash Tables:. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Slides:



Advertisements
Similar presentations
1 11. Hash Tables Heejin Park College of Information and Communications Hanyang University.
Advertisements

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. D.4 The Drosophila life cycle Slide number 1 Drosophila life.
Hash Tables Many of the slides are from Prof. Plaisteds resources at University of North Carolina at Chapel Hill.
Hash Tables CIS 606 Spring 2010.
Chapter 13 Control Structures. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display Control Structures Conditional.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter Four Trusses.
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well.
Hashing for dummies 1 For quick search, insert and delete of data from a table. 1 figures and concepts from MITOPENCOURSEWARE
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Lecture 10: Search Structures and Hashing
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 4 Image Slides.
Copyright © The McGraw-Hill Companies, Inc
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 2 Image Slides.
Chapter 8 Traffic-Analysis Techniques. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 8-1.
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Design and Analysis of Algorithms Hash Tables Haidong Xue Summer 2012, at GSU.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
17.16 Synthesis of Thyroid Hormone (TH) Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Slide number: 1.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Instructor Neelima Gupta Expected Running Times and Randomized Algorithms Instructor Neelima Gupta
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 7.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Chapter 13 Transportation Demand Analysis. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display
Introduction to Algorithms Second Edition by
Introduction to Algorithms Second Edition by
Chapter 10 Image Slides Title
PowerPoint Presentations
Copyright © The McGraw-Hill Companies, Inc
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Chapter 3 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Transparency A.
Introduction to Algorithms Second Edition by
Chapter 3 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter 8 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Chapter R A Review of Basic Concepts and Skills
Introduction to Algorithms Second Edition by
Copyright ©2014 The McGraw-Hill Companies, Inc
Chapter 7 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Introduction to Algorithms Second Edition by
Copyright © 2004 The McGraw-Hill Companies, Inc. All rights reserved.
Assignment Pages: 10 – 12 (Day 1) Questions over Assignment? # 1 – 4
Introduction to Algorithms Second Edition by
Chapter R.2 A Review of Basic Concepts and Skills
Chapter 12 Image Slides Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Introduction to Algorithms Second Edition by
Introduction to Algorithms Second Edition by
Chapter 5 Foundations in Microbiology Fourth Edition
Introduction to Algorithms Second Edition by
Introduction to Algorithms Second Edition by
Title Chapter 22 Image Slides
Chapter 3 Foundations in Microbiology Fourth Edition
Copyright © The McGraw-Hill Companies, Inc
Introduction to Algorithms Second Edition by
CHAPTER 6 SKELETAL SYSTEM
Introduction to Algorithms Second Edition by
Journey to the Cosmic Frontier
Journey to the Cosmic Frontier
Copyright © 2004 The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 24 Image Slides* *See PowerPoint Lecture Outline for a complete ready-made presentation integrating art and lecture notes Copyright © The.
Chapter 3 Introduction to Physical Design of Transportation Facilities.
Introduction to Algorithms Second Edition by
Presentation transcript:

Hash Tables:

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Analysis of hashing with chaining: Given a hash table with m slots and n keys, define load factor = n/m : average number of keys per slot. Assume each key is equally likely to be hashed into any slot: simple uniform hashing (SUH).

Thm: In a hash table in which collisions are resolved by chaining, an unsuccessful search takes expected time Θ(1+ ) under SUH. Proof: Under the assumption of SUH, any un-stored key is equally likely to hash to any of the m slots. The expected time to search unsuccessfully for a key k is the expected time to search to the end of list T[h(k)], which is exactly. Thus, the total time required is Θ(1+ ).

Thm: In a hash table in which collisions are resolved by chaining, a successful search takes time Θ(1+ ), on the average under SUH. Proof: Let the element being searched for equally likely to be any of the n elements stored in the table. The expected time to search successfully for a key. Elements before x in the list were inserted after x was inserted. We want to find the expected number of elements added to xs list after x was added to the list. x....

Let x i denote the ith element into the table, for i =1 to n, and let k i =key[x i ]. Define X ij = I{ h(k i )=h(k j ) }. Under SUH, we have Pr{ h(k i )=h(k j ) } = 1/m = E[X ij ].

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. The multiplication method: 0<A<1

Universal Hashing H={ h: U {0,…,m-1} }, which is a finite collection of hash functions. H is called universal if for each pair of distinct keys k, U, the number of hash functions h H for which h(k)=h( ) is at most |H|/m Define n i = the length of list T[i] Thm: suppose h is randomly selected from H, using chaining to resolve collisions. If k is not in the table, then E[n h(k) ] α. If k is in the table, then E[n h(k) ] 1+α

Proof: –For each pair k and of distinct keys, define X k =I{h(k)=h( )}. –By definition, Pr h {h(k)=h( )} 1/m, and so E[X k ] 1/m. –Define Y k to be the number of keys other than k that hash to the same slot as k, so that

– –If k T, then because k appears in T[h(k)] and the count Y k does not include k, we have n h(k) = Y k + 1 and

Designing a universal class of hash functions: p:prime For any and, define, h a,b :Z p Z m

Theorem: H p,m is universal. Pf: Let k, be two distinct keys in Z p. Given h a,b, Let r=(ak+b) mod p, and s=(a +b) mod p. Then r-s a(k- ) mod p

For any h a,b H p,m, distinct inputs k and map to distinct r and s modulo p. Each possible p(p-1) choices for the pair (a,b) with a0 yields a different resulting pair (r,s) with rs, since we can solve for a and b given r and s: a=((r-s)((k- ) -1 mod p)) mod p b=(r-ak) mod p

There are p(p-1) possible pairs (r,s) with r s, there is a 1-1 correspondence between pairs (a,b) with a0 and (r,s), rs.

For any given pair of inputs k and, if we pick (a,b) uniformly at random from, the resulting pair (r,s) is equally likely to be any pair of distinct values modulo p.

Pr[ k and collide]=Pr r,s [r s mod m ] Given r, the number of s such that s r and sr (mod m) is at most p/m-1((p+m-1)/m)-1 =(p-1)/m s, s+m, s+2m,…., p

Thus, Pr r,s [r s mod m ] ((p-1)/m)/(p-1) =1/m Therefore, for any pair of distinct k, Z p, Pr[h a,b (k)=h a,b ( )] 1/m, so that H p,m is universal.

Open addressing: –There is no list and no element stored outside the table. –Advantage: avoid pointers, potentially yield fewer collisions and faster retrieval.

– –For every k, the probe sequence is a permutation of. –Deletion from an open-address hash table is difficult. –Thus chaining is more common when keys must be deleted.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Linear probing: – ~ an ordinary hash function (auxiliary hash function). –.

Quadratic probing: –,where h is an auxiliary hash function, c 1 and c 2 0 and are constants.

Double hashing: –,where h 1 and h 2 are auxiliary hash functions. – probe sequences; Linear and Quadratic have probe sequences.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

Analysis of open-addressing hashing : load factor, with n elements and m slots.

Thm: Given an open-address hash table with load factor, the expected number of probes in an unsuccessful search is at most, assuming uniform hashing.

Pf: –Define the r.v. X to be the number of probes made in an unsuccessful search. –Define A i : the event there is an ith probe and it is to an occupied slot. –Event.

– –The prob. that there is a jth probe and it is to an occupied slot, given that the first j-1 probes were to occupied slots is (n-j+1)/(m-j+1). Why?

–n<m, (n-j)/(m-j) n/m for all 0 j<m. –

Cor: Inserting an element into an open- addressing hash table with load factor α requires at most 1/(1- α) probes on average, assuming uniform hashing.

Thm: Given an open-address hash table with load factor α<1, the expected number of probes in a successful search is at most, assuming uniform hashing and that each key in the table is equally likely to be searched for.

Pf: Suppose we search for a key k. If k was the (i+1)st key inserted into the hash table, the expected number of probes made in a search for k is at most 1/(1-i/m)=m/(m-i).

Averaging over all n keys in the hash table gives us the average number of probes in a successful search:

Perfect Hashing:

Perfect hashing : good for when the keys are static; i.e., once stored, the keys never change, e.g. CD-ROM, the set of reserved word in programming language. Thm : If we store n keys in a hash table of size m=n 2 using a hash function h randomly chosen from a universal class of hash functions, then the probability of there being any collisions < ½.

Proof: Let h be chosen from an universal family. Then each pair collides with probability 1/m, and there are pairs of keys. Let X be a r.v. that counts the number of collisions. When m=n 2,

Thm: If we store n keys in a hash table of size m=n using a hash function h randomly chosen from universal class of hash functions, then, where n j is the number of keys hashing to slot j.

Pf: –It is clear for any nonnegative integer a, –

total number of collisions

Cor: If store n keys in a hash table of size m=n using a hash function h randomly chosen from a universal class of hash functions and we set the size of each secondary hash table to m j =n j 2 for j=0,…,m- 1, then the expected amount of storage required for all secondary hash tables in a perfect hashing scheme is < 2n.

Cor: Same as the above, Pr{total storage 4n} < 1/2 Pf: –By Markovs inequality, Pr{ X t } E[X]/t. –