Download presentation
Presentation is loading. Please wait.
1
Randomized Algorithms CS648
Lecture 11 Hashing - I
2
โDoes ๐โ ๐บ ?โ for any given ๐โ๐ผ.
Problem Definition ๐ผ= 1,2,โฆ,๐ called universe ๐บโ๐ผ and ๐ =|๐บ| ๐ โช ๐ Examples: ๐= , ๐ = 10 3 Aim Maintain a data structure for storing ๐บ to support the search query : โDoes ๐โ ๐บ ?โ for any given ๐โ๐ผ.
3
Solutions Solutions with worst case guarantees Alternative:
Solution for static ๐บ : Array storing ๐บ in sorted order Solution for dynamic ๐บ : Height Balanced Search trees (AVL trees, Red-Black trees,โฆ) Time per operation: O(log ๐ ), Space: O(๐ ) Alternative: Time per operation: O(1), Space: O(๐) Solutions used in practice with no worst case guarantees Hashing.
4
How many bits needed to encode ๐ ?
Hashing Hash table: ๐ป: an array of size ๐. Hash function ๐ : ๐ผ๏ [๐] Answering a Query: โDoes ๐โ ๐บ ?โ ๐๏๐(๐); Search the list stored at ๐ป[๐]. Properties of ๐ : ๐ ๐ computable in O(1) time. Space required by ๐: O(1). Elements of ๐บ ๐ป โฎ 1 ๐โ๐ How many bits needed to encode ๐ ?
5
Collision Definition: Two elements ๐,๐โ๐ผ are said to collide under hash function ๐ if ๐ ๐ =๐ ๐ Worst case time complexity of searching an item ๐ : No. of elements in ๐บ colliding with ๐. A Discouraging fact: No hash function can be found which is good for all ๐บ. Proof: At least ๐/๐ elements from ๐ผ are mapped to a single index in ๐ป. โฎ 1 ๐โ๐ ๐ป
6
Collision Definition: Two elements ๐,๐โ๐ผ are said to collide under hash function ๐ if ๐ ๐ =๐ ๐ Worst case time complexity of searching an item ๐ : No. of elements in ๐บ colliding with ๐. A Discouraging fact: No hash function can be found which is good for all ๐บ. Proof: At least ๐/๐ elements from ๐ผ are mapped to a single index in ๐ป. โฎ 1 ๐โ๐ ๐ป โฏ ๐/๐
7
The following result gave an answer in affirmative๏
Hashing A very popular heuristic since 1950โs Achieves O(1) search time in practice Worst case guarantee on search time: O(๐) Question: Can we have a hashing ensuring O(1) worst case guarantee on search time. O(๐) space. Expected O(๐) preprocessing time. The following result gave an answer in affirmative๏ Michael Fredman, Janos Komlos, Endre Szemeredy. Storing a Sparse Table with O(1) Worst Case Access Time. Journal of the ACM (Volume 31, Issue 3), 1984.
8
Why does hashing work so well in Practice ?
9
Why does hashing work so well in Practice ?
Question: What is the simplest hash function ๐ : ๐ผ๏ [๐] ? Answer: ๐ ๐ =๐ ๐ฆ๐จ๐ ๐ Hashing works so well in practice because the set ๐บ is usually a uniformly random subset of ๐ผ. Let us give a theoretical reasoning for this fact.
10
Why does hashing work so well in Practice ?
1 2 m Let ๐ฆ 1 , ๐ฆ 2 ,โฆ, ๐ฆ ๐ denote ๐ elements selected randomly uniformly from ๐ผ to form ๐บ. Question: What is expected number of elements colliding with ๐ฆ 1 ? Answer: Let ๐ฆ 1 takes value ๐. P( ๐ฆ ๐ collides with ๐ฆ 1 ) = ?? โฎ ๐โ๐ ๐ How many possible values can ๐ฆ ๐ take ? ๐+๐ How many possible values can collide with ๐ ? ๐+2๐ ๐โ1 ๐+3๐ โฎ
11
Why does hashing work so well in Practice ?
1 2 m Let ๐ฆ 1 , ๐ฆ 2 ,โฆ, ๐ฆ ๐ denote ๐ elements selected randomly uniformly from ๐ผ to form ๐บ. Question: What is expected number of elements colliding with ๐ฆ 1 ? Answer: Let ๐ฆ 1 takes value ๐. P( ๐ฆ ๐ collides with ๐ฆ 1 ) = ๐ ๐ ๐โ1 Expected number of elements of ๐บ colliding with ๐ฆ 1 = = ๐ ๐ ๐โ1 (๐ โ1) =๐ 1 for ๐=๐(๐ ) โฎ ๐โ๐ Values which may collide with ๐ under the hash function ๐ ๐ฅ =๐ ๐ฆ๐จ๐ ๐ ๐ ๐+๐ ๐+2๐ ๐+3๐ โฎ
12
Why does hashing work so well in Practice ?
Conclusion ๐ ๐ =๐ ๐ฆ๐จ๐ ๐ works so well because for a uniformly random subset of ๐ผ, the expected number of collision at an index of ๐ป is O(1). It is easy to fool this hash function such that it achieves O(s) search time. (do it as a simple exercise). This makes us think: โHow can we achieve worst case O(1) search time for a given set ๐บ.โ
13
How to achieve worst case O(1) search time
14
Key idea to achieve worst case O(1) search time
Observation: Of course, no single hash function is good for every possible ๐บ. But we may strive for a hash function which is good for a given ๐บ. A promising direction: Find out a set of hash functions H such that For any given ๐บ, many of them are good. Select a function randomly from H and try for ๐บ. The notion of goodness is captured formally by Universal hash family in the following slide.
15
Universal Hash Family
16
Universal Hash Family Definition: A collection ๐ฏ of hash-functions is said to be universal if there exists a constant ๐ such that for any ๐,๐โ๐ผ, ๐ ๐ โ ๐ ๐ฏ ๐ ๐ =๐ ๐ โค ๐ ๐ Fact: Set of all functions from ๐ผ to [๐] is a universal hash family (do it as homework). Question: Can we use the set of all functions as universal hash family in real life ? Answer: No. There are ๐ ๐ possible functions. Every pair of them must differ in at least one bit. At least one of them will require ๐ log ๐ bits to encode. So the space occupied by a randomly chosen hash function is too large ๏. Question: Does there exist a Universal hash family whose hash functions have a compact encoding?
17
Universal Hash Family Definition: A collection ๐ฏ of hash-functions is said to be universal if there exists a constant ๐ such that for any ๐,๐โ๐ผ, ๐ ๐ โ ๐ ๐ฏ ๐ ๐ =๐ ๐ โค ๐ ๐ There indeed exist many c-Universal hash families with compact hash function ๏ Example: Let ๐ ๐ : ๐ผ๏ [๐] defined as ๐ ๐ ๐ = ๐๐ ๐ฆ๐จ๐ ๐ ๐ฆ๐จ๐ ๐ ๐ฏ= ๐ ๐ ๐โค๐โค๐โ๐} is ๐-universal. This looks complicated. In the next class we shall show that it is very natural and intuitive. For todayโs lecture, you donโt need it ๏
18
Static Hashing worst Case O(1) search time
19
The Journey One Milestone in Our Journey: Tools Needed:
A perfect hash function using hash table of size O( ๐ 2 ) Tools Needed: Universal Hash Family where ๐ is a small constant Elementary Probability
20
Perfect hashing using O( ๐ ๐ ) space
Let ๐ฏ be Universal Hash Family. Let ๐ฟ : the number of collisions for ๐บ when ๐ โ ๐ ๐ฏ ? Question: What is ๐[๐ฟ] ? ๐ฟ ๐,๐ = ๐ if ๐ ๐ =๐(๐) ๐ otherwise ๐ฟ= ๐<๐ ๐๐ง๐ ๐,๐โ๐บ ๐ฟ ๐,๐ ๐ ๐ฟ = ๐<๐ ๐๐ง๐ ๐,๐โ๐บ ๐[ ๐ฟ ๐,๐ ] = ๐<๐ ๐๐ง๐ ๐,๐โ๐บ ๐[ ๐ฟ ๐,๐ =๐] โค ๐<๐ ๐๐ง๐ ๐,๐โ๐บ ๐ ๐ = ๐ ๐ โ ๐(๐โ๐) ๐
21
Perfect hashing using O( ๐ ๐ ) space
Let ๐ฏ be Universal Hash Family. Let ๐ฟ : the number of collisions for ๐บ when ๐ โ ๐ ๐ฏ ? Lemma1: ๐[๐ฟ]= ๐ ๐ โ ๐(๐โ๐) ๐ Question: How large should ๐ be to achieve no collision ? Question: How large should ๐ be to achieve ๐ ๐ฟ = ๐ ๐ ? Answer: Pick ๐=๐ ๐ ๐ .
22
Perfect hashing using O( ๐ ๐ ) space
Let ๐ฏ be Universal Hash Family. Let ๐ฟ : the number of collisions for ๐บ when ๐ โ ๐ ๐ฏ ? Lemma1: ๐[๐ฟ]= ๐ ๐ โ ๐(๐โ๐) ๐ Observation: ๐ ๐ฟ โค ๐ ๐ when ๐=๐ ๐ ๐ . Question: What is the probability of no collision when ๐=๐ ๐ ๐ ? Answer: โNo collisionโ ๏ณ โ๐ฟ=๐โ P(No collision ) = P(๐ฟ=๐) = ๐ โ P(๐ฟโฅ๐) โฅ๐ โ ๐ ๐ = ๐ ๐ Use Markovโs Inequality to bound it.
23
Perfect hashing using O( ๐ ๐ ) space
Let ๐ฏ be Universal Hash Family. Lemma2: For ๐=๐ ๐ ๐ , there will be no collision with probability at least Algorithm1: Perfect hashing for ๐บ Repeat Pick ๐ โ ๐ ๐ฏ ; ๐ ๏ the number of collisions for ๐บ under ๐. Until ๐=๐. Theorem: A perfect hash function can be computed for ๐บ in expected O( ๐ ๐ ) time. Corollary: A hash table occupying O( ๐ ๐ ) space and worst case O(๐) search time.
24
Hashing with O(๐) space and O(1) worst case search time
We have completed almost 90% of our journey. To achieve the goal of O(๐) space and worst case O(๐) search time, here is the sketch (the details will be given in the beginning of the next class) Use the same hashing scheme as used in Algorithm1 except that use ๐= O(๐). Of course, there will be collisions. Use an additional level of hash tables to take care of collisions. In the next class: We shall complete our algorithm for hashing with O(๐) space and O(1) worst case search time We shall present a very natural way to design various Universal Hash Families.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.