Download presentation
Presentation is loading. Please wait.
1
Hash Functions for Network Applications (II)
Yaxuan Qi NSLab, RIIT Tsinghua University
2
Outline Concept and Theory (1~2) Applications (3~4) Hash functions
Bloom Filters Applications (3~4)
3
Basic Idea Packet: header & payload. L3-L4 header
Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table?
4
Technique Packet: header & payload. L3-L4 header
Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table?
5
False Positive n: number of messages m: number of bloom bits
k: number of hash functions False Positive p(y是fp ) = p(y不属于X)*p(y对应的k个bits都是1) = p(y对应的k个bits都是1) 考虑对y对应的特定的k个bits, 都被set(由X引起)的概率 首先考虑1个指定bit被set(由X引起)的概率…
6
Math (I) Two potential assumptions: m: big enough… kn/m: constant…
n: number of messages m: number of bloom bits k: number of hash functions Two potential assumptions: m: big enough… kn/m: constant…
7
n: number of messages m: number of bloom bits k: number of hash functions In practice If the number of 0 bits in the array is substantially less than expected, then the probability of a false positive will be higher than the quantity f that we computed.
8
Optimal Number of Hash Functions
Given m and n minimizes f as a function of k Two competing forces k ?? (from view of search) more chances to find a 0 bit for an element that is not a match (from view of construction) increases the fraction of 0 bits in the array
9
Math (II) In practice, k must be an integer, and a smaller, suboptimal k might be preferred since this reduces the number of hash functions that have to be computed.
10
Optimization: Summary
Assumption We have good hash functions, look random. Given m bits for filter and n elements, choose number k of hash functions to minimize false positives: Let As k increases more chances to find a 0 but more 1’s in the array. Conclusion
11
Partial Bloom Filters The total number of bits is still m, but the bits are divided equally among the k hash functions. Each hash function has a range of m/k consecutive bit, make parallelization of array accesses. Packet: header & payload. L3-L4 header Forwarding Engine: Router, Firewall, IPS. Rule Set: exact match; prefix match; range match ACTION: drop/accept Ruleset may contain thousands of rules, then How to efficiently lookup the table? Though the probability of a false positive is actually always at least as large with this division, the difference is small...
12
Counting Bloom Filters: Idea
13
Counting Bloom filters: Implementation
4 bits is enough...
14
Compressed Bloom Filters: Problem
15
Compressed Bloom Filters: Motivation
Insight: Bloom filter is not just a data structure, it is also a message. If the Bloom filter is a message, worthwhile to compress it Further reduce traffic of URL exchanging Compressing bit vectors is easy. Arithmetic coding gets close to entropy. Can Bloom filters be compressed? Bloom filter looks like a random string
16
Compression: Technique
17
Compression: Results z/n = 8
Original Compressed At k = m (ln 2) /n, false positives are maximized with a compressed Bloom filter. Best case without compression is worst case with compression; compression always helps. Side benefit: Use fewer hash functions with compression; possible speedup (depend on the bottleneck: memory or link).
18
Bloom Filter vs. Perfect Hash
If the set X of n elements is fixed, one can find a perfect hash function for X plus a fully uniform random hash function Then build a table with n entries of j bits each Mapping each X to n j-bit index, thus the false positive is exactly (1/2)j . matches the lower bound of bloom filter: HOWEVER any change in the set X would require an expensive recomputation of a perfect hash function.
19
Bloom Filter: Tricks Union (combining two BFs)
The same m and the same hash functions Just OR the two bit vectors of the original Bloom filters Shrinking (halve a big BF) just OR the first and second halves together the highest order bit can be masked Intersection (estimation)
20
Applications
21
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.