Download presentation
Presentation is loading. Please wait.
1
Bloom Filters Kira Radinsky Slides based on material from:
Michael Mitzenmacher and Hanoch Levy Some modifications by Naama Kraus
2
Bloom Filter Problem: membership testing
Does item X belong to a set S ? Assumption: the great majority of items tested will not belong to the given set Data structure should be: Fast (faster than searching through S). Small (smaller than explicit representation). The “price”: allow some probability of error Allow false positive errors Don’t allow false negative errors
3
Sample Application: Distributed Web Caches
Proxy servers maintain local cache to minimize expensive internet requests. Proxy must maintain an efficient lookup method into the cache. The lookup structure must be stored in DRAM for performance. Structure must be compact, as DRAM is expensive and is used for “Hot Items” storage and more. Pages are usually replaced in the cache using an LRU algorithm Summary Cache: [Fan, Cao, Almeida, & Broder] If local caches know each other’s content... …try local cache before going out to Web The idea: each cache keeps a summary of the content of each participating cache Store each summary in a Bloom Filter
4
Why Bloom Filters? Size is very economical Efficient query time
Percentage of false positives is 1%-2% for 8 bits per entry False positives are possible Penalty is a wasted cache query. Small cost. No false negatives Never miss a cache hit. Big potential gain.
5
Bloom Filters B B B B Start with an m bit array, filled with 0s.
B Hash each item xj in S k times. If Hi(xj) = a, set B[a] = 1. 1 B To check if y is in S, check B at Hi(y). All k values must be 1. 1 B Encoding an attribute aU Maintain a Bit Vector V of size m Use k hash functions (h1..hk) , hi: U[1..m] Encoding: For item x, “turn on” bits V[h1(x)]..V[hk(x)]. Lookup: Check bits V[h1(i)]..V[hk(i)] . If all equal 1, return “Probably Yes”. Else “Definitely No”. Possible to have a false positive; all k values are 1, but y is not in S. 1 B
6
Bloom Filter x V0 Vm-1 1 1 h1(x) h2(x) h3(x) hk(x)
7
x didn’t appear, yet its bits are already set
Bloom Errors a b c d V0 Vm-1 1 1 h1(x) h2(x) h3(x) hk(x) x didn’t appear, yet its bits are already set
8
Computational Factors
Size m/n : bits per item. |U| = n: Number of elements to encode. hi: U[1..m] : Maintain a Bit Vector V of size m Time k : number of hash functions. Use k hash functions (h1..hk) Error f : false positive probability.
9
Error Estimation Assumption: Hash functions are perfectly random
Probability of a bit being 0 after hashing all elements: Let p=e-kn/m, probability of a false positive is: Assuming we are given m and n, the optimal k is:
10
Example m/n = 8 Opt k = 8 ln 2 = Error estimation: perfect k is ln2 * (m/n)
11
Bloom Filter Tradeoffs
Three factors: m,k and n. Normally, n and m are given, and we select k. More hash functions yields more chances to find a 0 bit for elements not in S Fewer hash functions increases the fraction of the bits that are 0. Not surprisingly, when k is optimal, the “hit ratio” (ratio of bits flipped in the array) is 0.5 .
12
Bloom Filters and Deletions
Cache contents change Items both inserted and deleted. Insertions are easy – add bits to BF Can Bloom filters handle deletions? Use Counting Bloom Filters to track insertions/deletions
13
Handling Deletions Bloom filters can handle insertions, but not deletions. If deleting xi means resetting 1s to 0s, then deleting xi will “delete” xj. xi xj B 1 1 1 1 1 1 1 1
14
Counting Bloom Filters
Start with an m bit array, filled with 0s. B Hash each item xj in S k times. If Hi(xj) = a, add 1 to B[a]. 3 1 2 B To delete xj decrement the corresponding counters. 2 3 1 B Can obtain a corresponding Bloom filter by reducing to 0/1. 1 B
15
Variations and Extensions
Bloomier Filter Distance-Sensitive Bloom Filters
16
Extension: Bloomier Filter
Bloomier filter [Chazelle, Kilian, Rubinfeld, Tal]: Map: associate a value with each element (key) Elements not in the map have a null value Always return correct value for elements in the map No false negatives: If null is returned, element is not in the map False positives: Returns a value for an element that is not in the map
17
Extension: Distance-Sensitive Bloom Filters
Instead of answering questions of the form we would like to answer questions of the form That is, is the query close to some element of the set, under some metric and some notion of close. Applications: DNA matching Virus/worm matching Databases Some initial results [KirschMitzenmacher]. Hard.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.