Download presentation
Presentation is loading. Please wait.
Published byLouisa Waters Modified over 9 years ago
1
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel
2
Requirement: A data structure in user with fast answer to Solutions: o O(n) – Searching in a list o O(log(n)) – Searching in a sorted list o O(1) – But with false positives / negatives S local cache Problem Definition 2 M central memory with all elements vu z yx zx x user cost = 10 cost = 1 x y cost = 10 y user y
3
False Positive: but the data structure answers Results in a redundant access to the local cache. Additional cost of 1. False Negative: but the data structure answers Results in an expensive access to the central memory instead of the local cache. Additional cost of 10-1=9. Two Possible Errors 3 x y
4
1 Initialization: Array of zero bits. Insertion: Each of the elements is hashed times, the corresponding bits are set. Query: Hashing the element, checking that all bits are set. False positive rate (probability) of. No false negatives. Bloom Filters (Bloom, 1970) 4 000000000000 1 y 1 1 000000000000 1 1 z x 1 1 11 1111 1 x 1 1 1 w 1 1 1
5
Cache/Memory Framework Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification Can be found in o Google's web browser Chrome o Google's database system BigTable o Facebook's distributed storage system Cassandra o Mellanox's IB Switch System Bloom Filters are Widely Used 5
6
The Bloom Paradox 6 Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it, thus making the Bloom filter useless.
7
Outline Introduction to Bloom Filters The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter Summary 7
8
Parameters: Extreme case without locality: All elements with equal probability of belonging to the cache. o Toy example Bloom Paradox Example 8 Bloom filter
9
Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter Intuition: S local cache M central memory with all elements vu z yx zx cost = 10 cost = 1 cost = 10 Bloom Paradox Example. user B Bloom filter 9
10
Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter Surprise: cost = 1 S local cache M central memory with all elements vu z yx zx cost = 10 Bloom Paradox Example. 9 B Bloom filter
11
Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter Surprise: The Bloom filter indicates the membership of elements. Only of them are indeed in. Bloom Paradox Example. B Bloom filter
12
When the Bloom filter states that, it is wrong with probability Average cost if we listen to the Bloom filter: Average cost if we don’t: The Bloom filter is useless! Bloom Paradox Example 11 Don’t listen to the Bloom filter = =
13
Outline Introduction to Bloom Filters The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter Summary 12
14
The cost of a false positive : 1 The cost of a false negative : In the cache example: Costs of the Two Possible Errors 13
15
Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small Conditions for the Bloom Paradox 14 local cache Bloom filter central memory
16
Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) Conditions for the Bloom Paradox 14 central memory local cache Bloom filter
17
Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) Conditions for the Bloom Paradox 14 Bloom filter central memory local cache
18
Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) Theorem 1: The Bloom paradox occurs if and only if Boundaries of the Bloom Paradox: (for ) Conditions for the Bloom Paradox 14 If and the Bloom paradox occurs if
19
Theorem 1: The Bloom paradox occurs if and only if Bloom Filter Improvements 15 Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful Bloom filter central memory local cache
20
Theorem 1: The Bloom paradox occurs if and only if Bloom Filter Improvements 15 Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful Bloom filter central memory local cache
21
Outline Introduction to Bloom Filters The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter Summary 16
22
1 Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. The solution: Counting Bloom filters - Storing array of counters instead of bits. o Insertion: Incrementing counters by one. o Deletion: Decrementing counters by one. o Query: Checking that counters are positive. The same false positive probability. Require too much memory, e.g. 57 bits per element for. Counting Bloom Filters (CBFs) y +1 010200101001 x y 000000101000 x 1 1 11 1
23
Query o Checking that counters are positive. o Question: Which is more likely to be correct? y or z? Counting Bloom Filter Query 18 038105201012 z y y
24
Theorem 2: Let denote the values of the counters pointed by the set of hash functions. Then, 19 The Bloom Paradox in the Counting Bloom Filter Only counters product matters!
25
Parameters: n=3328, m = 28485, k=6 20 CBF Based Membership Probability -Before checking CBF, a priori membership probability = ≈ 0.03 -CBF indicates counters product=8 a posteriori membership probability ≈ 0.69
26
Theorem 3: An optimal decision policy of the counting Bloom filter is to be positive iff Use the formula to improve the Counting Bloom filter o Only return a positive indication if the counters product is large enough 21 Optimal Query Policy
27
Internet trace (equinix-chicago) with real hash functions. Counting Bloom filter parameters: n=2 10, m / n = 30, k=5, 2 20 queries 21 Experimental Results
28
Discovery of the Bloom paradox Importance of the a priori membership probability Using the counters product to estimate the correctness of a positive indication of the CBF Concluding Remarks 22
29
Thank You
30
Bloom filter Insertion, Query Selective Bloom filter Insertion Selective Bloom filter Query Implementation 14
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.