Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Similar presentations


Presentation on theme: "The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel."— Presentation transcript:

1 The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

2 Requirement: A data structure in user with fast answer to Solutions: o O(n) – Searching in a list o O(log(n)) – Searching in a sorted list o O(1) – But with false positives / negatives S local cache Problem Definition 2 M central memory with all elements vu z yx zx x user cost = 10 cost = 1 x y cost = 10 y user y

3 False Positive: but the data structure answers Results in a redundant access to the local cache.  Additional cost of 1. False Negative: but the data structure answers Results in an expensive access to the central memory instead of the local cache.  Additional cost of 10-1=9. Two Possible Errors 3 x y

4 1 Initialization: Array of zero bits. Insertion: Each of the elements is hashed times, the corresponding bits are set. Query: Hashing the element, checking that all bits are set. False positive rate (probability) of. No false negatives. Bloom Filters (Bloom, 1970) 4 000000000000 1 y 1 1 000000000000 1 1 z x 1 1 11 1111 1 x 1 1 1 w 1 1 1

5 Cache/Memory Framework Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification Can be found in o Google's web browser Chrome o Google's database system BigTable o Facebook's distributed storage system Cassandra o Mellanox's IB Switch System Bloom Filters are Widely Used 5

6 The Bloom Paradox 6 Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it, thus making the Bloom filter useless.

7 Outline  Introduction to Bloom Filters  The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter  Summary 7

8 Parameters: Extreme case without locality: All elements with equal probability of belonging to the cache. o Toy example Bloom Paradox Example 8 Bloom filter

9 Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter  Intuition: S local cache M central memory with all elements vu z yx zx cost = 10 cost = 1 cost = 10 Bloom Paradox Example. user B Bloom filter 9

10 Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter  Surprise: cost = 1 S local cache M central memory with all elements vu z yx zx cost = 10 Bloom Paradox Example. 9 B Bloom filter

11 Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter  Surprise: The Bloom filter indicates the membership of elements. Only of them are indeed in. Bloom Paradox Example. B Bloom filter

12 When the Bloom filter states that, it is wrong with probability Average cost if we listen to the Bloom filter: Average cost if we don’t: The Bloom filter is useless! Bloom Paradox Example 11 Don’t listen to the Bloom filter  = =

13 Outline  Introduction to Bloom Filters  The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter  Summary 12

14 The cost of a false positive : 1 The cost of a false negative : In the cache example: Costs of the Two Possible Errors 13

15 Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small Conditions for the Bloom Paradox 14 local cache Bloom filter central memory

16 Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) Conditions for the Bloom Paradox 14 central memory local cache Bloom filter

17 Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) Conditions for the Bloom Paradox 14 Bloom filter central memory local cache

18 Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) Theorem 1: The Bloom paradox occurs if and only if Boundaries of the Bloom Paradox: (for ) Conditions for the Bloom Paradox 14 If and the Bloom paradox occurs if

19 Theorem 1: The Bloom paradox occurs if and only if Bloom Filter Improvements 15 Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful Bloom filter central memory local cache

20 Theorem 1: The Bloom paradox occurs if and only if Bloom Filter Improvements 15 Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful Bloom filter central memory local cache

21 Outline  Introduction to Bloom Filters  The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter  Summary 16

22 1 Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. The solution: Counting Bloom filters - Storing array of counters instead of bits. o Insertion: Incrementing counters by one. o Deletion: Decrementing counters by one. o Query: Checking that counters are positive. The same false positive probability. Require too much memory, e.g. 57 bits per element for. Counting Bloom Filters (CBFs) y +1 010200101001 x y 000000101000 x 1 1 11 1

23 Query o Checking that counters are positive. o Question: Which is more likely to be correct? y or z? Counting Bloom Filter Query 18 038105201012 z y y

24 Theorem 2: Let denote the values of the counters pointed by the set of hash functions. Then, 19 The Bloom Paradox in the Counting Bloom Filter Only counters product matters!

25 Parameters: n=3328, m = 28485, k=6 20 CBF Based Membership Probability -Before checking CBF, a priori membership probability = ≈ 0.03 -CBF indicates counters product=8  a posteriori membership probability ≈ 0.69

26 Theorem 3: An optimal decision policy of the counting Bloom filter is to be positive iff Use the formula to improve the Counting Bloom filter o Only return a positive indication if the counters product is large enough 21 Optimal Query Policy

27 Internet trace (equinix-chicago) with real hash functions. Counting Bloom filter parameters: n=2 10, m / n = 30, k=5, 2 20 queries 21 Experimental Results

28 Discovery of the Bloom paradox Importance of the a priori membership probability Using the counters product to estimate the correctness of a positive indication of the CBF Concluding Remarks 22

29 Thank You

30 Bloom filter Insertion, Query Selective Bloom filter Insertion Selective Bloom filter Query Implementation 14


Download ppt "The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel."

Similar presentations


Ads by Google