The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.

Slides:



Advertisements
Similar presentations
Compressing Forwarding Tables Ori Rottenstreich (Technion, Israel) Joint work with Marat Radan, Yuval Cassuto, Isaac Keslassy (Technion, Israel) Carmi.
Advertisements

A Memory-optimized Bloom Filter using An Additional Hashing Function Author: Mahmood Ahmadi, Stephan Wong Publisher: IEEE GLOBECOM 2008 Presenter: Yu-Ping.
Introduction to Database Systems1 Records and Files Storage Technology: Topic 3.
Precept 6 Hashing & Partitioning 1 Peng Sun. Server Load Balancing Balance load across servers Normal techniques: Round-robin? 2.
Indian Statistical Institute Kolkata
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
Beyond Bloom Filters: Approximate Concurrent State Machines Michael Mitzenmacher Joint work with Flavio Bonomi, Rina Panigrahy, Sushil Singh, George Varghese.
Hit or Miss ? !!!.  Cache RAM is high-speed memory (usually SRAM).  The Cache stores frequently requested data.  If the CPU needs data, it will check.
Hit or Miss ? !!!.  Small size.  Simple and fast.  Implementable with hardware.  Does not need too much power.  Does not predict miss if we have.
Bloom Filters Kira Radinsky Slides based on material from:
On the Code Length of TCAM Coding Schemes Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) 1.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Fast Statistical Spam Filter by Approximate Classifications Authors: Kang Li Zhenyu Zhong University of Georgia Reader: Deke Guo.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Ph.D. DefenceUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Ph.D. SeminarUniversity of Alberta1 Approximation Algorithms for Frequency Related Query Processing on Streaming Data Presented by Fan Deng Supervisor:
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Worst-Case TCAM Rule Expansion Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)
Look-up problem IP address did we see the IP address before?
1 Memory-Efficient 5D Packet Classification At 40 Gbps Authors: Ioannis Papaefstathiou, and Vassilis Papaefstathiou Publisher: IEEE INFOCOM 2007 Presenter:
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman Stanford University Note to other teachers and users of these.
Worst-Case TCAM Rule Expansion Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)
1 The Mystery of Cooperative Web Caching 2 b b Web caching : is a process implemented by a caching proxy to improve the efficiency of the web. It reduces.
BUFFALO: Bloom Filter Forwarding Architecture for Large Organizations Minlan Yu Princeton University Joint work with Alex Fabrikant,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
PEDS: Parallel Error Detection Scheme for TCAM Devices David Hay, Politecnico di Torino Joint work with Anat Bremler Barr (IDC, Israel), Danny Hendler.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Hashing CS 105. Hashing Slide 2 Hashing - Introduction In a dictionary, if it can be arranged such that the key is also the index to the array that stores.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
1 Lecture 11: Bloom Filters, Final Review December 7, 2011 Dan Suciu -- CSEP544 Fall 2011.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
TinyLFU: A Highly Efficient Cache Admission Policy
TECH Computer Science Dynamic Sets and Searching Analysis Technique  Amortized Analysis // average cost of each operation in the worst case Dynamic Sets.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
Accelerating Error Correction in High-Throughput Short-Read DNA Sequencing Data with CUDA Haixiang Shi Bertil Schmidt Weiguo Liu Wolfgang Müller-Wittig.
On Finding an Optimal TCAM Encoding Scheme for Packet Classification Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Lecture 4 Infinite Cardinals. Some Philosophy: What is “2”? Definition 1: 2 = 1+1. This actually needs the definition of “1” and the definition of the.
Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.
1 Efficient System-on-Chip Energy Management with a Segmented Counting Bloom Filter Mrinmoy Ghosh- Georgia Tech Emre Özer- ARM Ltd Stuart Biles- ARM Ltd.
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
Duplicate Detection in Click Streams(2005) SubtitleAhmed Metwally Divyakant Agrawal Amr El Abbadi Tian Wang.
Mining Data Streams (Part 1)
Hash table CSC317 We have elements with key and satellite data
Lower bounds for approximate membership dynamic data structures
The Variable-Increment Counting Bloom Filter
Tapping Into The Unutilized Router Processing Power
Chapter 12: Query Processing
Chapter 15 QUERY EXECUTION.
Bloom Filters Very fast set membership. Is x in S? False Positive
CS5112: Algorithms and Data Structures for Applications
Heavy Hitters in Streams and Sliding Windows
By: Ran Ben Basat, Technion, Israel
Lecture 1: Bloom Filters
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel

Requirement: A data structure in user with fast answer to Solutions: o O(n) – Searching in a list o O(log(n)) – Searching in a sorted list o O(1) – But with false positives / negatives S local cache Problem Definition 2 M central memory with all elements vu z yx zx x user cost = 10 cost = 1 x y cost = 10 y user y

False Positive: but the data structure answers Results in a redundant access to the local cache.  Additional cost of 1. False Negative: but the data structure answers Results in an expensive access to the central memory instead of the local cache.  Additional cost of 10-1=9. Two Possible Errors 3 x y

1 Initialization: Array of zero bits. Insertion: Each of the elements is hashed times, the corresponding bits are set. Query: Hashing the element, checking that all bits are set. False positive rate (probability) of. No false negatives. Bloom Filters (Bloom, 1970) y z x x w 1 1 1

Cache/Memory Framework Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification Can be found in o Google's web browser Chrome o Google's database system BigTable o Facebook's distributed storage system Cassandra o Mellanox's IB Switch System Bloom Filters are Widely Used 5

The Bloom Paradox 6 Sometimes, it is better to disregard the Bloom filter results, and in fact not to even query it, thus making the Bloom filter useless.

Outline  Introduction to Bloom Filters  The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter  Summary 7

Parameters: Extreme case without locality: All elements with equal probability of belonging to the cache. o Toy example Bloom Paradox Example 8 Bloom filter

Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter  Intuition: S local cache M central memory with all elements vu z yx zx cost = 10 cost = 1 cost = 10 Bloom Paradox Example. user B Bloom filter 9

Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter  Surprise: cost = 1 S local cache M central memory with all elements vu z yx zx cost = 10 Bloom Paradox Example. 9 B Bloom filter

Parameters: Let be the set of elements that the Bloom filter indicates are in o In particular, no false negatives in Bloom filter  Surprise: The Bloom filter indicates the membership of elements. Only of them are indeed in. Bloom Paradox Example. B Bloom filter

When the Bloom filter states that, it is wrong with probability Average cost if we listen to the Bloom filter: Average cost if we don’t: The Bloom filter is useless! Bloom Paradox Example 11 Don’t listen to the Bloom filter  = =

Outline  Introduction to Bloom Filters  The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter  Summary 12

The cost of a false positive : 1 The cost of a false negative : In the cache example: Costs of the Two Possible Errors 13

Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small Conditions for the Bloom Paradox 14 local cache Bloom filter central memory

Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) Conditions for the Bloom Paradox 14 central memory local cache Bloom filter

Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) Conditions for the Bloom Paradox 14 Bloom filter central memory local cache

Let be the a priori membership probability of o i.e. before getting the answer of the Bloom filter Intuition: The Bloom paradox occurs more often when: o is small o is large (i.e. is small) o is small (because the Bloom filter implicitly assumes ) Theorem 1: The Bloom paradox occurs if and only if Boundaries of the Bloom Paradox: (for ) Conditions for the Bloom Paradox 14 If and the Bloom paradox occurs if

Theorem 1: The Bloom paradox occurs if and only if Bloom Filter Improvements 15 Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful Bloom filter central memory local cache

Theorem 1: The Bloom paradox occurs if and only if Bloom Filter Improvements 15 Use the formula to improve the Bloom filter o Only insert / query Bloom filter if the formula expects it to be useful Bloom filter central memory local cache

Outline  Introduction to Bloom Filters  The Bloom Paradox o The Bloom Paradox in Bloom Filters o Analysis of the Bloom Paradox o The Bloom Paradox in the Counting Bloom Filter  Summary 16

1 Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. The solution: Counting Bloom filters - Storing array of counters instead of bits. o Insertion: Incrementing counters by one. o Deletion: Decrementing counters by one. o Query: Checking that counters are positive. The same false positive probability. Require too much memory, e.g. 57 bits per element for. Counting Bloom Filters (CBFs) y x y x

Query o Checking that counters are positive. o Question: Which is more likely to be correct? y or z? Counting Bloom Filter Query z y y

Theorem 2: Let denote the values of the counters pointed by the set of hash functions. Then, 19 The Bloom Paradox in the Counting Bloom Filter Only counters product matters!

Parameters: n=3328, m = 28485, k=6 20 CBF Based Membership Probability -Before checking CBF, a priori membership probability = ≈ CBF indicates counters product=8  a posteriori membership probability ≈ 0.69

Theorem 3: An optimal decision policy of the counting Bloom filter is to be positive iff Use the formula to improve the Counting Bloom filter o Only return a positive indication if the counters product is large enough 21 Optimal Query Policy

Internet trace (equinix-chicago) with real hash functions. Counting Bloom filter parameters: n=2 10, m / n = 30, k=5, 2 20 queries 21 Experimental Results

Discovery of the Bloom paradox Importance of the a priori membership probability Using the counters product to estimate the correctness of a positive indication of the CBF Concluding Remarks 22

Thank You

Bloom filter Insertion, Query Selective Bloom filter Insertion Selective Bloom filter Query Implementation 14