The Variable-Increment Counting Bloom Filter

Slides:



Advertisements
Similar presentations
Compressing Forwarding Tables Ori Rottenstreich (Technion, Israel) Joint work with Marat Radan, Yuval Cassuto, Isaac Keslassy (Technion, Israel) Carmi.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Mining Data Streams.
An Improved Construction for Counting Bloom Filters Flavio Bonomi Michael Mitzenmacher Rina Panigrahy Sushil Singh George Varghese Presented by: Sailesh.
SIGMOD 2006University of Alberta1 Approximately Detecting Duplicates for Streaming Data using Stable Bloom Filters Presented by Fan Deng Joint work with.
1 Introduction to Computability Theory Lecture12: Reductions Prof. Amos Israeli.
Bloom Filters Kira Radinsky Slides based on material from:
On the Code Length of TCAM Coding Schemes Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) 1.
Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Fast Statistical Spam Filter by Approximate Classifications Authors: Kang Li Zhenyu Zhong University of Georgia Reader: Deke Guo.
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Worst-Case TCAM Rule Expansion Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,
Basic Data Structures for IP lookups and Packet Classification
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
1Bloom Filters Lookup questions: Does item “ x ” exist in a set or multiset? Data set may be very big or expensive to access. Filter lookup questions with.
Worst-Case TCAM Rule Expansion Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
PEDS: Parallel Error Detection Scheme for TCAM Devices David Hay, Politecnico di Torino Joint work with Anat Bremler Barr (IDC, Israel), Danny Hendler.
Fast and deterministic hash table lookup using discriminative bloom filters  Author: Kun Huang, Gaogang Xie,  Publisher: 2013 ELSEVIER Journal of Network.
Great Theoretical Ideas in Computer Science.
Compact Data Structures and Applications Gil Einziger and Roy Friedman Technion, Haifa.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
TinyLFU: A Highly Efficient Cache Admission Policy
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
PEDS: A PARALLEL ERROR DETECTION SCHEME FOR TCAM DEVICES Author: Anat Bremler-Barr, David Hay, Danny Hendler and Ron M. Roth Publisher/Conf.: IEEE INFOCOM.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
Great Theoretical Ideas in Computer Science.
On Finding an Optimal TCAM Encoding Scheme for Packet Classification Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel)
The Bloom Paradox Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel.
Foundations of Data Structures Practical Session #10 Hash Tables.
1 Chapter 34: NP-Completeness. 2 About this Tutorial What is NP ? How to check if a problem is in NP ? Cook-Levin Theorem Showing one of the most difficult.
The Bloom Paradox Ori Rottenstreich Joint work with Isaac Keslassy Technion, Israel.
Mining of Massive Datasets Ch4. Mining Data Streams
Minimizing Delay in Shared Pipelines Ori Rottenstreich (Technion, Israel) Joint work with Isaac Keslassy (Technion, Israel) Yoram Revah, Aviran Kadosh.
Bloom Filters. Lecture on Bloom Filters Not described in the textbook ! Lecture based in part on: Broder, Andrei; Mitzenmacher, Michael (2005), "Network.
Compression for Fixed-Width Memories Ori Rottenstriech, Amit Berman, Yuval Cassuto and Isaac Keslassy Technion, Israel.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Evaluating and Optimizing Indexing Schemes for a Cloud-based Elastic Key- Value Store Apeksha Shetty and Gagan Agrawal Ohio State University David Chiu.
Mining Data Streams (Part 1)
Author: Heeyeol Yu; Mahapatra, R.; Publisher: IEEE INFOCOM 2008
Indexing Goals: Store large files Support multiple search keys
Updating SF-Tree Speaker: Ho Wai Shing.
Hash table CSC317 We have elements with key and satellite data
Hans Bodlaender, Marek Cygan and Stefan Kratsch
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Tapping Into The Unutilized Router Processing Power
Problem Solving: Brute Force Approaches
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Randomized Algorithms CS648
Bloom Filters Very fast set membership. Is x in S? False Positive
Parallel Sorting Algorithms
Index tuning Hash Index.
RUM Conjecture of Database Access Method
Heavy Hitters in Streams and Sliding Windows
EMOMA- Exact Match in One Memory Access
Lecture 1: Bloom Filters
A flow aware packet sampling mechanism for high speed links
Lu Tang , Qun Huang, Patrick P. C. Lee
CSE 326: Data Structures Lecture #14
Worst-Case TCAM Rule Expansion
Presentation transcript:

The Variable-Increment Counting Bloom Filter Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel

Problem Definition Yes No Set S (Special Flows) Support queries of the form Requirements for data structure: Space efficient Fast (Insertion, Query) Flow x Flow y Flow z Flow y Flow u Yes No Set S (Special Flows) Flow y 2

Naïve Solutions Set S (Special Flows) O(n) – Searching in a list O(log(n)) – Searching in a sorted list O(1) ? Tradeoff: We allow False Positives with low probability Two possible errors False Positives - but the answer is False Negatives - but the answer is Flow x Flow y Flow y Flow z Set S (Special Flows) 3

Bloom Filters (Bloom, 1970) Initialization: Array of zero bits. Insertion: Each of the elements is hashed times, the corresponding bits are set. Query: Hashing the element, checking that all bits are set. False positive rate (probability) of . No false negatives. x y 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x z w 4

Counting Bloom Filters (CBFs) Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. The solution: Counting Bloom filters - Storing array of counters instead of bits. Insertion: Incrementing counters by one. Deletion: Decrementing counters by one. Query: Checking that counters are positive. The same false positive probability. Require too much memory, e.g. 57 bits per element for . x y 1 1 1 1 1 1 1 1 x y +1 +1 +1 +1 +1 +1 1 1 2 1 1 5 5

(Counting) Bloom Filters are Widely Used Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification Can be found in Google's web browser Chrome Google's database system BigTable Facebook's distributed storage system Cassandra Mellanox's IB Switch System 6

Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary 7

Intuition for Variable Increments Upon query, we should consider the exact values of the counters and not just their positiveness. Idea: Use variable increments to encode the element identity. 1 2 4 1 7 1 2 1 x y 8 8 8

Architecture c1 c2 Each hash entry contains a pair of counters: , fixed increments → number of elements in entry (as in CBF) , variable increments → weighted sum of elements weights from a pre-determined set We use two sets of hash functions: The first set uses hash functions with range , i.e. it points to the set of entries. The second set uses hash functions with range , i.e. it points to the set . 1 2 3 4 5 6 7 8 9 c1 5 3 2 2 3 3 3 4 c2 34 25 26 17 21 9 6 26 9 9 9

Insertion c1 c2 Insertion: Example 1: x z At each entry , the two counters are updated as follows. from the set Example 1: 1 2 3 4 5 6 7 8 9 c1 0 1 5 3 4 3 2 2 3 4 3 3 4 5 3 4 c2 0 8 34 25 29 25 17 30 43 17 21 30 34 9 13 26 +8 +4 +13 +4 x z 10 10 10

Query c1 c2 We should use Sequences! Query ( with ) y? We ask whether 17 can be a sum of 2 elements from the set including 4 30 can be a sum of 3 elements from the set including 8 No: How should we pick the set of variable increments? Flow y 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 4? 8? y? We should use Sequences! 11 11 11

Bh Sequences Definition 1: Example 2: Let be a sequence of positive integers. Then, is a sequence iff all the sums with are distinct. Example 2: All the sums of elements of are distinct: Therefore, is a sequence. sequences are widely used in error-correcting codes. 12

The Bh-CBF Scheme Query Example 3: is a sequence Since , then the Bh-CBF can determine that 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 X? 1? 4? 13 13 13

The Bh-CBF Scheme Operations The Bh-CBF Scheme Query Example 3: is a sequence 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 X? 1? Here, and then necessarily Since , the Bh-CBF can determine that 4? 4? 8? y? 13 14 14

The Bh-CBF Scheme Operations The Bh-CBF Scheme Query The Bh-CBF Scheme Operations Example 3: is a sequence 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 X? 1? Since , the Bh-CBF cannot exclude that 4? 4? 8? 4? 13? y? z? 13 15 15

Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary 16

The VI-CBF Scheme Principles Two counters in each hash entry  use more space. Can we only keep the variable increment counter? In the VI-CBF (Variable-Increment Counting Bloom Filter), each hash entry only contains the variable-increment counter. The counter is updated like the variable-increment counter in the Bh-CBF. 1 2 3 4 5 6 7 8 9 c1 5 3 2 2 3 3 3 4 c2 34 25 26 17 21 9 6 26 => We want a variant of Bh in which we don’t know h 15 17 17

The VI-CBF Scheme Principles cannot be a sum of 3 elements from the set including 8 However, can be a sum of 5 elements from the set including 8 Problem: We do not know the number of elements in each hash entry. Example 4: (with the sequence ) 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 4? 8? y? 16 18 18

The VI-CBF Scheme Principles In the VI-CBF , the set of variable increments is not necessarily a sequence Example 5: Based on or , the VI-CBF can deduce that x y +7 +5 +4 +5 +5 +4 7 5 9 4 5 5 7 6 z 17 19 19

A Simple Option for D: DL = [L, 2L-1] For , we define the set of size as Intuition: Lemma 1: Let be an element whose -th hash function hashes into an entry of the value If then sum of zero elements sum of one element sum of two or more elements not possible not possible 18 20 20

VI-CBF Outperforms CBF Theorem 1: While keeping the same bit-per-element ratio , VI-CBF satisfies the following properties when compared to CBF: (i) VI-CBF obtains a lower false positive rate than CBF. (ii) (iii) VI-CBF obtains a lower counter overflow probability bound than the classical bound of CBF. Cost: Limited implementation overhead. 19 21 21

Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary 22

Experimental Results Internet trace (equinix-chicago) with real hash functions. For the Bh-CBF, (with ). For the VI-CBF, and . . 21

Concluding Remarks Encoding the element identity using Variable Increments Considering the exact values of the counters upon query Can extend many variants of the counting Bloom filter First time sequences are presented in networking applications 22 24 24

Thank You