The Variable-Increment Counting Bloom Filter Ori Rottenstreich Joint work with Yossi Kanizo and Isaac Keslassy Technion, Israel
Problem Definition Yes No Set S (Special Flows) Support queries of the form Requirements for data structure: Space efficient Fast (Insertion, Query) Flow x Flow y Flow z Flow y Flow u Yes No Set S (Special Flows) Flow y 2
Naïve Solutions Set S (Special Flows) O(n) – Searching in a list O(log(n)) – Searching in a sorted list O(1) ? Tradeoff: We allow False Positives with low probability Two possible errors False Positives - but the answer is False Negatives - but the answer is Flow x Flow y Flow y Flow z Set S (Special Flows) 3
Bloom Filters (Bloom, 1970) Initialization: Array of zero bits. Insertion: Each of the elements is hashed times, the corresponding bits are set. Query: Hashing the element, checking that all bits are set. False positive rate (probability) of . No false negatives. x y 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 x z w 4
Counting Bloom Filters (CBFs) Bloom filters do not support deletions of elements. Simply resetting bits might cause false negatives. The solution: Counting Bloom filters - Storing array of counters instead of bits. Insertion: Incrementing counters by one. Deletion: Decrementing counters by one. Query: Checking that counters are positive. The same false positive probability. Require too much memory, e.g. 57 bits per element for . x y 1 1 1 1 1 1 1 1 x y +1 +1 +1 +1 +1 +1 1 1 2 1 1 5 5
(Counting) Bloom Filters are Widely Used Packet Classification Intrusion Detection Routing Accounting Beyond networking: Spell Checking, DNA Classification Can be found in Google's web browser Chrome Google's database system BigTable Facebook's distributed storage system Cassandra Mellanox's IB Switch System 6
Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary 7
Intuition for Variable Increments Upon query, we should consider the exact values of the counters and not just their positiveness. Idea: Use variable increments to encode the element identity. 1 2 4 1 7 1 2 1 x y 8 8 8
Architecture c1 c2 Each hash entry contains a pair of counters: , fixed increments → number of elements in entry (as in CBF) , variable increments → weighted sum of elements weights from a pre-determined set We use two sets of hash functions: The first set uses hash functions with range , i.e. it points to the set of entries. The second set uses hash functions with range , i.e. it points to the set . 1 2 3 4 5 6 7 8 9 c1 5 3 2 2 3 3 3 4 c2 34 25 26 17 21 9 6 26 9 9 9
Insertion c1 c2 Insertion: Example 1: x z At each entry , the two counters are updated as follows. from the set Example 1: 1 2 3 4 5 6 7 8 9 c1 0 1 5 3 4 3 2 2 3 4 3 3 4 5 3 4 c2 0 8 34 25 29 25 17 30 43 17 21 30 34 9 13 26 +8 +4 +13 +4 x z 10 10 10
Query c1 c2 We should use Sequences! Query ( with ) y? We ask whether 17 can be a sum of 2 elements from the set including 4 30 can be a sum of 3 elements from the set including 8 No: How should we pick the set of variable increments? Flow y 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 4? 8? y? We should use Sequences! 11 11 11
Bh Sequences Definition 1: Example 2: Let be a sequence of positive integers. Then, is a sequence iff all the sums with are distinct. Example 2: All the sums of elements of are distinct: Therefore, is a sequence. sequences are widely used in error-correcting codes. 12
The Bh-CBF Scheme Query Example 3: is a sequence Since , then the Bh-CBF can determine that 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 X? 1? 4? 13 13 13
The Bh-CBF Scheme Operations The Bh-CBF Scheme Query Example 3: is a sequence 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 X? 1? Here, and then necessarily Since , the Bh-CBF can determine that 4? 4? 8? y? 13 14 14
The Bh-CBF Scheme Operations The Bh-CBF Scheme Query The Bh-CBF Scheme Operations Example 3: is a sequence 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 X? 1? Since , the Bh-CBF cannot exclude that 4? 4? 8? 4? 13? y? z? 13 15 15
Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary 16
The VI-CBF Scheme Principles Two counters in each hash entry use more space. Can we only keep the variable increment counter? In the VI-CBF (Variable-Increment Counting Bloom Filter), each hash entry only contains the variable-increment counter. The counter is updated like the variable-increment counter in the Bh-CBF. 1 2 3 4 5 6 7 8 9 c1 5 3 2 2 3 3 3 4 c2 34 25 26 17 21 9 6 26 => We want a variant of Bh in which we don’t know h 15 17 17
The VI-CBF Scheme Principles cannot be a sum of 3 elements from the set including 8 However, can be a sum of 5 elements from the set including 8 Problem: We do not know the number of elements in each hash entry. Example 4: (with the sequence ) 34 30 13 26 17 21 25 5 4 3 2 c1 c2 7 8 9 6 1 4? 8? y? 16 18 18
The VI-CBF Scheme Principles In the VI-CBF , the set of variable increments is not necessarily a sequence Example 5: Based on or , the VI-CBF can deduce that x y +7 +5 +4 +5 +5 +4 7 5 9 4 5 5 7 6 z 17 19 19
A Simple Option for D: DL = [L, 2L-1] For , we define the set of size as Intuition: Lemma 1: Let be an element whose -th hash function hashes into an entry of the value If then sum of zero elements sum of one element sum of two or more elements not possible not possible 18 20 20
VI-CBF Outperforms CBF Theorem 1: While keeping the same bit-per-element ratio , VI-CBF satisfies the following properties when compared to CBF: (i) VI-CBF obtains a lower false positive rate than CBF. (ii) (iii) VI-CBF obtains a lower counter overflow probability bound than the classical bound of CBF. Cost: Limited implementation overhead. 19 21 21
Outline Introduction to Bloom Filters The Variable-Increment Counting Bloom Filter Intuition for Variable Increments The Bh-CBF Scheme The VI-CBF Scheme Experimental Results Summary 22
Experimental Results Internet trace (equinix-chicago) with real hash functions. For the Bh-CBF, (with ). For the VI-CBF, and . . 21
Concluding Remarks Encoding the element identity using Variable Increments Considering the exact values of the counters upon query Can extend many variants of the counting Bloom filter First time sequences are presented in networking applications 22 24 24
Thank You