Download presentation
Presentation is loading. Please wait.
Published byEaster Wells Modified over 9 years ago
1
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa
2
We wish to count the number of occurrences of various items from a very large domain. To gain space efficiency, we are willing to tolerate an “approximate count” only. Approximate Counting
3
Bloom Filters An array BF of m bits and k hash functions {h 1,…,h k } over the domain [0,…,m-1] Adding an object obj to the Bloom filter is done by computing h 1 (obj),…, h k (obj) and setting the corresponding bits in BF Checking for set membership for an object cand is done by computing h 1 (cand),…, h k (cand) and verifying that all corresponding bits are set m=11, k=3, 111 h 1 (o1)=0, h 2 (o1)=7, h 3 (o1)=5 BF= h 1 (o2)=0, h 2 (o2)=7, h 3 (o2)=4 √ ×
4
Counting Bloom Filters A vector of counters (instead of bits) A counting Bloom filter supports the operations: – Increment Increment by 1 all entries that correspond to the results of the k hash functions – Decrement Decrement by 1 all entries that correspond to the results of the k hash functions – Estimate (instead of get) Return the minimal value of all corresponding entries m=11 368 k=3, h 1 (o1)=0, h 2 (o1)=7, h 3 (o1)=5 CBF= Estimate(o1)=4 4 9 7
5
Give up the ability to Decrement in favor of accuracy/space efficiency – During an Increment operation, only update the lowest counters m=11 368 k=3, h 1 (o1)=0, h 2 (o1)=7, h 3 (o1)=5 SBF-MI= Increment(o1) only adds to the first entry (3->4) 4 Empirically shown to improve accuracy! Up to two orders of magnitude for some workloads. – But not formally understood. Conservative Update Technique
6
Motivation Applications: – Network messurements and heavy hitters. – Network security: anomaly detection. – Cache admission policy Additional applications in other fields: e.g. databases and natural language processing.
7
TinyLFU - Cache Admission Policy (PDP 2014) Frequency Rank The access distribution of most content is skewed ▫ Often modeled using Zipf-like functions, power-law, etc. Long Heavy Tail For example~(50% of the weight) A small number of very popular items For example~(50% of the weight)
8
Cache Victim Winner Eviction and Admission Policies Eviction Policy Admission Policy New Item One of you guys should leave… is the new item any better than the victim? What is the common Answer?
9
Conservative Update allows counting just the head items, with high accuracy, so our cache can make educated admission decisions. Undesired Desired Items Conservative Update - Intuition
10
Admission Policy Example More memory Better cache management Without admission policy Frequency based admission policy Cache Size Hit Rate
11
The Basic Observation CBF = LCS = 111 111 2 2 2 111 1 1 If we can quantify how many items are inserted to each level in the LCS we can bound the error. A CBF is exactly like
12
Simple Observations It is useful to discuss the number of items that are inserted to each level of the LCS. Since all levels are considered the same – the false positive probability of each level is determined only by the number of items inserted to that level. A false positive at a higher level implies false positive at all lower levels.
13
Known (constant) distribution Large enough sample – We assume that we can make a ‘characteristic’ histogram. Formally we know how many items are going to appear every number of times. The Model
14
Denote A[i] - the number of items that are actually inserted to level i. By definition: A min/max argument about the lowest level that could have experienced a false positive yields the following: Lower Bound
15
Upper Bound Is derived similar by upper bounding A[i]. Requires a bit further assumptions. Technical details in the paper.
16
Accurate Configuration – Uniform
17
Accurate Configuration – Zipf 1
18
Inaccurate Configuration – Uniform
19
Inaccurate Configuration – Zipf 1
20
Real Trace – Counting TCP packets
21
Summery A simple analysis to an extensively used approximate counting optimization. First to analyze it for general distributions Lower and upper bounds on model Good indicator on real workloads. An extended version published as tech report. Thank You
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.