Ariel Rosenfeld
Counter ranges from 0 to M requiers log 2 M bits. For large data log 2 M is still a lot. Using probability to reduce to log 2 log 2 M bits. ◦ Small probability of errors.
Counting of a large number of events using a small amount of memory, while incorporating some probability by Robert Morris.Robert Morris 1982 analyzed by Philippe Flajolet.Philippe Flajolet
Gathering statistics on a large number of events Streaming data frequency Data compression Etc..
Because we give up accuracy, we use 2 k approximation and only keep the exponent. Representing if the approximate number is M, we only keep 2 k =M in binary form. Log 2 log 2 M How do we know when to increase k?
Generate "c" pseudo-random bits ◦ "c" = current value of the counter If all are 1 ◦ What is the probability? ◦ How to check it efficiently? Simply add the result to the counter.
What is the probability of increment? ◦ 2 -C After N increments (probabilistic explanation in article) ◦ E(2 C ) = n+2 ◦ Var(2C) = n(n+ 1)/2 ◦ Small chance to be “far off”.
Increase was called 1024 times. ◦ Correct value should be 10. ◦ Chance of being more than 1 off is ~8%.