Ariel Rosenfeld.  Counter ranges from 0 to M requiers log 2 M bits.  For large data log 2 M is still a lot.  Using probability to reduce to log 2 log.

Slides:



Advertisements
Similar presentations
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Advertisements

Computer Networking Error Control Coding
Ariel Rosenfeld Network Traffic Engineering. Call Record Analysis. Sensor Data Analysis. Medical, Financial Monitoring. Etc,
Algorithms for data streams Foundations of Data Science 2014 Indian Institute of Science Navin Goyal.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
Beyond Bloom Filters: From Approximate Membership Checks to Approximate State Machines By F. Bonomi et al. Presented by Kenny Cheng, Tonny Mak Yui Kuen.
Statistics.
Asynchronous Input Example Program counter normally increments, jumps to address of interrupt subroutine on asynchronous interrupt How many states can.
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,
BIST AND DATA COMPRESSION 1 JTAG COURSE spring 2006 Andrei Otcheretianski.
Algorithmic Complexity Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
1 Lecture 10: FP, Performance Metrics Today’s topics:  IEEE 754 representations  FP arithmetic  Evaluating a system Reminder: assignment 4 due in a.
Chapter 9 Numerical Integration Numerical Integration Application: Normal Distributions Copyright © The McGraw-Hill Companies, Inc. Permission required.
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
Statistics for Engineer Week II and Week III: Random Variables and Probability Distribution.
Efficient Minimal Perfect Hash Language Models David Guthrie, Mark Hepple, Wei Liu University of Sheffield.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon (Technion, Israel) Joint work with Iddo Hanniel and Isaac Keslassy ( Technion ) 1.
Theoretical and Experimental Probability Today you will learn to: calculate the theoretical and experimental probabilities of an event. M07.D-S.3.1.1:
Defining Success Approximate the Probability of a Chance Event.
CEDAR Counter-Estimation Decoupling for Approximate Rates Erez Tsidon Joint work with Iddo Hanniel and Isaac Keslassy Technion, Israel 1.
TinyLFU: A Highly Efficient Cache Admission Policy
How Computers Work … and how you can work them. Art 315 Lecture 03 Dr. J Parker Fall 2010.
A Formal Analysis of Conservative Update Based Approximate Counting Gil Einziger and Roy Freidman Technion, Haifa.
Outline Transmitters (Chapters 3 and 4, Source Coding and Modulation) (week 1 and 2) Receivers (Chapter 5) (week 3 and 4) Received Signal Synchronization.
ACADs (08-006) Covered Keywords Errors, accuracy, count rate, background, count time, equipment efficiency, sample volume, sample geometry, moisture absorption,
1 1 7-Dec-15 Binary Converting to and from decimal.
Calculating frequency moments of Data Stream
Forecast Interpretation CBRFC Stakeholder Forum October 20, 2015.
Relevance of Complex Network Properties Philippe Giabbanelli «Impact of complex network properties on routing in backbone networks» Philippe Giabbanelli,
CompSci 100e2.1 1 N-Body Simulation l Applications to astrophysics.  Orbits of solar system bodies.  Stellar dynamics at the galactic center.  Stellar.
{ Binary “There are 10 types of people in the world: Those who understand binary, and those who don't.”
Binary & Normalization What is Normalization? We discussed this the other day (special review session slides, near the end) Can someone tell us.
The Law of Averages. What does the law of average say? We know that, from the definition of probability, in the long run the frequency of some event will.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
1.1 ANALYZING CATEGORICAL DATA. FREQUENCY TABLE VS. RELATIVE FREQUENCY TABLE.
Презентацию подготовила Хайруллина Ч.А. Муслюмовская гимназия Подготовка к части С ЕГЭ.
Wrong Presentation Put In
Floating Point Numbers
Data Representation Covering… Binary addition / subtraction
Confidence Intervals Cont.
SIMILARITY SEARCH The Metric Space Approach
Updating SF-Tree Speaker: Ho Wai Shing.
Dr. Clincy Professor of CS
Lecture 22: Linearity Testing Sparse Fourier Transform
Sublinear Algorithmic Tools 2
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Chapter 3: Pulse Code Modulation
Lecture 4: CountSketch High Frequencies
Counting Statistics HPT Revision 3 Page of
Creating Subnets – Network Requirements
S=.2 s=.1.
More Multiplication Properties of Exponents
S=.2 s=.1.
One Way ANOVAs One Way ANOVAs
Binary “There are 10 types of people in the world: Those who understand binary, and those who don't.”
ML – Lecture 3B Deep NN.
Lecture 6: Counting triangles Dynamic graphs & sampling
Non-parametric Filters: Particle Filters
By: Ran Ben Basat, Technion, Israel
Tools of Environmental Science
Non-parametric Filters: Particle Filters
Ladder programming Counter Instruction S7 300
Approximate Counting Algorithm
Error Detection and Correction
Accuracy of Averages.
Maintaining Stream Statistics over Sliding Windows
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Today Binary addition Representing negative numbers 2.
Presentation transcript:

Ariel Rosenfeld

 Counter ranges from 0 to M requiers log 2 M bits.  For large data log 2 M is still a lot.  Using probability to reduce to log 2 log 2 M bits. ◦ Small probability of errors.

Counting of a large number of events using a small amount of memory, while incorporating some probability by Robert Morris.Robert Morris 1982 analyzed by Philippe Flajolet.Philippe Flajolet

 Gathering statistics on a large number of events  Streaming data frequency  Data compression  Etc..

Because we give up accuracy, we use 2 k approximation and only keep the exponent. Representing if the approximate number is M, we only keep 2 k =M in binary form. Log 2 log 2 M How do we know when to increase k?

 Generate "c" pseudo-random bits ◦ "c" = current value of the counter  If all are 1 ◦ What is the probability? ◦ How to check it efficiently?  Simply add the result to the counter.

 What is the probability of increment? ◦ 2 -C  After N increments (probabilistic explanation in article) ◦ E(2 C ) = n+2 ◦ Var(2C) = n(n+ 1)/2 ◦ Small chance to be “far off”.

 Increase was called 1024 times. ◦ Correct value should be 10. ◦ Chance of being more than 1 off is ~8%.