Chebychev, Hoffding, Chernoff
Histo 1 X = 2*Bin(300,1/2) – 300 E[X] = 0
Histo 2 Y = 2*Bin(30,1/2) – 30 E[Y] = 0
Histo 3 Z = 4*Bin(10,1/4) – 10 E[Z] = 0
Histo 4 W = 0 E[W] = 0
A natural question: Is there a good parameter that allow to distinguish between these distributions? Is there a way to measure the spread?
Variance and Standard Deviation The variance of X, denoted by Var(X) is the mean squared deviation of X from its expected value m = E(X): Var(X) = E[(X-m)2]. The standard deviation of X, denoted by SD(X) is the square root of the variance of X.
Computational Formula for Variance E[ (X-m)2] = E[X2 – 2m X + m2] E[ (X-m)2] = E[X2] – 2m E[X] + m2 E[ (X-m)2] = E[X2] – 2m2+ m2 E[ (X-m)2] = E[X2] – E[X]2
Properties of Variance and SD
Markov
Chebyshev’s Inequality Theorem: X is random variable on sample space S, and P(X=r) it’s probability distribution. Then for any positive real number r: (proof in book) In words: the probability of finding a value of X farther away from the mean than r is smaller than the variance divided by r^2. r
Example:
Proof
Popular form (same) “k standard deviations”
Example
Hoffding (1963) Let be random variables that for
, Chernoff
Example