Descriptive statistics Experiment Data Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root of N
Inferential Statistics Model Estimates of parameters Inferences Predictions
Importance of the Gaussian
Why is the Gaussian important? Sum if independent observations converge to Gaussian, Central Limit Theorem Linear combination is also Gaussian Has maximum entropy for given Least-squares becomes max likelihood Derived variables have known densities Sample means and variances of independent samples are independent
Derived distributions Sample mean is Gaussian Sample variance is distributed Sample mean with unknown variance is Student-t distributed This allows us to get confidence intervals for mean and variance
The logic of confidence intervals The mean with unknown variance is distributed as Student-t ; that is, if samples x i are normally distributed, where is the sample mean and is the sample variance, is distributed as Student-t Pick q 1 and q 2 from “tables” so that prob{ q 1 < < q 2 } = 0.99
< μ < Then which gives us confidence intervals on where the actual mean can be
Simulating random arrivals Method 1: take small t, flip coin with event probability t Method 2: generate exponentially distributed r. variable to determine next arrival time (use transformation of uniform)
Binomial distribution (Bernoulli trials) Suppose we flip a fair coin n times. The mean # of heads is n/2, and the standard deviation is. For large n ( about > 30), the distribution, called binomial, approaches normal. Specifically, if x is the number of heads, the normalized variable is distributed as N(0,1), the normal distribution with mean 0 and variance 1.
This enables to estimate probability of events using Bernoulli trials very easily. Example: We flip a coin 100 times and observe 60 heads. What is the probability of that event?
Martin Gardner: How not to test a Psychic (Prometheus, 1989) p. 31: report of claim that a psychic subject made 781 hits out of That corresponds to z = 17.8 [ z = Notice that we get here is prob{event|hypothesis}, where the hypothesis is that the trials are Bernoulli. What we don’t get is the prob{hypothesis|event}. 9.5 E-21 ]