Probability Distributions and Frequentist Statistics “A single death is a tragedy, a million deaths is a statistic” Joseph Stalin
Can we answer that? 1 st draw M Red N-M Blue 2 nd draw ? N Balls Total ? P(R 1 |I) = (M/N)
The Red and the Blue Red-2 R 2 = (R 1 + B 1 ), R 2 M Red N-M Blue N Balls Total R 2 = R 1,R 2 + B 1, R 2 P(R 2 |I ) = P(R 1, R 2 | I ) + P(B 1, R 2 | I ) = P(R 1 | I ) P(R 2 | R 1, I ) + P(B 1 | I ) P(R 2 | B 1, I ) N - 1 M - 1MN - M N - 1 M N = + N M N = = P(R 1 |I ) Using product rule... = P(R 3 |I ) etc The Outcome of first draw is a “nuisance” parameter. Marginalize = Integrate over all options.
Marginalization RAINNO RAIN CLOUDS NO CLOUDS 1/6 0 1/3 1/2 1/65/6 Chance of Rain Chance of Cloud
Marginalization Where A i represents a set of Mutually Exclusive and Exhaustive possibilities, then marginalization or integrating out of “nuisance parameters” takes the form: P( |D,I) = i P( , A i |D,I) Or in the limit of a continuously variable parameter A (rather than discrete case above) P changes into a probability density function: P( |D,I) = dA P( , A|D,I) This technique is often required in inference, for example we may be interested in the frequency of a sinusoidal signal in noisy data, but not interested in the amplitude (a nuisance parameter)
Probability Distributions We denote probability distributions over all possible values of a variable x by p(x). Discrete Continuous Cumulative Lim [p(x < X < x+δx)] / δx δx→ 0
Properties of Probability Distributions The expectation value for a function g(X) is the weighted average: g(X) = g(x) p(x) (discrete) All x ʃ g(x) f(x) dx (continuous) If it exists, this is the first moment, or mean of the distribution. The r th moment for a random variable X about the origin (x=0) is: ’ r = X r = x r p(x) (discrete) All x ʃ x r f(x) dx (continuous) The mean = ’ 1 = X is the 1 st moment about the origin.
Properties of Probability Distributions Therefore the variance x 2 = X 2 – X 2 The r th central moment for a random variable X about the mean (origin= ) is: r = (X- ) r = (x- ) r p(x) (discrete) All x ʃ ( x- ) r f(x) dx (continuous) First central moment: 1 = (X- ) = 0 Second central moment: Var(X) = x 2 = ( X - ) 2 x 2 = ( X - ) 2 = ( X 2 – 2 X + 2 ) = X 2 – 2 X + 2 = X 2 – 2 2 + 2 = X 2 – 2 = X 2 – X 2
Properties of Probability Distributions Third central moment: 3 = ( X - ) 3 Skewness Fourth central moment: 4 = ( X - ) 4 Kurtosis The median and the mode both provide estimates of central tendency for a distribution, and are in many cases more robust against outliers than the mean.
Example: Mean and Median filtering Mean Filter Median Filter Image degraded by salt noise
The Uniform Distribution A flat distribution with peak value normalized so that the area under the curve=1 Uniform PDFCumulative Uniform PDF Commonly used as an ingnorance prior to express impartiality (a lack of bias) of the value of a quantity over the given interval. Round-off error, quantization error are uniformly distributed
The Binomial Distribution Binomial statistics apply when there are exactly two mutually exclusive outcomes of a trial (labelled "success" and "failure“). The binomial distribution gives the probability of observing k successes in n trials, with the probability of success on a single trial denoted by p (p is assumed fixed for all trials). Fixed n, Varying p Fixed p, Varying n Among the most useful discrete distribution functions in statistics. Multinomial distribution is a generalization for the case where there is more than a binary outcome. n
The Negative Binomial Distribution Closely related to the Binomial distribution, the Negative Binomial Distribution applies under the same circumstances but where the variable of interest is the number of trials n to obtain k successes and n-k failures (rather than the number of successes in N trials). For n Bernoulli trials each with success fraction p, the negative_binomial distribution gives the probability of observing k failures and n-k successes with success on the last trial:
The Poisson Distribution Another crucial discrete distribution function, the Poisson expresses the probability of a number of events k (e.g. failures, arrivals, occurrences...) occurring in a fixed period of time (or fixed area of space), provided these events occur with a known mean rate λ (events/time), and are independent of the previous event. Poisson distribution is the limiting case of a binomial distribution where the probability for success p goes to zero while the number of trials n grows such that λ = np is finite. Examples: photons received from a star in an interval; meteorite impacts over an area; pedestrians crossing at an intersection etc…
The Normal (Gaussian) Distribution The Normal or Gaussian distribution is probably the most well known statistical distribution. A Gaussian with mean zero and standard deviation one is known as the Standard Normal Distribution. Given mean μ and standard deviation σ it has the PDF: Continuous distribution which is the limiting case for a binomial as the number of trials (and successes) is very large. Its pivotal role in statistics is partly due to the Central Limit Theorem (see later).
Examples: Gaussian Distributions Human IQ Distribution
The Power Law Distribution Power law distributions are ubiquitous in science, occurring in diverse phenomena, including city sizes, incomes, word frequencies, and earthquake magnitudes. A power- law implies that small occurrences are extremely common, whereas large instances are extremely rare. This “law” takes a number of forms (can be referred to as Zipf and sometimes Pareto). A simple illustrative power law is: Power Law PDF - Linear Scale Power Law PDF – Log-Log scale k=0.5 K=1.0 K=2.0
Example Power Laws from Nature
Physics Example: Cosmic Ray Spectrum
The Exponential Distribution The exponential distribution is a continuous probability distribution with an exponential falloff controlled by the rate parameter λ: larger values of λ entail a more rapid falloff in the distribution. The exponential distribution is used to model times between independent events which happen at a constant average rate (e.g. lifetimes, waiting times).
The gamma Distribution The gamma distribution is a two-parameter continuous pdf characterized by two parameters usually designated the shape parameter k and the scale parameter θ. When k=1 it coincides with the exponential distribution, and is also closely related to the Poisson and Chi Squared Distributions. Gamma PDF: Where the Gamma function is defined: The Gamma distribution gives a flexible class of PDFs for nonnegative phenomena, often used in modeling waiting times. Conjugate for the Poisson PDF
The Beta Distribution The family of beta probability distributions is defined on the fixed interval [0,1] and parameterized by two positive shape parameters, α and β. In Bayesian statistics it is frequently encountered as a prior for the binomial distribution. Beta PDF: Where the Beta function is defined: The family of Beta distributions allows for a wide variety of shapes over a fixed interval. If likelihood function is a binomial, then a Beta prior will lead to another beta function for the posterior. The role of the Beta function can be thought of as a simple normalization to ensure that the total PDF integrates to 1.0
Central Limit Theorem: Experimental demonstration.....
Central Limit Theorem: A Bayesian demonstration x1x1 dx 1 x2x2 dx 2 ydy X 1 x 1 to dx 1 X 2 x 2 to dx 2 Y y to dy I Y is the sum of X 1 and X 2 P(Y |I ) = dX 1 dX 2 P(Y, X 1, X 2 | I ) P(x 1 |I ) = f 1 (x 1 ) P(x 2 |I ) = f 2 (x 2 ) = dX 1 dX 2 P(X 1 | I ) P(X 2 | I ) P(Y | X 1, X 2, I ) Using the product rule, and independence of X 1, X 2 P(Y | X 1, X 2, I ) = δ (y – x 1 – x 2 ) B ecause y = x 1 + x 2 Therefore P(Y |I ) = dX 1 f 1 (x 1 ) dX 2 f 2 (x 2 ) δ (y – x 1 – x 2 ) = dX 1 f 1 (x 1 ) f 2 (y – x 1 ) Convolution Integral
Central Limit Theorem: Convolution Demonstration