Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone.

Similar presentations


Presentation on theme: "Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone."— Presentation transcript:

1 Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone conclusion.” Anon

2 The Medical Test You take a test for a rare debilitating disease, Frequentitis: False Positive rate for the test = 5% False Negative rate for the test = 1% The incidence of Frequentitis in the population is 0.1% Data D = You test positive. Bayes Theorem P(H|D,I) = P(D|I) P(H|I) P(D|H I) What are the odds H that you have the disease? Normalization factor P(D|I) ensures  i P(H i |D,I) = 1 P(D|I) = P(H|I) P(D|H I) + P(H|I) P(D|H I) P(H|D,I) = 0.1%  99% 0.1%  99% + 99.9%  5% = 0.019 The odds are 1.9% (not 95%) that you have the disease! (Sum Rule)

3 Two basic classes of Inference Which of two or more competing models is the most probable given our present state of knowledge? 1. Model Comparison Competing models may have free parameters Models may vary in complexity (some with more free parameters) Generally, model comparison is not concerned with finding values Free parameters are usually Marginalized out in the analysis. Given a certain model, what is the probability density function for each of its free parameters? 2. Parameter Estimation Suppose model M has free parameters f and A We wish to find p( f | D, M, I) and p( A | D, M, I) p( f | D, M, I) is known as the Marginal Posterior Distribution for f

4 Where: 0 = 37 &  L = 2 (channels) { } Spectral Line fitting Gaussian line profile in noisy data. We are given the model M: Tf i = T exp -( i - 0 ) 2 2L22L2 Noise has been independently characterized as Gaussian with  n = 1 (in units of the signal) Estimates of T from theory are uncertain over three orders of magnitude from 0.1 to 100

5 Parameter Estimation: Spectral Line Fit P(T|D,M,I) = P(D|,M,I) P(T|M,I) P(D|M,T, I)  Calculating the likelihood P(D|M,T, I) d i = Tf i + e i P(D|M,T, I) = P(E 1,E 2,E 3...E n ||M,T, I) = ii P(E i ||M,T, I) { } 1 exp -(d i - Tf i ) 2 2n22n2 ii =  n sqrt(2  ) N { } exp -  i (d i - Tf i ) 2 2n22n2 =  n -N (2  ) -N/2 PriorLikelihood For Gaussian Noise

6 Parameter Estimation: Spectral Line Fit What to use for the Prior P(T|M, I) ? For now let us assume a Uniform prior: T max < T < T min Posterior  Likelihood. Maximum Likelihood Estimator

7 The choice of Prior Our choice of prior can have strong influence the outcome of a Bayesian analysis. In our example, we adopted a uniform prior for the unknown line strength. Was this the right thing to do? T max < T < T min where T min = 0.1 and T max = 100 (given in problem) Implication: we don’t know the scale. Uniform prior heavily weights to upper decade of range. In such cases, consider the use of the scale invariant Jeffreys Prior (equal probability per decade). P(T|I)= T ln(T max / T min ) 1 The Jeffreys Prior is defined: Jeffreys Uniform Jeffreys Uniform PDF Prob. Per log interval

8 Varying the Prior: Spectral Line Fit Jeffreys Uniform Posterior PDF for line strength T

9 Increasing the linestrength T In the case of a stronger line detection, the data make a more powerful constraint on the parameters, so that the choice of prior is less critical. Jeffreys Uniform Posterior PDF for line strength T Stronger linestrength case

10 Ignorance Priors Location Parameter: Measured location from some origin: p(X|I)= p(x → x+dx|I) The solution to this is: pdf(x) = constant (uniform prior) From a different (arbitrary) origin p(X’|I)= p(x’ → x’+dx’|I) where x’ = x + c Indifference requires p(X|I) = p(X’|I) so that pdf(x)=pdf(x’)=pdf(x+c) Principle of indifference states p(A i |B) = 1/N where N possible states How to select a prior when (we think) we have no clue?

11 Ignorance Priors The solution to this is: pdf(t) = constant/t (Jeffreys prior) Ignorance of a scale parameter implies we should have invariance of the distribution when measured either in units t or t’= βt Then: p(T|I) dT = p(T’|I) dT’ = p(T’|I) dβT = β p(T’|I) dT Scale Parameter: for example, half life of a new radioactive element pdf(t)= β pdf(t’)= β pdf(β t)

12 Improper Priors Suppose we have absolutely no idea of the limits, X min and X max (a recent physics example: the distance to a GRB). A uniform prior with an infinite range cannot be normalized. Such priors are known as Improper Priors. Improper priors can still be used for parameter estimation problems (like the previous problem), but not for model comparison (Lecture 4) where the normalization of the prior is required to obtain probabilities

13 Nuisance Parameters Frequently, we are only interested in a subset of the model parameters. Uninteresting parameters are called Nuisance Parameters. Example: We may be interested in the frequency  of a sinusoidal signal in a noisy dataset, but not interested in the phase  or the amplitude a. P(  |D,I) =   d  da P( , , a|D,I) We obtain the marginal posterior for  by the process of marginalization (by integration) of the nuisance parameters:

14 Where: 0 = 37 &  L = 2 (channels) { } Spectral Line fitting Gaussian line profile in noisy data. We are given the model M: Tf i = T exp -( i - 0 ) 2 2L22L2 Noise has been independently characterized as Gaussian with  n = 1 (in units of the signal) Estimates of T from theory are uncertain over three orders of magnitude from 0.1 to 100 Likely Examples of Nuisance Parameters

15 Maximum Likelihood Then we have: Assume a uniform Prior Ignore normalization factor P(X|D,I) = P(D|I) P(X|I) P(D|X,I) Hypothesis H pertains to the PDF for a variable x P(X|D,I)  P(D|X I) The value x 0 which gives the maximum value for the posterior in this case is that which maximises the value of the likelihood function P(D|X,I), and is referred to as the Maximum Likelihood estimator.

16 Maximum Likelihood and Least Squares Assuming the noise is Gaussian, then for each datum: Where F i = f(x,i) is our ideal (noiseless) model prediction ii P(D i ||X, I) { } exp -(F i - D i ) 2 2i22i2  i √(2  ) N (F i - D i ) 2 Given a set of data D, where individual data points are independent, then the likelihood is: P(D|X, I) = P(D i ||X, I) = 1 Thus P(D|X,I)  exp( - ) χ2χ2 2 χ2 =χ2 = where ii i2i2 N Since the location of a maximum is not affected by a monotonic transformation: L = ln [ P(X|D,I) ] = constant - χ2χ2 2 Maximum Likelihood is obtained by minimizing χ 2 we have derived the well-known Least Squares optimization result.


Download ppt "Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone."

Similar presentations


Ads by Google