Download presentation
Presentation is loading. Please wait.
Published byBrendan Poole Modified over 9 years ago
1
NASSP Masters 5003F - Computational Astronomy - 2010 Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least squares’). –Likelihood Hypothesis testing
2
NASSP Masters 5003F - Computational Astronomy - 2009 Model fitting – reminder of the terminology: We have data y i at samples of some independent variable x i. The model is our estimate of the parent or truth function. Let’s express the model m(x i ) as a function of a few parameters θ 1, θ 2.. θ M. Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list) Knowledge of physics informs choice of m, θ. The parent function – what we’d like to find out (but never can, exactly).
3
NASSP Masters 5003F - Computational Astronomy - 2009 Naive best fit calculation: The residuals for a particular model = y i -m i. To ‘thread the model through the middle of the noise’, we want the magnitudes of all residuals to be small. A reasonable way (not the only way) to achieve this is to define a sum of squared residuals as our objective function: Fitting by minimizing this objective function is called the method of least squares. It is extremely common. NOTE! This approach IGNORES possible uncerts in x.
4
But what if the noise is not homogeneous? Some bits clearly have higher σ than others. Answer: weight by 1/ σ 2 i This form of U is sometimes called χ 2 (pronounced kai squared). To use it, we need to know the σ i. NASSP Masters 5003F - Computational Astronomy - 2009
5
Simple example: m i = θ 1 + θ 2 s i Model – red is s i, green the flat background. The data y i : Contour map of U ls. Truth values!
6
An even simpler example: Last lecture, I noted that there do exist cases in which we can directly invert For least squares, this happens if the model is a polynomial function of the parameters θ i. Expansion of grad U in this case gives a set of M linear equations in the M parameters called the normal equations. It is easy to solve these to get the θ i. NASSP Masters 5003F - Computational Astronomy - 2009
7
Simplest example of all: fitting a straight line. Called linear regression by the statisticians. There is a huge amount of literature on it. Normal equations for a line model turn out to be: Polynomial is an easy extension to this. NASSP Masters 5003F - Computational Astronomy - 2009
8
χ 2 for Poisson data – possible, but problematic. Choose data y i as estimator for σ i 2 ? –No - can have zero values in denominator. Choose (evolving) model as estimator for σ i 2 ? –No - gives a biased result. Better: Mighell formula Unbiased, but no good for goodness-of-fit. –Use Mighell to fit θ then standard U for “goodness of fit” (GOF). Mighell K J, Ap J 518, 380 (1999)
9
Another choice of U: likelihood. Likelihood is best illustrated by Poisson data. Consider a single Poisson random variable y: its PDF is where m here plays the role of the expectation value of y. We’re used to thinking of this as a function just of one variable, ie y; –but it is really a function of both y and m. NASSP Masters 5003F - Computational Astronomy - 2009
10
Poisson PDF NASSP Masters 5003F - Computational Astronomy - 2009
11
Poisson PDF NASSP Masters 5003F - Computational Astronomy - 2009
12
PDF for y vs likelihood for θ. Probability p(y|θ) = θ y e –θ / y! Likelihood p(y|θ) = θ y e –θ / y!
13
The likelihood function. Before, we thought “given m, let us apply the PDF to obtain the probability of getting between y and y+dy.” Now we are saying “well we know y, we just measured it. We don’t know m. But surely the PDF taken as a function of m indicates the probability density for m.” Problems with this: –Likelihood function is not necessarily normalized, like a ‘proper’ PDF; –What assurance do we have that the true PDF for m has this shape?? NASSP Masters 5003F - Computational Astronomy - 2009
14
Likelihood continued. Usually we have many (N) samples y i. Can we arrive at a single likelihood for all samples taken together? (Note that we’ve stopped talking just about Poisson data now – this expression is valid for any form of p.) Sometimes easier to deal with the log- likelihood L: NASSP Masters 5003F - Computational Astronomy - 2009
15
Likelihood continued To get the best-fit model m, we need to maximize the likelihood (or equivalently, the log likelihood). If we want an objective function to minimize, it is convenient to choose –L. Can show that for Gaussian data, minimizing –L is equivalent to minimizing the variance-weighted sum of squared residuals (=chi squared) given before. –Proof left as an exercise! NASSP Masters 5003F - Computational Astronomy - 2009
16
Poissonian/likelihood version of slide 3 Model – red is s i, green the flat background. The data y i : Map of the joint likelihood L.
17
What if also errors in x i ? Tricky… Bayes better in this case. NASSP Masters 5003F - Computational Astronomy - 2009
18
What next? In fitting a model, we want (amplifying a bit on lecture 4): 1.The best fit values of the parameters; 2.Then we want to know if these values are good enough! If not: need to go back to the drawing board and choose a new model. 3.If the model passes, want uncertainties in the best-fit parameters. (I’ll put this off to a later lecture…) Number 1 is accomplished. √ NASSP Masters 5003F - Computational Astronomy - 2009
19
How to tell if our model is correct. Supposing our model is absolutely accurate. The U value we calculate is, nevertheless, a random variable: each fresh set of data will give rise to a slightly different value of U. In other words, U, even in the case of a perfectly accurate model, will have some spread – in fact, like any other random variable, it will have a PDF. –This PDF is sometimes calculable from first principles (if not, one can do a Monte Carlo to estimate it). NASSP Masters 5003F - Computational Astronomy - 2009
20
How to tell if our model is correct. The procedure is: –First calculate the PDF for U in the ‘perfect fit’ case; –From this curve, obtain the value of the PDF at our best-fit value of U; –If p(U best fit ) is very small, it is unlikely that our model is correct. –Note that both χ 2 and –L have the property that they cannot be negative. –A model which is a less than ideal match to the truth function will always generate U values with a PDF displaced to higher values of U. NASSP Masters 5003F - Computational Astronomy - 2009
21
Perfect vs. imperfect p(U): NASSP Masters 5003F - Computational Astronomy - 2009 A perfect model gives this shape PDF PDF for imperfect model is ALWAYS displaced to higher U.
22
Goodness of model continued Because plausible Us are >=0; and because an imperfect model always gives higher U: we prefer to –generate the survival function for the perfect model; –that tells us the probability of a perfect model giving us the measured value of U or higher. This procedure is called hypothesis testing. Because we make the hypothesis: –“Suppose our model is correct. What sort of U value should we expect to find?” We’ll encounter the technique again next lecture when we turn to enquire if there is any signal at all buried in the noise. NASSP Masters 5003F - Computational Astronomy - 2009
23
If we use the least-squares U (also known as χ 2 ), this is easy, because p(U) is known for this: where –Г is the gamma function –and υ is called the degrees of freedom. Note: the PDF has a peak at U~ υ. Perfect-model p(U)s: NASSP Masters 5003F - Computational Astronomy - 2009
24
What are degrees of freedom? The easiest way to illustrate what degrees of freedom is, is to try fitting a polynomial of higher and higher order to a set of noisy data. The more orders we include, the nearer the model will fit the data, and the smaller the sum of squared residuals ( χ 2 ) will be, until… when M=N (ie the number of parameters, polynomial orders in this case, equals the number of data points), the model will go through every point exactly. χ 2 will equal 0. NASSP Masters 5003F - Computational Astronomy - 2009
25
Degrees of freedom Defined as N-M: number of data points minus number of parameters fitted. It is sometimes convenient to define a reduced chi squared –PDF for χ 2 reduced should of course peak at about 1. –There is no advantage in using this for minimization rather than the ‘raw’ χ 2. NASSP Masters 5003F - Computational Astronomy - 2009
26
‘Survival function’ for U. Remember the survival function of a PDF is defined as For χ 2 this is where Г written with 2 arguments like this is called the incomplete gamma function: NASSP Masters 5003F - Computational Astronomy - 2009
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.