NASSP Masters 5003F - Computational Astronomy - 2009 Lecture 4: mostly about model fitting. The model is our estimate of the parent function. Let’s express.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
NASSP Masters 5003F - Computational Astronomy Lecture 5: source detection. Test the null hypothesis (NH). –The NH says: let’s suppose there is no.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Visual Recognition Tutorial
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
458 Fitting models to data – II (The Basics of Maximum Likelihood Estimation) Fish 458, Lecture 9.
Chapter 4 Multiple Regression.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Maximum likelihood (ML)
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Chi-squared distribution  2 N N = number of degrees of freedom Computed using incomplete gamma function: Moments of  2 distribution:
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Today’s lesson Confidence intervals for the expected value of a random variable. Determining the sample size needed to have a specified probability of.
NASSP Masters 5003F - Computational Astronomy Lecture 7 – chi squared and all that Testing for goodness-of-fit continued. Uncertainties in the fitted.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (1)
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
NASSP Masters 5003F - Computational Astronomy Lecture 3 First, a bit more python. Then some noise statistics.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Section 10.1 Confidence Intervals
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
NASSP Masters 5003F - Computational Astronomy Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Estimation of Population Mean Dr. T. T. Kachwala.
Lecture 8 Source detection NASSP Masters 5003S - Computational Astronomy
Machine Learning 5. Parametric Methods.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Confidence Intervals.
The simple linear regression model and parameter estimation
Physics 114: Lecture 13 Probability Tests & Linear Fitting
12. Principles of Parameter Estimation
(5) Notes on the Least Squares Estimate
Basic Estimation Techniques
Ch3: Model Building through Regression
The Maximum Likelihood Method
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Basic Estimation Techniques
CONCEPTS OF ESTIMATION
CHAPTER 14: Confidence Intervals The Basics
Geology Geomath Chapter 7 - Statistics tom.h.wilson
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
5.2 Least-Squares Fit to a Straight Line
12. Principles of Parameter Estimation
Presentation transcript:

NASSP Masters 5003F - Computational Astronomy Lecture 4: mostly about model fitting. The model is our estimate of the parent function. Let’s express the model m(x i ) as a function of a few parameters θ 1, θ 2.. θ P Finding the ‘best fit’ model then just means best estimates of the θ. (Bold – shorthand for a list) Knowledge of physics informs choice of θ. The parent function – what we’d like to find out (but never can, exactly).

NASSP Masters 5003F - Computational Astronomy Naive best fit calculation: Want to minimize all the deviates |y i -m i |. A reasonable single number to minimize is: But what if some bits have larger σ than others? –Answer: weight by 1/ σ 2 i – just like the best-SNR weighted average: This is sometimes (a bit deceptively) known as the chi squared ( χ 2 ) formula. Choose θ which minimizes U.

NASSP Masters 5003F - Computational Astronomy Simple example: m i = θ 1 + θ 2 s i Model – red is s i, green the flat background. The data y i : Map of U.

NASSP Masters 5003F - Computational Astronomy Sometimes known as the ‘method of least squares’. Have ignored possibility that x i might also have errors. Concept of degrees of freedom ν: –The higher the number P of parameters, the better the model fits the noise, therefore the lower the average (y i -m i ) 2. –Normalize by ν = N – P. Reduced χ 2 : Should ~1. χ 2 remarks

NASSP Masters 5003F - Computational Astronomy How good is the fit? χ 2 for the parent function has a probability distribution Probability of χ 2 equalling U or higher is Equals: probability that the data come from the model. But… is U truly distributed as χ 2 ? –If in doubt, check with a Monte Carlo!

NASSP Masters 5003F - Computational Astronomy χ 2 for Poisson data Choose data y i as estimator for σ i 2 ? –Can have zero values in denominator. Choose (evolving) model as estimator for σ i 2 ? –Gives a biased result. Better: Mighell formula Unbiased, but no good for goodness-of-fit. –Use Mighell to fit θ then standard U for “goodness of fit” (GOF).

NASSP Masters 5003F - Computational Astronomy Likelihood Take, for example, a single datum y which is a parent function f + gaussian noise. IF m = f, But, can also think of this as the ‘probability’ of m given y, p(m|y). (Then no worry about the ‘if’.) Comments on p(m|y): –May not be true; hence, Bayesian use of any extra prior information. –Hard to check. –However, the flow of information seems right: ie, from known (the data) towards unknown but desired (the model). Known as the likelihood of m given y.

NASSP Masters 5003F - Computational Astronomy Even simpler example: m = θ Probability p(y|θ) = θ y e –θ / y! Likelihood p(θ|y) = θ y e –θ / y! For Poissonian noise this time (because it is more interesting, and harder to handle with ‘traditional’ χ 2 ):

NASSP Masters 5003F - Computational Astronomy Likelihood continued. We can use likelihood to calculate the best-fit θ. If we have several data values y=[y 1,y 2 …y N ], multiply the separate likelihoods together: It’s often easier if we take logs:

NASSP Masters 5003F - Computational Astronomy Likelihood continued. Back to 2-parameter model m i = θ 1 + θ 2 s i of slide 3, but now with Poissonian noise. Minimize L same as U to get optimum θ. Can ignore the Σln(y i !) term because it doesn’t depend on any θ. p( θ 1, θ 2 |y) is an example of a joint probability distribution, in this case a bivariate one because there’s only 2 parameters.

NASSP Masters 5003F - Computational Astronomy Poissonian/likelihood version of slide 3 Model – red is s i, green the flat background. The data y i : Map of the joint likelihood L.

NASSP Masters 5003F - Computational Astronomy Likelihood continued. An interesting fact: –Maximum likelihood for gaussian data leads to the U (ie, ‘ χ 2’ ) expression!