SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Lesson 10: Linear Regression and Correlation
The Simple Regression Model
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
Lecture 3 Today: Statistical Review cont’d:
Integration of sensory modalities
The General Linear Model. The Simple Linear Model Linear Regression.
Visual Recognition Tutorial
The Simple Linear Regression Model: Specification and Estimation
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Maximum likelihood (ML) and likelihood ratio (LR) test
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
SUMS OF RANDOM VARIABLES Changfei Chen. Sums of Random Variables Let be a sequence of random variables, and let be their sum:
SYSTEMS Identification
Maximum likelihood (ML) and likelihood ratio (LR) test
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Linear and generalised linear models
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
Linear and generalised linear models
Basics of regression analysis
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Separate multivariate observations
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Modern Navigation Thomas Herring
Least SquaresELE Adaptive Signal Processing 1 Method of Least Squares.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad Reference: “System Identification Theory For The User” Lennart.
1 EEE 431 Computational Methods in Electrodynamics Lecture 18 By Dr. Rasime Uyguroglu
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Virtual University of Pakistan
Data Modeling Patrice Koehl Department of Biological Sciences
The simple linear regression model and parameter estimation
Chapter 3: Maximum-Likelihood Parameter Estimation
12. Principles of Parameter Estimation
Ch3: Model Building through Regression
Modelling data and curve fitting
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Singular Value Decomposition SVD
5.4 General Linear Least-Squares
Regression Lecture-5 Additional chapters of mathematics
Simple Linear Regression
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
12. Principles of Parameter Estimation
Presentation transcript:

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy University of Glasgow

In the frequentist approach, parameter estimation requires the definition of a lot of mathematical machinery Random sample of size M, drawn from underlying pdf 2. Parameter Estimation and Goodness of Fit – Part One SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

In the frequentist approach, parameter estimation requires the definition of a lot of mathematical machinery Random sample of size M, drawn from underlying pdf Sampling distribution, derived from underlying pdf (depends on underlying pdf, and on M ) 2. Parameter Estimation and Goodness of Fit – Part One SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Random sample of size M, drawn from underlying pdf Sampling distribution, derived from underlying pdf Define an estimator – function of sample used to estimate parameters of the pdf In the frequentist approach, parameter estimation requires the definition of a lot of mathematical machinery Random sample of size M, drawn from underlying pdf Sampling distribution, derived from underlying pdf (depends on underlying pdf, and on M ) 2. Parameter Estimation and Goodness of Fit – Part One SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Random sample of size M, drawn from underlying pdf Sampling distribution, derived from underlying pdf Define an estimator – function of sample used to estimate parameters of the pdf Hypothesis test – to decide if estimator is ‘acceptable’, for the given sample size In the frequentist approach, parameter estimation requires the definition of a lot of mathematical machinery Random sample of size M, drawn from underlying pdf Sampling distribution, derived from underlying pdf (depends on underlying pdf, and on M ) 2. Parameter Estimation and Goodness of Fit – Part One SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Random sample of size M, drawn from underlying pdf Sampling distribution, derived from underlying pdf Define an estimator – function of sample used to estimate parameters of the pdf Hypothesis test – to decide if estimator is ‘acceptable’, for the given sample size In the frequentist approach, parameter estimation requires the definition of a lot of mathematical machinery Random sample of size M, drawn from underlying pdf Sampling distribution, derived from underlying pdf (depends on underlying pdf, and on M ) How do we decide what makes an ‘acceptable’ estimator? 2. Parameter Estimation and Goodness of Fit – Part One SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example:measuring the wavelength of a spectral line True wavelength = (fixed but unknown parameter) Compute sampling distribution for and, modelling errors 1.Low dispersion spectrometer Unbiased: Repeat observation a large number of times  average estimate is equal to SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example:measuring the wavelength of a spectral line True wavelength = (fixed but unknown parameter) Compute sampling distribution for and, modelling errors 1.Low dispersion spectrometer Unbiased: Repeat observation a large number of times  average estimate is equal to BUT is large SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example:measuring the wavelength of a spectral line True wavelength = (fixed but unknown parameter) Compute sampling distribution for and, modelling errors 2.High dispersion spectrometer but faulty physicist! (e.g. wrong calibration) Biased: BUT is small SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Example:measuring the wavelength of a spectral line True wavelength = (fixed but unknown parameter) Compute sampling distribution for and, modelling errors 2.High dispersion spectrometer but faulty physicist! (e.g. wrong calibration) Biased: BUT is small Better choice of estimator (if we can correct bias) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

The Sample Mean = random sample from pdf with mean and variance = sample mean Can show that unbiased estimator But bias is defined formally in terms of an infinite set of randomly chosen samples, each of size M. SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

The Sample Mean = random sample from pdf with mean and variance = sample mean Can show that unbiased estimator But bias is defined formally in terms of an infinite set of randomly chosen samples, each of size M. What can we say with a finite number of samples, each of finite size? SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

The Sample Mean = random sample from pdf with mean and variance = sample mean Can show that unbiased estimator as sample size increases, sample andmean increasingly concentrated near to true mean SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Given sampled data we can estimate the linear correlation between the variables as follows: where SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Pearson’s product moment correlation coefficient Sample mean Sample standard deviation Linear correlation

Given sampled data we can estimate the linear correlation between the variables as follows: where SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Pearson’s product moment correlation coefficient Sample mean Sample standard deviation Linear correlation If is bivariate normal then is an estimator of

We can also rewrite the formula for in the slightly simpler forms: or SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Linear correlation

Question 4: Estimate for the sample data shown in the graph below ABCDABCD

ABCDABCD

The Central Limit Theorem For any pdf with finite variance, as M  follows a normal pdf with mean and variance 8 SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

The Central Limit Theorem For any pdf with finite variance, as M  follows a normal pdf with mean and variance 8 Explains importance of normal pdf in statistics. But still based on asymptotic behaviour of an infinite ensemble of samples that we didn’t actually observe! SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

The Central Limit Theorem For any pdf with finite variance, as M  follows a normal pdf with mean and variance 8 Explains importance of normal pdf in statistics. But still based on asymptotic behaviour of an infinite ensemble of samples that we didn’t actually observe! No ‘hard and fast’ rule for defining ‘good’ estimators. FPT invokes a number of principles – e.g. least squares, maximum likelihood SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Method of Least Squares o ‘workhorse’ method for fitting lines and curves to data in the physical sciences o method often encountered (as a ‘black box’?) in elementary courses o useful demonstration of underlying statistical principles o simple illustration of fitting straight line to (x,y) data SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Ordinary Linear Least Squares SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Ordinary Linear Least Squares SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Ordinary Linear Least Squares SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Ordinary Linear Least Squares SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

We can show that i.e. LS estimators are unbiased. SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

We can show that i.e. LS estimators are unbiased. Choosing the so that we can make and independent.

Weighted Linear Least Squares Define Again we find Least Squares estimators of a and b satisfying SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Solving, we find SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Also SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Also

Extensions and Generalisations o Errors on both variables? Need to modify merit function accordingly. Renders equations non-linear; no simple analytic solution! See e.g. Numerical Recipes 15.3 SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Extensions and Generalisations o General linear models? e.g. We have Can formulate as a matrix equation and solve for parameters See e.g. Numerical Recipes 15.4 SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Define Vector of model parameters Vector of observations Matrix of model basis functions SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Model: Vector of observations Design matrix of model basis functions Vector of model parameters Vector of errors where we assume is drawn from some pdf with mean zero and variance SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Define Vector of model parameters Vector of weighted observations Design matrix Weighting by errors SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Weighted Model: Vector of weighted observations Weighted design matrix of model basis functions Vector of model parameters Vector of weighted errors where we assume is drawn from some pdf with mean zero and variance SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

matrix We solve for the parameter vector that minimises This has solution and SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Inverting can be hazardous, particularly if is a sparse matrix and/or close to singular. Some inversion methods will break down, since they may give a formal solution, but are highly unstable to round-off error in the data. Remedy:solution via Singular Value Decomposition. From linear algebra theory: Any matrix can be decomposed as the product of an column-orthogonal matrix U, an diagonal matrix W with positive or zero elements (the singular values) and the transpose of an orthogonal matrix V SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

From linear algebra theory: Any matrix can be decomposed as the product of an column-orthogonal matrix U, an diagonal matrix W with positive or zero elements (the singular values) and the transpose of an orthogonal matrix V N observations M parameters

Let the vectors denote the columns of (each one is a vector of length ) Let the vectors denote the columns of (each one is a vector of length ) It can be shown that the solution to the general linear model satisfies SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Very small values of will amplify any round-off errors in Solution: For these very small singular values, set. This suppresses their noisy contribution to the least-squares solution for the parameters. SVD acts as a noise filter – see Section 5

Extensions and Generalisations o Non-linear models? Suppose Then Model parameters drawn from pdf with mean zero, variance SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Extensions and Generalisations o Non-linear models? Suppose Then But no simple analytic method to minimise sum of squares ( e.g. no analytic solutions to ) Model parameters drawn from pdf with mean zero, variance

Extensions and Generalisations o Non-linear models? Methods of solution often involve assumingTaylor expansion of around minimum, and solving by gradient descent See e.g. Numerical Recipes 15.5 and Section 6 SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Extensions and Generalisations o Correlated errors? We need to define a covariance matrix See e.g. Gregory, Chapter 10 SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Sivia Chapter 3 gives a very clear discussion of least squares fitting within a Bayesian framework. In particular, contrasts, for Gaussian residuals: o known  o unknown   Student’s t SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 See also Section 3

The principle of maximum likelihood Frequentist approach: A parameter is a fixed (but unknown) constant From actual data we can compute Likelihood, L = probability of obtaining the observed data, given the value of the parameter  SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

The principle of maximum likelihood Frequentist approach: A parameter is a fixed (but unknown) constant From actual data we can compute Likelihood, L = probability of obtaining the observed data, given the value of the parameter Now define likelihood function: (infinite) family of curves generated by regarding L as a function of, for data fixed. Principle of Maximum Likelihood A good estimator of maximises L - i.e. and   

The principle of maximum likelihood Frequentist approach: A parameter is a fixed (but unknown) constant From actual data we can compute Likelihood, L = probability of obtaining the observed data, given the value of the parameter Now define likelihood function: (infinite) family of curves generated by regarding L as a function of, for data fixed. Principle of Maximum Likelihood A good estimator of maximises L - i.e. and    We set the parameter equal to the value that makes the actual data sample we did observe – out of all the possible random samples we could have observed – the most likely.

Observed value  =  1 A good estimator of maximises L - i.e. and Likelihood function has same definition in Bayesian probability theory, but subtle difference in meaning and interpretation – no need to invoke idea of (infinite) ensemble of different samples. Aside: Principle of Maximum Likelihood  SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Observed value   =  2 Principle of Maximum Likelihood A good estimator of maximises L - i.e. and SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Observed value   =  3 Principle of Maximum Likelihood A good estimator of maximises L - i.e. and SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Observed value   =  ML  Principle of Maximum Likelihood A good estimator of maximises L - i.e. and SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Least squares as maximum likelihood estimators To see the maximum likelihood method in action, let’s consider again weighted least squares for the simple model Let’s assume the pdf is a Gaussian SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Least squares as maximum likelihood estimators To see the maximum likelihood method in action, let’s consider again weighted least squares for the simple model Let’s assume the pdf is a Gaussian Likelihood SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Question 5: How can we justify writing the likelihood as a product? A Because the residuals are all equal to each other B Because the residuals are all Gaussian C Because the residuals are all positive D Because the residuals are all independent

Question 5: How can we justify writing the likelihood as a product? A Because the residuals are all equal to each other B Because the residuals are all Gaussian C Because the residuals are all positive D Because the residuals are all independent

Least squares as maximum likelihood estimators To see the maximum likelihood method in action, let’s consider again weighted least squares for the simple model Let’s assume the pdf is a Gaussian Likelihood (note: L is a product of 1-D Gaussians because we are assuming the are independent) SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Substitute and the ML estimators of and satisfy and SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Substitute and the ML estimators of and satisfy and But maximising is equivalent to maximising Here This is exactly the same sum of squares we defined earlier SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009

Substitute and the ML estimators of and satisfy and But maximising is equivalent to maximising Here So in this case maximising L is exactly equivalent to minimising the sum of squares. i.e. for Gaussian, independent errors, ML and weighted LS estimators are identical. This is exactly the same sum of squares we defined earlier