Statistics (2) Fitting Roger Barlow Manchester University IDPASC school Sesimbra 14 th December 2010.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Maximum likelihood (ML) and likelihood ratio (LR) test
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.
Maximum likelihood (ML)
Statistics.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
5. Estimation 5.3 Estimation of the mean K. Desch – Statistical methods of data analysis SS10 Is an efficient estimator for μ ?  depends on the distribution.
Chi Square Distribution (c2) and Least Squares Fitting
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
Linear and generalised linear models
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Maximum likelihood (ML)
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Lab 3b: Distribution of the mean
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
Probability, Statistics and Errors in High Energy Physics Wen-Chen Chang Institute of Physics, Academia Sinica 章文箴 中央研究院 物理研究所.
NASSP Masters 5003F - Computational Astronomy Lecture 6 Objective functions for model fitting: –Sum of squared residuals (=> the ‘method of least.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
ES 07 These slides can be found at optimized for Windows)
8th December 2004Tim Adye1 Proposal for a general-purpose unfolding framework in ROOT Tim Adye Rutherford Appleton Laboratory BaBar Statistics Working.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
NASSP Masters 5003F - Computational Astronomy Lecture 4: mostly about model fitting. The model is our estimate of the parent function. Let’s express.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Probability and Statistics for Particle Physics Javier Magnin CBPF – Brazilian Center for Research in Physics Rio de Janeiro - Brazil.
Statistics (3) Errors Roger Barlow Manchester University IDPASC school Sesimbra 15 th December 2010.
NUMERICAL ANALYSIS I. Introduction Numerical analysis is concerned with the process by which mathematical problems are solved by the operations.
Lesson 8: Basic Monte Carlo integration
The Maximum Likelihood Method
Physics 114: Lecture 13 Probability Tests & Linear Fitting
MATRIKULASI MATEMATIKA
12. Principles of Parameter Estimation
The Maximum Likelihood Method
The Maximum Likelihood Method
Statistical Methods For Engineers
Unfolding Problem: A Machine Learning Approach
Modelling data and curve fitting
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Chi Square Distribution (c2) and Least Squares Fitting
Linear regression Fitting a straight line to observations.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 7
Introduction to Statistics − Day 4
Statistics for HEP Roger Barlow Manchester University
12. Principles of Parameter Estimation
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Presentation transcript:

Statistics (2) Fitting Roger Barlow Manchester University IDPASC school Sesimbra 14 th December 2010

Summary Fitting Estimation Consistency Bias Efficiency Invariance Likeliho od ML Estimation Goodness of fit Least Squares Χ2Χ2 Fitting techniques comparison Orthogo nal Polyno mials Wilks' Theorem MVB Comparing families

IDPASC Statistics Lectures 2010 Roger Barlow Slide 3/ Fitting and Estimation Data sample {x 1,x 2,x 3,...} confronts theory – pdf P(x;a) (a may be multidimensional) Estimator â(x 1,x 2,x 3,.. ) is a process returning a value for a. A 'good' estimator is – Consistent – Unbiassed – Invariant – Efficient Explanations follow. Introduce (again) the Likelihood L(x 1,x 2,x 3,...;a) = P(x 1 ;a) P(x 2 ;a) P(x 3 ;a)... These qualities depend on the estimator and on the particular pdf

IDPASC Statistics Lectures 2010 Roger Barlow Slide 4/ Consistency Introduce the Expectation value = ʃʃʃ... f( x 1,x 2,x 3,...) L(x 1,x 2,x 3,..a) dx 1 dx 2 dx 3,.. Integrating over the space of results but not over a. It is the average you would get from a large number of samples. Analogous to Quantum Mechanics. Consistency requires: Lt N →∞ =a i.e. given more and more data, the estimator will tend to the right answer This is normally quite easy to establish

IDPASC Statistics Lectures 2010 Roger Barlow Slide 5/ Bias Require =a (even for finite sample sizes) If a bias is known, it can be corrected for Standard example: estimate mean and variance of pdf from data sample This tends to underestimate V. Correct by factor N/(N-1)

IDPASC Statistics Lectures 2010 Roger Barlow Slide 6/ Invariance Desirable to have a procedure which is transparent to the form of a, i.e. need not worry about the difference between and This is incompatible with unbiassedness. The well known formula (previous slide) is unbiassed for V but biassed for σ

IDPASC Statistics Lectures 2010 Roger Barlow Slide 7/ Efficiency Minimise The spread of results of your estimator about the true value Remarkable fact: there is a limit on this (Minimum Variance Bound, or Cramer-Rao bound)

IDPASC Statistics Lectures 2010 Roger Barlow Slide 8/ Some examples Repeated Gaussian measurements Bias Variance MVB

IDPASC Statistics Lectures 2010 Roger Barlow Slide 9/ More examples Centre of a top hat function: ½(max + min) More efficient than the mean. Several Gaussian measurements with different σ: weight each measurement by (1/σ) 2. - normalised But don't weight Poisson measurements by their value....

IDPASC Statistics Lectures 2010 Roger Barlow Slide 10/ Maximum Likelihood Estimate a by choosing the value which maximises L(x 1,x 2,x 3,..a). Or, for convenience, ln L = Σ ln P(x i,a) ConsistencyYes Bias-freeNo InvarianceYes EfficiencyYes, in large N limit This is a technique, but not the only one. Use by algebra in simple cases or numerically in tougher ones

IDPASC Statistics Lectures 2010 Roger Barlow Slide 11/ Numerical ML Adjust a to maximise Ln L If you have a form for (dln L/da) that helps a lot. Use MINUIT or ROOT or...., especially if a is multidimensional

IDPASC Statistics Lectures 2010 Roger Barlow Slide 12/ Algebraic ML Maximising: requires Σ d ln P(x i,a)/da =0 This leads to fractions with no nice solution – unless P is exponential. Given set of x i, measured y i,predictions f(x i ) subject to Gaussian smearing – Max likelihood mean minimising Classic example: straight line fit f(x)=mx+c

IDPASC Statistics Lectures 2010 Roger Barlow Slide 13/ The Normal Equations If f is linear function of a 1,a 2,a 3...a M –f i =f(x i )=Σa j g j (x i ) Maximum Likelihood = Minimum χ 2 Solve for the coefficients a j by inverting matrix

IDPASC Statistics Lectures 2010 Roger Barlow Slide 14/ Orthogonal Polynomials Good trick: construct the g(x) functions so that the matrix is diagonal If fitting polynomial up to 5 th power (say), can use 1,x,x 2,x 3,x 4,x 5 or 1,x,2x 2 -1,4x 3 -3x,8x 4 -8x 2 +1,16x x 3 +5x, or whatever Choose g 0 =1 Choose g 1 =x-(Σx)/N so that makes Σg 0 g 1 =0 And so on iteratively g r (x)=x r + Σc rs g s (x) c rs =-Σx i r g s (x i )/Σg 2 s (x i ) These polynomials are orthogonal over a specific dataset

IDPASC Statistics Lectures 2010 Roger Barlow Slide 15/ Fitting histograms Raw data {x 1,x 2,x 3,...x N } Often sorted into bins {n 1,n 2,n 3,...n m } Number of entries in bin is Poisson Last form sometimes used as a definition for χ 2, though really only an approximation Fit function to histogram by minimising χ 2.

IDPASC Statistics Lectures 2010 Roger Barlow Slide 16/ 4 Techniques 1) Minimise naïve χ 2. Computationally easy as problem linear 2) Minimise full χ 2. Slower as problem nonlinear due to terms in the denominator 3) Binned Maximum Likelihood. Write the Poisson probability for each bin e -f i f i n i /n i ! and maximise the sum of logs 4) Full maximum likelihood without binning

IDPASC Statistics Lectures 2010 Roger Barlow Slide 17/ Consumer test Fit Try (many times) with10,000 events All methods give same results

IDPASC Statistics Lectures 2010 Roger Barlow Slide 18/ Histogram fitting (contd) With small sample (100 events) Simple χ 2 goes bad due to bins with zeros Full χ 2 not good as Poisson is not Gaussian Two ML methods OK

IDPASC Statistics Lectures 2010 Roger Barlow Slide 19/ Goodness of fit Each term is clearly of order 1. Full treatment by integrating multi-d gaussian gives χ 2 distribution P(χ 2,N) Mean indeed N. Shapes vary If the fit is bad, χ 2 is large Is a p value. Often called “ χ 2 probability”

IDPASC Statistics Lectures 2010 Roger Barlow Slide 20/ Goodness of fit Large χ 2 >> N, low p value means: - The theory is wrong - The data are wrong - The errors are wrong - You are unlucky Small χ 2 << N, p value~1 means: - The errors are wrong - You are lucky Exact χ 2 = N means the errors have been calculated from this test, and it says nothing about goodness of fit - If you histogram the p values from many cases (e.g. kinematic fits) the distribution should be flat. This is obvious if you think about it in the right way

IDPASC Statistics Lectures 2010 Roger Barlow Slide 21/ Nice extra feature If one (or more) of the parameters in the function have been fitted to the data, this improves the χ 2 by an amount which corresponds to 1 less data point Hence 'degrees of freedom' N D =N-N P

IDPASC Statistics Lectures 2010 Roger Barlow Slide 22/ Likelihoood and Goodness of fit No test available, sorry

IDPASC Statistics Lectures 2010 Roger Barlow Slide 23/ Likelihoood and Goodness of fit!!! Take a 'Toy Monte Carlo' which simulates your data many times, fit and find the likelihood. Use this distribution to obtain a p value for your likelihood This is not in most books as it is computationally inefficient. But who cares these days?

IDPASC Statistics Lectures 2010 Roger Barlow Slide 24/ Wilks' Theorem Often stated that Δ ln L = - 2 χ 2 This strictly relates to changes in likelihood caused by an extra term in model. Valid for relative comparisons within families E.g. Fit data to straight line. χ 2 sort of OK Fit using parabola. χ 2 improves. If this improvement is >>1 the parabola is doing a better job. If only ~1 there is no reason to use it

IDPASC Statistics Lectures 2010 Roger Barlow Slide 25/ GoF comparisons and likelihood Wilks' theorem lets you compare the merit of adding a further term to your parametrisation: yardstick for whether improved likelihood is significant. Does not report absolute merit as χ 2 does Caution! Not to be used for adding bumps at arbitrary positions in the data.

Summary Fitting Estimation Consistency Bias Efficiency Invariance Likeliho od ML Estimation Goodness of fit Least Squares Χ2Χ2 Fitting techniques comparison Orthogo nal Polyno mials Wilks' Theorem MVB Comparing families