1 χ 2 and Goodness of Fit & L ikelihood for Parameters Louis Lyons Imperial College and Oxford CERN Summer Students July 2014.

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

DO’S AND DONT’S WITH LIKELIHOODS Louis Lyons Oxford (CDF)
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
1 p-values and Discovery Louis Lyons Oxford SLUO Lecture 4, February 2007.
1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.
N.D.GagunashviliUniversity of Akureyri, Iceland Pearson´s χ 2 Test Modifications for Comparison of Unweighted and Weighted Histograms and Two Weighted.
1 χ 2 and Goodness of Fit Louis Lyons IC and Oxford SLAC Lecture 2’, Sept 2008.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford and IC CDF IC, February 2008.
1 Practical Statistics for Physicists Sardinia Lectures Oct 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
1 χ 2 and Goodness of Fit Louis Lyons Oxford Mexico, November 2006 Lecture 3.
1 Practical Statistics for Physicists LBL January 2008 Louis Lyons Oxford
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Mexico, November 2006.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.
1 Practical Statistics for Physicists Dresden March 2010 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF CERN, October 2006.
1 Practical Statistics for Particle Physicists CERN Academic Lectures October 2006 Louis Lyons Oxford
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Manchester, 16 th November 2005.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Chi Square Distribution (c2) and Least Squares Fitting
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
1 Statistical Inference Problems in High Energy Physics and Astronomy Louis Lyons Particle Physics, Oxford BIRS Workshop Banff.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
1 Practical Statistics for Physicists Stockholm Lectures Sept 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
Lecture II-2: Probability Review
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Probability theory 2 Tron Anders Moger September 13th 2006.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
1 Statistical Distribution Fitting Dr. Jason Merrick.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
Statistics Lectures: Questions for discussion Louis Lyons Imperial College and Oxford CERN Summer Students July
1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.
A statistical test for point source searches - Aart Heijboer - AWG - Cern june 2002 A statistical test for point source searches Aart Heijboer contents:
1 Practical Statistics for Physicists CERN Latin American School March 2015 Louis Lyons Imperial College and Oxford CMS expt at LHC
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
A bin-free Extended Maximum Likelihood Fit + Feldman-Cousins error analysis Peter Litchfield  A bin free Extended Maximum Likelihood method of fitting.
ES 07 These slides can be found at optimized for Windows)
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
1 Practical Statistics for Physicists CERN Summer Students July 2014 Louis Lyons Imperial College and Oxford CMS expt at LHC
Louis Lyons Imperial College
The Maximum Likelihood Method
Lecture Nine - Twelve Tests of Significance.
χ2 and Goodness of Fit & Likelihood for Parameters
Likelihoods 1) Introduction 2) Do’s & Dont’s
Likelihoods 1) Introduction . 2) Do’s & Dont’s
The Maximum Likelihood Method
The Maximum Likelihood Method
Chi Square Distribution (c2) and Least Squares Fitting
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Statistics for HEP Roger Barlow Manchester University
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

1 χ 2 and Goodness of Fit & L ikelihood for Parameters Louis Lyons Imperial College and Oxford CERN Summer Students July 2014

Example of  2 : Least squares straight line fitting y Data = {x i, y i   i } Theory: y= a + bx x Statistical issues: 1) Is data consistent with straight line? (Goodness of Fit) 2) What are the gradient and intercept (and their errors (and correlation))? (Parameter Determination) Will deal with issue 2) first N.B. 1. Method can be used for other functional forms e.g. y = a +b/x +c/x 2 +……. y = a + b sin  + c sin(2  ) + d sin(3  ) +…… y = a exp(-bx) N.B. 2 Least squares is not the only method 2

3 Minimise S w.r.t. parameters a and b Not best straight line fit

Errors on parameter(s) 4 S In parabolic approx,   = 1/  1/2 d 2 S/d  2 1 (mneumonic)  best  With more than one param, replace S(  ) by S(  1,  2,  3, ….. ), and covariance matrix E is given by 1 ∂ 2 S E -1 = 2 ∂  i ∂  j S= S max -1 contour  1. 2   2 2   1  2

5 Summary of straight line fitting Plot data Bad points Estimate a and b (and errors) a and b from formula Errors on a’ and b Cf calculated values with estimated Determine S min (using a and b) ν = n – p Look up in χ 2 tables If probability too small, IGNORE RESULTS If probability a “bit” small, scale errors? Asymptotically

6 Summary of straight line fitting Plot data Bad points Estimate a and b (and errors) a and b from formula Errors on a’ and b Parameter Determination Cf calculated values with estimated Determine S min (using a and b) ν = n – p Goodness of Fit Look up in χ 2 tables If probability too small, IGNORE RESULTS If probability a “bit” small, scale errors? Asymptotically

7 Properties of  2 distribution

8 Properties of  2 distribution, contd.

9 Goodness of Fit  2 Very general Needs binning y Not sensitive to sign of deviation Run Test Not sensitive to mag. of devn. x Kolmogorov- Smirnov Aslan-Zech Review: Mike Williams, “How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics” Book: D’Agostino and Stephens, “Goodness of Fit techniques”

10 Goodness of Fit: Kolmogorov-Smirnov Compares data and model cumulative plots Uses largest discrepancy between dists. Model can be analytic or MC sample Uses individual data points Not so sensitive to deviations in tails (so variants of K-S exist) Not readily extendible to more dimensions Distribution-free conversion to p; depends on n (but not when free parameters involved – needs MC)

11 Goodness of fit: ‘Energy’ test Assign +ve charge to data ; -ve charge to M.C. Calculate ‘electrostatic energy E’ of charges If distributions agree, E ~ 0 If distributions don’t overlap, E is positive v 2 Assess significance of magnitude of E by MC N.B. v 1 1)Works in many dimensions 2)Needs metric for each variable (make variances similar?) 3)E ~ Σ q i q j f( Δr = |r i – r j |), f = 1/(Δr + ε) or –ln( Δr + ε) Performance insensitive to choice of small ε See Aslan and Zech’s paper at:

12 PARADOX Histogram with 100 bins Fit with 1 parameter S min : χ 2 with NDF = 99 (Expected χ 2 = 99 ± 14) For our data, S min (p 0 ) = 90 Is p 2 acceptable if S(p 2 ) = 115? 1)YES. Very acceptable χ 2 probability 2) NO. σ p from S(p 0 +σ p ) = S min +1 = 91 But S(p 2 ) – S(p 0 ) = 25 So p 2 is 5σ away from best value

13

14 L ikelihoods for determining parameters What it is How it works: Resonance Error estimates Detailed example: Lifetime Several Parameters Do’s and Dont’s with L ****

15 Simple example: Angular distribution y = N (1 +  cos 2  ) y i = N (1 +  cos 2  i ) = probability density of observing  i, given  L(  ) =  y i = probability density of observing the data set y i, given  Best estimate of  is that which maximises L Values of  for which L is very small are ruled out Precision of estimate for  comes from width of L distribution (Information about parameter  comes from shape of exptl distribution of cos  ) CRUCIAL to normalise y N = 1/{2(1 +  /3)} cos  cos    = -1  large L

16 How it works: Resonance y ~ Γ/2 (m-M 0 ) 2 + (Γ/2) 2 m m Vary M 0 Vary Γ

17 Conventional to consider l = ln( L ) = Σ ln(p i ) If L is Gaussian, l is parabolic

18 Maximum likelihood error Range of likely values of param μ from width of L or l dists. If L (μ) is Gaussian, following definitions of σ are equivalent: 1) RMS of L ( µ ) 2) 1/√(-d 2 ln L / d µ 2 ) (Mnemonic) 3) ln( L (μ 0 ±σ) = ln( L (μ 0 )) -1/2 If L (μ) is non-Gaussian, these are no longer the same “ Procedure 3) above still gives interval that contains the true value of parameter μ with 68% probability ” Errors from 3) usually asymmetric, and asym errors are messy. So choose param sensibly e.g 1/p rather than p; τ or λ

19

20 MomentsMax LikeLeast squares Easy?Yes, if…Normalisation, maximisation messy Minimisation Efficient?Not veryUsually best Sometimes = Max Like InputSeparate events Histogram Goodness of fitMessyNo (unbinned)Easy ConstraintsNoYes N dimensionsEasy if ….Norm, max messierEasy Weighted eventsEasyErrors difficultEasy Bgd subtractionEasyTroublesomeEasy Error estimateObserved spread, or analytic - ∂ 2 l ∂p i ∂p j ∂ 2 S 2∂p i ∂p j Main featureEasyBest for paramsGoodness of Fit

21 NORMALISATION FOR LIKELIHOOD data param e.g. Lifetime fit to t 1, t 2,………..t n t MUST be independent of  /1 Missing / )|(       t etP INCORRECT

22 Δln L = -1/2 rule If L (μ) is Gaussian, following definitions of σ are equivalent: 1) RMS of L ( µ ) 2) 1/√(-d 2 L /d µ 2 ) 3) ln( L (μ 0 ±σ) = ln( L (μ 0 )) -1/2 If L (μ) is non-Gaussian, these are no longer the same “ Procedure 3) above still gives interval that contains the true value of parameter μ with 68% probability ” Heinrich: CDF note 6438 (see CDF Statistics Committee Web-page) Barlow: Phystat05

23 COVERAGE How often does quoted range for parameter include param’s true value? N.B. Coverage is a property of METHOD, not of a particular exptl result Coverage can vary with μ Study coverage of different methods for Poisson parameter μ, from observation of number of events n Hope for: Nominal value 100% Undercoverage region

Practical example of Coverage Poisson counting experiment Observed number of counts n Poisson parameter μ P(n|μ) = e -μ μ n /n! Best estimate of μ = n Range for μ given by ΔlnL = 0.5 rule. Coverage should be 68%. What does Coverage look like as a function of μ? C ? μ 24

25 Coverage : L approach (Not frequentist) P(n, μ) = e -μ μ n /n! (Joel Heinrich CDF note 6438) -2 lnλ< 1 λ = P(n,μ)/P(n,μ best ) UNDERCOVERS

26 Great?Good?Bad L max Frequency Unbinned L max and Goodness of Fit? Find params by maximising L So larger L better than smaller L So L max gives Goodness of Fit?? Monte Carlo distribution of unbinned L max

27 Example L = cos θ pdf (and likelihood) depends only on cos 2 θ i Insensitive to sign of cosθ i So data can be in very bad agreement with expected distribution e.g. all data with cosθ < 0 and L max does not know about it. Example of general principle

28 Conclusions re Likelihoods How it works, and how to estimate errors  (ln L ) = 0.5 rule and coverage Several Parameters Likelihood does not guarantee coverage L max and Goodness of Fit Do lifetime and coverage problems on question sheet

Next (last) time 29 Comparing data with 2 hypotheses H0 = background only (No New Physics) H1 = background + signal (Exciting New Physics) Specific example: Discovery of Higgs