χ2 and Goodness of Fit & Likelihood for Parameters

Slides:



Advertisements
Similar presentations
The Maximum Likelihood Method
Advertisements

DO’S AND DONT’S WITH LIKELIHOODS Louis Lyons Oxford (CDF)
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.
1 χ 2 and Goodness of Fit Louis Lyons IC and Oxford SLAC Lecture 2’, Sept 2008.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford and IC CDF IC, February 2008.
1 χ 2 and Goodness of Fit Louis Lyons Oxford Mexico, November 2006 Lecture 3.
1 Practical Statistics for Physicists LBL January 2008 Louis Lyons Oxford
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Mexico, November 2006.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 3: Estimation.
1 Practical Statistics for Physicists Dresden March 2010 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF CERN, October 2006.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Manchester, 16 th November 2005.
Chi Square Distribution (c2) and Least Squares Fitting
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Rao-Cramer-Frechet (RCF) bound of minimum variance (w/o proof) Variance of an estimator of single parameter is limited as: is called “efficient” when the.
1 Statistical Inference Problems in High Energy Physics and Astronomy Louis Lyons Particle Physics, Oxford BIRS Workshop Banff.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
1 Practical Statistics for Physicists Stockholm Lectures Sept 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
Lecture II-2: Probability Review
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Probability theory 2 Tron Anders Moger September 13th 2006.
880.P20 Winter 2006 Richard Kass 1 Maximum Likelihood Method (MLM) Does this procedure make sense? The MLM answers this question and provides a method.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
1 Statistical Distribution Fitting Dr. Jason Merrick.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #23.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
1 Practical Statistics for Physicists CERN Latin American School March 2015 Louis Lyons Imperial College and Oxford CMS expt at LHC
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
1 χ 2 and Goodness of Fit & L ikelihood for Parameters Louis Lyons Imperial College and Oxford CERN Summer Students July 2014.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
1 Practical Statistics for Physicists CERN Summer Students July 2014 Louis Lyons Imperial College and Oxford CMS expt at LHC
1 Multivariate Histogram Comparison S.I. BITYUKOV, V.V. SMIRNOVA (IHEP, Protvino), N.V. KRASNIKOV (INR RAS, Moscow) IML LHC Machine Learning WG Meeting.
Louis Lyons Imperial College
Practical Statistics for Physicists
The Maximum Likelihood Method
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Lecture Nine - Twelve Tests of Significance.
Practical Statistics for Physicists
Reduction of Variables in Parameter Inference
BAYES and FREQUENTISM: The Return of an Old Controversy
Likelihoods 1) Introduction 2) Do’s & Dont’s
p-values and Discovery
Likelihoods 1) Introduction . 2) Do’s & Dont’s
The Maximum Likelihood Method
The Maximum Likelihood Method
Lecture 4 1 Probability (90 min.)
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
Introduction to Instrumentation Engineering
Chi Square Distribution (c2) and Least Squares Fitting
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 7
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Introduction to Statistics − Day 4
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for HEP Roger Barlow Manchester University
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

χ2 and Goodness of Fit & Likelihood for Parameters Louis Lyons Imperial College and Oxford CERN Summer Students July 2015

Example of 2: Least squares straight line fitting y Data = {xi, yi  i} Theory: y= a + bx x Statistical issues: 1) Is data consistent with straight line? (Goodness of Fit) 2) What are the gradient and intercept (and their uncertainties (and correlation))? (Parameter Determination) Will deal with issue 2) first N.B. 1. Method can be used for other functional forms e.g. y = a +b/x +c/x2 +……. y = a + b sin + c sin(2) + d sin(3) +…… y = a exp(-bx) N.B. 2 Least squares is not the only method

Minimise S w.r.t. parameters a and b Not best straight line fit Minimise S w.r.t. parameters a and b

Straight Line Fit N.B. L.S.B.F. passes through (<x>, <y>)

Uncertainties on parameter(s) In parabolic approx,  = 1/1/2 d2S/d2 1 (mneumonic) best  With more than one param, replace S() by S(1, 2, 3, …..), and covariance matrix E is given by 1 ∂2S E-1 = 2∂i∂j S= Smax -1 contour 1 . 22 21 2

Summary of straight line fitting Plot data Bad points Estimate a and b (and uncertainties) a and b from formula Errors on a’ and b Cf calculated values with estimated Determine Smin (using a and b) ν = n – p Look up in χ2 tables If probability too small, IGNORE RESULTS If probability a “bit” small, scale uncertainties? Asymptotically

Summary of straight line fitting Plot data Bad points Estimate a and b (and uncertainties) a and b from formula Errors on a’ and b Parameter Determination Cf calculated values with estimated Determine Smin (using a and b) ν = n – p Goodness of Fit Look up in χ2 tables If probability too small, IGNORE RESULTS If probability a “bit” small, scale uncertainties? If theory is correct; data unbiassed, ~ Gaussian and asymptotic;  is correct; etc, then Smin has 2 distribution

Properties of 2 distribution

Properties of 2 distribution, contd.

Goodness of Fit 2 Very general Needs binning y Not sensitive to sign of deviation Run Test Not sensitive to mag. of devn. x Kolmogorov- Smirnov Aslan-Zech Review: Mike Williams, “How good are your fits? Unbinned multivariate goodness-of-fit tests in high energy physics” http://arxiv.org/pdf/1006.3019.pdf Book: D’Agostino and Stephens, “Goodness of Fit techniques”

Goodness of Fit: Kolmogorov-Smirnov Compares data and model cumulative plots Uses largest discrepancy between dists. Model can be analytic or MC sample Uses individual data points Not so sensitive to deviations in tails (so variants of K-S exist) Not readily extendible to more dimensions Distribution-free conversion to p; depends on n (but not when free parameters involved – needs MC)

Goodness of fit: ‘Energy’ test Assign +ve charge to data ; -ve charge to M.C. Calculate ‘electrostatic energy E’ of charges If distributions agree, E ~ 0 If distributions don’t overlap, E is positive v2 Assess significance of magnitude of E by MC N.B. v1 Works in many dimensions Needs metric for each variable (make variances similar?) E ~ Σ qiqj f(Δr = |ri – rj|) , f = 1/(Δr + ε) or –ln(Δr + ε) Performance insensitive to choice of small ε See Aslan and Zech’s paper at: http://www.ippp.dur.ac.uk/Workshops/02/statistics/program.shtml

PARADOX Histogram with 100 bins Fit with 1 parameter Smin: χ2 with NDF = 99 (Expected χ2 = 99 ± 14) For our data, Smin(p0) = 90 Is p2 acceptable if S(p2) = 115? YES. Very acceptable χ2 probability NO. σp from S(p0 +σp) = Smin +1 = 91 But S(p2) – S(p0) = 25 So p2 is 5σ away from best value

Likelihoods for determining parameters What it is How it works: Resonance Error estimates Detailed example: Lifetime Several Parameters Do’s and Dont’s with L ****

Simple example: Angular distribution y = N (1 +  cos2) yi = N (1 +  cos2i) = probability density of observing i, given  L() =  yi = probability density of observing the data set yi, given  Best estimate of  is that which maximises L Values of  for which L is very small are ruled out Precision of estimate for  comes from width of L distribution (Information about parameter  comes from shape of exptl distribution of cos) CRUCIAL to normalise y N = 1/{2(1 + /3)}  = -1  large L cos  cos  

How it works: Resonance y ~ Γ/2 (m-M0)2 + (Γ/2)2 m m Vary M0 Vary Γ

Conventional to consider l = ln(L) = Σ ln(pi) If L is Gaussian, l is parabolic

Parameter uncertainty with likelihoods Range of likely values of param μ from width of L or l dists. If L(μ) is Gaussian, following definitions of σ are equivalent: 1) RMS of L(µ) 2) 1/√(-d2lnL / dµ2) (Mnemonic) 3) ln(L(μ0±σ) = ln(L(μ0)) -1/2 If L(μ) is non-Gaussian, these are no longer the same “Procedure 3) above still gives interval that contains the true value of parameter μ with 68% probability” Errors from 3) usually asymmetric, and asym errors are messy. So choose param sensibly e.g 1/p rather than p; τ or λ

Moments Max Like Least squares Easy? Yes, if… Normalisation, maximisation messy Minimisation Efficient? Not very Usually best Sometimes = Max Like Input Separate events Histogram Goodness of fit Messy No (unbinned) Easy Constraints No Yes N dimensions Easy if …. Norm, max messier Weighted events Errors difficult Bgd subtraction Troublesome Uncertainty estimates Observed spread, or analytic - ∂2l ∂pi∂pj ∂2S 2∂pi∂pj Main feature Best for params Goodness of Fit

NORMALISATION FOR LIKELIHOOD MUST be independent of m data param e.g. Lifetime fit to t1, t2,………..tn INCORRECT / 1 Missing ) | ( t ­ - = e P t

ΔlnL = -1/2 rule 3) ln(L(μ0±σ) = ln(L(μ0)) -1/2 If L(μ) is Gaussian, following definitions of σ are equivalent: 1) RMS of L(µ) 2) 1/√(-d2L/dµ2) 3) ln(L(μ0±σ) = ln(L(μ0)) -1/2 If L(μ) is non-Gaussian, these are no longer the same “Procedure 3) above still gives interval that contains the true value of parameter μ with 68% probability” Heinrich: CDF note 6438 (see CDF Statistics Committee Web-page) Barlow: Phystat05

How often does quoted range for parameter include param’s true value? COVERAGE How often does quoted range for parameter include param’s true value? N.B. Coverage is a property of METHOD, not of a particular exptl result Coverage can vary with μ Study coverage of different methods for Poisson parameter μ, from observation of number of events n Hope for: 100% Nominal value Undercoverage region

Practical example of Coverage Poisson counting experiment Observed number of counts n Poisson parameter μ P(n|μ) = e-μ μn/n! Best estimate of μ = n Range for μ given by ΔlnL = 0.5 rule. Coverage should be 68%. What does Coverage look like as a function of μ? C ? μ

Coverage : L approach (Not frequentist) P(n,μ) = e-μμn/n! (Joel Heinrich CDF note 6438) -2 lnλ< 1 λ = P(n,μ)/P(n,μbest) UNDERCOVERS

Unbinned Lmax and Goodness of Fit? Find params by maximising L So larger L better than smaller L So Lmax gives Goodness of Fit?? Bad Good? Great? Monte Carlo distribution of unbinned Lmax Frequency Lmax

L = Example cos θ pdf (and likelihood) depends only on cos2θi Insensitive to sign of cosθi So data can be in very bad agreement with expected distribution e.g. all data with cosθ < 0 and Lmax does not know about it. Example of general principle

Conclusions re Likelihoods How it works, and how to estimate errors (ln L) = 0.5 rule and coverage Several Parameters Likelihood does not guarantee coverage Lmax and Goodness of Fit Do lifetime and coverage problems on question sheet

Tomorrow Bayes and Frequentist approaches: What is probability? Bayes theorem Does it make a difference Examples from Physics and from everyday life Relative merits Which do we use?

Last time Comparing data with 2 hypotheses H0 = background only (No New Physics) H1 = background + signal (Exciting New Physics) Specific example: Discovery of Higgs