Statistical Methods for Particle Physics Lecture 4: unfolding

Slides:



Advertisements
Similar presentations
Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
Advertisements

RooUnfold unfolding framework and algorithms
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Statistical Image Modelling and Particle Physics Comments on talk by D.M. Titterington Glen Cowan RHUL Physics PHYSTAT05 Glen Cowan Royal Holloway, University.
G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
TOPLHCWG. Introduction The ATLAS+CMS combination of single-top production cross-section measurements in the t channel was performed using the BLUE (Best.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
Model Inference and Averaging
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Modern Navigation Thomas Herring
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory ATLAS RAL Physics Meeting 20 th May 2008.
RooUnfold unfolding framework and algorithms Tim Adye Rutherford Appleton Laboratory Oxford ATLAS Group Meeting 13 th May 2008.
1 Iterative dynamically stabilized (IDS) method of data unfolding (*) (*arXiv: ) Bogdan MALAESCU CERN PHYSTAT 2011 Workshop on unfolding.
Lecture 2: Statistical learning primer for biologists
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
Unfolding in ALICE Jan Fiete Grosse-Oetringhaus, CERN for the ALICE collaboration PHYSTAT 2011 CERN, January 2011.
G. Cowan St. Andrews 2012 / Statistics for HEP / Lecture 31 Statistics for HEP Lecture 3: Further topics 69 th SUSSP LHC Physics St. Andrews August,
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
8th December 2004Tim Adye1 Proposal for a general-purpose unfolding framework in ROOT Tim Adye Rutherford Appleton Laboratory BaBar Statistics Working.
G. Cowan Terascale Statistics School 2015 / Statistics for Run II1 Statistics for LHC Run II Terascale Statistics School DESY, Hamburg March 23-27, 2015.
Spectrum Reconstruction of Atmospheric Neutrinos with Unfolding Techniques Juande Zornoza UW Madison.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 41 Statistics for HEP Lecture 4: Unfolding Academic Training Lectures CERN, 2–5 April,
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Statistical Data Analysis: Lecture 5 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan INFN School of Statistics, Vietri Sul Mare, 3-7 June Statistical Methods (Lectures 2.1, 2.2) INFN School of Statistics Vietri sul Mare,
G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.
Discussion on significance
Probability Theory and Parameter Estimation I
Model Inference and Averaging
Comment on Event Quality Variables for Multivariate Analyses
Lecture 4 1 Probability (90 min.)
Unfolding Problem: A Machine Learning Approach
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Statistical Methods: Parameter Estimation
Computing and Statistical Data Analysis / Stat 8
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Terascale Statistics School DESY, February, 2018
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 7
Introduction to Unfolding
Decomposition of Stat/Sys Errors
Parametric Methods Berlin Chen, 2005 References:
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Juande Zornoza UW Madison
Unfolding with system identification
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Statistical Methods for Particle Physics Lecture 4: unfolding https://indico.weizmann.ac.il//conferenceDisplay.py?confId=52 Statistical Inference for Astro and Particle Physics Workshop Weizmann Institute, Rehovot March 8-12, 2015 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

Outline for Monday – Thursday (GC = Glen Cowan, KC = Kyle Cranmer) Monday 9 March GC: probability, random variables and related quantities KC: parameter estimation, bias, variance, max likelihood Tuesday 10 March KC: building statistical models, nuisance parameters GC: hypothesis tests I, p-values, multivariate methods Wednesday 11 March KC: hypothesis tests 2, composite hyp., Wilks’, Wald’s thm. GC: asympotics 1, Asimov data set, sensitivity Thursday 12 March: KC: confidence intervals, asymptotics 2 GC: unfolding G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Formulation of the unfolding problem Consider a random variable y, goal is to determine pdf f(y). If parameterization f(y;θ) known, find e.g. ML estimators . If no parameterization available, construct histogram: “true” histogram New goal: construct estimators for the μj (or pj). G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Migration Effect of measurement errors: y = true value, x = observed value, migration of entries between bins, f(y) is ‘smeared out’, peaks broadened. discretize: data are response matrix Note μ, ν are constants; n subject to statistical fluctuations. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Efficiency, background Sometimes an event goes undetected: efficiency Sometimes an observed event is due to a background process: βi = expected number of background events in observed histogram. For now, assume the βi are known. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

The basic ingredients “true” “observed” G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Summary of ingredients ‘true’ histogram: probabilities: expectation values for observed histogram: observed histogram: response matrix: efficiencies: expected background: These are related by: G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Maximum likelihood (ML) estimator from inverting the response matrix Assume can be inverted: Suppose data are independent Poisson: So the log-likelihood is ML estimator is G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Example with ML solution Catastrophic failure??? G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

What went wrong? Suppose μ really had a lot of fine structure. Applying R washes this out, but leaves a residual structure: But we don’t have ν, only n. R-1 “thinks” fluctuations in n are the residual of original fine structure, puts this back into G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

ML solution revisited For Poisson data the ML estimators are unbiased: Their covariance is: (Recall these statistical errors were huge for the example shown.) G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

ML solution revisited (2) The information inequality gives for unbiased estimators the minimum (co)variance bound: invert → This is the same as the actual variance! I.e. ML solution gives smallest variance among all unbiased estimators, even though this variance was huge. In unfolding one must accept some bias in exchange for a (hopefully large) reduction in variance. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Correction factor method Often Ci ~ O(1) so statistical errors far smaller than for ML. Nonzero bias unless MC = Nature. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Reality check on the statistical errors Example from Bob Cousins Reality check on the statistical errors Suppose for some bin i we have: (10% stat. error) But according to the estimate, only 10 of the 100 events found in the bin belong there; the rest spilled in from outside. How can we have a 10% measurement if it is based on only 10 events that really carry information about the desired parameter? G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Discussion of correction factor method As with all unfolding methods, we get a reduction in statistical error in exchange for a bias; here the bias is difficult to quantify (difficult also for many other unfolding methods). The bias should be small if the bin width is substantially larger than the resolution, so that there is not much bin migration. So if other uncertainties dominate in an analysis, correction factors may provide a quick and simple solution (a “first-look”). Still the method has important flaws and it would be best to avoid it. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Regularized unfolding G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Regularized unfolding (2) G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Tikhonov regularization Solution using Singular Value Decomposition (SVD). G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

SVD implementation of Tikhonov unfolding Hoecker, V. Kartvelishvili, NIM A372 (1996) 469; (TSVDUnfold in ROOT). Minimizes Numerical implementation using Singular Value Decomposition. Recommendations for setting regularization parameter τ: Transform variables so errors ~ Gauss(0,1); number of transformed values significantly different from zero gives prescription for τ; or base choice of τ on unfolding of test distributions. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

SVD example G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Edge effects Regularized unfolding can lead to “edge effects”. E.g. in Tikhonov regularization with Gaussian data, solution can go negative: Solution pushed negative. Important e.g. if New Physics would appear as a longer tail of a distribution. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Regularization function based on entropy Can have Bayesian motivation: G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Example of entropy-based unfolding G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Estimating bias and variance G. Cowan, Statistical Data Analysis, OUP (1998) Ch. 11 Estimating bias and variance G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Choosing the regularization parameter G. Cowan, Statistical Data Analysis, OUP (1998) Ch. 11 Choosing the regularization parameter G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Choosing the regularization parameter (2) G. Cowan, Statistical Data Analysis, OUP (1998) Ch. 11 Choosing the regularization parameter (2) G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Some examples with Tikhonov regularization G. Cowan, Statistical Data Analysis, OUP (1998) Ch. 11 Some examples with Tikhonov regularization G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Some examples with entropy regularization G. Cowan, Statistical Data Analysis, OUP (1998) Ch. 11 Some examples with entropy regularization G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Iterative unfolding (“Bayesian”) ; see also arXiv:1010.0632. Goal is to estimate probabilities: For initial guess take e.g. Initial estimators for μ are Update according to the rule uses Bayes’ theorem here Continue until solution stable using e.g. χ2 test with previous iteration. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Estimating systematic uncertainty We know that unfolding introduces a bias, but quantifying this (including correlations) can be difficult. Suppose a model predicts a spectrum A priori suppose e.g. θ ≈ 8 ± 2. More precisely, assign prior π(θ). Propagate this into a systematic covariance for the unfolded spectrum: (Typically large positive correlations.) Often in practice, one doesn’t have π(θ) but rather a few models that differ in spectrum. Not obvious how to convert this into a meaningful covariance for the unfolded distribution. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Stat. and sys. errors of unfolded solution In general the statistical covariance matrix of the unfolded estimators is not diagonal; need to report full But unfolding necessarily introduces biases as well, corresponding to a systematic uncertainty (also correlated between bins). This is more difficult to estimate. Suppose, nevertheless, we manage to report both Ustat and Usys. To test a new theory depending on parameters θ, use e.g. Mixes frequentist and Bayesian elements; interpretation of result can be problematic, especially if Usys itself has large uncertainty. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Folding Suppose a theory predicts f(y) → μ (may depend on parameters θ). Given the response matrix R and expected background β, this predicts the expected numbers of observed events: From this we can get the likelihood, e.g., for Poisson data, And using this we can fit parameters and/or test, e.g., using the likelihood ratio statistic G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Versus unfolding If we have an unfolded spectrum and full statistical and systematic covariance matrices, to compare this to a model μ compute likelihood where Complications because one needs estimate of systematic bias Usys. If we find a gain in sensitivity from the test using the unfolded distribution, e.g., through a decrease in statistical errors, then we are exploiting information inserted via the regularization (e.g., imposed smoothness). G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

ML solution again From the standpoint of testing a theory or estimating its parameters, the ML solution, despite catastrophically large errors, is equivalent to using the uncorrected data (same information content). There is no bias (at least from unfolding), so use The estimators of θ should have close to optimal properties: zero bias, minimum variance. The corresponding estimators from any unfolded solution cannot in general match this. Crucial point is to use full covariance, not just diagonal errors. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Summary/discussion Unfolding can be a minefield and is not necessary if goal is to compare measured distribution with a model prediction. Even comparison of uncorrected distribution with future theories not a problem, as long as it is reported together with the expected background and response matrix. In practice complications because these ingredients have uncertainties, and they must be reported as well. Unfolding useful for getting an actual estimate of the distribution we think we’ve measured; can e.g. compare ATLAS/CMS. Model test using unfolded distribution should take account of the (correlated) bias introduced by the unfolding procedure. G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Extra slides G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Some unfolding references H. Prosper and L. Lyons (eds.), Proceedings of PHYSTAT 2011, CERN-2011-006; many contributions on unfolding from p 225. G. Cowan, A Survey Of Unfolding Methods For Particle Physics, www.ippp.dur.ac.uk/old/Workshops/02/statistics/proceedings/cowan.pdf G. Cowan, Statistical Data Analysis, OUP (1998), Chapter 11. V. Blobel, An Unfolding Method for High Energy Physics Experiments, arXiv:hep-ex/0208022v1 G. D'Agostini, Improved iterative Bayesian unfolding, arXiv:1010.0632 G. Choudalakis, Fully Bayesian Unfolding, arXiv:1201.4612 Tim Adye, Unfolding algorithms and tests using RooUnfold, arXiv:1105.1160v1 G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Dependence of response matrix on physics model Often to good approximation, the response matrix is independent of the physics model, i.e., is by construction independent of how often the model puts an event in bin j. But it can still depend on the underlying model. E.g. if one model generates jets very forward, this has different pT resolution than another model that puts jets mainly in the barrel. Same problem regardless of whether folding or unfolding. Can improve situation by measuring both the variable of interest (say, pT), as well as another variable that correlates with the resolution. Response “matrix” now has 4 indices: Rijkl = P(observed in cell ij | true value in cell kl) G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Determining the response matrix S. Paramesvaran, PhD thesis (BaBar), RHUL, 2010. Determining the response matrix Usually determine response matrix from MC, some smoothing often useful. E.g. first make scatterplot of measured versus true: G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Parameterizing the response matrix S. Paramesvaran, PhD thesis (BaBar), RHUL, 2010. Parameterizing the response matrix Then cut the scatterplot into vertical slices, and fit each with an appropriate function (here, Student’s t distribution): G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4

Parameterizing the response matrix (2) S. Paramesvaran, PhD thesis (BaBar), RHUL, 2010. Parameterizing the response matrix (2) Then fit these parameters as a function of the true variable: These can then be treated as nuisance parameters and the likelihood function extended to include the constraints on them from knowledge of the detector (e.g., from the fits above or subsidiary measurements). G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 4