Discussion on significance

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

G. Cowan TAE 2013 / Statistics Problems1 TAE Statistics Problems Glen Cowan Physics Department Royal Holloway, University of London
Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
G. Cowan RHUL Physics Higgs combination note status page 1 Status of Higgs Combination Note ATLAS Statistics/Higgs Meeting Phone, 7 April, 2008 Glen Cowan.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Statistical Data Analysis / Stat 4 1 Statistical Data Analysis Stat 4: confidence intervals, limits, discovery London Postgraduate Lectures on.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits Lecture 2: Tests based on likelihood.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 31 Statistical Methods for Particle Physics Lecture 3: asymptotics I; Asimov data set Statistical.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
G. Cowan RHUL Physics Input from Statistics Forum for Exotics page 1 Input from Statistics Forum for Exotics ATLAS Exotics Meeting CERN/phone, 22 January,
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan NExT Workshop, 2015 / GDC Lecture 11 Statistical Methods for Particle Physics Lecture 1: introduction & statistical tests Fifth NExT PhD Workshop:
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
G. Cowan RHUL Physics Status of Higgs combination page 1 Status of Higgs Combination ATLAS Higgs Meeting CERN/phone, 7 November, 2008 Glen Cowan, RHUL.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 2: Discovery.
G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,
G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,
Computing and Statistical Data Analysis / Stat 11
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
Tutorial on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Estimating Statistical Significance
Statistics for the LHC Lecture 3: Setting limits
Some Statistical Tools for Particle Physics
LAL Orsay, 2016 / Lectures on Statistics
Comment on Event Quality Variables for Multivariate Analyses
Lecture 4 1 Probability (90 min.)
Tutorial on Multivariate Methods (TMVA)
TAE 2017 Centro de ciencias Pedro Pascual Benasque, Spain
School on Data Science in (Astro)particle Physics
Chapter 9 Hypothesis Testing.
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Statistical Methods for Particle Physics (II)
TAE 2017 / Statistics Lecture 3
HCPSS 2016 / Statistics Lecture 1
TAE 2018 / Statistics Lecture 3
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistical Methods for the LHC
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Statistical Tests and Limits Lecture 2: Limits
Computing and Statistical Data Analysis / Stat 6
Computing and Statistical Data Analysis / Stat 7
TRISEP 2016 / Statistics Lecture 1
TAE 2017 / Statistics Lecture 2
Statistical Methods for HEP Lecture 3: Discovery and Limits
Decomposition of Stat/Sys Errors
TAE 2018 Centro de ciencias Pedro Pascual Benasque, Spain
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan, RHUL Physics Discussion on significance

p-values The standard way to quantify the significance of a discovery is to give the p-value of the background-only hypothesis H0: p = Prob( data equally or more incompatible with H0 | H0 ) Requires a definition of what data values constitute a lesser level of compatibility with H0 relative to the level found with the observed data. Define this to get high probability to reject H0 if a particular signal model (or class of models) is true. Note that actual confidence in whether a real discovery is made depends also on other factors, e.g., plausibility of signal, degree to which it describes the data, reliability of the model used to find the p-value. p-value is really only first step! G. Cowan, RHUL Physics Discussion on significance

Significance from p-value Often define significance Z as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p-value. TMath::Prob TMath::NormQuantile Z = 5 corresponds to p = 2.87 × 10-7 G. Cowan, RHUL Physics Discussion on significance

Sensitivity (expected significance) The significance with which one rejects the SM depends on the particular data set obtained. To characterize the sensitivity of a planned analysis, give the expected (e.g., mean or median) significance assuming a given signal model. To determine accurately could in principle require an MC study. Often sufficient to evaluate with representative (e.g. “Asimov”) data. G. Cowan, RHUL Physics Discussion on significance

Significance for single counting experiment Suppose we measure n events, expect s signal, b background. n ~ Poisson(s+b) Find p-value of s = 0 hypothesis. data values with n ≥ nobs constitute lesser compatibility G. Cowan, RHUL Physics Discussion on significance

Simple counting experiment with LR Equivalently can write expectation value of n as where m is a strength parameter (background-only is m = 0). To test a value of m, construct likelihood ratio where muhat is the Maximum Likelihood Estimator (MLE), which we constrain to be positive: G. Cowan, RHUL Physics Discussion on significance

p-value from LR Also define High values correspond to increasing incompatibility with m. For discovery we are testing m = 0. We find The p-value is G. Cowan, RHUL Physics Discussion on significance

Significance from LR using c2 approx. For large enough n, we can regard qm as continuous, and find Furthermore, for large enough n, the distribution of qm approaches a form related to the chi-square distribution for 1 d.o.f. Complications arise from requirement that m be positive, but end result simple. For test of m = 0 (discovery), significance is G. Cowan, RHUL Physics Discussion on significance

Sensitivity for simple counting exp. Find median significance from median n, which is approximately s + b when this is sufficiently large. Or, if using the approximate formula based on chi-square, approximate median by substituting s + b for n (“Asimov” data) For s << b, expanding logarithm and keeping terms to O(s2), G. Cowan, RHUL Physics Discussion on significance

Simple counting exp. with bkg. uncertainty Suppose b consists of several components, and that these are not precisely known but estimated from subsidiary measurements: n ~ Poisson, mi ~ Poisson, Likelihood function for full set of measurements is: G. Cowan, RHUL Physics Discussion on significance

Profile likelihood ratio To account for the nuisance parameters (systematics), test m with the profile likelihood ratio: Double hat: maximize L for the given m Single hats: maximize L wrt m and b. Important point is that qm = -2 ln l(m) still related to chi-square distribution even with nuisance parameters (for sufficiently large sample), so retain the simple formula for significance: G. Cowan, RHUL Physics Discussion on significance

Examples from recent HN posts From recent hypernews (Tetiana Hrynova, Xavier Prudent), Consider s = 20.4, b = 2.5 ± 1.5. What is “correct” sensitivity? First suppose b = 2.5 exactly, then: 1) Use MC to find median, assuming s = 20.4, of Best 2) Use formula based on chi-square approx. for likelihood ratio: Good for s+b > dozen? 3) Use Here OK for s << b, b > dozen? G. Cowan, RHUL Physics Discussion on significance

Examples from recent HN posts (2) To take into account the uncertainty in the background, need to understand the origin of the 2.5 ± 1.5. Is this e.g. an estimate based on a Poisson measurement? Use profile likelihood for nuisance parameter b. Or is it a Gaussian prior (truncated at zero) with mean 2.5, s = 1.5? Use “Cousins-Highland” G. Cowan, RHUL Physics Discussion on significance

Provisional conclusions Key is to view p-value as the basic quantity of interest; Z is equivalent, and all “magic formulae” are various approximations for Z. Also other considerations for discovery (and limits) beyond p-value, e.g., level to which signal described by data, plausibility of signal model, reliability of model for p-value, … Also consider e.g. Bayes factors for complementary info. StatForum should move towards firm recommendations on what formulae to use where possible, but cannot investigate every approximation – analysts must take some responsibility here. Draft note (INT) attached to agenda on discovery significance; will also have partner note on limits. G. Cowan, RHUL Physics Discussion on significance