Computing and Statistical Data Analysis / Stat 11

Slides:



Advertisements
Similar presentations
Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
Advertisements

G. Cowan Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian methods for HEP / DESY Terascale School page 1 Bayesian statistical methods for HEP Terascale Statistics School DESY, Hamburg.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Statistical Data Analysis / Stat 4 1 Statistical Data Analysis Stat 4: confidence intervals, limits, discovery London Postgraduate Lectures on.
G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Shandong seminar / 1 September Some Developments in Statistical Methods for Particle Physics Particle Physics Seminar Shandong University.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits Lecture 2: Tests based on likelihood.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
G. Cowan Orsay 2014 / Discussion on Statistics1 Topics in Statistics for Particle Physics Discussion on Statistics LAL Orsay 16 June 2014 Glen Cowan Physics.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 31 Statistical Methods for Particle Physics Lecture 3: asymptotics I; Asimov data set Statistical.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.
G. Cowan NExT Workshop, 2015 / GDC Lecture 11 Statistical Methods for Particle Physics Lecture 1: introduction & statistical tests Fifth NExT PhD Workshop:
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
G. Cowan Terascale Statistics School 2015 / Statistics for Run II1 Statistics for LHC Run II Terascale Statistics School DESY, Hamburg March 23-27, 2015.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 2: Discovery.
G. Cowan IHEP seminar / 19 August Some Developments in Statistical Methods for Particle Physics Particle Physics Seminar IHEP, Beijing 19 August,
G. Cowan iSTEP 2014, Beijing / Statistics for Particle Physics / Lecture 31 Statistical Methods for Particle Physics Lecture 3: systematic uncertainties.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,
G. Cowan SOS 2010 / Statistical Tests and Limits -- lecture 31 Statistical Tests and Limits Lecture 3 IN2P3 School of Statistics Autrans, France 17—21.
G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,
G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
Discussion on significance
Status of the Higgs to tau tau
iSTEP 2016 Tsinghua University, Beijing July 10-20, 2016
Tutorial on Statistics TRISEP School 27, 28 June 2016 Glen Cowan
Estimating Statistical Significance
Statistics for the LHC Lecture 3: Setting limits
Some Statistical Tools for Particle Physics
iSTEP 2015 Shandong University, Jinan August 11-19, 2015
LAL Orsay, 2016 / Lectures on Statistics
Comment on Event Quality Variables for Multivariate Analyses
Lecture 4 1 Probability (90 min.)
Tutorial on Multivariate Methods (TMVA)
Chapter 9 Hypothesis Testing.
Recent developments in statistical methods for particle physics
Statistical Methods for Particle Physics Lecture 3: further topics
Statistical Methods for Particle Physics Lecture 3: further topics
Computing and Statistical Data Analysis / Stat 8
Statistical Methods for Particle Physics (II)
TAE 2017 / Statistics Lecture 3
Statistical Methods for Particle Physics Lecture 3: further topics
Statistical Treatment of Data in HEP
TAE 2018 / Statistics Lecture 3
Some Topics in Statistical Data Analysis
Statistical Data Analysis / Stat 4
Terascale Statistics School DESY, February, 2018
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
TAE 2018 / Statistics Lecture 4
Statistical Tests and Limits Lecture 2: Limits
Some Prospects for Statistical Data Analysis in High Energy Physics
Computing and Statistical Data Analysis / Stat 7
TRISEP 2016 / Statistics Lecture 1
TAE 2017 / Statistics Lecture 2
Statistical Methods for HEP Lecture 3: Discovery and Limits
Decomposition of Stat/Sys Errors
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 10
Statistical Methods for Particle Physics Lecture 3: further topics
Presentation transcript:

Computing and Statistical Data Analysis / Stat 11 Computing and Statistical Data Analysis Stat 11: Nuisance parameters, Bayes factors London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan Course web page: www.pp.rhul.ac.uk/~cowan/stat_course.html G. Cowan Computing and Statistical Data Analysis / Stat 11

Statistical tests for a non-established signal (i.e. a “search”) Consider a parameter m proportional to the rate of a signal process whose existence is not yet established. Suppose the model for the data includes both m and a set of nuisance parameters θ. To test hypothetical values of m , use profile likelihood ratio: maximizes L for specified μ maximize L G. Cowan Computing and Statistical Data Analysis / Stat 11

Test statistic for discovery Try to reject background-only (μ = 0) hypothesis using That is, here we regard positive m as the relevant alternative, so the critical region is taken to correspond to high values of . Note that even though here physically m ≥ 0, we allow to be negative. In large sample limit its distribution becomes Gaussian, and this will allow us to write down simple expressions for distributions of our test statistics. G. Cowan Computing and Statistical Data Analysis / Stat 11

Test statistic for upper limits For purposes of setting an upper limit on μ use Here we regard the relevant alternative to be low μ, so we define the critical region to correspond to low values of . G. Cowan Computing and Statistical Data Analysis / Stat 11

Wald approximation for profile likelihood ratio To find p-values, we need: For median significance under alternative, need: Use approximation due to Wald (1943) sample size G. Cowan Computing and Statistical Data Analysis / Stat 11

Noncentral chi-square for -2lnλ(μ) If we can neglect the O(1/√N) term, -2lnλ(μ) follows a noncentral chi-square distribution for one degree of freedom with noncentrality parameter As a special case, if μ′ = μ then Λ = 0 and -2lnλ(μ) follows a chi-square distribution for one degree of freedom (Wilks). G. Cowan Computing and Statistical Data Analysis / Stat 11

Distribution of q0 in large-sample limit Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 Distribution of q0 in large-sample limit Assuming approximations valid in the large sample (asymptotic) limit, we can write down the full distribution of q0 as The special case μ′ = 0 is a “half chi-square” distribution: In large sample limit, f(q0|0) independent of nuisance parameters; f(q0|μ′) depends on nuisance parameters through σ. G. Cowan Computing and Statistical Data Analysis / Stat 11

Cumulative distribution of q0, significance Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 Cumulative distribution of q0, significance From the pdf, the cumulative distribution of q0 is found to be The special case μ′ = 0 is The p-value of the μ = 0 hypothesis is Therefore the discovery significance Z is simply G. Cowan Computing and Statistical Data Analysis / Stat 11

Computing and Statistical Data Analysis / Stat 11 Example of a p-value ATLAS, Phys. Lett. B 716 (2012) 1-29 G. Cowan Computing and Statistical Data Analysis / Stat 11

Computing and Statistical Data Analysis / Stat 11 Cowan, Cranmer, Gross, Vitells, arXiv:1007.1727, EPJC 71 (2011) 1554 Distribution of qμ Similar results for qμ G. Cowan Computing and Statistical Data Analysis / Stat 11

Example of upper limits ATLAS, Phys. Lett. B 716 (2012) 1-29 Not exactly what we described earlier as these are “CLs” limits; see, e.g., GDC, arXiv:1307.2487. G. Cowan Computing and Statistical Data Analysis / Stat 11

Profile likelihood with b uncertain This is the well studied “on/off” problem: Cranmer 2005; Cousins, Linnemann, and Tucker 2008; Li and Ma 1983,... Measure two Poisson distributed values: n ~ Poisson(s+b) (primary or “search” measurement) m ~ Poisson(τb) (control measurement, τ known) The likelihood function is Use this to construct profile likelihood ratio (b is nuisance parmeter): G. Cowan Computing and Statistical Data Analysis / Stat 11

Ingredients for profile likelihood ratio To construct profile likelihood ratio from this need estimators: and in particular to test for discovery (s = 0), G. Cowan Computing and Statistical Data Analysis / Stat 11

Tests of asympotic formulae Cowan, Cranmer, Gross, Vitells, EPJC 71 (2011) 1554; arXiv:1007.1727 G. Cowan Computing and Statistical Data Analysis / Stat 11

Tests of asympotic formulae Cowan, Cranmer, Gross, Vitells, EPJC 71 (2011) 1554; arXiv:1007.1727 G. Cowan Computing and Statistical Data Analysis / Stat 11

Expected (or median) significance / sensitivity When planning the experiment, we want to quantify how sensitive we are to a potential discovery, e.g., by given median significance assuming some nonzero strength parameter μ ′. So for p-value, need f(q0|0), for sensitivity, will need f(q0|μ ′), G. Cowan Computing and Statistical Data Analysis / Stat 11

Expected discovery significance for counting experiment with background uncertainty I. Discovery sensitivity for counting experiment with b known: (a) (b) Profile likelihood ratio test & Asimov: II. Discovery sensitivity with uncertainty in b, σb: (a) (b) Profile likelihood ratio test & Asimov: G. Cowan Computing and Statistical Data Analysis / Stat 11

Counting experiment with known background Count a number of events n ~ Poisson(s+b), where s = expected number of events from signal, b = expected number of background events. To test for discovery of signal compute p-value of s = 0 hypothesis, Usually convert to equivalent significance: where Φ is the standard Gaussian cumulative distribution, e.g., Z > 5 (a 5 sigma effect) means p < 2.9 ×10-7. To characterize sensitivity to discovery, give expected (mean or median) Z under assumption of a given s. G. Cowan Computing and Statistical Data Analysis / Stat 11

s/√b for expected discovery significance For large s + b, n → x ~ Gaussian(m,s) , m = s + b, s = √(s + b). For observed value xobs, p-value of s = 0 is Prob(x > xobs | s = 0),: Significance for rejecting s = 0 is therefore Expected (median) significance assuming signal rate s is G. Cowan Computing and Statistical Data Analysis / Stat 11

Better approximation for significance Poisson likelihood for parameter s is For now no nuisance params. To test for discovery use profile likelihood ratio: So the likelihood ratio statistic for testing s = 0 is G. Cowan Computing and Statistical Data Analysis / Stat 11

Approximate Poisson significance (continued) For sufficiently large s + b, (use Wilks’ theorem), To find median[Z|s], let n → s + b (i.e., the Asimov data set): This reduces to s/√b for s << b. G. Cowan Computing and Statistical Data Analysis / Stat 11

Computing and Statistical Data Analysis / Stat 11 n ~ Poisson(s+b), median significance, assuming s, of the hypothesis s = 0 CCGV, EPJC 71 (2011) 1554, arXiv:1007.1727 “Exact” values from MC, jumps due to discrete data. Asimov √q0,A good approx. for broad range of s, b. s/√b only good for s « b. G. Cowan Computing and Statistical Data Analysis / Stat 11

Extending s/√b to case where b uncertain The intuitive explanation of s/√b is that it compares the signal, s, to the standard deviation of n assuming no signal, √b. Now suppose the value of b is uncertain, characterized by a standard deviation σb. A reasonable guess is to replace √b by the quadratic sum of √b and σb, i.e., This has been used to optimize some analyses e.g. where σb cannot be neglected. G. Cowan Computing and Statistical Data Analysis / Stat 11

Profile likelihood with b uncertain This is the well studied “on/off” problem: Cranmer 2005; Cousins, Linnemann, and Tucker 2008; Li and Ma 1983,... Measure two Poisson distributed values: n ~ Poisson(s+b) (primary or “search” measurement) m ~ Poisson(τb) (control measurement, τ known) The likelihood function is Use this to construct profile likelihood ratio (b is nuisance parmeter): G. Cowan Computing and Statistical Data Analysis / Stat 11

Ingredients for profile likelihood ratio To construct profile likelihood ratio from this need estimators: and in particular to test for discovery (s = 0), G. Cowan Computing and Statistical Data Analysis / Stat 11

Asymptotic significance Use profile likelihood ratio for q0, and then from this get discovery significance using asymptotic approximation (Wilks’ theorem): Essentially same as in: G. Cowan Computing and Statistical Data Analysis / Stat 11

Asimov approximation for median significance To get median discovery significance, replace n, m by their expectation values assuming background-plus-signal model: n → s + b m → τb ˆ Or use the variance of b = m/τ, , to eliminate τ: G. Cowan Computing and Statistical Data Analysis / Stat 11

Computing and Statistical Data Analysis / Stat 11 Limiting cases Expanding the Asimov formula in powers of s/b and σb2/b (= 1/τ) gives So the “intuitive” formula can be justified as a limiting case of the significance from the profile likelihood ratio test evaluated with the Asimov data set. G. Cowan Computing and Statistical Data Analysis / Stat 11

Testing the formulae: s = 5 G. Cowan Computing and Statistical Data Analysis / Stat 11

Using sensitivity to optimize a cut G. Cowan Computing and Statistical Data Analysis / Stat 11

Bayesian model selection (‘discovery’) The probability of hypothesis H0 relative to its complementary alternative H1 is often given by the posterior odds: no Higgs Higgs Bayes factor B01 prior odds The Bayes factor is regarded as measuring the weight of evidence of the data in support of H0 over H1. Interchangeably use B10 = 1/B01 G. Cowan Computing and Statistical Data Analysis / Stat 11

Assessing Bayes factors One can use the Bayes factor much like a p-value (or Z value). There is an “established” scale, analogous to HEP's 5σ rule: B10 Evidence against H0 -------------------------------------------- 1 to 3 Not worth more than a bare mention 3 to 20 Positive 20 to 150 Strong > 150 Very strong Kass and Raftery, Bayes Factors, J. Am Stat. Assoc 90 (1995) 773. G. Cowan Computing and Statistical Data Analysis / Stat 11

Rewriting the Bayes factor Suppose we have models Hi, i = 0, 1, ..., each with a likelihood and a prior pdf for its internal parameters so that the full prior is where is the overall prior probability for Hi. The Bayes factor comparing Hi and Hj can be written G. Cowan Computing and Statistical Data Analysis / Stat 11

Bayes factors independent of P(Hi) For Bij we need the posterior probabilities marginalized over all of the internal parameters of the models: Use Bayes theorem So therefore the Bayes factor is Ratio of marginal likelihoods The prior probabilities pi = P(Hi) cancel. G. Cowan Computing and Statistical Data Analysis / Stat 11

Numerical determination of Bayes factors Both numerator and denominator of Bij are of the form ‘marginal likelihood’ Various ways to compute these, e.g., using sampling of the posterior pdf (which we can do with MCMC). Harmonic Mean (and improvements) Importance sampling Parallel tempering (~thermodynamic integration) ... See e.g. G. Cowan Computing and Statistical Data Analysis / Stat 11

Harmonic mean estimator E.g., consider only one model and write Bayes theorem as: π(θ) is normalized to unity so integrate both sides, posterior expectation Therefore sample θ from the posterior via MCMC and estimate m with one over the average of 1/L (the harmonic mean of L). G. Cowan Computing and Statistical Data Analysis / Stat 11

Improvements to harmonic mean estimator The harmonic mean estimator is numerically very unstable; formally infinite variance (!). Gelfand & Dey propose variant: Rearrange Bayes thm; multiply both sides by arbitrary pdf f(θ): Integrate over θ : Improved convergence if tails of f(θ) fall off faster than L(x|θ)π(θ) Note harmonic mean estimator is special case f(θ) = π(θ). . G. Cowan Computing and Statistical Data Analysis / Stat 11

Computing and Statistical Data Analysis / Stat 11 Importance sampling Need pdf f(θ) which we can evaluate at arbitrary θ and also sample with MC. The marginal likelihood can be written Best convergence when f(θ) approximates shape of L(x|θ)π(θ). Use for f(θ) e.g. multivariate Gaussian with mean and covariance estimated from posterior (e.g. with MINUIT). G. Cowan Computing and Statistical Data Analysis / Stat 11

Bayes factor computation discussion Also tried method of parallel tempering; see note on course web page and also Harmonic mean OK for very rough estimate. I had trouble with all of the methods based on posterior sampling. Importance sampling worked best, but may not scale well to higher dimensions. Lots of discussion of this problem in the literature, e.g., G. Cowan Computing and Statistical Data Analysis / Stat 11