Statistical Tests and Limits Lecture 2: Limits

Slides:



Advertisements
Similar presentations
Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
Advertisements

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
G. Cowan Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian methods for HEP / DESY Terascale School page 1 Bayesian statistical methods for HEP Terascale Statistics School DESY, Hamburg.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion.
G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Statistical Data Analysis / Stat 4 1 Statistical Data Analysis Stat 4: confidence intervals, limits, discovery London Postgraduate Lectures on.
G. Cowan Lectures on Statistical Data Analysis Lecture 13 page 1 Statistical Data Analysis: Lecture 13 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Statistical Decision Theory
G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 2 page 1 Statistical Methods in Particle Physics Lecture 2: Limits and Discovery.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
G. Cowan SOS 2010 / Statistical Tests and Limits -- lecture 31 Statistical Tests and Limits Lecture 3 IN2P3 School of Statistics Autrans, France 17—21.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Discussion on significance
BAYES and FREQUENTISM: The Return of an Old Controversy
3. The X and Y samples are independent of one another.
Confidence Intervals and Limits
Statistics for HEP Lecture 3: More on Discovery and Limits
Statistics for the LHC Lecture 3: Setting limits
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing: Hypotheses
Lecture 4 1 Probability (90 min.)
Chapter 9 Hypothesis Testing.
Graduierten-Kolleg RWTH Aachen February 2014 Glen Cowan
Computing and Statistical Data Analysis / Stat 8
Statistical Methods for Particle Physics (II)
HCPSS 2016 / Statistics Lecture 1
Lecture 3 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
TAE 2018 / Statistics Lecture 3
Statistics for Particle Physics Lecture 2: Statistical Tests
Statistical Data Analysis / Stat 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for Particle Physics Lecture 3: Parameter Estimation
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 7
TAE 2017 / Statistics Lecture 2
Introduction to Unfolding
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for HEP Roger Barlow Manchester University
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Statistical Tests and Limits Lecture 2: Limits IN2P3 School of Statistics Autrans, France 17—21 May, 2010 Glen Cowan Physics Department Royal Holloway, University of London g.cowan@rhul.ac.uk www.pp.rhul.ac.uk/~cowan G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Outline Lecture 1: General formalism Definition and properties of a statistical test Significance tests (and goodness-of-fit) , p-values Lecture 2: Setting limits Confidence intervals Bayesian Credible intervals Lecture 3: Further topics for tests and limits More on systematics / nuisance parameters Look-elsewhere effect CLs Bayesian model selection G. Cowan S0S 2010 / Statistical Tests and Limits

Interval estimation — introduction In addition to a ‘point estimate’ of a parameter we should report an interval reflecting its statistical uncertainty. Desirable properties of such an interval may include: communicate objectively the result of the experiment; have a given probability of containing the true parameter; provide information needed to draw conclusions about the parameter possibly incorporating stated prior beliefs. Often use +/- the estimated standard deviation of the estimator. In some cases, however, this is not adequate: estimate near a physical boundary, e.g., an observed event rate consistent with zero. We will look at both Frequentist and Bayesian intervals. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Frequentist confidence intervals Consider an estimator for a parameter q and an estimate We also need for all possible q its sampling distribution Specify upper and lower tail probabilities, e.g., a = 0.05, b = 0.05, then find functions ua(q) and vb(q) such that: G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Confidence interval from the confidence belt The region between ua(q) and vb(q) is called the confidence belt. Find points where observed estimate intersects the confidence belt. This gives the confidence interval [a, b] Confidence level = 1 - a - b = probability for the interval to cover true value of the parameter (holds for any possible true q). G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Confidence intervals by inverting a test Confidence intervals for a parameter q can be found by defining a test of the hypothesized value q (do this for all q): Specify values of the data that are ‘disfavoured’ by q (critical region) such that P(data in critical region) ≤ g for a prespecified g, e.g., 0.05 or 0.1. If data observed in the critical region, reject the value q . Now invert the test to define a confidence interval as: set of q values that would not be rejected in a test of size g (confidence level is 1 - g ). The interval will cover the true value of q with probability ≥ 1 - g. Equivalent to confidence belt construction; confidence belt is acceptance region of a test. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Relation between confidence interval and p-value Equivalently we can consider a significance test for each hypothesized value of q, resulting in a p-value, pq.. If pq < g, then we reject q. The confidence interval at CL = 1 – g consists of those values of q that are not rejected. E.g. an upper limit on q is the greatest value for which pq ≥ g. In practice find by setting pq = g and solve for q. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Confidence intervals in practice The recipe to find the interval [a, b] boils down to solving → a is hypothetical value of q such that → b is hypothetical value of q such that G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Meaning of a confidence interval G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Central vs. one-sided confidence intervals G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Intervals from the likelihood function In the large sample limit it can be shown for ML estimators: (n-dimensional Gaussian, covariance V) defines a hyper-ellipsoidal confidence region, If then G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Distance between estimated and true q G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Approximate confidence regions from L( ) So the recipe to find the confidence region with CL = 1- is: For finite samples, these are approximate confidence regions. Coverage probability not guaranteed to be equal to 1- ; no simple theorem to say by how far off it will be (use MC). Remember here the interval is random, not the parameter. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Example of interval from ln L( ) For n=1 parameter, CL = 0.683, Qg = 1. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Setting limits: Poisson data with background Count n events, e.g., in fixed time or integrated luminosity. s = expected number of signal events b = expected number of background events n ~ Poisson(s+b): Suppose the number of events found is roughly equal to the expected number of background events, e.g., b = 4.6 and we observe nobs = 5 events. The evidence for the presence of signal events is not statistically significant, → set upper limit on the parameter s, taking into consideration any uncertainty in b. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Upper limit for Poisson parameter Find the hypothetical value of s such that there is a given small probability, say, g = 0.05, to find as few events as we did or less: Solve numerically for s = sup, this gives an upper limit on s at a confidence level of 1-g. Example: suppose b = 0 and we find nobs = 0. For 1-g = 0.95, → G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Calculating Poisson parameter limits To solve for slo, sup, can exploit relation to 2 distribution: Quantile of 2 distribution For low fluctuation of n this can give negative result for sup; i.e. confidence interval is empty. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Limits near a physical boundary Suppose e.g. b = 2.5 and we observe n = 0. If we choose CL = 0.9, we find from the formula for sup Physicist: We already knew s ≥ 0 before we started; can’t use negative upper limit to report result of expensive experiment! Statistician: The interval is designed to cover the true value only 90% of the time — this was clearly not one of those times. Not uncommon dilemma when limit of parameter is close to a physical boundary, cf. mn estimated using E2 - p2 . G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Expected limit for s = 0 Physicist: I should have used CL = 0.95 — then sup = 0.496 Even better: for CL = 0.917923 we get sup = 10-4 ! Reality check: with b = 2.5, typical Poisson fluctuation in n is at least √2.5 = 1.6. How can the limit be so low? Look at the mean (or median) limit for the no-signal hypothesis (s = 0) (sensitivity). Distribution of 95% CL limits with b = 2.5, s = 0. Mean upper limit = 4.44 G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Profile likelihood ratio for upper limits For purposes of setting an upper limit on m use Note for purposes of setting an upper limit, one does not regard an upwards fluctuation of the data as representing incompatibility with the hypothesized m. But in contrast to the CSC Higgs combination, here we are letting the estimator for m go negative (à la Fayard, Andari et al.). G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Alternative test statistic for upper limits Assume physical signal model has m > 0, therefore if estimator for m comes out negative, the closest physical model has m = 0. Therefore could also measure level of discrepancy between data and hypothesized m with This is in fact the test statistic used in the Higgs CSC combination. Performance not identical to but very close to qm (of previous slide). qm is in certain ways simpler (hence preferred). G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Relation between test statistics and Similarly, qm and qm also have monotonic relation with m. ~ ˆ And therefore quantiles of qm, qm can be obtained directly from those of m (which is Gaussian). ˆ ̃ G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Distribution of qm Similar results for qm G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Distribution of qm ̃ Similar results for qm ̃ G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

An example O. Vitells, E. Gross G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

The Bayesian approach In Bayesian statistics need to start with ‘prior pdf’ p(q), this reflects degree of belief about q before doing the experiment. Bayes’ theorem tells how our beliefs should be updated in light of the data x: Integrate posterior pdf p(q | x) to give interval with any desired probability content. For e.g. Poisson parameter 95% CL upper limit from G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Bayesian prior for Poisson parameter Include knowledge that s ≥0 by setting prior p(s) = 0 for s<0. Often try to reflect ‘prior ignorance’ with e.g. Not normalized but this is OK as long as L(s) dies off for large s. Not invariant under change of parameter — if we had used instead a flat prior for, say, the mass of the Higgs boson, this would imply a non-flat prior for the expected number of Higgs events. Doesn’t really reflect a reasonable degree of belief, but often used as a point of reference; or viewed as a recipe for producing an interval whose frequentist properties can be studied (coverage will depend on true s). G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Bayesian interval with flat prior for s Solve numerically to find limit sup. For special case b = 0, Bayesian upper limit with flat prior numerically same as classical case (‘coincidence’). Otherwise Bayesian limit is everywhere greater than classical (‘conservative’). Never goes negative. Doesn’t depend on b if n = 0. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements. Often called “objective priors” Form basis of Objective Bayesian Statistics The priors do not reflect a degree of belief (but might represent possible extreme cases). In a Subjective Bayesian analysis, using objective priors is an important part of the sensitivity analysis. In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties. For a review see: G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Jeffreys’ prior According to Jeffreys’ rule, take prior according to where is the Fisher information matrix. One can show that this leads to inference that is invariant under a transformation of parameters. For a Gaussian mean, the Jeffreys prior is constant; for a Poisson mean m it is proportional to 1/√m. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Likelihood ratio limits (Feldman-Cousins) Define likelihood ratio for hypothesized parameter value s: Here is the ML estimator, note Critical region defined by low values of likelihood ratio. Resulting intervals can be one- or two-sided (depending on n). (Re)discovered for HEP by Feldman and Cousins, Phys. Rev. D 57 (1998) 3873. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

More on intervals from LR test (Feldman-Cousins) Caveat with coverage: suppose we find n >> b. Usually one then quotes a measurement: If, however, n isn’t large enough to claim discovery, one sets a limit on s. FC pointed out that if this decision is made based on n, then the actual coverage probability of the interval can be less than the stated confidence level (‘flip-flopping’). FC intervals remove this, providing a smooth transition from 1- to 2-sided intervals, depending on n. But, suppose FC gives e.g. 0.1 < s < 5 at 90% CL, p-value of s=0 still substantial. Part of upper-limit ‘wasted’? G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Properties of upper limits Example: take b = 5.0, 1 -  = 0.95 Upper limit sup vs. n Mean upper limit vs. s G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Upper limit versus b Feldman & Cousins, PRD 57 (1998) 3873 If n = 0 observed, should upper limit depend on b? Classical: yes Bayesian: no FC: yes G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Coverage probability of intervals Because of discreteness of Poisson data, probability for interval to include true value in general > confidence level (‘over-coverage’) G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Wrapping up lecture 2 In large sample limit and away from physical boundaries, +/- 1 standard deviation is all you need for 68% CL. Frequentist confidence intervals Complicated! Random interval that contains true parameter with fixed probability. Can be obtained by inversion of a test; freedom left as to choice of test. Log-likelihood can be used to determine approximate confidence intervals (or regions) Bayesian intervals Conceptually easy — just integrate posterior pdf. Requires choice of prior. G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2

Extra slides G. Cowan S0S 2010 / Statistical Tests and Limits -- lecture 2