G. Cowan S0S 2010 / Statistical Tests and Limits 1 Statistical Tests and Limits Lecture 1: general formalism IN2P3 School of Statistics Autrans, France.

Slides:



Advertisements
Similar presentations
Computing and Statistical Data Analysis / Stat 4
Advertisements

Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:
G. Cowan Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
Statistical Data Analysis Stat 3: p-values, parameter estimation
G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan 2007 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability densities,
G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,
Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 1
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits Lecture 2: Tests based on likelihood.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits Lecture 1: Introduction, Statistical.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 1 1 Statistical Methods for Particle Physics Lecture 1: Introduction, Statistical Tests,
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
G. Cowan Lectures on Statistical Data Analysis Lecture 7 page 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem 2Random variables and.
Statistical Decision Theory
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 11 Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN,
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 21 Topics in Statistical Data Analysis for HEP Lecture 2: Statistical Tests CERN.
G. Cowan CLASHEP 2011 / Topics in Statistical Data Analysis / Lecture 1 1 Topics in Statistical Data Analysis for HEP Lecture 1: Parameter Estimation CERN.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Lecture 2 1 Probability Definition, Bayes’ theorem, probability densities and their properties,
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 31 Introduction to Statistics − Day 3 Lecture 1 Probability Random variables, probability.
G. Cowan Weizmann Statistics Workshop, 2015 / GDC Lecture 31 Statistical Methods for Particle Physics Lecture 3: asymptotics I; Asimov data set Statistical.
G. Cowan RHUL Physics page 1 Status of search procedures for ATLAS ATLAS-CMS Joint Statistics Meeting CERN, 15 October, 2009 Glen Cowan Physics Department.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan TAE 2010 / Statistics for HEP / Lecture 11 Statistical Methods for HEP Lecture 1: Introduction Taller de Altas Energías 2010 Universitat de Barcelona.
G. Cowan NExT Workshop, 2015 / GDC Lecture 11 Statistical Methods for Particle Physics Lecture 1: introduction & statistical tests Fifth NExT PhD Workshop:
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Statistical methods for particle physics / Cambridge Recent developments in statistical methods for particle physics HEP Phenomenology.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan IDPASC School of Flavour Physics, Valencia, 2-7 May 2013 / Statistical Analysis Tools 1 Statistical Analysis Tools for Particle Physics IDPASC.
1 Introduction to Statistics − Day 2 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 6 page 1 Statistical Data Analysis: Lecture 6 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 2: Discovery.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 21 Statistical Methods for Particle Physics Lecture 2: statistical tests, multivariate.
G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 1 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 1: Introduction,
G. Cowan RHUL Physics Statistical Issues for Higgs Search page 1 Statistical Issues for Higgs Search ATLAS Statistics Forum CERN, 16 April, 2007 Glen Cowan.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,
G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.
G. Cowan CERN Academic Training 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits Academic Training Lectures CERN,
Statistics for HEP Lecture 1: Introduction and basic formalism
Discussion on significance
Comment on Event Quality Variables for Multivariate Analyses
Lecture 4 1 Probability (90 min.)
Statistical Methods for the LHC Discovering New Physics
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department
Computing and Statistical Data Analysis / Stat 6
Computing and Statistical Data Analysis / Stat 7
Statistical Methods for HEP Lecture 3: Discovery and Limits
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Presentation transcript:

G. Cowan S0S 2010 / Statistical Tests and Limits 1 Statistical Tests and Limits Lecture 1: general formalism IN2P3 School of Statistics Autrans, France 17—21 May, 2010 Glen Cowan Physics Department Royal Holloway, University of London TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A

G. Cowan S0S 2010 / Statistical Tests and Limits 1 page 2 Outline Lecture 1: General formalism Definition and properties of a statistical test Significance tests (and goodness-of-fit), p-values Lecture 2: Setting limits Confidence intervals Bayesian Credible intervals Lecture 3: Further topics for tests and limits More on systematics / nuisance parameters Look-elsewhere effect CLs Bayesian model selection

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 3 Hypotheses A hypothesis H specifies the probability for the data, i.e., the outcome of the observation, here symbolically: x. x could be uni-/multivariate, continuous or discrete. E.g. write x ~ f (x|H). Possible values of x form the sample space S (or “data space”). Simple (or “point”) hypothesis: f (x|H) completely specified. Composite hypothesis: H contains unspecified parameter(s). The probability for x given H is also called the likelihood of the hypothesis, written L(x|H).

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 4 Definition of a test Consider e.g. a simple hypothesis H 0 and alternative H 1. A test of H 0 is defined by specifying a critical region W of the data space such that there is no more than some (small) probability , assuming H 0 is correct, to observe the data there, i.e., P(x  W | H 0 ) ≤  If x is observed in the critical region, reject H 0.  is called the size or significance level of the test. Critical region also called “rejection” region; complement is acceptance region.

G. Cowan S0S 2010 / Statistical Tests and Limits page 5 Frequentist Statistics − general philosophy In frequentist statistics, probabilities are associated only with the data, i.e., outcomes of repeatable observations. Probability = limiting frequency Probabilities such as P (Higgs boson exists), P (0.117 <  s < 0.121), etc. are either 0 or 1, but we don’t know which. The tools of frequentist statistics tell us what to expect, under the assumption of certain probabilities, about hypothetical repeated observations. The preferred theories (models, hypotheses,...) are those for which our observations would be considered ‘usual’.

G. Cowan S0S 2010 / Statistical Tests and Limits page 6 Bayesian Statistics − general philosophy In Bayesian statistics, interpretation of probability extended to degree of belief (subjective probability). Use this for hypotheses: posterior probability, i.e., after seeing the data prior probability, i.e., before seeing the data probability of the data assuming hypothesis H (the likelihood) normalization involves sum over all possible hypotheses Bayesian methods can provide more natural treatment of non- repeatable phenomena: systematic uncertainties, probability that Higgs boson exists,... No golden rule for priors (“if-then” character of Bayes’ thm.)

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 7 Rejecting a hypothesis Note that rejecting H 0 is not necessarily equivalent to the statement that we believe it is false and H 1 true. In frequentist statistics only associate probability with outcomes of repeatable observations (the data). In Bayesian statistics, probability of the hypothesis (degree of belief) would be found using Bayes’ theorem: which depends on the prior probability  (H). What makes a frequentist test useful is that we can compute the probability to accept/reject a hypothesis assuming that it is true, or assuming some alternative is true.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 8 Type-I, Type-II errors Rejecting the hypothesis H 0 when it is true is a Type-I error. The maximimum probability for this is the size of the test: P(x  W | H 0 ) ≤  But we might also accept H 0 when it is false, and an alternative H 1 is true. This is called a Type-II error, and occurs with probability P(x  S  W | H 1 ) =  One minus this is called the power of the test with respect to the alternative H 1 : Power =  

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 9 For each reaction we consider we will have a hypothesis for the pdf of, e.g., Statistical test in a particle physics context Suppose the result of a measurement for an individual event is a collection of numbers x 1 = number of muons, x 2 = mean p t of jets, x 3 = missing energy,... follows some n-dimensional joint pdf, which depends on the type of event produced, i.e., was it etc. Often call H 0 the background hypothesis (e.g. SM events); H 1, H 2,... are possible signal hypotheses.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 10 A simulated SUSY event in ATLAS high p T muons high p T jets of hadrons missing transverse energy p p

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 11 Background events This event from Standard Model ttbar production also has high p T jets and muons, and some missing transverse energy. → can easily mimic a SUSY event.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 12 Selecting events Suppose we have a data sample with two kinds of events, corresponding to hypotheses H 0 and H 1 and we want to select those of type H 0. Each event is a point in space. What ‘decision boundary’ should we use to accept/reject events as belonging to event type H 0 ? accept H1H1 H0H0 Perhaps select events with ‘cuts’:

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 13 Other ways to select events Or maybe use some other sort of decision boundary: accept H1H1 H0H0 H1H1 H0H0 linearor nonlinear How can we do this in an ‘optimal’ way?

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 14 Test statistics Construct a ‘test statistic’ of lower dimension (e.g. scalar) We can work out the pdfs Goal is to compactify data without losing ability to discriminate between hypotheses. Decision boundary is now a single ‘cut’ on t. This effectively divides the sample space into two regions, where we accept or reject H 0 and thus defines a statistical test.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 15 Significance level and power Probability to reject H 0 if it is true (type-I error): (significance level) Probability to accept H 0 if H 1 is true (type-II error): (  power)

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 16 Signal/background efficiency Probability to reject background hypothesis for background event (background efficiency): Probability to accept a signal event as signal (signal efficiency):

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 17 Purity of event selection Suppose only one background type b; overall fractions of signal and background events are  s and  b (prior probabilities). Suppose we select events with t < t cut. What is the ‘purity’ of our selected sample? Here purity means the probability to be signal given that the event was accepted. Using Bayes’ theorem we find: So the purity depends on the prior probabilities as well as on the signal and background efficiencies.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 18 Constructing a test statistic How can we select events in an ‘optimal way’? Neyman-Pearson lemma states: To get the highest  s for a given  b (highest power for a given significance level), choose acceptance region such that where c is a constant which determines  s. Equivalently, optimal scalar test statistic is N.B. any monotonic function of this is leads to the same test.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 19 Proof of Neyman-Pearson lemma We want to determine the critical region W that maximizes the power subject to the constraint First, include in W all points where P(x|H 0 ) = 0, as they contribute nothing to the size, but potentially increase the power.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 20 Proof of Neyman-Pearson lemma (2) For P(x|H 0 ) ≠ 0 we can write the power as The ratio of 1 –  to  is therefore which is the average of the likelihood ratio P(x|H 1 ) / P(x|H 0 ) over the critical region W, assuming H 0. (1 –  ) /  is thus maximized if W contains the part of the sample space with the largest values of the likelihood ratio.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 21 Purity vs. efficiency — optimal trade-off Consider selecting n events: expected numbers s from signal, b from background; → n ~ Poisson (s + b) Suppose b is known and goal is to estimate s with minimum relative statistical error. Take as estimator: Variance of Poisson variable equals its mean, therefore → So we should maximize equivalent to maximizing product of signal efficiency  purity.

G. Cowan S0S 2010 / Statistical Tests and Limits page 22 Two distinct event selection problems In some cases, the event types in question are both known to exist. Example: separation of different particle types (electron vs muon) Use the selected sample for further study. In other cases, the null hypothesis H 0 means "Standard Model" events, and the alternative H 1 means "events of a type whose existence is not yet established" (to do so is the goal of the analysis). Many subtle issues here, mainly related to the heavy burden of proof required to establish presence of a new phenomenon. Typically require p-value of background-only hypothesis below ~ 10  (a 5 sigma effect) to claim discovery of "New Physics".

G. Cowan S0S 2010 / Statistical Tests and Limits page 23 Using test for discovery y f(y)f(y) y N(y)N(y) Normalized to unity Normalized to expected number of events excess? signal background search region Discovery = number of events found in search region incompatible with background-only hypothesis. p-value of background-only hypothesis can depend crucially distribution f(y|b) in the "search region". y cut

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 24 Multivariate methods Many new (and some old) methods for determining test: Fisher discriminant Neural networks Kernel density methods Support Vector Machines Decision trees Boosting Bagging New software for HEP, e.g., TMVA, Höcker, Stelzer, Tegenfeldt, Voss, Voss, physics/ More on this in the lectures by Kegl, Feindt, Hirshbuehl, Coadou. For the rest of these lectures, I will focus on other aspects of tests, e.g., discovery significance and exclusion limits.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 25 Testing significance / goodness-of-fit Suppose hypothesis H predicts pdf observations for a set of We observe a single point in this space: What can we say about the validity of H in light of the data? Decide what part of the data space represents less compatibility with H than does the point less compatible with H more compatible with H (Not unique!)

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 26 p-values where  (H) is the prior probability for H. Express level of agreement between data and hypothesis by giving the p-value for H: p = probability, under assumption of H, to observe data with equal or lesser compatibility with H relative to the data we got. This is not the probability that H is true! In frequentist statistics we don’t talk about P(H) (unless H represents a repeatable observation). In Bayesian statistics we do; use Bayes’ theorem to obtain For now stick with the frequentist approach; result is p-value, regrettably easy to misinterpret as P(H).

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 27 p-value example: testing whether a coin is ‘fair’ i.e. p = is the probability of obtaining such a bizarre result (or more so) ‘by chance’, under the assumption of H. Probability to observe n heads in N coin tosses is binomial: Hypothesis H: the coin is fair (p = 0.5). Suppose we toss the coin N = 20 times and get n = 17 heads. Region of data space with equal or lesser compatibility with H relative to n = 17 is: n = 17, 18, 19, 20, 0, 1, 2, 3. Adding up the probabilities for these values gives:

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 28 The significance of an observed signal Suppose we observe n events; these can consist of: n b events from known processes (background) n s events from a new process (signal) If n s, n b are Poisson r.v.s with means s, b, then n = n s + n b is also Poisson, mean = s + b: Suppose b = 0.5, and we observe n obs = 5. Should we claim evidence for a new discovery? Give p-value for hypothesis s = 0:

G. Cowan page 29 Significance from p-value Often define significance Z as the number of standard deviations that a Gaussian variable would fluctuate in one direction to give the same p-value. 1 - TMath::Freq TMath::NormQuantile S0S 2010 / Statistical Tests and Limits

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 30 The significance of a peak Suppose we measure a value x for each event and find: Each bin (observed) is a Poisson r.v., means are given by dashed lines. In the two bins with the peak, 11 entries found with b = 3.2. The p-value for the s = 0 hypothesis is:

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 31 The significance of a peak (2) But... did we know where to look for the peak? → give P(n ≥ 11) in any 2 adjacent bins Is the observed width consistent with the expected x resolution? → take x window several times the expected resolution How many bins  distributions have we looked at? → look at a thousand of them, you’ll find a effect Did we adjust the cuts to ‘enhance’ the peak? → freeze cuts, repeat analysis with new data How about the bins to the sides of the peak... (too low!) Should we publish????

G. Cowan page 32 When to publish HEP folklore is to claim discovery when p = 2.9  10 , corresponding to a significance Z = 5. This is very subjective and really should depend on the prior probability of the phenomenon in question, e.g., phenomenon reasonable p-value for discovery D 0 D 0 mixing~0.05 Higgs~ 10  7 (?) Life on Mars~10  Astrology   One should also consider the degree to which the data are compatible with the new phenomenon, not only the level of disagreement with the null hypothesis; p-value is only first step! S0S 2010 / Statistical Tests and Limits

page 33 Prototype search analysis Search for signal in a region of phase space; result is histogram of some variable x giving numbers: Assume the n i are Poisson distributed with expectation values G. Cowan S0S 2010 / Statistical Tests and Limits signal where background strength parameter Expected Performance of the ATLAS Experiment: Detector, Trigger and Physics, arXiv: , CERN-OPEN

page 34 Prototype analysis (II) Often also have a subsidiary measurement that constrains some of the background and/or shape parameters: Assume the m i are Poisson distributed with expectation values G. Cowan S0S 2010 / Statistical Tests and Limits nuisance parameters (  s,  b,b tot ) Likelihood function is

page 35 The profile likelihood ratio Base significance test on the profile likelihood ratio: G. Cowan S0S 2010 / Statistical Tests and Limits maximizes L for specified  maximize L The likelihood ratio gives optimum test between two point hypotheses (Neyman-Pearson lemma). Should be near-optimal in present analysis with variable  and nuisance parameters .

page 36 Test statistic for discovery Try to reject background-only (  = 0) hypothesis using G. Cowan S0S 2010 / Statistical Tests and Limits Large q 0 means increasing incompatibility between the data and hypothesis, therefore p-value for an observed q 0,obs is will get formula for this later i.e. only regard upward fluctuation of data as evidence against the background-only hypothesis.

page 37 p-value for discovery G. Cowan S0S 2010 / Statistical Tests and Limits Large q 0 means increasing incompatibility between the data and hypothesis, therefore p-value for an observed q 0,obs is will get formula for this later From p-value get equivalent significance,

page 38 Expected (or median) significance / sensitivity When planning the experiment, we want to quantify how sensitive we are to a potential discovery, e.g., by given median significance assuming some nonzero strength parameter  ′. G. Cowan S0S 2010 / Statistical Tests and Limits So for p-value, need f(q 0 |0), for sensitivity, will need f(q 0 |  ′),

page 39 Wald approximation for profile likelihood ratio To find p-values, we need: For median significance under alternative, need: G. Cowan S0S 2010 / Statistical Tests and Limits Use approximation due to Wald (1943) sample size

page 40 Noncentral chi-square for  2ln (  ) G. Cowan S0S 2010 / Statistical Tests and Limits If we can neglect the O(1/√N) term,  2ln (  ) follows a noncentral chi-square distribution for one degree of freedom with noncentrality parameter As a special case, if  ′ =  then  = 0 and  2ln (  ) follows a chi-square distribution for one degree of freedom (Wilks).

page 41 Distribution of q 0 Assuming the Wald approximation, we can write down the full distribution of q  as G. Cowan S0S 2010 / Statistical Tests and Limits The special case  ′ = 0 is a “half chi-square” distribution:

page 42 Cumulative distribution of q 0, significance From the pdf, the cumulative distribution of q  is found to be G. Cowan S0S 2010 / Statistical Tests and Limits The special case  ′ = 0 is The p-value of the  = 0 hypothesis is Therefore the discovery significance Z is simply

page 43 The Asimov data set To estimate median value of  2ln (  ), consider special data set where all statistical fluctuations suppressed and n i, m i are replaced by their expectation values (the “Asimov” data set): G. Cowan S0S 2010 / Statistical Tests and Limits Asimov value of  2ln (  ) gives non- centrality param.  or equivalently, 

page 44 Relation between test statistics and G. Cowan S0S 2010 / Statistical Tests and Limits

G. Cowan S0S 2010 / Statistical Tests and Limits page 45 Higgs search with profile likelihood Combination of Higgs boson search channels (ATLAS) Expected Performance of the ATLAS Experiment: Detector, Trigger and Physics, arXiv: , CERN-OPEN Standard Model Higgs channels considered (more to be used later): H →  H → WW ( * ) → e  H → ZZ ( * ) → 4l (l = e,  ) H →     → ll, lh Used profile likelihood method for systematic uncertainties: background rates, signal & background shapes.

page 46 An example: ATLAS Higgs search G. Cowan S0S 2010 / Statistical Tests and Limits (ATLAS Collab., CERN-OPEN )

page 47 Cumulative distributions of q 0 G. Cowan S0S 2010 / Statistical Tests and Limits To validate to 5  level, need distribution out to q 0 = 25, i.e., around 10 8 simulated experiments. Will do this if we really see something like a discovery.

G. Cowan S0S 2010 / Statistical Tests and Limits page 48 Combined discovery significance Discovery signficance (in colour) vs. L, m H : Approximations used here not always accurate for L < 2 fb  1 but in most cases conservative.

G. Cowan S0S 2010 / Statistical Tests and Limits page 49 Discovery significance for n ~ Poisson(s + b) Consider again the case where we observe n events, model as following Poisson distribution with mean s + b (assume b is known). 1)For an observed n, what is the significance Z 0 with which we would reject the s = 0 hypothesis? 2)What is the expected (or more precisely, median ) Z 0 if the true value of the signal rate is s?

G. Cowan S0S 2010 / Statistical Tests and Limits page 50 Gaussian approximation for Poisson significance For large s + b, n → x ~ Gaussian( ,  ),  = s + b,  = √(s + b). For observed value x obs, p-value of s = 0 is Prob(x > x obs | s = 0),: Significance for rejecting s = 0 is therefore Expected (median) significance assuming signal rate s is

G. Cowan S0S 2010 / Statistical Tests and Limits page 51 Better approximation for Poisson significance Likelihood function for parameter s is or equivalently the log-likelihood is Find the maximum by setting gives the estimator for s:

G. Cowan S0S 2010 / Statistical Tests and Limits page 52 Approximate Poisson significance (continued) The likelihood ratio statistic for testing s = 0 is For sufficiently large s + b, (use Wilks’ theorem), To find median[Z 0 |s+b], let n → s + b, This reduces to s/√b for s << b.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 53 Wrapping up lecture 1 General framework of a statistical test: Divide data spaced into two regions; depending on where data are then observed, accept or reject hypothesis. Properties: significance level (rate of Type-I error) power (one minus rate of Type-II error) Significance tests (also for goodness-of-fit): p-value = probability to see level of incompatibility between data and hypothesis equal to or greater than level found with the actual data.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 54 Extra slides

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 55 Pearson’s  2 statistic Test statistic for comparing observed data (n i independent) to predicted mean values For n i ~ Poisson( i ) we have V[n i ] = i, so this becomes (Pearson’s  2 statistic)  2 = sum of squares of the deviations of the ith measurement from the ith prediction, using  i as the ‘yardstick’ for the comparison.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 56 Pearson’s  2 test If n i are Gaussian with mean i and std. dev.  i, i.e., n i ~ N( i,  i 2 ), then Pearson’s  2 will follow the  2 pdf (here for  2 = z): If the n i are Poisson with i >> 1 (in practice OK for i > 5) then the Poisson dist. becomes Gaussian and therefore Pearson’s  2 statistic here as well follows the  2 pdf. The  2 value obtained from the data then gives the p-value:

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 57 The ‘  2 per degree of freedom’ Recall that for the chi-square pdf for N degrees of freedom, This makes sense: if the hypothesized i are right, the rms deviation of n i from i is  i, so each term in the sum contributes ~ 1. One often sees  2 /N reported as a measure of goodness-of-fit. But... better to give  2 and N separately. Consider, e.g., i.e. for N large, even a  2 per dof only a bit greater than one can imply a small p-value, i.e., poor goodness-of-fit.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 58 Pearson’s  2 with multinomial data Ifis fixed, then we might model n i ~ binomial I.e.with p i = n i / n tot.~ multinomial. In this case we can take Pearson’s  2 statistic to be If all p i n tot >> 1 then this will follow the chi-square pdf for N  1 degrees of freedom.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 59 Example of a  2 test ← This gives for N = 20 dof. Now need to find p-value, but... many bins have few (or no) entries, so here we do not expect  2 to follow the chi-square pdf.

G. Cowan S0S 2010 / Statistical Tests and Limits Lecture 1 page 60 Using MC to find distribution of  2 statistic The Pearson  2 statistic still reflects the level of agreement between data and prediction, i.e., it is still a ‘valid’ test statistic. To find its sampling distribution, simulate the data with a Monte Carlo program: Here data sample simulated 10 6 times. The fraction of times we find  2 > 29.8 gives the p-value: p = 0.11 If we had used the chi-square pdf we would find p =