Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.

Slides:



Advertisements
Similar presentations
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Advertisements

Topics Today: Case I: t-test single mean: Does a particular sample belong to a hypothesized population? Thursday: Case II: t-test independent means: Are.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #22.
Introduction to Statistics
Lecture 3 Miscellaneous details about hypothesis testing Type II error
G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.
Business Statistics for Managerial Decision
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Chi Square Distribution (c2) and Least Squares Fitting
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Statistical aspects of Higgs analyses W. Verkerke (NIKHEF)
Discovery Experience: CMS Giovanni Petrucciani (UCSD)
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Significance Tests: THE BASICS Could it happen by chance alone?
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Large sample CI for μ Small sample CI for μ Large sample CI for p
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Experience from Searches at the Tevatron Harrison B. Prosper Florida State University 18 January, 2011 PHYSTAT 2011 CERN.
INTRODUCTION TO CLINICAL RESEARCH Introduction to Statistical Inference Karen Bandeen-Roche, Ph.D. July 12, 2010.
Practical Statistics for Particle Physicists Lecture 2 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
G. Cowan, RHUL Physics Statistics for early physics page 1 Statistics jump-start for early physics ATLAS Statistics Forum EVO/Phone, 4 May, 2010 Glen Cowan.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Richard Kass/F02P416 Lecture 6 1 Lecture 6 Chi Square Distribution (  2 ) and Least Squares Fitting Chi Square Distribution (  2 ) (See Taylor Ch 8,
Practical Statistics for LHC Physicists Frequentist Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 8 April, 2015.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Parádfürdő,
Status of the Higgs to tau tau
arXiv:physics/ v3 [physics.data-an]
Lecture 4 1 Probability (90 min.)
Bayesian Inference, Basics
PSY 626: Bayesian Statistics for Psychological Science
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistical Tests and Limits Lecture 2: Limits
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015

Confidence Intervals – Recap 2 sample space parameter space Any questions about this figure? s u l

Outline  Frequentist Hypothesis Tests Continued…  Bayesian Inference 3

Hypothesis Tests In order to perform a realistic hypothesis test we need first to rid ourselves of nuisance parameters. Here are the two primary ways: 1.Use a likelihood averaged over the nuisance parameters. 2.Use the profile likelihood. 4

5 Recall, that for B = 3.0 events (ignoring the uncertainty) observed count we found Example 1: W ± W ± jj Production (ATLAS) (sometimes called the Z-value)

Method 1: We eliminate b from the problem as follows*: where, L(s) = P(12|s) is the average likelihood. ( *R.D. Cousins and V.L. Highland. Nucl. Instrum. Meth., A320:331–335, 1992 ) Exercise 10: Show this 6 Example 1: W ± W ± jj Production (ATLAS)

Background, B = 3.0 ± 0.6 events D is observed count This is equivalent to 3.5 σ which may be compared with the 3.8 σ obtained earlier. Exercise 11: Verify this calculation 7 Example 1: W ± W ± jj Production (ATLAS)

An Aside on s / √b The quantity s / √b is often used as a rough measure of significance on the “n-σ” scale. But, it should be used with caution. In our example, s ~ 12 – 3.0 = 9.0 events. So according to this measure, the ATLAS W ± W ± jj result is a 9.0/√3 ~ 5.2σ effect, which is to be compared with 3.8σ! Beware of s / √b! 8

The Profile Likelihood Revisited Recall that the profile likelihood is just the likelihood with all nuisance parameters replaced by their conditional maximum likelihood estimates (CMLE). In our example, We also saw that the quantity can be used to compute approximate confidence intervals. 9

10 t(s) can also be used to test hypotheses, in particular, s = 0. Wilks’ theorem, applied to our example, states that for large samples the density of the signal estimate will be approximately, if s is the true expected signal count. Then, will be distributed approximately as a χ 2 density of one degree of freedom, that is, as a density that is independent of all the parameters of the problem! The Profile Likelihood Revisited

11 Since we now know the analytical form of the probability density of t, we can calculate the observed p-value = P[t(0) ≥ t obs (0)] given the observed value t(0), t obs (0), for the s = 0 hypothesis. Then if we find that the p-value < α, the significance of our test, we reject the s = 0 hypothesis. Furthermore, Z = √t obs (0), so we can skip the p-value calculation! The Profile Likelihood Revisited

Background, B = 3.0 ± 0.6 events. For this example, t obs (0) = therefore, Z = 3.6 Exercise 12: Verify this calculation 12 Example 1: W ± W ± jj Production (ATLAS)

Example 4: Higgs to γγ (CMS)

This example mimics part of the CMS Run 1 Higgs γγ data. We simulate 20,000 background di-photon masses and 200 signal masses. The signal is chosen to be a Gaussian bump with standard deviation 1.5 GeV and mean of 125 GeV. background model signal model Example 4: Higgs to γγ (CMS) 14

Example 4: Higgs to γγ (CMS) Fitting using Minuit (via RooFit) yields: 15

Bayesian Inference

Bayesian Inference – 1 Definition: A method is Bayesian if 1.it is based on the degree of belief interpretation of probability and if 2.it uses Bayes’ theorem for all inferences. Dobserved data θparameter of interest ωnuisance parameters πprior density 17

Bayesian Inference – 2 Nuisance parameters are removed by marginalization: in contrast to profiling, which can be viewed as marginalization with the data-dependent prior 18

Bayesian Inference – 3 Bayes’ theorem can be used to compute the probability of a model. First compute the posterior density: Dobserved data Hmodel or hypothesis θ H parameters of model H ωnuisance parameters πprior density 19

Bayesian Inference – 4 1.Factorize the priors:  (  , ω, H) =  (θ H, ω | H)  (H) 2.Then, for each model, H, compute the function 3.Then, compute the probability of each model, H 20

Bayesian Inference – 5 In order to compute p(H |D), however, two things are needed: 1.Proper priors over the parameter spaces 2.The priors  (H). In practice, we compute the Bayes factor: which is the ratio in the first bracket, B

Example 1: W ± W ± jj Production (ATLAS)

Step 1: Construct a probability model for the observations and insert the data D = 12 events B = 3.0 ± 0.6 background events B = Q / k δB = √Q / k to arrive at the likelihood. 23 Example 1: W ± W ± jj Production (ATLAS)

Step 2: Write down Bayes’ theorem: and specify the prior: It is often convenient first to compute the marginal likelihood by integrating over the nuisance parameters 24 Example 1: W ± W ± jj Production (ATLAS)

The Prior: What do and represent? They encode what we know, or assume, about the expected background and signal in the absence of new observations. We shall assume that s and b are non-negative and finite. After a century of argument, the consensus today is that there is no unique way to represent such vague information. 25 Example 1: W ± W ± jj Production (ATLAS)

For simplicity, we shall take π(b | s) = 1*. We may now eliminate b from the problem: which, of course, is exactly the same function we found earlier! H 1 represents the background + signal hypothesis. *Luc Demortier, Supriya Jain, Harrison B. Prosper, Reference priors for high energy physics, Phys.Rev.D82:034002, Example 1: W ± W ± jj Production (ATLAS)

L(s) = P(12 | s, H 1 ) is marginal likelihood for the expected signal s. Here we compare the marginal and profile likelihoods. For this problem they are found to be almost identical. But, this happy thing does not always happen! 27 Example 1: W ± W ± jj Production (ATLAS)

Given the likelihood we can compute the posterior density where 28 Example 1: W ± W ± jj Production (ATLAS)

Assuming a flat prior for the signal π (s | H 1 ) = 1, the posterior density is given by The posterior density of the parameter (or parameters) of interest is the complete answer to the inference problem and should be made available. Better still, publish the likelihood and the prior. Exercise 13: Derive an expression for p(s | D, H 1 ) assuming a gamma prior Gamma(qs, U +1) for π(s | H 1 ) 29 Example 1: W ± W ± jj Production (ATLAS)

By solving we obtain Since this is a Bayesian calculation, this statement means: the probability (that is, the degree of belief) that s lies in [6.3, 13.5] is Example 1: W ± W ± jj Production (ATLAS)

As noted, the number can be used to perform a hypothesis test. But, to do so, we need to specify a proper prior for the signal, that is, a prior π(s| H 1 ) that integrates to one. The simplest such prior is a δ-function, e.g.: π (s | H 1 ) = δ(s – 9), which yields p(D | H 1 ) = p(D |9, H 1 ) =1.13 x Example 1: W ± W ± jj Production (ATLAS)

From p(D | H 1 )= 1.13 x and p(D | H 0 )= 2.23 x we conclude that the odds in favor of the hypothesis s = 9 has increased by 506 relative to the whatever prior odds you started with. It is useful to convert this Bayes factor into a (signed) measure akin to the “n-sigma” (Sezen Sekmen, HBP) Exercise 14: Verify this number 32 Example 1: W ± W ± jj Production (ATLAS)

Here is a plot of Z vs. m H as we scan through different hypotheses about the expected signal s. For simplicity, the signal width and background parameters have been fixed to their maximum likelihood estimates. 33 Example 4: Higgs to γγ (CMS)

Summary – 1 Probability Two main interpretations: 1.Degree of belief 2.Relative frequency Likelihood Function Main ingredient in any full scale statistical analysis Frequentist Principle Construct statements such that a fraction f ≥ C.L. of them will be true over a specified ensemble of statements. 34

Summary – 2 Frequentist Approach 1.Use likelihood function only. 2.Eliminate nuisance parameters by profiling. 3.Decide on a fixed threshold α for rejection and reject null if p-value < α, but do so only if rejecting the null makes scientific sense, e.g.: the probability of the alternative is judged to be high enough. Bayesian Approach 1.Model all uncertainty using probabilities and use Bayes’ theorem to make all inferences. 2.Eliminate nuisance parameters through marginalization. 35

The End “Have the courage to use your own understanding!” Immanuel Kant 36