G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic.

Slides:

Advertisements

Similar presentations

Using the Profile Likelihood in Searches for New Physics / PHYSTAT 2011 G. Cowan 1 Using the Profile Likelihood in Searches for New Physics arXiv:

Advertisements

Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 

Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.

G. Cowan Statistics for HEP / LAL Orsay, 3-5 January 2012 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.

G. Cowan Statistical Data Analysis / Stat 5 1 Statistical Data Analysis Stat 5: More on nuisance parameters, Bayesian methods London Postgraduate Lectures.

G. Cowan RHUL Physics Profile likelihood for systematic uncertainties page 1 Use of profile likelihood to determine systematic uncertainties ATLAS Top.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.

G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 3 1 Statistical Methods for Particle Physics Lecture 3: Limits for Poisson mean: Bayesian.

G. Cowan RHUL Physics Bayesian methods for HEP / DESY Terascale School page 1 Bayesian statistical methods for HEP Terascale Statistics School DESY, Hamburg.

1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.

G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 3: Exclusion.

G. Cowan RHUL Physics Comment on use of LR for limits page 1 Comment on definition of likelihood ratio for limits ATLAS Statistics Forum CERN, 2 September,

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.

G. Cowan RHUL Physics Statistical Methods for Particle Physics / 2007 CERN-FNAL HCP School page 1 Statistical Methods for Particle Physics (2) CERN-FNAL.

G. Cowan RHUL Physics Higgs combination note status page 1 Status of Higgs Combination Note ATLAS Statistics/Higgs Meeting Phone, 7 April, 2008 Glen Cowan.

G. Cowan Statistics for HEP / NIKHEF, December 2011 / Lecture 2 1 Statistical Methods for Particle Physics Lecture 2: Tests based on likelihood ratios.

G. Cowan Lectures on Statistical Data Analysis Lecture 14 page 1 Statistical Data Analysis: Lecture 14 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits Lecture 2: Tests based on likelihood.

G. Cowan Lectures on Statistical Data Analysis Lecture 13 page 1 Statistical Data Analysis: Lecture 13 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan RHUL Physics Moriond QCD 2007 page 1 Bayesian analysis and problems with the frequentist approach Rencontres de Moriond (QCD) La Thuile,

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.

G. Cowan Discovery and limits / DESY, 4-7 October 2011 / Lecture 3 1 Statistical Methods for Discovery and Limits Lecture 3: Limits for Poisson mean: Bayesian.

G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination using shapes ATLAS Statistics Meeting CERN, 19 December, 2007 Glen Cowan.

Maximum likelihood (ML)

G. Cowan SUSSP65, St Andrews, August 2009 / Statistical Methods 2 page 1 Statistical Methods in Particle Physics Lecture 2: Limits and Discovery.

G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,

G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;

G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.

G. Cowan Aachen 2014 / Statistics for Particle Physics, Lecture 51 Statistical Methods for Particle Physics Lecture 5: systematics, Bayesian methods Graduierten-Kolleg.

G. Cowan ATLAS Statistics Forum / Minimum Power for PCL 1 Minimum Power for PCL ATLAS Statistics Forum EVO, 10 June, 2011 Glen Cowan* Physics Department.

G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.

G. Cowan RHUL Physics LR test to determine number of parameters page 1 Likelihood ratio test to determine best number of parameters ATLAS Statistics Forum.

G. Cowan St. Andrews 2012 / Statistics for HEP / Lecture 31 Statistics for HEP Lecture 3: Further topics 69 th SUSSP LHC Physics St. Andrews August,

1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.

G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.

1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.

G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.

G. Cowan CERN-JINR 2009 Summer School / Topics in Statistical Data Analysis page 1 Topics in Statistical Data Analysis for HEP Lecture 1: Bayesian Methods.

G. Cowan Statistical methods for particle physics / Cambridge Recent developments in statistical methods for particle physics HEP Phenomenology.

G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.

G. Cowan RHUL Physics Status of Higgs combination page 1 Status of Higgs Combination ATLAS Higgs Meeting CERN/phone, 7 November, 2008 Glen Cowan, RHUL.

G. Cowan Statistical Methods in Particle Physics1 Statistical Methods in Particle Physics Day 5: Bayesian Methods 清华大学高能物理研究中心 2010 年 4 月 12—16 日 Glen.

G. Cowan Cargese 2012 / Statistics for HEP / Lecture 21 Statistics for HEP Lecture 2: Discovery and Limits International School Cargèse August 2012 Glen.

G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Statistical methods for HEP / Freiburg June 2011 / Lecture 2 1 Statistical Methods for Discovery and Limits in HEP Experiments Day 2: Discovery.

G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,

1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 21 Statistics for the LHC Lecture 2: Discovery Academic Training Lectures CERN,

G. Cowan SOS 2010 / Statistical Tests and Limits -- lecture 31 Statistical Tests and Limits Lecture 3 IN2P3 School of Statistics Autrans, France 17—21.

G. Cowan SLAC Statistics Meeting / 4-6 June 2012 / Two Developments 1 Two developments in discovery tests: use of weighted Monte Carlo events and an improved.

G. Cowan Lectures on Statistical Data Analysis Lecture 5 page 1 Lecture 5 1 Probability Definition, Bayes’ theorem, probability densities and their properties,

Computing and Statistical Data Analysis / Stat 11

Statistics for HEP Lecture 3: More on Discovery and Limits

Statistics for the LHC Lecture 3: Setting limits

Lecture 4 1 Probability (90 min.)

Computing and Statistical Data Analysis / Stat 8

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Statistics for Particle Physics Lecture 3: Parameter Estimation

TAE 2018 Benasque, Spain 3-15 Sept 2018 Glen Cowan Physics Department

Statistical Tests and Limits Lecture 2: Limits

Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.

Computing and Statistical Data Analysis / Stat 10

Presentation transcript:

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 41 Statistics for the LHC Lecture 4: Bayesian methods and further topics Academic Training Lectures CERN, 14–17 June, 2010 Glen Cowan Physics Department Royal Holloway, University of London TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A indico.cern.ch/conferenceDisplay.py?confId=77830

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 42 Outline Lecture 1: Introduction and basic formalism Probability, statistical tests, parameter estimation. Lecture 2: Discovery Quantifying discovery significance and sensitivity Systematic uncertainties (nuisance parameters) Lecture 3: Exclusion limits Frequentist and Bayesian intervals/limits Lecture 4: Further topics More on Bayesian methods / model selection The look-elsewhere effect

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 43 The Bayesian approach to limits In Bayesian statistics need to start with ‘prior pdf’  (  ), this reflects degree of belief about  before doing the experiment. Bayes’ theorem tells how our beliefs should be updated in light of the data x: Integrate posterior pdf p(  | x) to give interval with any desired probability content. For e.g. n ~ Poisson(s+b), 95% CL upper limit on s from

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 44 Bayesian prior for Poisson parameter Include knowledge that s ≥0 by setting prior  (s) = 0 for s<0. Could try to reflect ‘prior ignorance’ with e.g. Not normalized but this is OK as long as L(s) dies off for large s. Not invariant under change of parameter — if we had used instead a flat prior for, say, the mass of the Higgs boson, this would imply a non-flat prior for the expected number of Higgs events. Doesn’t really reflect a reasonable degree of belief, but often used as a point of reference; or viewed as a recipe for producing an interval whose frequentist properties can be studied (coverage will depend on true s).

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 45 Bayesian interval with flat prior for s Solve numerically to find limit s up. For special case b = 0, Bayesian upper limit with flat prior numerically same as classical case (‘coincidence’). Otherwise Bayesian limit is everywhere greater than classical (‘conservative’). Never goes negative. Doesn’t depend on b if n = 0.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 46 Priors from formal rules Because of difficulties in encoding a vague degree of belief in a prior, one often attempts to derive the prior from formal rules, e.g., to satisfy certain invariance principles or to provide maximum information gain for a certain set of measurements. Often called “objective priors” Form basis of Objective Bayesian Statistics The priors do not reflect a degree of belief (but might represent possible extreme cases). In a Subjective Bayesian analysis, using objective priors can be an important part of the sensitivity analysis.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 47 Priors from formal rules (cont.) In Objective Bayesian analysis, can use the intervals in a frequentist way, i.e., regard Bayes’ theorem as a recipe to produce an interval with certain coverage properties. For a review see: Formal priors have not been widely used in HEP, but there is recent interest in this direction; see e.g. L. Demortier, S. Jain and H. Prosper, Reference priors for high energy physics, arxiv: (Feb 2010)

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 48 Jeffreys’ prior According to Jeffreys’ rule, take prior according to where is the Fisher information matrix. One can show that this leads to inference that is invariant under a transformation of parameters. For a Gaussian mean, the Jeffreys’ prior is constant; for a Poisson mean  it is proportional to 1/√ .

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 49 Jeffreys’ prior for Poisson mean Suppose n ~ Poisson(  ). To find the Jeffreys’ prior for , So e.g. for  = s + b, this means the prior  (s) ~ 1/√(s + b), which depends on b. But this is not designed as a degree of belief about s.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 4page 10 Bayesian limits with uncertainty on b Uncertainty on b goes into the prior, e.g., Put this into Bayes’ theorem, Marginalize over b, then use p(s|n) to find intervals for s with any desired probability content. Framework for treatment of nuisance parameters well defined; choice of prior can still be problematic, but often less so than finding a “non-informative” prior for a parameter of interest.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 411

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 412 Comment on priors Suppose we measure n ~ Poisson(s+b), goal is to make inference about s. Suppose b is not known exactly but we have an estimate b with uncertainty  b. For Bayesian analysis, first reflex may be to write down a Gaussian prior for b, But a Gaussian could be problematic because e.g. b ≥ 0, so need to truncate and renormalize; tails fall off very quickly, may not reflect true uncertainty. ˆ

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 413 Gamma prior for b What is in fact our prior information about b? It may be that we estimated b using a separate measurement (e.g., background control sample) with m ~ Poisson(  b) (  = scale factor, here assume known) Having made the control measurement we can use Bayes’ theorem to get the probability for b given m, If we take the “original” prior  0 (b) to be to be constant for b ≥ 0, then the posterior  (b|m), which becomes the subsequent prior when we measure n and infer s, is a Gamma distribution with: mean = (m + 1) /  standard dev. = √(m + 1) / 

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 414 Gamma distribution

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 415 Frequentist approach to same problem In the frequentist approach we would regard both variables n ~ Poisson(s+b) m ~ Poisson(  b) as constituting the data, and thus the full likelihood function is Use this to construct test of s with e.g. profile likelihood ratio Note here that the likelihood refers to both n and m, whereas the likelihood used in the Bayesian calculation only modeled n.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 416 Bayesian model selection (‘discovery’) no Higgs Higgs The probability of hypothesis H 0 relative to an alternative H 1 is often given by the posterior odds: Bayes factor B 01 prior odds The Bayes factor is regarded as measuring the weight of evidence of the data in support of H 0 over H 1. Interchangeably use B 10 = 1/B 01

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 417 Assessing Bayes factors One can use the Bayes factor much like a p-value (or Z value). There is an “established” scale, analogous to our 5  rule: B 10 Evidence against H to 3Not worth more than a bare mention 3 to 20Positive 20 to 150Strong > 150Very strong Kass and Raftery, Bayes Factors, J. Am Stat. Assoc 90 (1995) 773. Will this be adopted in HEP?

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 418 Rewriting the Bayes factor Suppose we have models H i, i = 0, 1,..., each with a likelihood and a prior pdf for its internal parameters so that the full prior is where is the overall prior probability for H i. The Bayes factor comparing H i and H j can be written

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 419 Bayes factors independent of P(H i ) For B ij we need the posterior probabilities marginalized over all of the internal parameters of the models: Use Bayes theorem So therefore the Bayes factor is The prior probabilities p i = P(H i ) cancel. Ratio of marginal likelihoods

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 420 Numerical determination of Bayes factors Both numerator and denominator of B ij are of the form ‘marginal likelihood’ These can be very challenging to compute. Methods include: Harmonic Mean (and improvements) Importance sampling Parallel tempering (~thermodynamic integration) Nested sampling... See e.g. Nested sampling -- see:

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 421 The Look-Elsewhere Effect Eilam Gross and Ofer Vitells, arXiv: Suppose a model for a mass distribution allows for a peak at a mass m with amplitude  The data show a bump at a mass m 0. How consistent is this with the no-bump (  = 0) hypothesis?

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 422 p-value for fixed mass Eilam Gross and Ofer Vitells, arXiv: First, suppose the mass m 0 of the peak was specified a priori. Test consistency of bump with the no-signal (  = 0) hypothesis with e.g. likelihood ratio where “fix” indicates that the mass of the peak is fixed to m 0. The resulting p-value gives the probability to find a value of t fix at least as great as observed at the specific mass m 0.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 423 p-value for floating mass Eilam Gross and Ofer Vitells, arXiv: But suppose we did not know where in the distribution to expect a peak. What we want is the probability to find a peak at least as significant as the one observed anywhere in the distribution. Include the mass as an adjustable parameter in the fit, test significance of peak using (Note m does not appear in the  = 0 model.)

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 424 Distributions of t fix, t float Eilam Gross and Ofer Vitells, arXiv: For a sufficiently large data sample, t fix ~chi-square for 1 degree of freedom (Wilks’ theorem). For t float there are two adjustable parameters,  and m, and naively Wilks theorem says t float ~ chi-square for 2 d.o.f. In fact Wilks’ theorem does not hold in the floating mass case because on of the parameters (m) is not-defined in the  = 0 model. So getting t float distribution is more difficult.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 425 Trials factor We would like to be able to relate the p-values for the fixed and floating mass analyses (at least approximately). Gross and Vitells (arXiv: ) argue that the “trials factor” can be approximated by where ‹N› = average number of local maxima of L in fit range and is the significance for the fixed mass case. So we can either carry out the full floating-mass analysis (e.g. use MC to get p-value, or do fixed mass analysis and apply a correction factor.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 426 Wrapping up lecture 4 Bayesian methods for limits Difficult issues surrounding choice of prior Marginalize over nuisance parameters (MCMC) Bayesian model selection Bayes factors = posterior odds if assume prior odds 1 = ratio of marginalized likelihoods Can be very difficult to compute. Look-elsewhere effect Correct with trials factor or e.g. use floating mass analysis.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 4Lecture 1 page 27 Extra slides

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 4page 28 MCMC basics: Metropolis-Hastings algorithm Goal: given an n-dimensional pdf generate a sequence of points 1) Start at some point 2) Generate Proposal density e.g. Gaussian centred about 3) Form Hastings test ratio 4) Generate 5) If else move to proposed point old point repeated 6) Iterate

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 4page 29 Metropolis-Hastings (continued) This rule produces a correlated sequence of points (note how each new point depends on the previous one). For our purposes this correlation is not fatal, but statistical errors larger than naive The proposal density can be (almost) anything, but choose so as to minimize autocorrelation. Often take proposal density symmetric: Test ratio is (Metropolis-Hastings): I.e. if the proposed step is to a point of higher, take it; if not, only take the step with probability If proposed step rejected, hop in place.

G. Cowan CERN Academic Training 2010 / Statistics for the LHC / Lecture 430 Metropolis-Hastings caveats Actually one can only prove that the sequence of points follows the desired pdf in the limit where it runs forever. There may be a “burn-in” period where the sequence does not initially follow Unfortunately there are few useful theorems to tell us when the sequence has converged. Look at trace plots, autocorrelation. Check result with different proposal density. If you think it’s converged, try starting from a different point and see if the result is similar.