Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Slides:



Advertisements
Similar presentations
Bayesian Inference Chris Mathys Wellcome Trust Centre for Neuroimaging UCL SPM Course London, May 12, 2014 Thanks to Jean Daunizeau and Jérémie Mattout.
Advertisements

CKM Fits: What the Data Say Stéphane T’Jampens LAPP (CNRS/IN2P3 & Université de Savoie) On behalf of the CKMfitter group
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Decision Making Under Risk Continued: Bayes’Theorem and Posterior Probabilities MGS Chapter 8 Slides 8c.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Practical Statistics for LHC Physicists Bayesian Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 9 April, 2015.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
A Brief Introduction to Bayesian Inference Robert Van Dine 1.
Uncertainty in Engineering The presence of uncertainty in engineering is unavoidable. Incomplete or insufficient data Design must rely on predictions or.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Statistics for Particle Physics: Intervals Roger Barlow Karlsruhe: 12 October 2009.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 1: Probability.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
A Discussion of the Bayesian Approach Reference: Chapter 1 and notes from Dr. David Madigan.
Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16 th 2005.
I The meaning of chance Axiomatization. E Plurbus Unum.
CHAPTER 6 Statistical Analysis of Experimental Data
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
1 Probability and Statistics  What is probability?  What is statistics?
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Basics of Probability. A Bit Math A Probability Space is a triple, where  is the sample space: a non-empty set of possible outcomes; F is an algebra.
Introduction Osborn. Daubert is a benchmark!!!: Daubert (1993)- Judges are the “gatekeepers” of scientific evidence. Must determine if the science is.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Methods I: Parameter Estimation “A statistician is a person who draws a mathematically precise line from an unwarranted assumption to a foregone.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Likelihood function and Bayes Theorem In simplest case P(B|A) = P(A|B) P(B)/P(A) and we consider the likelihood function in which we view the conditional.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
Bayesian statistics Probabilities for everything.
Frequentistic approaches in physics: Fisher, Neyman-Pearson and beyond Alessandro Palma Dottorato in Fisica XXII ciclo Corso di Probabilità e Incertezza.
Dept of Bioenvironmental Systems Engineering National Taiwan University Lab for Remote Sensing Hydrology and Spatial Modeling Introduction STATISTICS Introduction.
Making sense of randomness
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Sampling and estimation Petter Mostad
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Summer Students July 2014.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Univariate Gaussian Case (Cont.)
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
1 Nuisance parameters and systematic uncertainties Glen Cowan Royal Holloway, University of London IoP Half.
On triangular norms, metric spaces and a general formulation of the discrete inverse problem or starting to think logically about uncertainty On triangular.
BAYES and FREQUENTISM: The Return of an Old Controversy
Bayesian data analysis
What’s Reasonable About a Range?
Reasoning Under Uncertainty in Expert System
Bayesian Inference, Basics
Bayesian Statistics at work: The Troublesome Extraction of the angle a
LECTURE 09: BAYESIAN LEARNING
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
CS639: Data Management for Data Science
Computing and Statistical Data Analysis / Stat 10
Presentation transcript:

Frequentist versus Bayesian

Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics we can associate a probability with a hypothesis, e.g., a parameter value . Interpret probability of  as ‘degree of belief’ (subjective). Need to start with ‘prior pdf’  (  ), this reflects degree of belief about  before doing the experiment. Our experiment has data x, → likelihood function L(x|  ). Bayes’ theorem tells how our beliefs should be updated in light of the data x: Posterior pdf p(  |x) contains all our knowledge about .

Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester Case #4: Bayesian method We need to associate prior probabilities with  0 and  1, e.g., Putting this into Bayes’ theorem gives: posterior Q likelihood  prior ← based on previous measurement reflects ‘prior ignorance’, in any case much broader than

Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester Bayesian method (continued) Ability to marginalize over nuisance parameters is an important feature of Bayesian statistics. We then integrate (marginalize) p(  0,  1 | x) to find p(  0 | x): In this example we can do the integral (rare). We find

Bayesian Statistics at work: The Troublesome Extraction of the angle  Stéphane T’JAMPENS LAPP (CNRS/IN2P3 & Université de Savoie) J. Charles, A. Hocker, H. Lacker, F.R. Le Diberder, S. T’Jampens, hep-ph

Frequentist: probability about the data (randomness of measurements), given the model P(data|model) Hypothesis testing: given a model, assess the consistency of the data with a particular parameter value  1-CL curve (by varying the parameter value) [only repeatable events (Sampling Theory)] Statistics tries answering a wide variety of questions  two main different! frameworks: Digression: Statistics D.R. Cox, Principles of Statistical Inference, CUP (2006) W.T. Eadie et al., Statistical Methods in Experimental Physics, NHP (1971) Bayesian: probability about the model (degree of belief), given the data P(model|data)  Likelihood(data,model)  Prior(model)

Bayesian Statistics in 1 slide Bayesian: probability about the model (degree of belief), given the data P(model|data)  Likelihood(data;model)  Prior(model)  “ it treats information derived from data (“likelihood”) as on exactly equal footing with probabilities derived from vague and unspecified sources (“prior”). The assumption that all aspects of uncertainties are directly comparable is often unacceptable.”  “nothing guarantees that my uncertainty assessment is any good for you - I'm just expressing an opinion (degree of belief). To convince you that it's a good uncertainty assessment, I need to show that the statistical model I created makes good predictions in situations where we know what the truth is, and the process of calibrating predictions against reality is inherently frequentist.” (e.g., MC simulations) Bayes’rule The Bayesian approach is based on the use of inverse probability (“posterior”): Cox – Principles of Statistical Inference (2006)

Uniform prior: model of ignorance? A central problem : specifying a prior distribution for a parameter about which nothing is known  flat prior Problems: Not re-parametrization invariant (metric dependent): uniform in  is not uniform in z=cos  Favors large values too much [the prior probability for the range 0.1 to 1 is 10 times less than for 1 to 10] Flat priors in several dimensions may produce clearly unacceptable answers. In simple problems, appropriate* flat priors yield essentially same answer as non-Bayesian sampling theory. However, in other situations, particularly those involving more than two parameters, ignorance priors lead to different and entirely unacceptable answers. * (uniform prior for scalar location parameter, Jeffreys’ prior for scalar scale parameter). Cox – Principles of Statistical Inference (2006)

Hypersphere: One knows nothing about the individual Cartesian coordinates x,y,z… What do we known about the radius r =√(x^2+y^2+…) ? One has achieved the remarkable feat of learning something about the radius of the hypersphere, whereas one knew nothing about the Cartesian coordinates and without making any experiment. 6D space Uniform Prior in Multidimensional Parameter Space

Isospin Analysis : B→hh J. Charles et al. – hep-ph/ Gronau/London (1990) MA: Modulus & Argument RI: Real & Imaginary Improper posterior

Isospin Analysis: removing information from B 0 →  0  0  No model-independent constraint on  can be inferred in this case  Information is extracted on , which is introduced by the priors (where else?)

Conclusion Statistics is not a science, it is mathematics (Nature will not decide for us) [You will not learn it in Physics books  go to the professional literature!] Many attempts to define “ignorance” prior to “let the data speak by themselves” but none convincing. Priors are informative. Quite generally a prior that gives results that are reasonable from various viewpoints for a single parameter will have unappealing features if applied independently to many parameters. In a multiparameter space, credible Bayesian intervals generally under-cover. If the problem has some invariance properties, then the prior should have the corresponding structure.  specification of priors is fraught with pitfalls (especially in high dimensions). Examine the consequences of your assumptions (metric, priors, etc.) Check for robustness: vary your assumptions Exploring the frequentist properties of the result should be strongly encouraged. PHYSTAT Conferences:

α[ππ ] : B-factories status LP07

A +0 B +0  |A +0 |= |A +0 | Isospin analysis : reminder √2 A +0 = √2 A (B u  π + π 0 ) = e -iα (T +- +T 00 ) √2 A +0 = e +iα (T +- +T 00 ) A +- = A (B d  π + π - ) = e -iα T +- + P +- A +- = e +iα T +- + P +- √2 A 00 = √2 A (B d  π 0 π 0 ) = e -iα T 00 - P +- √2 A 00 = e +iα T 00 - P +- ΔΦ=2α ΔΦ=2α eff Neglecting EW penguin, the amplitude of the SU(2)-related B  ππ modes is : SU(2) triangular relation : A +0 = A +- / √2 + A 00 Same for B  ρρ decay dominated by longitudinal polarized ρ (CP-even fs) S +-  sin(2α eff )  2-fold α eff in [0,π] B 00, C 00  |A 00 |,|A 00 | A 00 A +- /√2 B +-, C +-  |A +- |,|A +- | Closing SU(2) triangle  8-fold α α S 00 S 00  relative phase between A 00 & A 00 Re Im

BbarB PiPi RhoRho RhoRho PiPi RhoRho RhoRho C 00 but noS 00 no C 00 /S 00 C 00 AND S 00 Sin(2α eff ) from B  (π/ ρ) + (π/ ρ) -  2 solutions for α eff in [0,π] Δα = α-α eff from SU(2) B/Bbar triangles  1,2 or 4 solutions for Δα (dep. on triangles closure)  2, 4 or 8 solutions for α = α eff + Δα 4-fold Δα 2-fold Δα 1-fold Δα (‘plateau’) A 00 /A +0 A +- /√2/A +0 1-fold Δα (peak) Isospin analysis : reminder

Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16 th 2005

Plan Probability –Frequentist –Bayesian Bayes Theorem –Priors Prior pitfalls (1): Le Diberder Prior pitfalls (2): Heinrich Jeffreys’ Prior –Fisher Information Reference Priors: Demortier

Probability Probability as limit of frequency P(A)= Limit N A /N total Usual definition taught to students Makes sense Works well most of the time- But not all

Frequentist probability “It will probably rain tomorrow.” “ M t =174.3±5.1 GeV means the top quark mass lies between and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “M t =174.3±5.1 GeV means: the top quark mass lies between and 179.4, at 68% confidence.”

Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns)

Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: Be aware of the limits and pitfalls of both Always be aware which you’re using

Bayes Theorem (1763) P(A|B) P(B) = P(A and B) = P(B|A) P(A) P(A|B)=P(B|A) P(A) P(B) Frequentist use eg Čerenkov counter P(  | signal)=P(signal |  ) P(  ) / P(signal) Bayesian use P(theory |data) = P(data | theory) P(theory) P(data)

Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(M top ), P(M H ), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior?

Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a 2 ), P(ln a), P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors