Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics.

Similar presentations


Presentation on theme: "Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics."— Presentation transcript:

1 Frequentist versus Bayesian

2

3 Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics we can associate a probability with a hypothesis, e.g., a parameter value . Interpret probability of  as ‘degree of belief’ (subjective). Need to start with ‘prior pdf’  (  ), this reflects degree of belief about  before doing the experiment. Our experiment has data x, → likelihood function L(x|  ). Bayes’ theorem tells how our beliefs should be updated in light of the data x: Posterior pdf p(  |x) contains all our knowledge about .

4 Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester Case #4: Bayesian method We need to associate prior probabilities with  0 and  1, e.g., Putting this into Bayes’ theorem gives: posterior Q likelihood  prior ← based on previous measurement reflects ‘prior ignorance’, in any case much broader than

5 Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester Bayesian method (continued) Ability to marginalize over nuisance parameters is an important feature of Bayesian statistics. We then integrate (marginalize) p(  0,  1 | x) to find p(  0 | x): In this example we can do the integral (rare). We find

6 Bayesian Statistics at work: The Troublesome Extraction of the angle  Stéphane T’JAMPENS LAPP (CNRS/IN2P3 & Université de Savoie) J. Charles, A. Hocker, H. Lacker, F.R. Le Diberder, S. T’Jampens, hep-ph-0607246

7 Frequentist: probability about the data (randomness of measurements), given the model P(data|model) Hypothesis testing: given a model, assess the consistency of the data with a particular parameter value  1-CL curve (by varying the parameter value) [only repeatable events (Sampling Theory)] Statistics tries answering a wide variety of questions  two main different! frameworks: Digression: Statistics D.R. Cox, Principles of Statistical Inference, CUP (2006) W.T. Eadie et al., Statistical Methods in Experimental Physics, NHP (1971) www.phystat.org Bayesian: probability about the model (degree of belief), given the data P(model|data)  Likelihood(data,model)  Prior(model)

8 Bayesian Statistics in 1 slide Bayesian: probability about the model (degree of belief), given the data P(model|data)  Likelihood(data;model)  Prior(model)  “ it treats information derived from data (“likelihood”) as on exactly equal footing with probabilities derived from vague and unspecified sources (“prior”). The assumption that all aspects of uncertainties are directly comparable is often unacceptable.”  “nothing guarantees that my uncertainty assessment is any good for you - I'm just expressing an opinion (degree of belief). To convince you that it's a good uncertainty assessment, I need to show that the statistical model I created makes good predictions in situations where we know what the truth is, and the process of calibrating predictions against reality is inherently frequentist.” (e.g., MC simulations) Bayes’rule The Bayesian approach is based on the use of inverse probability (“posterior”): Cox – Principles of Statistical Inference (2006)

9 Uniform prior: model of ignorance? A central problem : specifying a prior distribution for a parameter about which nothing is known  flat prior Problems: Not re-parametrization invariant (metric dependent): uniform in  is not uniform in z=cos  Favors large values too much [the prior probability for the range 0.1 to 1 is 10 times less than for 1 to 10] Flat priors in several dimensions may produce clearly unacceptable answers. In simple problems, appropriate* flat priors yield essentially same answer as non-Bayesian sampling theory. However, in other situations, particularly those involving more than two parameters, ignorance priors lead to different and entirely unacceptable answers. * (uniform prior for scalar location parameter, Jeffreys’ prior for scalar scale parameter). Cox – Principles of Statistical Inference (2006)

10 Hypersphere: One knows nothing about the individual Cartesian coordinates x,y,z… What do we known about the radius r =√(x^2+y^2+…) ? One has achieved the remarkable feat of learning something about the radius of the hypersphere, whereas one knew nothing about the Cartesian coordinates and without making any experiment. 6D space Uniform Prior in Multidimensional Parameter Space

11 Isospin Analysis : B→hh J. Charles et al. – hep-ph/0607246 Gronau/London (1990) MA: Modulus & Argument RI: Real & Imaginary Improper posterior

12 Isospin Analysis: removing information from B 0 →  0  0  No model-independent constraint on  can be inferred in this case  Information is extracted on , which is introduced by the priors (where else?)

13 Conclusion Statistics is not a science, it is mathematics (Nature will not decide for us) [You will not learn it in Physics books  go to the professional literature!] Many attempts to define “ignorance” prior to “let the data speak by themselves” but none convincing. Priors are informative. Quite generally a prior that gives results that are reasonable from various viewpoints for a single parameter will have unappealing features if applied independently to many parameters. In a multiparameter space, credible Bayesian intervals generally under-cover. If the problem has some invariance properties, then the prior should have the corresponding structure.  specification of priors is fraught with pitfalls (especially in high dimensions). Examine the consequences of your assumptions (metric, priors, etc.) Check for robustness: vary your assumptions Exploring the frequentist properties of the result should be strongly encouraged. PHYSTAT Conferences: http://www.phystat.org

14

15

16

17

18

19

20

21

22

23

24

25

26 α[ππ ] : B-factories status LP07

27 A +0 B +0  |A +0 |= |A +0 | Isospin analysis : reminder √2 A +0 = √2 A (B u  π + π 0 ) = e -iα (T +- +T 00 ) √2 A +0 = e +iα (T +- +T 00 ) A +- = A (B d  π + π - ) = e -iα T +- + P +- A +- = e +iα T +- + P +- √2 A 00 = √2 A (B d  π 0 π 0 ) = e -iα T 00 - P +- √2 A 00 = e +iα T 00 - P +- ΔΦ=2α ΔΦ=2α eff Neglecting EW penguin, the amplitude of the SU(2)-related B  ππ modes is : SU(2) triangular relation : A +0 = A +- / √2 + A 00 Same for B  ρρ decay dominated by longitudinal polarized ρ (CP-even fs) S +-  sin(2α eff )  2-fold α eff in [0,π] B 00, C 00  |A 00 |,|A 00 | A 00 A +- /√2 B +-, C +-  |A +- |,|A +- | Closing SU(2) triangle  8-fold α α S 00 S 00  relative phase between A 00 & A 00 Re Im

28 BbarB PiPi RhoRho RhoRho PiPi RhoRho RhoRho C 00 but noS 00 no C 00 /S 00 C 00 AND S 00 Sin(2α eff ) from B  (π/ ρ) + (π/ ρ) -  2 solutions for α eff in [0,π] Δα = α-α eff from SU(2) B/Bbar triangles  1,2 or 4 solutions for Δα (dep. on triangles closure)  2, 4 or 8 solutions for α = α eff + Δα 4-fold Δα 2-fold Δα 1-fold Δα (‘plateau’) A 00 /A +0 A +- /√2/A +0 1-fold Δα (peak) Isospin analysis : reminder

29 Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16 th 2005

30 Plan Probability –Frequentist –Bayesian Bayes Theorem –Priors Prior pitfalls (1): Le Diberder Prior pitfalls (2): Heinrich Jeffreys’ Prior –Fisher Information Reference Priors: Demortier

31 Probability Probability as limit of frequency P(A)= Limit N A /N total Usual definition taught to students Makes sense Works well most of the time- But not all

32 Frequentist probability “It will probably rain tomorrow.” “ M t =174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “M t =174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.”

33 Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns)

34 Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: Be aware of the limits and pitfalls of both Always be aware which you’re using

35 Bayes Theorem (1763) P(A|B) P(B) = P(A and B) = P(B|A) P(A) P(A|B)=P(B|A) P(A) P(B) Frequentist use eg Čerenkov counter P(  | signal)=P(signal |  ) P(  ) / P(signal) Bayesian use P(theory |data) = P(data | theory) P(theory) P(data)

36 Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(M top ), P(M H ), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior?

37 Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a 2 ), P(ln a), P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors


Download ppt "Frequentist versus Bayesian. Glen CowanStatistics in HEP, IoP Half Day Meeting, 16 November 2005, Manchester The Bayesian approach In Bayesian statistics."

Similar presentations


Ads by Google