Download presentation
Presentation is loading. Please wait.
1
Developments in Bayesian Priors Roger Barlow Manchester IoP meeting November 16 th 2005
2
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 2 Plan Probability –Frequentist –Bayesian Bayes Theorem –Priors Prior pitfalls (1): Le Diberder Prior pitfalls (2): Heinrich Jeffreys’ Prior –Fisher Information Reference Priors: Demortier
3
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 3 Probability Probability as limit of frequency P(A)= Limit N A /N total Usual definition taught to students Makes sense Works well most of the time- But not all
4
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 4 Frequentist probability “It will probably rain tomorrow.” “ M t =174.3±5.1 GeV means the top quark mass lies between 169.2 and 179.4, with 68% probability.” “The statement ‘It will rain tomorrow.’ is probably true.” “M t =174.3±5.1 GeV means: the top quark mass lies between 169.2 and 179.4, at 68% confidence.”
5
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 5 Bayesian Probability P(A) expresses my belief that A is true Limits 0(impossible) and 1 (certain) Calibrated off clear-cut instances (coins, dice, urns)
6
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 6 Frequentist versus Bayesian? Two sorts of probability – totally different. (Bayesian probability also known as Inverse Probability.) Rivals? Religious differences? Particle Physicists tend to be frequentists. Cosmologists tend to be Bayesians No. Two different tools for practitioners Important to: Be aware of the limits and pitfalls of both Always be aware which you’re using
7
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 7 Bayes Theorem (1763) P(A|B) P(B) = P(A and B) = P(B|A) P(A) P(A|B)=P(B|A) P(A) P(B) Frequentist use eg Čerenkov counter P( | signal)=P(signal | ) P( ) / P(signal) Bayesian use P(theory |data) = P(data | theory) P(theory) P(data)
8
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 8 Bayesian Prior P(theory) is the Prior Expresses prior belief theory is true Can be function of parameter: P(M top ), P(M H ), P(α,β,γ) Bayes’ Theorem describes way prior belief is modified by experimental data But what do you take as initial prior?
9
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 9 Uniform Prior General usage: choose P(a) uniform in a (principle of insufficient reason) Often ‘improper’: ∫P(a)da =∞. Though posterior P(a|x) comes out sensible BUT! If P(a) uniform, P(a 2 ), P(ln a), P(√a).. are not Insufficient reason not valid (unless a is ‘most fundamental’ – whatever that means) Statisticians handle this: check results for ‘robustness’ under different priors
10
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 10 Example – Le Diberder Sad Story Fitting CKM angle α from B 6 observables 3 amplitudes: 6 unknown parameters (magnitudes, phases) α is the fundamentally interesting one
11
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 11 Results Frequentist Bayesian Set one phase to zero Uniform priors in other two phases and 3 magnitudes
12
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 12 More Results Bayesian Parametrise Tree and Penguin amplitudes Bayesian 3 Amplitudes: 3 real parts, 3 Imaginary parts
13
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 13 Interpretation B shows same (mis)behaviour Removing all experimental info gives similar P(α) The curse of high dimensions is at work Uniformity in x,y,z makes P(r) peak at large r This result is not robust under changes of prior
14
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 14 Example - Heinrich CDF statistics group looking at problem of estimating signal cross section S in presence of background and efficiency. N= εS+b Efficiency and Background from separate calibration experiments (sidebands or MC). Scaling factors κ, ω are known. Everything done using Bayesian methods with uniform priors and Poisson statistics formula. Calibration experiments use uniform prior for ε and for b, yielding posteriors used for S P(N|S)=( 1 / N! )∫∫e -(εS+b) (εS+b ) N P(ε) P(b) dε db Check coverage – all fine
15
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 15 But it all goes pear shaped.. If particle decays in several channels H γγ H τ + τ - H bb Each channel with different b and ε: total 2N+1 parameters, 2N+1 experiments Heavy undercoverage! e.g. with 4 channels, all ε=25±10%, b=0.75±0.25 For s≈10 get ’90% upper limit’ above s in only 80% of cases 90% 100% S1020
16
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 16 The curse strikes again Uniform prior in ε: fine Uniform prior in ε 1, ε 2 … ε N ε N-1 prior in total ε Prejudice in favour of high efficiency Signal size downgraded
17
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 17 Happy ending Effect avoided by using Jeffreys’ Priors instead of uniform priors for ε and b Not uniform but like 1/ε, 1/b Not entirely realistic but interesting Uniform prior in S is not a problem – but maybe should consider 1/√S? Coverage (a very frequentist concept) is a useful tool for Bayesians
18
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 18 Fisher Information An informative experiment is one for which a measurement of x will give precise information about the parameter a. Quantify: I(a)= - (Second derivative – curvature) P(x,a): everything P(x)| a is the pdf P(a)| x is the likelihood L(a)
19
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 19 Jeffreys’ Prior A prior may be uniform in a – but if I(a) depends on a it’s still not ‘flat’: special values of a give better measurements Transform a a’ such that I(a’) is constant. Then choose a uniform prior location parameter – uniform prior OK scale parameter – a’ is ln a. prior 1/a Poisson mean – prior 1/√a
20
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 20 Objective Prior? Jeffreys called this an ‘objective’ prior as opposed to ‘subjective’ or straight guesswork, but not everyone was convinced For statisticians ‘flat prior’ means Jeffreys prior. For physicists it means uniform prior Prior depends on likelihood. Your ‘prior belief’ P(M H ) (or whatever) depends on the analysis Equivalent to a prior proportional to √I
21
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 21 Reference Priors (Demortier) 4 steps Intrinsic Discrepancy Between two PDFs δ{P 1 (z),P 2 (z)}=Min{∫P 1 (z)ln(P 1 (z)/P 2 (z)) dz, ∫P 2 (z)ln(P 2 (z)/P 1 (z))dz} Sensible measure of difference δ=0 iff P 1 (z) & P 2 (z) are the same, else +ve Invariant under all transformations of z
22
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 22 Reference Priors (2) 2) Expected Intrinsic Information Measurement M: x is sampled from p(x|a) Parameter a has a prior p(a) Joint distribution p(x,a)=p(x|a) p(a) Marginal distribution p(x)=∫p(x|a) p(a) da I(p(a),M)=δ{p(x,a),p(x)p(a)} Depends on (i) x-a relationship and (ii) breadth of p(a) Expected Intrinsic (Shannon) Information from measurement M about parameter a
23
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 23 Reference Priors (3) 3) Missing information Measurement M k – k samples of x Enough measurements fix a completely Limit k ∞ I(p(a),M k ) is the difference between knowledge encapsulated in prior p(a) and complete knowledge of a. Hence Missing Information given p(a).
24
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 24 Reference Priors(4) 4) Family of priors P (e.g. Fourier series, polynomials, histogram). p(a) P Ignorance principle: choose the least informative (dumbest) prior in the family: the one for which the missing information Limit k ∞ I(p(a),M k ) is largest. Technical difficulties in taking k limit and integrating over infinite range of a
25
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 25 Family of Priors (Google)
26
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 26 Reference Priors Do not represent subjective belief – in fact the opposite (like a jury selection). Allow the most input to come from the data. Formal consensus practitioners can use to arrive at sensible posterior Depend on measurement p(x|a) – cf Jeffreys Also require family of P of possible priors May be improper but this doesn’t matter (do not represent…). For 1 parameter (if measurement is asymptoticallly Gaussian, which the CLT usually secures) give Jeffreys prior But can also (unlike Jeffreys) work for several parameters
27
Manchester IoP Half Day Meeting Roger Barlow: Developments in Bayesian Priors Slide 27 Summary Probability –Frequentist –Bayesian Bayes Theorem –Priors Prior pitfalls (1): Le Diberder Prior pitfalls (2): Heinrich Jeffreys’ Prior –Fisher Information Reference Priors: Demortier
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.