What is Probability? Bayes and Frequentism

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

DO’S AND DONT’S WITH LIKELIHOODS Louis Lyons Oxford (CDF)
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University LBL, January 2008.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
27 th March CERN Higgs searches: CL s W. J. Murray RAL.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Latin American School March 2015.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University Heidelberg April 2013.
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
1 BAYES versus FREQUENTISM The Return of an Old Controversy The ideologies, with examples Upper limits Systematics Louis Lyons, Oxford University and CERN.
Statistics for Particle Physics: Intervals Roger Barlow Karlsruhe: 12 October 2009.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 1: Probability.
Probability theory Much inspired by the presentation of Kren and Samuelsson.
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University CERN, October 2006 Lecture 4.
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Oxford University SLAC, January 2007.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF CERN, October 2006.
Credibility of confidence intervals Dean Karlen / Carleton University Advanced Statistical Techniques in Particle Physics Durham, March 2002.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
1 Do’s and Dont’s with L ikelihoods Louis Lyons Oxford CDF Manchester, 16 th November 2005.
I The meaning of chance Axiomatization. E Plurbus Unum.
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Imperial College and Oxford University Vienna May 2011.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
1 Practical Statistics for Physicists Stockholm Lectures Sept 2008 Louis Lyons Imperial College and Oxford CDF experiment at FNAL CMS expt at LHC
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
880.P20 Winter 2006 Richard Kass 1 Confidence Intervals and Upper Limits Confidence intervals (CI) are related to confidence limits (CL). To calculate.
1 Probability and Statistics  What is probability?  What is statistics?
Statistical Decision Theory
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION I Probability Theory Review.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Statistical Data Analysis and Simulation Jorge Andre Swieca School Campos do Jordão, January,2003 João R. T. de Mello Neto.
Statistics Lectures: Questions for discussion Louis Lyons Imperial College and Oxford CERN Summer Students July
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
1 Is there evidence for a peak in this data?. 2 “Observation of an Exotic S=+1 Baryon in Exclusive Photoproduction from the Deuteron” S. Stepanyan et.
Making sense of randomness
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
1 Practical Statistics for Physicists CERN Latin American School March 2015 Louis Lyons Imperial College and Oxford CMS expt at LHC
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Summer Students July 2014.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
1 BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Imperial College and Oxford University Amsterdam Oct 2009.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Statistical Issues in Searches for New Physics
BAYES and FREQUENTISM: The Return of an Old Controversy
Probability and Statistics for Particle Physics
Statistical Issues for Dark Matter Searches
BAYES and FREQUENTISM: The Return of an Old Controversy
Confidence Intervals and Limits
Likelihoods 1) Introduction 2) Do’s & Dont’s
Bayes for Beginners Stephanie Azzopardi & Hrvoje Stojic
Lecture 4 1 Probability (90 min.)
Statistical NLP: Lecture 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Bayes for Beginners Luca Chech and Jolanda Malamud
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Introduction to Statistics − Day 4
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Statistics for HEP Roger Barlow Manchester University
Mathematical Foundations of BME Reza Shadmehr
Computing and Statistical Data Analysis / Stat 10
Lecture 8 Hypothesis Testing
Presentation transcript:

What is Probability? Bayes and Frequentism Louis Lyons Imperial College and Oxford (CMS experiment at CERN’s LHC) Islamabad Aug 2017

Topics Will discuss mainly in context of PARAMETER ESTIMATION. Misconceptions about Probability Different views of Probability Bayes versus Frequentism Conditional Probability and Bayes Theorem Bayesian priors P(A;B)  P(B;A) Examples: Peasant and Dog Frequentist approach Meaning of confidence intervals for parameters Comparison summary of Bayes and Frequentist approaches Conclusion Will discuss mainly in context of PARAMETER ESTIMATION. Also important for GOODNESS of FIT and HYPOTHESIS TESTING

Misconceptions Conversation with my barber about lottery: LL: Would you choose numbers 1,2,3,4,5,6? Barber: No that’s very unlikely to win LL: Lot’s of people think that, but any specific sequence is equally likely, so choose that to avoid having to share the prize. FACT: The set of numbers 1,2,3,4,5,6 is the most popular, so even if that wins, you don’t get a large sum CONCLUSION: My barber was right! Don’t choose 1,2,3,4,5,6

Misconceptions What is probability that LL will meet Director of National Centre for Physics somewhere in 2018? 3% What is probability that LL will meet Director of National Centre for Physics here in 2018? 10%

Misconceptions What is probability that LL will meet Director of National Centre for Physics somewhere in 2018? 3% What is probability that LL will meet Director of National Centre for Physics here in 2018? 10% These are INCONSISTENT

PROBABILITY MATHEMATICAL Formal Based on Axioms FREQUENTIST Ratio of frequencies as n infinity Repeated “identical” trials Not applicable to single event or physical constant BAYESIAN Degree of belief Can be applied to single event or physical constant (even though these have unique truth) Varies from person to person *** Quantified by “fair bet” LEGAL “We have heard the expert Statistician, but he told us only the mathematical probability, while we want the true probability.”

Example of ‘analysing data’ Measure height of everyone in this room Is data consistent with bell-shaped curve (Gaussian or normal distribution)? Goodness of Fit (2 , etc.) What are the best values for the position and the width of the bell-shaped curve? (e.g. 1.71m and 0.16m) Parameter Determination (Likelihood, 2, Bayes, Frequentist….} Does data fit better to one Gaussian , or to 2 Gaussians? Hypothesis Testing (p-values, L-ratios, Bayes,…)

Example of ‘analysing data’ Measure height of everyone in this room Is data consistent with bell-shaped curve (Gaussian or normal distribution)? What are the best values for the position and the width of the bell-shaped curve? (e.g. 1.71m and 0.16m) c) Does data fit better to one Gaussian , or to 2 Gaussians?

BAYES and FREQUENTISM: The Return of an Old Controversy Louis Lyons Imperial College and Oxford University

It is possible to spend a lifetime analysing data without realising that there are two very different fundamental approaches to statistics: Bayesianism and Frequentism.

How can textbooks not even mention Bayes / Frequentism? For simplest case with no constraint on then at some probability, for both Bayes and Frequentist . (but different interpretations) BUT in other situations (non-Gaussian, excluded ranges for params, allow for systematics,…..), B and F can and do differ. See Bob Cousins “Why isn’t every physicist a Bayesian?” Amer Jrnl Phys 63(1995)398

We need to make a statement about Parameters, Given Data The basic difference between the two: Bayesian : Probability (parameter, given data) (an anathema to a Frequentist!) Frequentist : Probability (data, given parameter) (a likelihood function)

CONDITIONAL PROBABILITY P(A|B) = Prob of A, given that B has occurred P[A+B] = N(A+B)/Ntot = {N(A+B)/N(B)} x {N(B)/Ntot} = P(A|B) x P(B) If A and B are independent, P(A|B) = P(A) Venn diagram  P[A+B] = P(A) x P(B) e.g. P[rainy + Sunday] = P(rainy) x 1/7 INDEP BUT: P[rainy + December] ≠ P(rainy) x 1/12 INDEP P[Ee large + Eν large] ≠ P(Ee large) x P(Eν large) INDEP P[A+B] = P(A|B) x P(B) = P(B|A) x P(A)  P(A|B) = P(B|A) x P(A) / P(B) ***** Bayes’ Theorem N.B Usually P(A|B) ≠ P(B|A) Examples later Ntot . N(A) . N(B)

Bayesian versus Classical P(A and B) = P(A;B) x P(B) = P(B;A) x P(A) {If A and B are independent, P(A;B) =P(A), so P(A and B) = P(A)*P(B) } e.g. A = event contains t quark B = event contains W boson or A = I am in Islamabad B = I am giving a lecture P(A;B) = P(B;A) x P(A) /P(B) Completely uncontroversial, provided….

Bayesian p(param | data) α p(data | param) * p(param) Bayes’ Theorem p(param | data) α p(data | param) * p(param) posterior likelihood prior Problems: p(param) Has particular value “Degree of belief” Prior What functional form? Coverage

p(parameter) Has specific value “Degree of Belief” Credible interval Prior: What functional form? Uninformative prior: flat? In which variable? e.g. m, m2, ln m,….? Even more problematic with more params Unimportant if “data overshadows prior” Important for limits Subjective or Objective prior?

Data overshadows prior Mass of Z boson (from LEP) Data overshadows prior

Even more important for UPPER LIMITS Prior Even more important for UPPER LIMITS

Mass-squared of neutrino Prior = zero in unphysical region

Bayesian posterior  intervals Upper limit Lower limit Central interval Shortest Posterior prob Param value

Upper Limits from Poisson data Expect b = 3.0, observe n events Ilya Narsky, FNAL CLW 2000 Upper Limits from Poisson data Expect b = 3.0, observe n events Upper Limits important for excluding models

P (Data;Theory) P (Theory;Data) HIGGS SEARCH at CERN Is data consistent with Standard Model? or with Standard Model + Higgs? End of Sept 2000: Data not very consistent with S.M. Prob (Data ; S.M.) < 1% valid frequentist statement Turned by the press into: Prob (S.M. ; Data) < 1% and therefore Prob (Higgs ; Data) > 99% i.e. “It is almost certain that the Higgs has been seen” THIS IS WRONG!

P (Data;Theory) P (Theory;Data)

P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3%

P (Data;Theory) P (Theory;Data) Theory = male or female Data = pregnant or not pregnant P (pregnant ; female) ~ 3% but P (female ; pregnant) >>>3%

Example 1 : Is coin fair ? Toss coin: 5 consecutive tails What is P(unbiased; data) ? i.e. p = ½ Depends on Prior(p) If village priest: prior ~ δ(p = 1/2) If stranger in pub: prior ~ 1 for 0 < p <1 (also needs cost function)

Example 2 : Particle Identification Try to separate π’s and protons probability (p tag; real p) = 0.95 probability (π tag; real p) = 0.05 probability (p tag; real π) = 0.10 probability (π tag; real π) = 0.90 Particle gives proton tag. What is it? Depends on prior = fraction of protons If proton beam, very likely If general secondary particles, more even If pure π beam, ~ 0

Peasant and Dog Dog d has 50% probability of being 100 m. of Peasant p Peasant p has 50% probability of being within 100m of Dog d ? d p x River x =0 River x =1 km

Given that: a) Dog d has 50% probability of being 100 m. of Peasant, is it true that: b) Peasant p has 50% probability of being within 100m of Dog d ? Additional information Rivers at zero & 1 km. Peasant cannot cross them. Dog can swim across river - Statement a) still true If dog at –101 m, Peasant cannot be within 100m of dog Statement b) untrue

Neyman “confidence interval” avoids pdf for Classical Approach Neyman “confidence interval” avoids pdf for Uses only P( x; ) Confidence interval : P( contains ) = True for any Varying intervals from ensemble of experiments fixed Gives range of for which observed value was “likely” ( ) Contrast Bayes : Degree of belief = is in

Classical (Neyman) Confidence Intervals Uses only P(data|theory) μ≥0 No prior for μ

90% Classical interval for Gaussian σ = 1 μ ≥ 0 e.g. m2(νe), length of small object Other methods have different behaviour at negative x

at 90% confidence Frequentist Bayesian and known, but random unknown, but fixed Probability statement about and Frequentist and known, and fixed unknown, and random Probability/credible statement about Bayesian

Frequentism: Specific example Particle decays exponentially: dn/dt = (1/τ) exp(-t/τ) Observe 1 decay at time t1: L(τ) = (1/τ) exp(-t1/τ) Construct 68% central interval t = .17τ dn/dt τ t t = 1.8τ t1 t 68% conf. int. for τ from t1 /1.8  t1 /0.17

Bayesian versus Frequentism Bayesian Frequentist Basis of method Bayes Theorem  Posterior probability distribution Uses pdf for data, for fixed parameters Meaning of probability Degree of belief Frequentist definition Prob of parameters? Yes Anathema Needs prior? No Choice of interval? Yes (except F+C) Data considered Only data you have ….+ other possible data Likelihood principle?

Bayesian versus Frequentism Bayesian Frequentist Ensemble of experiment No Yes (but often not explicit) Final statement Posterior probability distribution Parameter values  Data is likely Unphysical/ empty ranges Excluded by prior Can occur Systematics Integrate over prior Extend dimensionality of frequentist construction Coverage Unimportant Built-in Decision making Yes (uses cost function) Not useful

Bayesianism versus Frequentism “Bayesians address the question everyone is interested in, by using assumptions no-one believes” “Frequentists use impeccable logic to deal with an issue of no interest to anyone”

Approach used at LHC Recommended to use both Frequentist and Bayesian approaches If agree, that’s good If disagree, see whether it is just because of different approaches

Final Probability Story (From David Hand ‘The Improbability Principle: Why coincidences, miracles and rare events happen all the time’) Someone enters 2 lotteries each week, choosing different sets of numbers for each One week she has the two sets of winning numbers BUT she has them the wrong way round Probability of a ticket winning one lottery ~10-7 Probability of 2 tickets winning 2 lotteries in same week ~ 10-14 But after considering: enormous number of lottery tickets bought number of weeks lottery operates number of lotteries around world probability of someone having 2 winning tickets somewhere in world during last 30 years is very much larger

Particle Physics Resources (Astrophysics/Cosmology has its own too) Books by Particle Physicists Barlow, Benkhe, Cowan, James, Lista, Lyons, Roe,….. PDG: Sections on Probability, Statistics and Monte Carlo simulation. PHYSTAT meetings CERN and FNAL 2000 for U.L. PhyStat-nu, Japan and FNAL 2016 PhyStat-DM in 2018? Statistics Committees Collider expts: BaBar, CDF, ATLAS, CMS RooStats e.g. Lyons + Moneta at CERN (2016) and at IPMU (2017) “Too easy to use”

Conclusions Do your homework: Try to achieve consensus Good luck. Before re-inventing the wheel, try to see if Statisticians have already found a solution to your statistics analysis problem. Don’t use your own square wheel if a circular one already exists. Try to achieve consensus Good luck.