Frequentistic approaches in physics: Fisher, Neyman-Pearson and beyond Alessandro Palma Dottorato in Fisica XXII ciclo Corso di Probabilità e Incertezza.

Slides:



Advertisements
Similar presentations
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Advertisements

27 th March CERN Higgs searches: CL s W. J. Murray RAL.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Null Hypothesis Significance Testing What the heck have we been doing this whole time?
I’ll not discuss about Bayesian or Frequentist foundation (discussed by previous speakers) I’ll try to discuss some facts and my point of view. I think.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #22.
8. Statistical tests 8.1 Hypotheses K. Desch – Statistical methods of data analysis SS10 Frequent problem: Decision making based on statistical information.
Slide 1 Statistics for HEP Roger Barlow Manchester University Lecture 1: Probability.
QM Spring 2002 Business Statistics Introduction to Inference: Hypothesis Testing.
Hypothesis Testing For  With  Known. HYPOTHESIS TESTING Basic idea: You want to see whether or not your data supports a statement about a parameter.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
8-2 Basics of Hypothesis Testing
Chapter 9 Hypothesis Testing.
BCOR 1020 Business Statistics
PSY 307 – Statistics for the Behavioral Sciences
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 11 Introduction to Hypothesis Testing.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Statistical Inference Decision Making (Hypothesis Testing) Decision Making (Hypothesis Testing) A formal method for decision making in the presence of.
Lecture Slides Elementary Statistics Twelfth Edition
Confidence Intervals and Hypothesis Testing - II
Hypothesis Testing. Distribution of Estimator To see the impact of the sample on estimates, try different samples Plot histogram of answers –Is it “normal”
Lucio Baggio - Lucio Baggio - False discovery rate: setting the probability of false claim of detection 1 False discovery rate: setting the probability.
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
Statistical problems in network data analysis: burst searches by narrowband detectors L.Baggio and G.A.Prodi ICRR TokyoUniv.Trento and INFN IGEC time coincidence.
Statistical Inference Decision Making (Hypothesis Testing) Decision Making (Hypothesis Testing) A formal method for decision making in the presence of.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Hypothesis Testing PowerPoint Prepared by Alfred.
Hypothesis testing Chapter 9. Introduction to Statistical Tests.
Lesson Significance Tests: The Basics. Vocabulary Hypothesis – a statement or claim regarding a characteristic of one or more populations Hypothesis.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Chapter 20 Testing hypotheses about proportions
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
G. Cowan Lectures on Statistical Data Analysis Lecture 1 page 1 Lectures on Statistical Data Analysis London Postgraduate Lectures on Particle Physics;
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Bayesian vs. frequentist inference frequentist: 1) Deductive hypothesis testing of Popper--ruling out alternative explanations Falsification: can prove.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
1 Section 8.3 Testing a claim about a Proportion Objective For a population with proportion p, use a sample (with a sample proportion) to test a claim.
1 Chapter 8 Hypothesis Testing 8.2 Basics of Hypothesis Testing 8.3 Testing about a Proportion p 8.4 Testing about a Mean µ (σ known) 8.5 Testing about.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
CHAPTER 9 Testing a Claim
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
1 URBDP 591 A Lecture 12: Statistical Inference Objectives Sampling Distribution Principles of Hypothesis Testing Statistical Significance.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Summer Students July 2014.
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Slide 20-1 Copyright © 2004 Pearson Education, Inc.
DATA ANALYSIS AND MODEL BUILDING LECTURE 5 Prof. Roland Craigwell Department of Economics University of the West Indies Cave Hill Campus and Rebecca Gookool.
Philosophy of science What is a scientific theory? – Is a universal statement Applies to all events in all places and time – Explains the behaviour/happening.
Practical Statistics for LHC Physicists Frequentist Inference Harrison B. Prosper Florida State University CERN Academic Training Lectures 8 April, 2015.
Statistical Significance & Its Systematic Uncertainties
BAYES and FREQUENTISM: The Return of an Old Controversy
Null Hypothesis Testing
Lecture 4 1 Probability (90 min.)
THE CLs METHOD Statistica per l'analisi dei dati – Dottorato in Fisica – XXIX Ciclo Alessandro Pistone.
PSY 626: Bayesian Statistics for Psychological Science
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Presentation transcript:

Frequentistic approaches in physics: Fisher, Neyman-Pearson and beyond Alessandro Palma Dottorato in Fisica XXII ciclo Corso di Probabilità e Incertezza di Misura Prof. D’Agostini

2 Outlook ● Fisher’s statistics and p-values ● Neyman-Pearson’s statistics ● Fisher/NP: Differences and critics ● Usual misunderstandings ● “Taking the best from both”: trying to merge the 2 approaches

3 Fisher’s statistics (1) ● We have a single hypothesis H0 ● Consider a test statistics T distributed with a known pdf f(T|H0) under H0 ● Compute T on data  get the value Tobs ● Define p-value as Prob(T≥Tobs|H0)

4 Fisher’s statistics (2) ● According to Fisher, p-value is evidential for/against a theory ● if very small, “either something very rare has happened, or H0 is false” ● What happens typically? – one gets a p-value of 1% on data – the theory is rejected saying that “the probability that H0 is true is 1%”

5 Neyman-Pearson’s statistics (1) ● We deal with (at least) TWO different hypotheses i.e. H0 and H1 ● Test statistics T has different pdf f(T|H0) and f(T|H1) ● 2 integral tails are defined,  and  H0 H1

6 Neyman-Pearson’s statistics (2) ●  is related to the probability of observing a H0 effect, given the data ● the value of  is chosen a priori, i.e.  =5% ● a value Tcut is chosen such that Prob(T≥Tcut)=  ● if T(data)>Tcut, one says that H0 is rejected at the 1-  (i.e. 95%) confidence level

7 Differences between Fisher and NP ● Number of Hypotheses – Fisher deals with only 1 hypotheses – NP needs at least 2 ● Evidence against a theory: – for Fisher a p-value of seems to reject much more strongly than a p-value of – in NP, the very p-value does NOT matter, it only matters that p <  and the rejection is claimed always at the  level

8 Some critics to Fisher (1) ● p-value is Prob(T≥Tobs|H0): we are taking into account results which are very rare in H0, so... – “a hypothesis that may be true may be rejected because it has not predicted observable results thate have not occurred” (Jeffreys, 1961) – the p-value is strongly biased AGAINST the theory we are testing

9 Some critics to Fisher (2) ● A p-value of 1% does NOT state that the probability of the hypothesis H0 is 1%! – Probability of results ≠ probability of hypotheses (Bayes’ theorem) – It can be seen using simulations ( that a p-value of 5% can result from data where H0 is true in 20-50% of cases!

10 Fisher in frontier physics: the LEP-SM pulls “Would Fisher accept SM”? Fisher would not be satisfied, low p-values from tensions in EW data SM is nonetheless “the” below-TeV theory of particle physics pull is > 1 (2) in 7 (1) cases

11 Fisher in frontier physics: the cosmological constant All data point in the same (  > 0,    ≈ 0.7) direction Fisher is satisfied, but… …there is no physical idea for WHAT  is (i.e. no QFT vacuum energy) DOUBT: Is the cosmological constant a sensible issue? [PDG2006]

12 Fisher in frontier physics: Dark Matter search ● 6.3  signal, Fisher (and HEP community) would be greatly happy ● There is strong evidence for Dark Matter in cosmology: gravitation, BBN ● WHY “EVIDENCE”?! WHY NOT A WIDELY ACCLAIMED DISCOVERY?! ● Dark Matter particles not discovered yet  what about Higgs mechanism? ● Need confirmation by ther experiments?  what about W,Z in 1983? “The χ2 test on the (2–6) keV residual rate […] disfavours the hypothesis of unmodulated behaviour giving a probability of 7 · 10−4 (χ2/d.o.f. = 71/37).” [Riv. N. Cim. 26 n.1 (2003) 1-73]

13 Fisher in frontier physics: conclusions ● SM: HEP community hugely trusts SM at below-TeV scale, even with small p-values in precision data ● Cosmological Constant: physicists quite believe in , even if there is NO idea about its physical meaning ● Dark Matter: physicists do NOT trust DAMA data, even if p-values strongly points towards DM discovery …in frontier physics, Fisher’s p-values and community beliefs seem NOT to be correlated!

14 Some critics to Neyman-Pearson ● NP statistics is nonevidential: – The level at which H0 is excluded does NOT depend on how rare the observed data is according to H0

15 Misunderstanding: p is NOT  ! ● Frequent mistake on statistics textbooks ● What is usually done is the following “mix” of Fisher’s and NP’s approaches: – build NP distributions given H0 and H1, with  fixed (i.e.  = 5%) – compute p-value on data – if p ≤  exclude data at “p-value” confidence level

16 Take the best from NP and Fisher... ● Fisher’s statistics has a number of bugs – it’s a tail integral, gives no info on H0 ● NP’s statistics has no much sensitivity to data – rejection level is decided a priori ● Can we combine both, and get a sensible frequentistic approach? – evidential (based on data) – giving info on the “probability of hypotheses”

17 Proposal for unifying Fisher and NP (Jeffreys) 1. Take the case of 2 hypotheses H0 and H1 (NP) 2. Use ratio of likelihoods, NOT p-values ( “debugged” Fisher) to address evidence from data: 3. Reject H0 if B≤1, accept otherwise 4. Claim the following “objective” probabilities of the hypotheses:

18 Observations on Jeffreys’ approach ● It is exactly a Bayesian approach with uniform priors f0(H0)=f0(H1)=1/2 ● it is ON PRINCIPLE Bayesian since talks of probability of hypotheses ● “Objective” should be referred to the (arbitrary) choice of a uniform prior. But it’s not a sensible choice that the scientist be forced to choose so! ● “Reject H0 if B≤1” is a pointless statement; it suffices to report the probability of hypotheses

19 Conclusions & (my) open issues ● Both Fisher’s and NP’s approaches are non- satisfactory among the frequentistic community ● Jeffreys’ solution ends up in a Bayesian-like statement ● Physics community has beliefs which hugely contradict Fisher’s approach (SM, cosmological const., DAMA) ● Frequentistics positions seem weak… ● General issue: do we really need to test at least 2 hypotheses?

20 References ● J. O. Berger, “Could Fisher, Jeffreys and Neyman have agreed on testing”, Statistical Science vol. 18, n. 1, 1-32 (2003) ● R. Hubbard, M.J.Bayarri, “Confusion over measures of evidence (p’s) versus errors (  ’s) in classical statistical testing”, The American Statistician vol. 57, n.3 (2003) ● H. Jeffreys, “Fisher and inverse probability”, International Statistical Review vol. 42, 1-3 (1974) ● [PDG2006]: W.-M. Yao et al., J. Phys. G 33, 1 (2006) ● R. Bernabei et al., “Dark Matter search”, Riv. N. Cim. 26 n.1 (2003) 1-73