Credibility of confidence intervals Dean Karlen / Carleton University Advanced Statistical Techniques in Particle Physics Durham, March 2002.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Chapter 6 Sampling and Sampling Distributions
Estimation, Variation and Uncertainty Simon French
27 th March CERN Higgs searches: CL s W. J. Murray RAL.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
1 LIMITS Why limits? Methods for upper limits Desirable properties Dealing with systematics Feldman-Cousins Recommendations.
What is Statistical Modeling
Statistics In HEP 2 Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
DATA COLLECTION AND SAMPLING MKT525. DATA COLLECITON 4 Telephone 4 Mail 4 Panels 4 Personal Interviews 4 Internet.
Evaluating Hypotheses
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
Inference about a Mean Part II
Chapter 7 Estimation: Single Population
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Inferences About Process Quality
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 7 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Standard error of estimate & Confidence interval.
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Statistical Analysis of Systematic Errors and Small Signals Reinhard Schwienhorst University of Minnesota 10/26/99.
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
Overview Definition Hypothesis
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Bayesian Inference, Basics Professor Wei Zhu 1. Bayes Theorem Bayesian statistics named after Thomas Bayes ( ) -- an English statistician, philosopher.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Chapter 7 Hypothesis testing. §7.1 The basic concepts of hypothesis testing  1 An example Example 7.1 We selected 20 newborns randomly from a region.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Practical Statistics for Particle Physicists Lecture 3 Harrison B. Prosper Florida State University European School of High-Energy Physics Anjou, France.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Statistics In HEP Helge VossHadron Collider Physics Summer School June 8-17, 2011― Statistics in HEP 1 How do we understand/interpret our measurements.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Confidence Intervals First ICFA Instrumentation School/Workshop At Morelia, Mexico, November 18-29, 2002 Harrison B. Prosper Florida State University.
Issues concerning the interpretation of statistical significance tests.
On Predictive Modeling for Claim Severity Paper in Spring 2005 CAS Forum Glenn Meyers ISO Innovative Analytics Predictive Modeling Seminar September 19,
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #25.
G. Cowan, RHUL Physics Discussion on significance page 1 Discussion on significance ATLAS Statistics Forum CERN/Phone, 2 December, 2009 Glen Cowan Physics.
Sampling and estimation Petter Mostad
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
BAYES and FREQUENTISM: The Return of an Old Controversy 1 Louis Lyons Imperial College and Oxford University CERN Summer Students July 2014.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #24.
G. Cowan Lectures on Statistical Data Analysis Lecture 8 page 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem 2Random variables and.
BASIC STATISTICAL CONCEPTS Statistical Moments & Probability Density Functions Ocean is not “stationary” “Stationary” - statistical properties remain constant.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
G. Cowan Lectures on Statistical Data Analysis Lecture 4 page 1 Lecture 4 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
G. Cowan Computing and Statistical Data Analysis / Stat 9 1 Computing and Statistical Data Analysis Stat 9: Parameter Estimation, Limits London Postgraduate.
Chapter 7 Introduction to Sampling Distributions Business Statistics: QMIS 220, by Dr. M. Zainal.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
In Bayesian theory, a test statistics can be defined by taking the ratio of the Bayes factors for the two hypotheses: The ratio measures the probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 12 page 1 Statistical Data Analysis: Lecture 12 1Probability, Bayes’ theorem 2Random variables and.
Statistics for Business and Economics 8 th Edition Chapter 7 Estimation: Single Population Copyright © 2013 Pearson Education, Inc. Publishing as Prentice.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
Frequentist approach Bayesian approach Statistics I.
arXiv:physics/ v3 [physics.data-an]
BAYES and FREQUENTISM: The Return of an Old Controversy
Sampling Distributions and Estimation
Confidence Intervals and Limits
Lecture 4 1 Probability (90 min.)
Incorporating systematic uncertainties into upper limits
Chapter Nine Part 1 (Sections 9.1 & 9.2) Hypothesis Testing
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
Lecture 4 1 Probability Definition, Bayes’ theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests general.
CS639: Data Management for Data Science
Statistics for HEP Roger Barlow Manchester University
Applied Statistics and Probability for Engineers
Presentation transcript:

Credibility of confidence intervals Dean Karlen / Carleton University Advanced Statistical Techniques in Particle Physics Durham, March 2002

March 2002Dean Karlen / Carleton University2 Classical confidence intervals Classical confidence intervals are well defined, following Neyman’s construction:

March 2002Dean Karlen / Carleton University3 Classical confidence intervals select a portion of the pdfs (with content  ) for example the 68% central region:

March 2002Dean Karlen / Carleton University4 Classical confidence intervals select a portion of the pdfs (with content  ) for example the 68% central region:

March 2002Dean Karlen / Carleton University5 Classical confidence intervals gives the following confidence belt:

March 2002Dean Karlen / Carleton University6 Classical confidence intervals The (frequentist) probability for the random interval to contain the true parameter is  confidence interval

March 2002Dean Karlen / Carleton University7 Problems with confidence intervals Misinterpretation is common, by general public and scientists alike… Incorrect:  states a “degree of belief” that the true value of the parameter is within the stated interval Correct:  states the “relative frequency” that the random interval contains the true parameter value Popular press gets it wrong more often than not “The probability that the Standard Model can explain the data is less than 1%.”

March 2002Dean Karlen / Carleton University8 Problems with confidence intervals People are justifiably concerned and confused when confidence intervals are empty; or reduce in size when background estimate increases (especially when n=0); or turn out to be smaller for the poorer of two experiments; or exclude parameters for which an experiment is insensitive “confidence interval pathologies”

March 2002Dean Karlen / Carleton University9 Source of confusion The two definitions of probability in common use go by the same name relative frequency: probability degree of belief: probability Both definitions have merit Situation would be clearer if there were different names for the two concepts proposal to introduce new names is way too radical Instead, treat this as an education problem make it better known that two definitions exist

March 2002Dean Karlen / Carleton University10 A recent published example… 4 events selected, background estimate is 0.34  0.05 frequency degree of belief

March 2002Dean Karlen / Carleton University11 And an unpublished one…

March 2002Dean Karlen / Carleton University12 Problems with confidence intervals Even those who understand the distinction find the “confidence interval pathologies” unsettling Much effort devoted to define approaches that reduce the frequency of their occurrence These cases are unsettling for the same reason: The degree of belief that these particular intervals contain the true value of the parameter is significantly less than the confidence level furthermore, there is no standard method for quantifying the pathology

March 2002Dean Karlen / Carleton University13 Problems with confidence intervals The confidence interval alone is not enough to define an interval with stated coverage; and express a degree of belief that the parameter is contained in the interval F. – C. recommend that experiments provide a second quantity: sensitivity defined as the average limit for the experiment consumer’s degree of belief would be reduced if observed limit is far superior to average limit

March 2002Dean Karlen / Carleton University14 Problems with sensitivity Sensitivity is not enough – need more information to compare with observed limit variance of limit from ensemble of experiments? Use (Sensitivity – observed limit)/  ? not a good indicator that interval is “pathological”

March 2002Dean Karlen / Carleton University15 Problems with sensitivity Example: m  analysis   3 prong events contribute with different weight depending on: mass resolution for event nearness of event to m  = 0 boundary ALEPH observes one clean event very near boundary  Limit is much better than average Any reason to reduce degree of belief that the true mass is in the stated interval? NO!

March 2002Dean Karlen / Carleton University16 Proposal When quoting a confidence interval for a frontier experiment, also quote its credibility Evaluate the degree of belief that the true parameter is contained in the stated interval Use Bayes thereom with a reasonable prior recommend: flat in physically allowed region call this the “credibility” report credibility (and prior) in journal paper if credibility is much less than confidence level, consumer would be warned that the interval may be “pathological”

March 2002Dean Karlen / Carleton University17 Example: Gaussian with boundary x is an unbiased estimator for  parameter, , physically cannot be negative Experiment A Experiment B Assume

March 2002Dean Karlen / Carleton University18 Example: 90% C.L. upper limit Standard confidence belts: A B

March 2002Dean Karlen / Carleton University19 Example: 90% C.L. upper limit Consider 3 measurements A B x A1 xBxB exp.xinterval A10.5[0,1.78] A2-2.0empty B-3.0[0,0.85]

March 2002Dean Karlen / Carleton University20 Example: 90% C.L. upper limit Calculate credibility of the intervals: prior: Bayes theorem: Credibility:

March 2002Dean Karlen / Carleton University21 Example: 90% C.L. upper limit exp.xintervalcredibility A10.5[0, 1.78]0.86 A2-2.0empty0. B-3.0[0, 0.85]0.37 A1 A2 B

March 2002Dean Karlen / Carleton University22 Example: 90% C.L. unified interval exp.xunified interval credibility A10.5[0, 2.14]0.93 A2-2.0[0, 0.40]0.64 B-3.0[0, 3.30]0.89 A1 A2 B

March 2002Dean Karlen / Carleton University23 Example: Counting experiment Observe n events, mean background b Likelihood: prior: n90% up limcred90% unifiedcred 0empty0.00[0, 1.08]0.66 1[0, 0.89]0.50[0, 1.88]0.78 3[0, 3.68]0.85[0, 4.42]0.90 6[0, 7.53]0.90[0.15, 8.47] [0, 12.41]0.90[2.63, 13.50]0.91 Example: b = 3

March 2002Dean Karlen / Carleton University24 Key benefit of the proposal Without proposal: experiments can report an overly small (pathological) interval without informing the consumer of the potential problem. With proposal: Consumer can distinguish credible from incredible intervals.

March 2002Dean Karlen / Carleton University25 Other benefits of the proposal Education: two different probabilities calculated – brings the distinction of coverage and credibility to the attention of physicists empty confidence intervals are assigned no credibility experiments with no observed events will be awarded for reducing their background (previously penalized) intervals “too small” (or exclusion of parameters beyond sensitivity) are assigned small credibility better than average limits not assigned small credibility if due to existence of rare, high precision events (m  )

March 2002Dean Karlen / Carleton University26 Other benefits of the proposal Bayesian concept applied in a way that may be easy to accept even by devout frequentists: choice of uniform prior appears to work well does not “mix” Bayesian and frequentist methods does not modify coverage Experimenters will naturally choose frequentist methods that are less likely to result in a poor degree of belief. “Do you want to risk getting an incredible limit?”

March 2002Dean Karlen / Carleton University27 Summary Confidence intervals are well defined, but are frequently misinterpreted can suffer from pathological problems when physical boundaries are present Propose that experiments quote credibility: quantify possible pathology reminder of two definitions of probabilities encourages the use of methods for confidence interval construction that avoid pathologies