Checking For Prior-Data Conflict

Slides:



Advertisements
Similar presentations
Introductory Mathematics & Statistics for Business
Advertisements

HYPOTHESIS TESTING. Purpose The purpose of hypothesis testing is to help the researcher or administrator in reaching a decision concerning a population.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 9-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
Inferences About Process Quality
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 10: Estimating with Confidence
Statistical Decision Theory
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Chapter 10 Confidence Intervals for Proportions © 2010 Pearson Education 1.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Estimation and Confidence Intervals Chapter 9.
Lecture Slides Elementary Statistics Twelfth Edition
HYPOTHESIS TESTING.
Tutorial 11: Hypothesis Testing
Univariate Gaussian Case (Cont.)
Statistics for Managers Using Microsoft® Excel 5th Edition
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Sampling Distribution Estimation Hypothesis Testing
Making inferences from collected data involve two possible tasks:
Unit 5: Hypothesis Testing
Chapter 9: Testing a Claim
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Confidence Intervals for Proportions
Confidence Intervals for Proportions
STA 291 Spring 2008 Lecture 19 Dustin Lueker.
Software Reliability Models.
Chapter 6 Hypothesis tests.
More about Posterior Distributions
Section 10.2: Tests of Significance
LESSON 20: HYPOTHESIS TESTING
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Estimating with Confidence
Hypothesis Testing A hypothesis is a claim or statement about the value of either a single population parameter or about the values of several population.
Chapter 8: Estimating with Confidence
Chapter 9: Testing a Claim
LECTURE 09: BAYESIAN LEARNING
STAT 111 Introductory Statistics
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
One-Sample Tests of Hypothesis
Chapter 9: Testing a Claim
Confidence Intervals for Proportions
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 9: Testing a Claim
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 9 Chapter 9 – Point estimation
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Confidence Intervals for Proportions
Unit 5: Hypothesis Testing
Chapter 8: Estimating with Confidence
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Objectives 6.1 Estimating with confidence Statistical confidence
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Objectives 6.1 Estimating with confidence Statistical confidence
Applied Statistics and Probability for Engineers
Presentation transcript:

Checking For Prior-Data Conflict by Michael Evans and Hadas Moshonov University of Toronto

Introduction Statistical analyses are based on inputs from analyst In a Bayesian context: - assumptions about the sampling model (S, {fθ : θ ∈ Ω}, μ) - choice of a prior (Ω, π, ν) If these are in "error" then subsequent inferences about model components are at least suspect. So checking for the validity of these components is a necessary step in a statistical analysis.

Two types of error Sampling model in error if the observed data s0 ∈ S is surprising for each fθ (2) Prior-data conflict when the prior places most of its mass on θ values for which s0 is surprising Several papers have considered model checking in this context - Guttman (1967) - Box (1980) - Rubin (1984) - Gelman, Meng and Stern (1996) But do not really distinguish between the different types of error These errors should be assessed separately

Why? If the sampling model is wrong it must be modified (ignoring practical versus statistical significance) If a prior-data conflict exists it may be possible to ignore this when the sampling model is viewed as correct and the amount of data is large enough So it is sometimes possible to correct for the effects of prior-data conflict simply by increasing the amount of data First check for the failure of the sampling model - many frequentist methods for this - Bayesian methods, Bayarri and Berger (2000) If sampling model in error no point in checking for prior-data conflict

Notation How do we assess whether or not a prior-data conflict exists? Prior-predictive measure for s given by where If Π is proper so M is a probability measure posterior probability measure Π (· | s0) has density, wrt υ, For T : S → Ί denote the marginal densities by fθT , wrt support measure λ on Ί , and this leads to the marginal prior predictive density for T, wrt λ, Posterior predictive distribution of T has density, wrt λ, How do we assess whether or not a prior-data conflict exists?

Prior-data Conflict: Sufficiency Basic idea: a prior-data conflict exists whenever the data provide little or no support to those values of θ where the prior places its support Compare the effective support of the prior with the region where the likelihood is high. How? If the observed likelihood L(· | s0) = f.(s.0) is a surprising value from M then this would seem to indicate that a prior-data conflict exists The likelihood is equivalent to a minimal sufficient statistic T so we compare T (s0) to MT Appropriate to restrict to T Theorem 1. Suppose T is a sufficient statistic for the model {fθ : θ ∈Ω} for data s. Then the conditional prior predictive distribution of the data s given T is independent of the prior π. also Theorem 2. If L(· | s0) is nonzero only on a θ-region where π places no mass then T (s0) is an impossible outcome for MT . Why compare T (s0) with MT rather than MT (· | s0)?

Example - Location-normal Suppose s = (s1, ..., sn) ~ N (θ, 1) distribution with θ ∈ R1 N (θ0 , ) prior on θ MT given by T(s) = ~ N (θ0 , + 1/n) Prior predictive P-value (1) Prior predictive results in the standardization by ( + 1/n)1/2 rather than by σ0 If the true value of θ is θ*, as n → ∞ (1) converges almost surely to When σ0 → ∞ then (1) converges to 0 (a diffuse prior simply indicates that all values are equally likely) and no conflict can be found As σ0 → 0, so we have a very precise prior, then (1) will definitely find evidence of a prior-data conflict for large n unless θ0 is the true value

Posterior predictive P-value (2)  As n→∞ (2) converges almost surely to 1, irrespective of the true value of θ So if we were to use the posterior predictive we would never conclude that a prior-data conflict exists

Prior-data Conflict: Ancillarity Compare the observed value of U (T(s)) with its marginal prior-predictive distribution for any function U For certain choices of U this will clearly not be appropriate e.g., if U is ancillary then the marginal prior-predictive distribution does not depend on the prior T (s0) may be a surprising value simply because U (T(s0)) is surprising for some ancillary U and we want to avoid this So remove the variation in the prior-predictive distribution of T, that is associated with U, by conditioning on U We avoid the necessity of conditioning on an ancillary whenever we have a complete minimal sufficient statistic T (Basu’s theorem)

If and are ancillary and for some h then condition on U2 (T(s0)) Condition on maximal ancillaries whenever possible Lack of a general method for obtaining maximal ancillaries Lehmann and Scholz (1992) Lack of unique maximal ancillary can cause problems in frequentist approaches to inference Not a problem here

Example - Mixtures Cox and Hinkley (1974) Response x is either from a N (θ, ) or N (θ, ) distribution where θ ∈ R1 is unknown and , are both known and unequal Particular instrument used is chosen according to c ~ Bernoulli (p) where p is known (c, x) is minimal sufficient and c is ancillary When c = i we would use the prior predictive to check for prior-data conflict Generally, if we have that (x, u) is minimal sufficient for a model with x | u ~ fθ (· | u) and u ~ h a maximal ancillary, use

Noninformative Priors Various definitions are available for expressing what it means for a prior to be noninformative. - Kass and Wasserman (1996) - Bernardo (1979) - Berger and Bernardo (1992) A somewhat different requirement for noninformativity arises from consideration about existence of prior-data conflict. For if a prior is such that we would never conclude that a prior-data conflict exists, no matter what data is obtained, then it seems reasonable to say that such a prior is at least a candidate for being called non-informative. So we consider the absence of the possibility of any prior-data conflict as a necessary characteristic of noninformativity rather then as a characterization of this concept.

Diagnostics for Ignoring Prior-data Conflict Suppose we found evidence of prior data conflict. What to do next? Use a different prior. How to choose such a prior? In some circumstances, the answer would be to collect more data. For, with sufficient amount of data, the effect of the prior on our inference is immaterial. But, in some circumstances this is not possible. So, in a given context we would like to know if we have enough data to ignore the prior-data conflict and, if not, how much more data do we need. Intuitively, if the inference about θ that we are interested in making are not strongly dependent on the prior, then we might feel that we can ignore any prior-data conflict.

Need some quantitative assessment of this and there are several possibilities. When there is a prior that is noninformative, we can compare posterior inferences obtained via these two priors. If these inferences do not differ by an amount that is considered of practical importance then it seems reasonable to ignore the prior data conflict. Amount of difference will depend on the particular application. In general we would compute the divergence between the posterior under the informative and noninformative prior. However, we need to state a cut-off value for the divergence, below which we would not view the difference between the distributions as being material. It is not always obvious how to obtain a noninformative prior. In these circumstance we can simply select another prior that seems reasonable for the problem and compare inferences.

Factoring the Joint Distribution Joint distribution of (θ, s) as given by the measure Pθ × Π Mss T leads to factorization P (· | T) × PTθ × Π P (· | T) depends only on the choice of the sampling model {Pθ : θ ∈ Ω} Compare the observed data s0 against P (· | T) to assess sampling model PTθ × Π can be written as MT × Π(· | T) Cmpare the observed value T(s0) with the distribution MT to check for prior-data conflict for maximal ancillary U we can factor MT × Π(· | T) as PU ×MT (· |U) × Π(· | T) PU depends only on the choice of the sampling model {Pθ : θ ∈ Ω} and so we can also compare the observed value U (s0) with PU to check sampling model When U is not independent of T, instead compare T(s0) with MT (· |U) to check for prior-data conflict Inferences about θ use the posterior distribution Π(· | T)