PSY 626: Bayesian Statistics for Psychological Science

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

Statistics Review – Part II Topics: – Hypothesis Testing – Paired Tests – Tests of variability 1.
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Hypothesis Testing making decisions using sample data.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Overview Definition Hypothesis
Bayesian Inference Using JASP
1 Level of Significance α is a predetermined value by convention usually 0.05 α = 0.05 corresponds to the 95% confidence level We are accepting the risk.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Copyright © 2012 by Nelson Education Limited. Chapter 7 Hypothesis Testing I: The One-Sample Case 7-1.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
Week 71 Hypothesis Testing Suppose that we want to assess the evidence in the observed data, concerning the hypothesis. There are two approaches to assessing.
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
- We have samples for each of two conditions. We provide an answer for “Are the two sample means significantly different from each other, or could both.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
15 Inferential Statistics.
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Dependent-Samples t-Test
Lecture Nine - Twelve Tests of Significance.
Unit 5: Hypothesis Testing
STA 291 Spring 2010 Lecture 18 Dustin Lueker.
Let’s continue to do a Bayesian analysis
Warm Up Check your understanding p. 541
Math 4030 – 10a Tests for Population Mean(s)
Let’s do a Bayesian analysis
Central Limit Theorem, z-tests, & t-tests
Hypothesis Testing: Hypotheses
Hypothesis Tests for a Population Mean,
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Test Review: Ch. 7-9
PSY 626: Bayesian Statistics for Psychological Science
Two-sided p-values (1.4) and Theory-based approaches (1.5)
Chapter 9 Hypothesis Testing.
PSY 626: Bayesian Statistics for Psychological Science
Hypothesis tests for the difference between two means: Independent samples Section 11.1.
Information criterion
PSY 626: Bayesian Statistics for Psychological Science
Review: What influences confidence intervals?
PSY 626: Bayesian Statistics for Psychological Science
Essential Statistics Introduction to Inference
I. Statistical Tests: Why do we use them? What do they involve?
INTEGRATED LEARNING CENTER
Daniela Stan Raicu School of CTI, DePaul University
Let’s do a Bayesian analysis
PSY 626: Bayesian Statistics for Psychological Science
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
CHAPTER 9 Testing a Claim
STA 291 Spring 2008 Lecture 18 Dustin Lueker.
What are their purposes? What kinds?
Inferential Statistics
Intro to Confidence Intervals Introduction to Inference
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
STA 291 Summer 2008 Lecture 18 Dustin Lueker.
Bayesian Data Analysis in R
Last Update 12th May 2011 SESSION 41 & 42 Hypothesis Testing.
STAT 1301 Tests of Significance about Averages.
Presentation transcript:

PSY 626: Bayesian Statistics for Psychological Science 2/17/2019 Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Hypothesis testing Suppose the null is true and check to see if a rare event has occurred e.g., does our random sample produce a t value that is in the tails of the null sampling distribution? If a rare event occurred, reject the null hypothesis

Hypothesis testing But what is the alternative? Typically: “anything goes” But that seems kind of unreasonable Maybe the “rare event” would be even less common if the null were not true!

Bayes Theorem Conditional probabilities

Ratio Ratio of posteriors conveniently cancels out P(D) Posterior odds Bayes Factor Prior odds

Bayesian Model Selection It’s not really about hypotheses, but hypotheses suggest models The Bayes Factor is often presented as BF12 You could also compute BF21 Posterior odds Bayes Factor Prior odds

2/17/2019 Bayes Factor Evidence for the alternative hypothesis (or the null) is computed with the Bayes Factor (BF) BF>1 indicates that the data is evidence for the alternative, compared to the null BF<1 indicates that the data is evidence for the null, compared to the alternative PSY200 Cognitive Psychology 7

2/17/2019 Bayes Factor When BF10 = 2, the data are twice as likely under H1 as under H0. When BF01 = 2, the data are twice as likely under H0 as under H1. These interpretations do not require you to believe that one model is better than the other You can still have priors that favor one model, regardless of the Bayes Factor You would want to make important decisions based on the posterior Still, if you consider both models to be plausible, then the priors should not be so different from each other PSY200 Cognitive Psychology 8

2/17/2019 Rules of thumb Evidence for the alternative hypothesis (or the null) is computed with the Bayes Factor (BF) BF10 Interpretation <0.01 Decisive evidence for null 0.01 to 0.1 Strong evidence for null 0.1 to 0.3 Substantial evidence for null 0.3 to 1 Anecdotal evidence for null 1 to 3 Anecdotal evidence for alternative 3 to 10 Substantial evidence for alternative 10 to 100 Strong evidence for alternative >100 Decisive evidence for alternative PSY200 Cognitive Psychology 9

Similar to AIC For a two-sample t-test, the null hypothesis (reduced model) is that a score from group s (1 or 2) is defined as With the same mean for each group s X12 X21 X22 X11

AIC For a two-sample t-test, the alternative hypothesis (full model) is that a score from group s (1 or 2) is defined as With different means for each group s X11 X21 X12 X22

AIC AIC and its variants are a way of comparing model structures One mean or two means? Always uses maximum likelihood estimates of the parameters Bayesian approaches identify a posterior distribution of parameter values We should use that information!

Models of what? We have been building models of trial-level scores # Model without intercept (more natural) model2 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3) print(summary(model2)) GrandSE = 10 stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model6 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model6))

Models of what? We have been building models of trial-level scores That is not the only option In traditional hypothesis testing, we care more about effect sizes than about individual scores Signal-to-noise ratio Of course, the effect size is derived from the individual scores In many cases, it is enough to just model the effect size itself rather than the individual scores Cohen’s d t-statistic p-value Correlation r “Sufficient” statistic

Models of means It’s not really going to be practical, but let’s consider a case where we assume that the population variance is known (and equals 1) and we want to compare null and alternative hypotheses of fixed values

Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample)

Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative

Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null

Models of means The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null

Bayes Factor The ratio of likelihood for the data under the null compared to the alternative Or the other way around Suppose we observe Data are more likely under alternative than under null

Decision depends on alternative The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative

Decision depends on alternative The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample)

Decision depends on alternative The likelihood of any given observed mean value is derived from the sampling distribution Suppose n=100 (one sample)

Decision depends on alternative For a fixed sample mean, evidence for the alternative only happens for alternative population mean values of a given range For big alternative values, the observed sample mean is less likely than for a null population value The sample mean may be unlikely for both models Rouder et al. (2009) Evidence for null Evidence for alternative Mean of alternative

Models of means Typically, we do not hypothesize a specific value for the alternative, but a range of plausible values

Likelihoods For the null, we compute likelihood in the same way Suppose n=100 (one sample)

Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample)

Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample)

Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample)

Likelihoods For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample)

Average Likelihood For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values Suppose n=100 (one sample) Prior for value of mu Likelihood for given value of mu (from sampling distribution)

Bayes Factor Ratio of the likelihood for the null compared to the (average) likelihood for the alternative P(D | H1)

Uncertainty The prior standard deviation for mu establishes a range of plausible values for mu Less flexible More flexible

Uncertainty With a very narrow prior, you may not fit the data 0.15 0.15 Less flexible More flexible

Uncertainty With a very broad prior, you will fit well for some values of mu and poorly for other values of mu 0.15 0.15 Less flexible More flexible

Uncertainty Uncertainty in the prior functions similar to the penalty for parameters in AIC 0.15 0.15 Less flexible More flexible

Penalty Averaging acts like a penalty for extra parameters Rouder et al. (2009) Evidence for null Evidence for alternative Width of alternative prior

Models of effect size Consider the case of two-sample t-test We often care about the standardized effect size Which we can estimate from data as:

Models of effect size If we were doing traditional hypothesis testing, we would compare a null model: Against an alternative: Equivalent statements can be made using the standardized effect size As long as the standard deviation is not zero

Priors on effect size For the null, the prior is (again) a spike at zero

JZS Priors on effect size For the alternative, a good choice is a Cauchy distribution (t-distribution with df=1) Rouder et al. (2009) Jeffreys, Zellner, Siow

JZS Priors on effect size It is a good choice because the integration for the alternative hypothesis can be done numerically t is the t-value you use in a hypothesis test (from the data) v is the “degrees of freedom” (from the data) This might not look easy, but it is simple to calculate with a computer

Variations of JZS Priors Scale parameter “r” Bigger values make for a broader prior More flexibility! More penalty!

Variations of JZS Priors Medium r= 1 Wide r= sqrt(2)/2 Ultrawide r=sqrt(2)

How do we use it? Super easy Rouder’s web site: http://pcl.missouri.edu/bayesfactor In R library(BayesFactor)

How do we use it?

How do we use it? library(BayesFactor) ttest.tstat(t=2.2, n1=15, n2=15, simple=TRUE) B10 1.993006

What does it mean? Guidelines BF Evidence 1 – 3 Anecdotal 3 – 10 Substantial 10 – 30 Strong 30 – 100 Very strong >100 Decisive

Conclusions JZS Bayes Factors Easy to calculate Pretty easy to understand results A bit arbitrary for setting up Why not other priors? How to pick scale factor? Criteria for interpretation are arbitrary Fairly painless introduction to Bayesian methods