PSY 626: Bayesian Statistics for Psychological Science

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
1. Estimation ESTIMATION.
Chapter Seventeen HYPOTHESIS TESTING
t scores and confidence intervals using the t distribution
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Chapter Sampling Distributions and Hypothesis Testing.
Sample Size Determination In the Context of Hypothesis Testing
Chapter 9 Hypothesis Testing.
Standard Error of the Mean
Estimation of Statistical Parameters
Estimation: Sampling Distribution
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Education 793 Class Notes Decisions, Error and Power Presentation 8.
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
1 Probability and Statistics Confidence Intervals.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Some Terminology experiment vs. correlational study IV vs. DV descriptive vs. inferential statistics sample vs. population statistic vs. parameter H 0.
Methods of Presenting and Interpreting Information Class 9.
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Chapter 8: Inferences Based on a Single Sample: Tests of Hypotheses
PSY 626: Bayesian Statistics for Psychological Science
Logic of Hypothesis Testing
PSY 626: Bayesian Statistics for Psychological Science
More on Inference.
Information criterion
PSY 626: Bayesian Statistics for Psychological Science
EXPERIMENTAL RESEARCH
Significance Test for the Difference of Two Proportions
Bayesian data analysis
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Let’s continue to do a Bayesian analysis
1. Estimation ESTIMATION.
UCL Linguistics workshop on mixed-effects modelling in R
PSY 626: Bayesian Statistics for Psychological Science
Let’s do a Bayesian analysis
Statistics 200 Objectives:
Confidence Intervals: Sampling Distribution
Bayesian data analysis: What is it and when should you care?
Meta-analysis statistical models: Fixed-effect vs. random-effects
Chapter 8: Inference for Proportions
Replication and hypothesis testing
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
More on Inference.
PSY 626: Bayesian Statistics for Psychological Science
Information criterion
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Bayesian style dependent tests
CHAPTER 22: Inference about a Population Proportion
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Statistical Inference about Regression
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Let’s do a Bayesian analysis
Reasoning in Psychology Using Statistics
Psych 231: Research Methods in Psychology
Effect sizes, power, and violations of hypothesis testing
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Chapter 8: Estimating with Confidence
Psych 231: Research Methods in Psychology
Bayesian Data Analysis in R
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

PSY 626: Bayesian Statistics for Psychological Science 2/22/2019 Comparing analyses Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Facial feedback Is your emotional state influenced by your facial muscles? If I ask you to smile, you may report feeling happier But this could just be because you guess I want you to report feeling happier Or because intentional smiling is associated with feeling happier You can ask people to use smiling muscles without them realizing it

Facial feedback Within subjects design (n=21) Judge “happiness” in a piece of abstract art while Holding a pen in your teeth (smiling) Holding a pen in your lips (frowning/pouting) No pen 11 trials for each condition Different art on each trial

Data The HappinessRating is a number between 0 (no happiness) to 100 (lots of happiness) The facial feedback hypothesis suggests that the mean HappinessRating values should be larger when the pen is held in the teeth and lower when the pen is held in the lips File FacialFeedback.csv contains all the data

Models Consider three models # load data file FFdata<-read.csv(file="FacialFeedback.csv",header=TRUE,stringsAsFactors=TRUE) # By default, participant numbers are treated as _numbers_. Need to correct that. FFdata$ParticipantFactor = factor(FFdata$Participant) # load the brms library library(brms) # null model model1 = brm(HappinessRating ~ 1, data = FFdata) # without random effect on participant model2 = brm(HappinessRating ~ 0 + Condition, data = FFdata) # with random effect on participant (for shrinkage) model3 = brm(HappinessRating ~ 0 + Condition + (Condition | Participant), data = FFdata) # compare models print(model_weights(model1, model2, model3, weight="loo") ) > print(model_weights(model1, model2, model3, weight="loo") ) model1 model2 model3 3.026681e-05 8.068823e-02 9.192815e-01

Model comparison What does this mean? If you wanted to predict future data, the random effects model is your best choice among these models That’s largely because there seem to be differences between participants, and only this model considers them Maybe we should consider other models

Various null models We usually think of there being one null But oftentimes there are many such models # Different null model: no effect of condition, but different values across participants model4 = brm(HappinessRating ~ ParticipantFactor, data = FFdata) # Random effects null model: no effect of condition, but different values across participants that are related to each other model5 = brm(HappinessRating ~ (1 | ParticipantFactor), data = FFdata) # compare models print(model_weights(model1, model2, model3, model4, model5, weight="loo")) > model_weights(model1, model2, model3, model4, model5, weight="loo") model1 model2 model3 model4 model5 2.480789e-02 5.444721e-02 6.586768e-01 2.620574e-01 1.074924e-05

Evidential support What does the Facial feedback hypothesis actually predict? Is your emotional state influenced by your facial muscles? ->Differences in happiness ratings across the conditions > print(model3) Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ 0 + Condition + (Condition | Participant) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~Participant (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 11.01 2.20 7.34 15.83 2114 1.00 sd(ConditionPenInLips) 8.59 3.19 2.33 15.05 1152 1.00 sd(ConditionPenInTeeth) 7.19 3.00 1.18 13.15 791 1.00 cor(Intercept,ConditionPenInLips) 0.03 0.32 -0.53 0.67 2304 1.00 cor(Intercept,ConditionPenInTeeth) -0.46 0.30 -0.88 0.29 2435 1.00 cor(ConditionPenInLips,ConditionPenInTeeth) 0.07 0.40 -0.78 0.73 1472 1.00 Population-Level Effects: ConditionNoPen 43.69 2.76 38.17 49.20 1174 1.01 ConditionPenInLips 45.88 3.37 39.29 52.46 1540 1.01 ConditionPenInTeeth 47.44 2.54 42.29 52.42 1700 1.00 Family Specific Parameters: sigma 19.02 0.55 17.98 20.11 4000 1.00

Evidential support What does the Facial feedback hypothesis actually predict? Use of “smiling” facial muscles should lead to higher happiness ratings than “pouting” facial muscles -> Higher ratings for teeth than for lips conditions > print(model3) Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ 0 + Condition + (Condition | Participant) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~Participant (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 11.01 2.20 7.34 15.83 2114 1.00 sd(ConditionPenInLips) 8.59 3.19 2.33 15.05 1152 1.00 sd(ConditionPenInTeeth) 7.19 3.00 1.18 13.15 791 1.00 cor(Intercept,ConditionPenInLips) 0.03 0.32 -0.53 0.67 2304 1.00 cor(Intercept,ConditionPenInTeeth) -0.46 0.30 -0.88 0.29 2435 1.00 cor(ConditionPenInLips,ConditionPenInTeeth) 0.07 0.40 -0.78 0.73 1472 1.00 Population-Level Effects: ConditionNoPen 43.69 2.76 38.17 49.20 1174 1.01 ConditionPenInLips 45.88 3.37 39.29 52.46 1540 1.01 ConditionPenInTeeth 47.44 2.54 42.29 52.42 1700 1.00 Family Specific Parameters: sigma 19.02 0.55 17.98 20.11 4000 1.00

Evidential support What does the Facial feedback hypothesis actually predict? “Smiling” facial muscles leads to more happiness, “pouting” facial muscles leads to less happiness -> Order of means: teeth > none > lips > print(model3) Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ 0 + Condition + (Condition | Participant) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~Participant (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 11.01 2.20 7.34 15.83 2114 1.00 sd(ConditionPenInLips) 8.59 3.19 2.33 15.05 1152 1.00 sd(ConditionPenInTeeth) 7.19 3.00 1.18 13.15 791 1.00 cor(Intercept,ConditionPenInLips) 0.03 0.32 -0.53 0.67 2304 1.00 cor(Intercept,ConditionPenInTeeth) -0.46 0.30 -0.88 0.29 2435 1.00 cor(ConditionPenInLips,ConditionPenInTeeth) 0.07 0.40 -0.78 0.73 1472 1.00 Population-Level Effects: ConditionNoPen 43.69 2.76 38.17 49.20 1174 1.01 ConditionPenInLips 45.88 3.37 39.29 52.46 1540 1.01 ConditionPenInTeeth 47.44 2.54 42.29 52.42 1700 1.00 Family Specific Parameters: sigma 19.02 0.55 17.98 20.11 4000 1.00

Ordered means You should build the model that best represents the theoretical claim Consider a model for ordered effects: teeth > none > lips First attempt (failure) More generally, “hard” boundaries are dangerous. Can lead to model convergence problems Better to use “soft” boundaries via priors priors = c(prior(normal(0, 10), class = "b", coef="ConditionPenInLips", ub=0), prior(normal(0, 10), class = "b", coef="ConditionPenInTeeth", lb=0)) > Error: Argument 'coef' may not be specified when using boundaries.

Ordered means You should build the model that best represents the theoretical claim Consider a model for ordered effects: teeth > none > lips Intercept (None), bias negative for Lips, bias Positive for Teeth Versus a model without ordered effects priors = c(prior(normal(-5, 5), class = "b", coef="ConditionPenInLips"), prior(normal(5, 5), class = "b", coef="ConditionPenInTeeth")) model6 = brm(HappinessRating ~ 1 + Condition + (1 + Condition | Participant), data = FFdata, prior=priors) > model_weights(model3, model6, weights="loo") model3 model6 0.3262669 0.6737331

Ordered means You should build the model that best represents the theoretical claim We set different priors, so this might not be a fair comparison Not much difference priors = c(prior(normal(0, 5), class = "b", coef="ConditionPenInLips"), prior(normal(0, 5), class = "b", coef="ConditionPenInTeeth")) model7 = brm(HappinessRating ~ 1 + Condition + (1 + Condition | Participant), data = FFdata, prior=priors) > model_weights(model6, model7, weights="loo") model6 model7 0.5610197 0.4389803

Model comparison To do much more, we need better priors Which means we need to better understand the precise predictions of the Facial Feedback hypothesis Perhaps the theory does make more precise predictions But oftentimes, psychological theories make only vague predictions

BayesFactor Package It is less flexible, but it provides a nice interface for common comparisons Favors the null # load data file FFdata<-read.csv(file="FacialFeedback.csv",header=TRUE,stringsAsFactors=TRUE) # By default, participant numbers are treated as _numbers_. Need to correct that. FFdata$ParticipantFactor = factor(FFdata$Participant) # load the BayesFactor library library(BayesFactor) bf = anovaBF(HappinessRating ~ Condition + ParticipantFactor, data = FFdata, whichRandom="ParticipantFactor") > bf Bayes factor analysis -------------- [1] Condition + ParticipantFactor : 0.1314427 ±0.95% Against denominator: HappinessRating ~ ParticipantFactor --- Bayes factor type: BFlinearModel, JZS

BRMS vs BayesFactor We can set up similar comparisons in BRMS Favors an effect! # Different null model: no effect of condition, but different values across participants model4 = brm(HappinessRating ~ ParticipantFactor, data = FFdata, save_all_pars = TRUE) model5b = brm(HappinessRating ~ Condition + (1 | ParticipantFactor), data = FFdata, save_all_pars = TRUE) > model_weights(model4, model5b, weights="waic") model4 model5b 0.2545269 0.7454731

Why the differences? Same model structure: Differences: Condition + random effect of participant Vs. participants only Differences: Priors BF vs. WAIC/loo Lets look at details

Bayes Factor NoPen=43.719 Lips=45.7805 Teeth=47.3425 Iterations = 1:1000 Thinning interval = 1 Number of chains = 1 Sample size per chain = 1000 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE mu 45.6104 2.4044 0.076032 0.076032 Condition-NoPen -1.8950 1.0304 0.032585 0.032585 Condition-PenInLips 0.1665 1.0324 0.032649 0.032649 Condition-PenInTeeth 1.7285 1.0157 0.032118 0.032118 ParticipantFactor-1 2.0154 3.8732 0.122481 0.116345 ParticipantFactor-2 -3.8689 3.8867 0.122908 0.117546 ParticipantFactor-3 -6.5852 3.7818 0.119590 0.119590 ParticipantFactor-4 7.1901 3.8973 0.123244 0.109853 ParticipantFactor-5 5.8068 3.9683 0.125487 0.109294 ParticipantFactor-6 -5.2414 3.8057 0.120346 0.120346 ParticipantFactor-7 14.4266 3.9816 0.125908 0.125908 ParticipantFactor-8 7.3839 3.8769 0.122597 0.122597 ParticipantFactor-9 4.7411 3.8236 0.120912 0.120912 ParticipantFactor-10 -6.0632 3.9646 0.125371 0.125371 ParticipantFactor-11 12.4585 3.9078 0.123576 0.136742 ParticipantFactor-12 3.0253 3.9878 0.126106 0.126106 ParticipantFactor-13 -9.3336 3.8855 0.122871 0.122871 ParticipantFactor-14 6.6023 3.9474 0.124829 0.124829 ParticipantFactor-15 2.4411 3.9526 0.124991 0.117939 ParticipantFactor-16 -2.4974 3.9522 0.124978 0.124978 ParticipantFactor-17 -0.9160 4.0851 0.129181 0.114247 ParticipantFactor-18 -27.8137 3.9291 0.124249 0.123889 ParticipantFactor-19 -1.5412 3.6467 0.115318 0.115318 ParticipantFactor-20 7.2656 4.0284 0.127389 0.127389 ParticipantFactor-21 -7.7216 3.7769 0.119436 0.119436 sig2 380.7215 21.0074 0.664311 0.703139 g_Condition 0.2786 1.0220 0.032320 0.032320 g_ParticipantFactor 0.3078 0.1106 0.003497 0.004191 Extract parameter estimates NoPen=43.719 Lips=45.7805 Teeth=47.3425 > chains = posterior(bf, iterations = 1000) 0 % |----|----|----|----|----|----|----|----|----|----| **************************************************| > summary(chains)

BRMS NoPen=43.65 Lips=45.85 Teeth=47.42 Extract parameter estimates Family: gaussian Links: mu = identity; sigma = identity Formula: HappinessRating ~ Condition + (1 | ParticipantFactor) Data: FFdata (Number of observations: 693) Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup samples = 4000 Group-Level Effects: ~ParticipantFactor (Number of levels: 21) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 10.18 1.97 7.11 14.84 1018 1.00 Population-Level Effects: Intercept 43.65 2.55 38.64 48.70 1064 1.00 ConditionPenInLips 2.20 1.83 -1.49 5.75 4000 1.00 ConditionPenInTeeth 3.77 1.81 0.13 7.13 4000 1.00 Family Specific Parameters: sigma 19.59 0.53 18.59 20.66 4000 1.00 Extract parameter estimates NoPen=43.65 Lips=45.85 Teeth=47.42 model5b

Posteriors BayesFactor Summarize plot(chains)

Posteriors BRMS Summarize plot(model5b)

Posterior details Posterior distributions for BF Have to look at chains format to extract what you want # Create posterior for Teeth mean TeethBFPosterior = chains[,1] + chains[,4] dev.new() plot(density(TeethBFPosterior))

Posterior details Posterior distributions for brms model5b post<-posterior_samples(model5b) TeethBRMSPosterior = post$b_Intercept + post$b_ConditionPenInTeeth

Together Almost identical! > plot(density(TeethBRMSPosterior)) > lines(density(TeethBFPosterior))

Lips

None condition

Bayes Factor from BRMS BRMS can compute a Bayes Factor from two models Huge BF in favor of model4 (null model) BFbrms = bayes_factor(model4, model5b) > print(BFbrms) The estimated Bayes factor in favor of x1 over x2 is equal to: 6.939161e+31

BF vs. loo A Bayes Factor makes sense when you have informative priors that are used to specify a pretty precise model The default priors used by brms are the opposite of informative priors (by design) This makes them a poor choice for computing Bayes Factors Small differences can get blown up in a ratio Which approach is right? It depends on what you want to do with your model

What do you want/have? Are you testing well specified models with informative priors? Do you believe the true model is among the ones you testing? Maybe go for the Bayes Factor Are you hoping to predict future data and avoid overfitting? Do you not believe that you can identify the true model? Maybe go for WAIC or loo. Personally, I think WAIC or loo is more appropriate for most situations in the social sciences

Posteriors Both brms and the BayesFactor library provide fairly reasonable default priors Both provide posterior distributions Both produce pretty similar estimates of population parameters They differ in details, and those details can matter some times Ease of use Interpretation

Conclusions Figure out what you want to do for your analysis That includes knowing your audience Some people will understand BayesFactors (or think they do), and that can help you communicate your findings