Bayesian style dependent tests

Bayesian style dependent tests
11/29/2018 Bayesian style dependent tests Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Weapon priming Anderson, Benjamin, & Bartholow (1998): 32 subjects read an aggression-related word (injure, shatter) out loud as fast as possible After presentation of a weapon word (shotgun, grenade) After presentation of a neutral word (rabbit fish) There were additional trials with non-aggression-related words being read (we ignore them) Our data is the mean response time for wording reading based on whether the first (prime) word is a weapon word or a neutral word Follow along by downloading “WeaponPrime” and “WeaponPrime1.R” from the class web site

Ignore dependencies? WPdata<- read.csv(file="WeaponPrime.csv",header=TRUE,stringsAsFactors=TRUE) traditional1 <- t.test(Time ~ Prime, data=WPdata, var.equal=TRUE) print(traditional1) Two Sample t-test data: Time by Prime t = , df = 62, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean in group Neutral mean in group Weapon

Standard analysis # Run traditional dependent t-test
traditional2 <- t.test(Time ~ Prime, data=WPdata, paired=TRUE) print(traditional2) Note: t-value and p-value change, but sample means do not Difference is: Note: confidence interval of difference changes a lot Paired t-test data: Time by Prime t = , df = 31, p-value = alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: sample estimates: mean of the differences

Regression analysis # Compare to traditional independent linear regression print(summary(lm(Time~Prime, data=WPdata))) Prints out a table of estimates of Intercept (Neutral prime) and deviation for Weapon prime (difference of means) Call: lm(formula = Time ~ Prime, data = WPdata) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** PrimeWeapon --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 62 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 62 DF, p-value:

Regression analysis # Compare to traditional dependent linear regression library(lme4) print(lmer(Time ~ Prime + (1 | Subject), data = WPdata)) Prints out a table of estimates of Intercept (Neutral prime) and deviation for Weapon prime (difference of means) Linear mixed model fit by REML ['lmerMod'] Formula: Time ~ Prime + (1 | Subject) Data: WPdata REML criterion at convergence: Random effects: Groups Name Std.Dev. Subject (Intercept) 50.46 Residual Number of obs: 64, groups: Subject, 32 Fixed Effects: (Intercept) PrimeWeapon

Bayesian variation of t-test
Call: lm(formula = Time ~ Prime, data = WPdata) Residuals: Min Q Median Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** PrimeWeapon --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 62 degrees of freedom Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 62 DF, p-value: Treat data as independent library(brms) model1 = brm(Time ~ Prime, data = WPdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) Family: gaussian Links: mu = identity; sigma = identity Formula: Time ~ Prime Data: WPdata (Number of observations: 64) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept PrimeWeapon Family Specific Parameters: sigma

Bayesian variation of t-test
Treat data as dependent (different intercept for each subject) model2 = brm(Time ~ Prime + (1 | Subject), data = WPdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) Linear mixed model fit by REML ['lmerMod'] Formula: Time ~ Prime + (1 | Subject) Data: WPdata REML criterion at convergence: Random effects: Groups Name Std.Dev. Subject (Intercept) 50.46 Residual Number of obs: 64, groups: Subject, 32 Fixed Effects: (Intercept) PrimeWeapon Family: gaussian Links: mu = identity; sigma = identity Formula: Time ~ Prime + (1 | Subject) Data: WPdata (Number of observations: 64) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~Subject (Number of levels: 32) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept PrimeWeapon Family Specific Parameters: sigma

Model comparison Does the independent or dependent model better fit the data? > loo(model1, model2) LOOIC SE model model model1 - model Warning messages: 1: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 3: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' 4: Found 16 observations with a pareto_k > 0.7 in model 'model2'. With this many problematic observations, it may be more appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather than LOO.

Model comparison Does the independent or dependent model better fit the data? Should be no surprise! kfold(model1, model2) …… KFOLDIC SE model model model1 - model There were 50 or more warnings (use warnings() to see the first 50)

Null model No effect of prime, different intercepts for different subjects model3 = brm(Time ~ 1 + (1 | Subject), data = WPdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) > summary(model3) Family: gaussian Links: mu = identity; sigma = identity Formula: Time ~ 1 + (1 | Subject) Data: WPdata (Number of observations: 64) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~Subject (Number of levels: 32) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept Family Specific Parameters: sigma

Model comparison Does the null (model3) or two-mean (model2) model better fit the data? Try: loo(model2, model3) kfold(model2, model3) …… KFOLDIC SE model model model2 - model

Dependent ANOVA ADHD Treatment effects
Children (n=24) given different dosages of drug MPH (methylphenidate) over different weeks Measured ability to delay impulsive behavior responses (wait long enough to press a key to get a “star”) Delay of Gratification task

Dependent ANOVA Standard dependent ANOVA

Regression analysis # Compare to traditional dependent linear regression library(lme4) print(summary( lmer(CorrectResponses ~ Dosage + (1 | SubjectID), data = ATdata)) ) Prints out a table of estimates of Intercept (D0 dosage) and deviation for other Dosages There is some debate about degrees of freedom and computing p-values Linear mixed model fit by REML ['lmerMod'] Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata REML criterion at convergence: 658.3 Scaled residuals: Min Q Median Q Max Random effects: Groups Name Variance Std.Dev. SubjectID (Intercept) Residual Number of obs: 96, groups: SubjectID, 24 Fixed effects: Estimate Std. Error t value (Intercept) DosageD DosageD DosageD Correlation of Fixed Effects: (Intr) DsgD15 DsgD30 DosageD DosageD DosageD

Bayesian regression model1 = brm(CorrectResponses ~ Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) # print out summary of model print(summary(model1)) Linear mixed model fit by REML ['lmerMod'] Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata REML criterion at convergence: 658.3 Scaled residuals: Min Q Median Q Max Random effects: Groups Name Variance Std.Dev. SubjectID (Intercept) Residual Number of obs: 96, groups: SubjectID, 24 Fixed effects: Estimate Std. Error t value (Intercept) DosageD DosageD DosageD Correlation of Fixed Effects: (Intr) DsgD15 DsgD30 DosageD DosageD DosageD Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept DosageD DosageD DosageD Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

Bayesian regression Check chains plot(model1)

Bayesian regression Plot marginals dev.new()
plot(marginal_effects(model1), points = TRUE)

Bayesian regression Consider a null model (no differences between means) model2 = brm(CorrectResponses ~ 1 + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 1 + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

Bayesian regression Compare null (model2) and full (model1) models
> loo(model1, model2, reloo=TRUE) No problematic observations found. Returning the original 'loo' object. 2 problematic observation(s) found. The model will be refit 2 times. Fitting model 1 out of 2 (leaving out observation 2) Start sampling Gradient evaluation took 2.7e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.27 seconds. Adjust your expectations accordingly! Elapsed Time: seconds (Warm-up) seconds (Sampling) seconds (Total) …….. LOOIC SE model model model1 - model > loo(model1, model2) LOOIC SE model model model1 - model Warning message: Found 2 observations with a pareto_k > 0.7 in model 'model2'. It is recommended to set 'reloo = TRUE' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 2 times to compute the ELPDs for the problematic observations directly.

Standard ANOVA Contrasts are t-tests (or F-tests, if you like)

Bayesian Regression Contrasts: Easiest to look at posteriors
Easy for comparisons to D0 (Intercept) > post<-posterior_samples(model1) > > # What is the probability that mean for D15 is larger than mean for D0? > cat("Probability CorrectResponses mean D15 is larger than mean D0 = ", length(post$b_DosageD15[post$b_DosageD15 >0])/length(post$b_DosageD15) ) Probability CorrectResponses mean D15 is larger than mean D0 = > cat("Probability CorrectResponses mean D30 is larger than mean D0 = ", length(post$b_DosageD30[post$b_DosageD30 >0])/length(post$b_DosageD30) ) Probability CorrectResponses mean D30 is larger than mean D0 = > cat("Probability CorrectResponses mean D60 is larger than mean D0 = ", length(post$b_DosageD60[post$b_DosageD60 >0])/length(post$b_DosageD60) ) Probability CorrectResponses mean D60 is larger than mean D0 =

Bayesian Regression Contrasts: Easiest to look at posteriors
A bit complicated for other comparisons # What is the probability that mean for D60 is larger than mean for D15? meanD15 <- post$b_Intercept + post$b_DosageD15 meanD60 <- post$b_Intercept + post$b_DosageD60 cat("Probability CorrectResponses mean D60 is larger than mean D15 = ", sum(meanD60 > meanD15)/length(meanD60) ) > Probability CorrectResponses mean D60 is larger than mean D15 = meanD30 <- post$b_Intercept + post$b_DosageD30 > > cat("Probability CorrectResponses mean D60 is larger than mean D30 = ", sum(meanD60 > meanD30)/length(meanD60) ) Probability CorrectResponses mean D60 is larger than mean D30 =

Setting priors It is not always obvious what can be set
brms has a nice function get_prior() > get_prior(CorrectResponses ~ Dosage + (1 |SubjectID), data = ATdata) prior class coef group resp dpar nlpar bound b b DosageD15 b DosageD30 b DosageD60 5 student_t(3, 39, 10) Intercept 6 student_t(3, 0, 10) sd sd SubjectID sd Intercept SubjectID 9 student_t(3, 0, 10) sigma >

Setting priors Let’s set an informative (bad) prior on the slopes
model3 = brm(CorrectResponses ~ Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(10, 1), class = "b")) ) Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept DosageD DosageD DosageD Family Specific Parameters: sigma Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept DosageD DosageD DosageD Family Specific Parameters: sigma

Setting priors Let’s set a crazy prior on just one slope
model4 = brm(CorrectResponses ~ Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(20, 1), class = "b", coef="DosageD60")) ) Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept DosageD DosageD DosageD Family Specific Parameters: sigma Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept DosageD DosageD DosageD Family Specific Parameters: sigma

Bayesian regression Compare all our models
Different priors are different models! > loo(model1, model2, model3, model4, reloo=TRUE) No problematic observations found. Returning the original 'loo' object. 2 problematic observation(s) found. The model will be refit 2 times. Fitting model 1 out of 2 (leaving out observation 2) Start sampling Gradient evaluation took 9.5e-05 seconds 1000 transitions using 10 leapfrog steps per transition would take 0.95 seconds. Adjust your expectations accordingly! Elapsed Time: seconds (Warm-up) seconds (Sampling) seconds (Total) …….. LOOIC SE model model model model model1 - model model1 - model model1 - model model2 - model model2 - model model3 - model > loo(model1, model2, model3, model4) LOOIC SE model model model model model1 - model model1 - model model1 - model model2 - model model2 - model model3 - model Warning messages: 1: Found 2 observations with a pareto_k > 0.7 in model 'model2'. It is recommended to set 'reloo = TRUE' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 2 times to compute the ELPDs for the problematic observations directly. 2: Found 1 observations with a pareto_k > 0.7 in model 'model3'. It is recommended to set 'reloo = TRUE' in order to calculate the ELPD without the assumption that these observations are negligible. This will refit the model 1 times to compute the ELPDs for the problematic observations directly.

Regression We may doing something fundamentally wrong here
We are treating dosage as a categorical variable, but it is really a quantitative variable Maybe we should really be doing a proper regression instead of coercing it into an ANOVA design

Frequentist regression
library(lme4) print(lmer(CorrectResponses ~ DosageNumber + (1 | SubjectID), data = ATdata)) Linear mixed model fit by REML ['lmerMod'] Formula: CorrectResponses ~ DosageNumber + (1 | SubjectID) Data: ATdata REML criterion at convergence: Random effects: Groups Name Std.Dev. SubjectID (Intercept) 9.451 Residual Number of obs: 96, groups: SubjectID, 24 Fixed Effects: (Intercept) DosageNumber

Bayesian regression library(brms)
model5 = brm(CorrectResponses ~ DosageNumber + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ DosageNumber + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept DosageNumber Family Specific Parameters: sigma Linear mixed model fit by REML ['lmerMod'] Formula: CorrectResponses ~ DosageNumber + (1 | SubjectID) Data: ATdata REML criterion at convergence: Random effects: Groups Name Std.Dev. SubjectID (Intercept) 9.451 Residual Number of obs: 96, groups: SubjectID, 24 Fixed Effects: (Intercept) DosageNumber

Bayesian regression library(brms)
model1 = brm(CorrectResponses ~ DosageNumber + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 )

Bayesian ANOVA vs. regression
loo(model1, model5, reloo=TRUE) I re-ran model1, so the LOO value is a bit different from what we had previously I am not sure this kind of comparison is appropriate because we are technically using different independent variables (it feels the same, though) Hardly any difference between the two models Which makes sense given the data LOOIC SE model model model1 - model

Activity Find the possible priors for the linear regression model
What would be reasonable priors? Implement them and run the model Discuss

Conclusions Dependent tests Pretty straightforward once you get the notation straight Which is really just the notation of regression Something similar to contrasts is done by looking at the posteriors No messy hypothesis testing

Bayesian style dependent tests

Similar presentations

Presentation on theme: "Bayesian style dependent tests"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Bayesian style dependent tests

Similar presentations

Presentation on theme: "Bayesian style dependent tests"— Presentation transcript:

Similar presentations

About project

Feedback