Download presentation
Presentation is loading. Please wait.
Published byCharla Long Modified over 6 years ago
1
PSY 626: Bayesian Statistics for Psychological Science
12/25/2018 Bayesian Shrinkage Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology
2
Shrinkage In Lecture 5, we noted that shrinking means toward a fixed value improved model fits for new data There is an error in the equations there (the numerator should be variance instead of standard deviation) Shrink to the average Morris-Efron estimator
3
Morris-Efron shrinkage
Using the Smiles and Leniency data set (independent means ANOVA) # load data file SLdata<-read.csv(file="SmilesLeniency.csv",header=TRUE,stringsAsFactors=TRUE) # Morris Efron Shrinkage # Get each sample mean Means <- aggregate(Leniency~SmileType, FUN=mean, data=SLdata) counts<- aggregate(Leniency~SmileType, FUN=length, data=SLdata) Vars <- aggregate(Leniency~SmileType, FUN=var, data=SLdata) GrandMean = sum(Means$Leniency)/4 newMeans = (1- ((length(Means$Leniency)-3))*Vars$Leniency/counts$Leniency /sum( (Means$Leniency - GrandMean)^2 )) *(Means$Leniency - GrandMean) + GrandMean par(bg="lightblue") range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) plot(Means$SmileType, Means$Leniency, main="Morris-Efron", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
4
Bayesian ANOVA Using the Smiles and Leniency data set (independent means ANOVA) We used linear regression to produce the Bayesian equivalent of an ANOVA (a little hard to interpret) Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept SmileTypeFelt SmileTypeMiserable SmileTypeNeutral Family Specific Parameters: sigma # Model as intercept and slopes model1 = brm(Leniency ~ SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3) print(summary(model1))
5
Bayesian ANOVA Using the Smiles and Leniency data set (independent means ANOVA) We used linear regression to produce the Bayesian equivalent of an ANOVA (a little hard to interpret) # compute means of posteriors post<-posterior_samples(model1) newMeans <- c(mean(post$b_Intercept), mean(post$b_Intercept + post$b_SmileTypeFelt), mean(post$b_Intercept + post$b_SmileTypeMiserable), mean(post$b_Intercept + post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 1", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
6
Equivalent model Using the Smiles and Leniency data set (independent means ANOVA) Remove the Intercept (easier to interpret) Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ 0 + SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat SmileTypeFALSE SmileTypeFelt SmileTypeMiserable SmileTypeNeutral Family Specific Parameters: sigma # Model without intercept (more natural) model2 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3) print(summary(model2))
7
Bayesian ANOVA Using the Smiles and Leniency data set (independent means ANOVA) Remove the Intercept (easier to interpret) # compute means of posteriors post<-posterior_samples(model2) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) plot(Means$SmileType, Means$Leniency, main="Model 2", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
8
Bayesian Shrinkage We get something very much like shrinkage by using a prior Let’s set up a prior to pull values toward the grand mean Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ 0 + SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 1; total post-warmup samples = 5400 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat SmileTypeFALSE SmileTypeFelt SmileTypeMiserable SmileTypeNeutral Family Specific Parameters: sigma # Model without intercept (more natural) and shrink to grand mean stanvars <-stanvar(GrandMean, name='GrandMean') prs <- c(prior(normal(GrandMean, 1), class = "b") ) model3 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model3))
9
Bayesian ANOVA Using the Smiles and Leniency data set (independent means ANOVA) Remove the Intercept (easier to interpret) # compute means of posteriors post<-posterior_samples(model3) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 3", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
10
Model comparison Shrinkage should lead to better prediction
Small effect, but helpful Shrinkage never really hurts for prediction, is that true of the effect of priors? > model_weights(model2, model3, weights="loo") model2 model3
11
Bayesian Shrinkage Larger standard deviation
stanvars <-stanvar(GrandMean, name='GrandMean') prs <- c(prior(normal(GrandMean, 5), class = "b") ) model4 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model4)) # compute means of posteriors post<-posterior_samples(model4) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 4", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
12
Bayesian Shrinkage Smaller standard deviation
stanvars <-stanvar(GrandMean, name='GrandMean') prs <- c(prior(normal(GrandMean, 0.1), class = "b") ) model5 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model5)) # compute means of posteriors post<-posterior_samples(model5) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 5", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
13
Model comparison Contrary to shrinkage, the prior can hurt if it is too constraining Makes intuitive sense A standard deviation of 1 (model3) helps, but a standard deviation of 0.1 (model5) hurts A standard deviation of 5 (model4) helps some > model_weights( model2, model3, model4, model5, weights="loo") model2 model3 model4 model5
14
Ad hoc approach Going too big for the standard deviation is not going to hurt (but it might not help as much as it good) Try using a standard deviation estimated from the data itself Standard error of the mean ( ) GrandSE = sqrt(mean(Vars$Leniency)/min(counts$Leniency) ) # pooled across groups (all have same sample size) stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model6 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, prior = prs, stanvars=stanvars ) print(summary(model6)) # compute means of posteriors post<-posterior_samples(model6) newMeans <- c(mean(post$b_SmileTypeFALSE), mean(post$b_SmileTypeFelt), mean(post$b_SmileTypeMiserable), mean(post$b_SmileTypeNeutral)) range = c(min(c(Means$Leniency, newMeans)), max(c(Means$Leniency, newMeans))) dev.new() plot(Means$SmileType, Means$Leniency, main="Model 6", ylim=range) points(Means$SmileType, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
15
Model comparison Does a good job here compared to the no shrinkage model Does a good job compared to other models No guarantees > model_weights( model2, model6, weights="loo") model2 model6 > model_weights( model2, model3, model4, model5, model6, weights="loo") model2 model3 model4 model5 model6
16
ADHD Treatment 24 children given different dosages of a drug
Measure scores on a Delay of Gratification task (bigger is better) Dependent means ANOVA
17
Shrinkage for subjects
Our data has quite some variability across subjects # load data file ATdata<-read.csv(file="ADHDTreatment.csv",header=TRUE,stringsAsFactors=TRUE) # Pull out individual subjects and plot plot(ATdata$DosageNumber, ATdata$CorrectResponses) for(i in c(1:24)) { thisLabel <- paste("Subject", toString(i), sep="") thisSet<- subset(ATdata, ATdata$SubjectID == thisLabel) lines(thisSet$DosageNumber, thisSet$CorrectResponses, col=i) }
18
Shrinkage for subjects
We may care about dosage effect for individual subjects Knowledge of other subjects should inform how we interpret scores of a given subject Subject7 has the highest scores in the data set But maybe we should suspect they are overestimates Subject4 has the lowest scores in the data set But maybe we should suspect they are underestimates
19
Simple model Dosage level corresponds to Subject1
Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + SubjectID Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat DosageD DosageD DosageD DosageD SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject SubjectIDSubject Family Specific Parameters: sigma Using fixed offset for each subject Dosage level corresponds to Subject1 model0 = brm(CorrectResponses ~ 0 + Dosage + SubjectID, data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model0))
20
Subject level From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution) dev.new() # Look at effect on each subject post<-posterior_samples(model0) plot(ATdata$DosageNumber, ATdata$CorrectResponses, main="Model 0") xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("b_SubjectIDSubject", toString(i), sep="") if(i==1){ SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] } else{ SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] } newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i)
21
Bayesian Shrinkage We want to put a prior on the Subject intercepts
By default brm uses a student t distribution Student_t parameters are (degrees of freedom, location, scale) Think of it as (df, mean, sd) > get_prior(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata) prior class coef group resp dpar nlpar bound b b DosageD0 b DosageD15 b DosageD30 b DosageD60 6 student_t(3, 0, 10) sd sd SubjectID sd Intercept SubjectID 9 student_t(3, 0, 10) sigma
22
Standard model Using default priors for subject intercepts
Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma Using default priors for subject intercepts model1 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model1))
23
Impact of prior From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution) dev.new() # Look at effect on each subject post<-posterior_samples(model1) plot(ATdata$DosageNumber, ATdata$CorrectResponses) xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("r_SubjectID[Subject", toString(i),",Intercept]", sep="") SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i) }
24
Modeling In some sense, this is just modeling
Extreme data are interpreted as noise, and the model predicts that testing an extreme subject again will result in something more “normal”
25
Compare models Favors model with shrinkage
> model_weights(model0, model1, weights="loo") model0 model1 >
26
More shrinkage! Adjust the t-distribution
Bigger df (not so fat tails) Smaller scale (~standard deviation) Not much affect on population means Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma prs <- c(prior(student_t(30, 0, 1), class = "sd") ) model2 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs ) print(summary(model2))
27
Impact of prior From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution) dev.new() # Look at effect on each subject post<-posterior_samples(model2) plot(ATdata$DosageNumber, ATdata$CorrectResponses, main="Model 2") xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("r_SubjectID[Subject", toString(i),",Intercept]", sep="") SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i) }
28
Less shrinkage! Adjust the t-distribution
Standard df Bigger scale (~standard deviation) Not much affect on population means Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma prs <- c(prior(student_t(3, 0, 20), class = "sd") ) model3 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs ) print(summary(model3))
29
Impact of prior From the posterior, we can pull out the intercept of each subject, and see the effect for each subject (mean of the resulting posterior distribution) Not much difference dev.new() # Look at effect on each subject post<-posterior_samples(model3) plot(ATdata$DosageNumber, ATdata$CorrectResponses, main="Model 3") xLabels<-c(0, 15, 30, 60) for(i in c(1:24)) { thisLabel <- paste("r_SubjectID[Subject", toString(i),",Intercept]", sep="") SubjectMeanEstimates <- post[, c("b_DosageD0", "b_DosageD15", "b_DosageD30", "b_DosageD60")] + post[,thisLabel] newMeans <- c(mean(SubjectMeanEstimates$b_DosageD0), mean(SubjectMeanEstimates$b_DosageD15), mean(SubjectMeanEstimates$b_DosageD30), mean(SubjectMeanEstimates$b_DosageD60)) lines(xLabels, newMeans, col=i, lty=i) }
30
Compare models Favors model with default shrinkage
Bigger scale is almost as good I was never able to beat the default prior > model_weights(model0, model1, model2, model3, weights="loo") model0 model1 model2 model3 >
31
Morris-Efron shrinkage
No need to ignore populaton means ATdata<-read.csv(file="ADHDTreatment.csv",header=TRUE,stringsAsFactors=TRUE) # Morris Efron Shrinkage # Get each sample mean Means <- aggregate(CorrectResponses~Dosage, FUN=mean, data=ATdata) counts<- aggregate(CorrectResponses~Dosage, FUN=length, data=ATdata) Vars <- aggregate(CorrectResponses~Dosage, FUN=var, data=ATdata) GrandMean = sum(Means$CorrectResponses)/4 newMeans = (1- ((4-3))*Vars$CorrectResponses/counts$CorrectResponses /sum( (Means$CorrectResponses - GrandMean)^2 )) *(Means$CorrectResponses - GrandMean) + GrandMean par(bg="lightblue") range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) plot(Means$Dosage, Means$CorrectResponses, main="Morris-Efron", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
32
Bayesian model Population level effects Family: gaussian
Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma # This model has different means for different dosages and an (random) intercept for each subject model4 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model4))
33
Bayesian ANOVA Plot the estimated means # compute means of posteriors
post<-posterior_samples(model4) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) plot(Means$Dosage, Means$CorrectResponses, main="Model 4", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
34
Shrink it all! The Smiles and Leniency data set has multiple groups
SmileType Subject Both can be shrunk by imposing a prior for each one We define a shrinkage prior using the data Means <- aggregate(CorrectResponses ~ Dosage, FUN=mean, data=ATdata) counts<- aggregate(CorrectResponses ~ Dosage, FUN=length, data= ATdata) Vars <- aggregate(CorrectResponses ~ Dosage, FUN=var, data= ATdata) GrandMean = sum(Means$CorrectResponses)/length(Means$CorrectResponses) GrandSE = sqrt(mean(Vars$CorrectResponses)/min(counts$CorrectResponses) ) # pooled across groups (all have same sample size) stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") )
35
Bayesian Shrinkage We get something very much like shrinkage by using a prior Let’s set up a prior to pull values toward the grand mean Using the standard deviation of the sample means ( ) Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma GrandSE = sd(Means$CorrectResponses) stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model5 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model5))
36
Bayesian ANOVA Effect of prior # compute means of posteriors
post<-posterior_samples(model5) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 5", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
37
Model comparison Shrinkage should lead to better prediction
Small effect, but helpful Other standard deviations? print(model_weights(model4, model5, weights="loo") ) model4 model5
38
Bayesian Shrinkage Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma We get something very much like shrinkage by using a prior Let’s set up a prior to pull values toward the grand mean Use a large standard deviation GrandSE = 10 stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model6 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model6))
39
Bayesian ANOVA Effect of a prior with a large SD
# compute means of posteriors post<-posterior_samples(model6) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 6", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
40
Bayesian Shrinkage We get something very much like shrinkage by using a prior Let’s set up a prior to pull values toward the grand mean Small standard deviation Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: DosageD DosageD DosageD DosageD Family Specific Parameters: sigma GrandSE = 0.1 stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model7 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model7))
41
Bayesian ANOVA Effect of a prior with a large SD
# compute means of posteriors post<-posterior_samples(model7) newMeans <- c(mean(post$b_DosageD0), mean(post$b_DosageD15), mean(post$b_DosageD30), mean(post$b_DosageD60)) range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 7", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
42
Bayesian Shrinkage We can treat condition similar to subjects
Use default brm prior Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ 0 + (1 | Dosage) + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~Dosage (Number of levels: 4) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) ~SubjectID (Number of levels: 24) sd(Intercept) Family Specific Parameters: sigma model8 = brm(CorrectResponses ~ 0 +(1 | Dosage) + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2 ) print(summary(model8))
43
Bayesian ANOVA Effect of default prior: seems to pull to zero
Need to reconfigure the model # compute means of posteriors post<-posterior_samples(model8) newMeans <- c() xLabels<-c(0, 15, 30, 60) for(i in c(1:length(xLabels)) ){ thisLabel <- paste("r_Dosage[D", toString(xLabels[i]),",Intercept]", sep="") DosageMeanEstimates <- post[,thisLabel] newMeans <- c(newMeans, mean(DosageMeanEstimates)) } range = c(min(c(Means$CorrectResponses, newMeans)), max(c(Means$CorrectResponses, newMeans))) dev.new() plot(Means$Dosage, Means$CorrectResponses, main="Model 8", ylim=range) points(Means$Dosage, newMeans, pch=19) abline(h= GrandMean, col="red", lwd=3, lty=2)
44
Model comparison Shrinkage should lead to better prediction
It does, unless the prior is too constraining (model7) Or the prior pulls to something far from actual values (model8) It tends to be a small effect Bigger effects for more means As we saw in Lecture 5 > print(model_weights(model4, model5, model6, model7, model8, weights="loo") ) model model model model model8
45
Conclusions Shrinkage Side effect of prior across a group
Helps prediction Need “appropriate” prior Not too tight For subject-level effects, defaults in brm seem pretty good for cases I have tested
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.