Let’s continue to do a Bayesian analysis

Let’s continue to do a Bayesian analysis
6/24/2018 Let’s continue to do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University PSY200 Cognitive Psychology

Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape)

Visual Search Typical results: For conjunctive distractors, response time increases with the number of distractors

Linear model Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation Let’s do it for a single participant We are basically going through Section 4.4 of the text, but using a new data set Download files from the class web site and follow along in class We built our model using the map function

MAP estimates Maximum a posteriori (MAP) model fit Formula:
RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors a ~ dnorm(1000, 500) b ~ dnorm(0, 100) sigma ~ dunif(0, 500) MAP values: a b sigma

Posterior You can estimate the posterior for a function by making draws from the posterior numVariableLines=10000 post<-extract.samples(VSmodel, n= numVariableLines) You can ask all kinds of questions about predictions and so forth by just using probability For example, what is the posterior distribution of the predicted mean value for 35 distractors? mu_at_35 <- post$a +post$b *35 10,000 samples

Posterior What is the 89% highest posterior density interval of mu at NumberDistractors=35? HPDI(mu_at_35, prob=0.89) | | Why 89%? Because it is prime  Why 95% for a CI? (2111.8, ) HPDI(mu_at_35, prob=0.95) | | Why is the HPDI broader than the CI?

HPDI vs CI HPDI95= (2128.9, 2428.4) CI95= (2111.8, 2450.9)
Pretty similar, so why bother? Different interpretations: HPDI is the smallest set of values of mean RT_ms for NumberDistractors=35 that have a 95% probability If the model is valid, the priors are appropriate, and so forth CI is the smallest set of values that results from a process that 95% of the time includes the true mean RT_ms for NumberDistractors=35 If the model is valid and so forth

HPDI vs CI HPDI is a description of the posterior distribution
CI is a description of the sample and an algorithm that connects it (probabilistically) to the true mean What is the probability that the mean of RT_ms for NumberDistractors=35 is greater than 2400 ms? Treat posterior as a normal distribution: mean(mu_at_35) = sd(mu_at_35)= Area greater than 2400 is Compute directly from posterior samples: length(mu_at_35[mu_at_35 > 2400])/length(mu_at_35) =

HPDI vs CI You cannot do this with a CI because the CI is not a summary of the posterior distribution Instead, the limits of the CI are the values that are output by a process that for 95% of random samples will produce limits that contain the true mean If that sounds kind of silly, then you are following along just fine In practice, the limits of a CI may be similar to the limits of an HPDI, but in principle, they could hardly be more different And the limit values are sometimes not similar at all It depends on the priors

Prediction uncertainty
Our linear model uses a and b to predict the mean RT_ms for any given NumberDistractors value There is uncertainty in this prediction, and we should represent it for each value of NumberDistractors NumberDistractors.seq<-seq(from=1, to=65, by=1) Generates a vector [1, 2, 3, … 65] mu<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq)) Provides a posterior distribution for mean predicted RT_ms for each value in the vector (a great big 2D matrix) mu.mean <- apply(mu, 2, mean) mu.HPDI <-apply(mu, 2, HPDI, prob=0.89) A short cut way of applying a function “mean” or “HPDI” to columns (dimension 2) of a matrix

Plot the raw data plot(RT_ms ~ NumberDistractors, data=VSdata2) Plot the MAP line lines(NumberDistractors.seq, mu.mean) For all practical purposes, this is the same as plotting the regression line from the estimated coefficients, but it is estimated from the sampled posterior distribution for different NumberDistractors values Plot a shaded region for the 89% HDPI shade(mu.HPDI, NumberDistractors.seq)

Nice summary of predicting average RT_ms values

Predicting individual points
We have been predicting mean RT_ms values for each NumberDistractors value Our model is Maximum a posteriori (MAP) model fit Formula: RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors If we try to predict any given RT_ms (not just a mean) we have to consider that we are sampling that value from a population with a standard deviation of sigma We need to consider all of the uncertainty

The model can just as easily generate individual simulated samples as means sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq)) We can identify the “middle” 89% of such simulated samples for each NumberDistractors value RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89) PI is a function from the rethinking library And plot everything dev.new() plot(RT_ms ~ NumberDistractors, data=VSdata2) lines(NumberDistractors.seq, mu.mean) # MAP line for means shade(mu.HPDI, NumberDistractors.seq) #HDPI for means shade(RT_ms.PI, NumberDistractors.seq) # PI for individual values

This kind of comparison is useful for “checking” on whether the model makes sense Here, everything looks fine to me

Bayesian vs. Linear regression
For the best fitting line, we get nearly the same from the Bayesian MAP approach as from typical linear regression Are the predictions the same?

What is the probability of observing a random trial with RT_ms<1000 for NumberDisactors=15? MAP: length(sim.RT_ms[, 15][sim.RT_ms[, 15]<1000])/length(sim.RT_ms[, 15]) 0.104

What is the probability of observing a random trial with RT_ms<1000 for NumberDisactors=15? Linear regression: Mean = * 15 = RT_ms ~ N( , 348.8) 0.0967 Why more likely to get this “rare” event in the Bayesian model?

The difference is in the representation of uncertainty Typical linear regression uses the best fitting straight line, and then estimates RT_ms from that model Any other choice would be worse (in terms of reducing error for the observed data) The Bayesian MAP also has a best fitting straight line model, but its prediction is a posterior distribution of many different straight line models (with different parameters) It estimates RT_ms from the full posterior distribution rather than just the “best fitting” model There is almost always uncertainty about the model, so there is uncertainty about the values of RT_ms beyond the standard deviation in the regression equation Predictions from typical linear regression ignore the uncertainty about the model They tend to be overly optimistic

Visual Search Typical results: For conjunctive distractors, response time increases with the number of distractors

Visual Search Previously we fit a model to the Target absent condition
We can easily extend it to include the Target present condition VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$DistractorType=="Conjunction") Define a dummy variable with value 0 if target is absent and 1 if the target is present VSdata2$TargetIsPresent <- ifelse(VSdata2$Target=="Present", 1, 0)

Visual Search Define the model VSmodel <- map(
alist( RT_ms ~ dnorm(mu, sigma), mu <- a + (b* TargetIsPresent +(1-TargetIsPresent)*b2)*NumberDistractors, a ~ dnorm(1000, 500), b ~ dnorm(0, 100), b2 ~ dnorm(0, 100), sigma ~ dunif(0, 2000) ), data=VSdata2 ) Note, parameter b is the slope for when the target is present and b2 is the slope when the target is absent Both conditions have the same model standard deviations and intercept

Model results Maximum a posteriori (MAP) model fit Formula:
RT_ms ~ dnorm(mu, sigma) mu <- a + (b * TargetIsPresent + (1 - TargetIsPresent) * b2) * NumberDistractors a ~ dnorm(1000, 500) b ~ dnorm(0, 100) b2 ~ dnorm(0, 100) sigma ~ dunif(0, 2000) MAP values: a b b sigma Log-likelihood: Compare with model for Target absent only MAP values: a b sigma

Model results print(precis(VSmodel, corr=TRUE))
Mean StdDev 5.5% % a b b2 sigma a b b sigma

Best fitting lines plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1) points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), pch=15) abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0)) abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b2"], col=col.alpha("green",1.0)) numVariableLines=10000 numVariableLinesToPlot=20 post<-extract.samples(VSmodel, n= numVariableLines) for(i in 1: numVariableLinesToPlot){ abline(a=post$a[i], b=post$b[i], col=col.alpha("red",0.3), lty=5) abline(a=post$a[i], b=+post$b2[i], col=col.alpha("green",0.3), lty=5) }

HDPI (Target absent) # Plot HPDI for TargetAbsent dev.new()
plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1) # Define a sequence of NumberDistractors to compute predictions NumberDistractors.seq<-seq(from=1, to=65, by=1) # use link to compute mu for each sample from posterior and for each value in NumberDistractors.seq mu_absent<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq, TargetIsPresent=0)) mu_absent.mean <- apply(mu_absent, 2, mean) mu_absent.HPDI <-apply(mu_absent, 2, HPDI, prob=0.89) # Plot the MAP line (same as abline done previously from the linear regression coefficients) lines(NumberDistractors.seq, mu_absent.mean, ) shade(mu_absent.HPDI, NumberDistractors.seq, col=col.alpha("green",0.3))

HDPI (Target present) # Plot HPDI for TargetPresent
points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), pch=15) # use link to compute mu for each sample from posterior and for each value in NumberDistractors.seq mu_present<-link(VSmodel, data=data.frame(NumberDistractors=NumberDistractors.seq, TargetIsPresent=1)) mu_present.mean <- apply(mu_present, 2, mean) mu_present.HPDI <-apply(mu_present, 2, HPDI, prob=0.89) # Plot the MAP line (same as abline done previously from the linear regression coefficients) lines(NumberDistractors.seq, mu_present.mean) shade(mu_present.HPDI, NumberDistractors.seq, col=col.alpha("red",0.3))

Prediction intervals (target absent)
# Prediction interval for RT_ms raw scores # Target absent # generate many sample RT_ms scores for NumberDistractors.seq using the model sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq, TargetIsPresent=0)) # Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a function from the rethinking library) RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89) # Plot dev.new() plot(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Absent" ), pch=1) lines(NumberDistractors.seq, mu_absent.mean) # MAP line for means shade(mu_absent.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("green",0.3)) # shaped prediction interval for simulated RT_ms values

Prediction intervals (target present)
# generate many sample RT_ms scores for NumberDistractors.seq using the model sim.RT_ms <- sim(VSmodel, data=list(NumberDistractors =NumberDistractors.seq, TargetIsPresent=1)) # Idenitfy limits of middle 89% of samples values for each NumberDistractors (PI is a function from the rethinking library) RT_ms.PI <- apply(sim.RT_ms, 2, PI, prob=0.89) # Plot points(RT_ms ~ NumberDistractors, data=subset(VSdata2, VSdata2$Target=="Present" ), pch=15) lines(NumberDistractors.seq, mu_present.mean) # MAP line for means shade(mu_present.HPDI, NumberDistractors.seq) # shaded HPDI for estimates of means shade(RT_ms.PI, NumberDistractors.seq, col=col.alpha("red",0.3)) # shaped prediction interval for simulated RT_ms values

Conclusions HPDI vs. CI Predictions should consider uncertainty about the model Bayesian analysis allows you to do this in a way that cannot be done with typical linear regression Extending the model to consider different slopes for different conditions is straightforward

Let’s continue to do a Bayesian analysis

Similar presentations

Presentation on theme: "Let’s continue to do a Bayesian analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Let’s continue to do a Bayesian analysis

Similar presentations

Presentation on theme: "Let’s continue to do a Bayesian analysis"— Presentation transcript:

Similar presentations

About project

Feedback