Download presentation
Presentation is loading. Please wait.
1
Let’s do a Bayesian analysis
9/11/2018 Let’s do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University PSY200 Cognitive Psychology
2
Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Log in with ID=GregTest-145, password=
3
Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape)
4
Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape)
5
Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape) Measure reaction time: time between onset of image and participant’s response 5 trials for each of 16 conditions: 4 number of distractors x 2 target (present or absent) x 2 distractor types (feature or conjunctive) = 80 trials
6
Visual Search Typical results: For conjunctive distractors, response time increases with the number of distractors
7
Linear model Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation Let’s do it for a single participant We are basically going through Section 4.4 of the text, but using a new data set Download files from the class web site and follow along in class
8
Read in data rm(list=ls(all=TRUE)) # clear all variables
Clear old variables out from R’s memory graphics.off() Remove old graphics from the display VSdata<-read.csv(file="VisualSearch.csv",header=TRUE,stringsAsFactors=FALSE) Reads in the contents of the file VisualSearch.csv Uses the contents of the first row as a header, and creates variables by the names of those headers (no spaces in the header names!) VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$Target=="Absent" & VSdata$DistractorType=="Conjunction") Creates a new variable that contains just the data for a particular Participant Only includes data from the Target absent trials Only includes data for the Conjunction distractors Check the contents of VSdata2 in the R console
9
Define the model library(rethinking) VSmodel <- map(
This loads up the R library created by the textbook author It has some nice functions for plotting and displaying tables VSmodel <- map( alist( RT_ms ~ dnorm(mu, sigma), mu <- a + b*NumberDistractors, a ~ dnorm(500, 10), b ~ dnorm(0, 100), sigma ~ dunif(0, 500) ), data=VSdata2 ) Defines the linear model Assigns prior distributions to each parameter (a, b, sigma)
10
Prior distributions b ~ dnorm(0, 100), sigma ~ dunif(0, 500)
a ~ dnorm(500, 10), We are saying that we expect the intercept (RT_ms with 0 distractors) to be around 500 This is a claim about the population parameter, not about the data, per se b ~ dnorm(0, 100), We are saying that we expect the slope to be around 0, but with a lot of possible other choices sigma ~ dunif(0, 500) We are saying that sigma could be anything between 0 (standard deviation cannot be negative!) or 500 It cannot be outside this range!
11
Maximum a Posterior Here, defining the model basically does all the work in the “map” function This function takes the priors and the data, and computes posterior distributions The main output of the map( ) calculation is the set of a, b, and sigma that have the highest probabilities Often this is actually probability density rather than probability
12
Posterior Distribution
Remember, these are distributions of the population parameters!
13
Posterior Distribution
Remember, these are distributions of the population parameters!
14
Posterior Distribution
Remember, these are distributions of the population parameters!
15
Maximum a Posterior The main output of the map( ) calculation is the set of a, b, and sigma that have the highest probabilities Often this is actually probability density rather than probability
16
Posterior Distributions
It is more complicated because we have three parameters with a joint posterior probability density function Finding the peak of a multidimensional surface is not a trivial problem The map() function allows you to specify a couple of different methods It sometimes produces errors that correspond to trouble finding the peak Caution, model may not have converged. Code 1: Maximum iterations reached.
17
MAP estimates summaryTable <- summary(VSmodel) print(VSmodel)
Prints out a table of Maximum A Posteriori estimates of a, b, and sigma
18
MAP estimates Maximum a posteriori (MAP) model fit Formula:
RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors a ~ dnorm(500, 10) b ~ dnorm(0, 100) sigma ~ dunif(0, 500) MAP values: a b sigma Log-likelihood:
19
MAP estimates (2nd run) Maximum a posteriori (MAP) model fit Formula:
RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors a ~ dnorm(500, 10) b ~ dnorm(0, 100) sigma ~ dunif(0, 500) MAP values: a b sigma Log-likelihood:
20
MAP estimate Red line is the MAP “best fitting” straight line
plot(RT_ms ~ NumberDistractors, data=VSdata2) abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0))
21
MAP estimate Grey lines are samples of the population parameters from the posterior distribution numVariableLines=2000 numVariableLinesToPlot=20 post<-extract.samples(VSmodel, n= numVariableLines) for(i in 1: numVariableLinesToPlot){ abline(a=post$a[i], b=post$b[i], col=col.alpha("black",0.3)) }
22
MAP estimate Now is a good time to see if the model makes any sense
Model checking Do the priors have the influence we expected? Does the model behave reasonably?
23
MAP estimate I see several issues.
1) The lines for samples population parameters all converge They all have a common intercept value 2) The “traditional” best fit line is rather different Intercept! a=832.94 b=41.38
24
Prior We used a ~ dnorm(500, 10) Which means the intercept has to be pretty close to the value 500 (we got 501.7) regardless of the data Suppose we use a ~ dnorm(500, 100) Now we get: MAP values: a b sigma
25
Prior What about Now we get: MAP values: a b sigma
a ~ dnorm(500, 500) Now we get: MAP values: a b sigma With a ~ dnorm(500, 1000) Error in map(alist(RT_ms ~ dnorm(mu, sigma), mu <- a + b * NumberDistractors, : non-finite finite-difference value [3] Start values for parameters may be too far from MAP. Try better priors or use explicit start values. If you sampled random start values, just trying again may work. Start values used in this attempt: a = b = sigma =
26
MAP Estimate Now it looks pretty good (and closely matches the traditional fit) Check the other priors What happens when you use: sigma ~ dunif(0, 100) sigma ~ dunif(0, 1000) Can you change the prior for b to “break” the model? b ~ dnorm(0, 100)
27
Why bother? Once the model makes sense, it gives us an answer pretty close to standard regression: MAP: a=832.7, b= 41.4, sigma= 327.5 Standard: a=832.9, b=41.4 The difference is in the question being asked
28
Standard linear regression
What parameter values a, b, minimize the prediction error? This is necessarily just a pair of numbers a, b
29
MAP estimates What parameter values a, b, maximize the posterior distribution? The maximal values are just a pair of numbers a, b But the posterior contains a lot more information Uncertainty about the estimates Uncertainty about the prediction Compare to confidence intervals
30
Posterior You can ask all kinds of questions about predictions and so forth by just using probability For example, what is the posterior distribution of the predicted mean value for 35 distractors? mu_at_35 <- post$a +post$b *35 dev.new() # make a new plot window dens(mu_at_35, col=rangi2, lwd=2, xlab="mu|NumDistract=35")
31
Posterior You can ask all kinds of questions about predictions and so forth by just using probability What is the probability that RT_ms is < 2200? length(mu_at_35[mu_at_35 <= 2200])/length(mu_at_35) [1] Note, this estimate is computed by considering “all” possible values of a, b, sigma Most of those values are close to the “traditional” values, but some are rather different This variability can matter!
32
Posterior What is the probability that predicted RT_ms is greater than 3000 for 42 distractors? What is the probability that a is more than 900? What is the probability that b is less than 35? Explore: What happens when you set? numVariableLines=100 numVariableLines=10000 What happens to these probability estimates when you change the priors?
33
Conclusions That wasn’t so bad
It does take more care than a standard linear regression analysis You get a lot more out of it!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.