Let’s do a Bayesian analysis

Slides:



Advertisements
Similar presentations
Objectives 10.1 Simple linear regression
Advertisements

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Inferences for Regression.
Probability Distributions CSLU 2850.Lo1 Spring 2008 Cameron McInally Fordham University May contain work from the Creative Commons.
Copyright © 2010 Pearson Education, Inc. Chapter 27 Inferences for Regression.
Linear Regression. PSYC 6130, PROF. J. ELDER 2 Correlation vs Regression: What’s the Difference? Correlation measures how strongly related 2 variables.
CHAPTER 24: Inference for Regression
The General Linear Model. The Simple Linear Model Linear Regression.
The Simple Regression Model
Correlation 2 Computations, and the best fitting line.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Chapter 12 Section 1 Inference for Linear Regression.
Linear Regression Inference
Prediction concerning Y variable. Three different research questions What is the mean response, E(Y h ), for a given level, X h, of the predictor variable?
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Inferences for Regression
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Estimation: Sampling Distribution
+ Chapter 12: Inference for Regression Inference for Linear Regression.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Section 9.3: Confidence Interval for a Population Mean.
Correlation – Recap Correlation provides an estimate of how well change in ‘ x ’ causes change in ‘ y ’. The relationship has a magnitude (the r value)
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Linear Regression Linear Regression. Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Purpose Understand Linear Regression. Use R functions.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
Chapter 26 Inferences for Regression. An Example: Body Fat and Waist Size Our chapter example revolves around the relationship between % body fat and.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture Notes and Electronic Presentations, © 2013 Dr. Kelly Significance and Sample Size Refresher Harrison W. Kelly III, Ph.D. Lecture # 3.
Estimating standard error using bootstrap
CHAPTER 12 More About Regression
Copyright © Cengage Learning. All rights reserved.
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
The Maximum Likelihood Method
Information criterion
PSY 626: Bayesian Statistics for Psychological Science
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
CHAPTER 12 More About Regression
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Computations, and the best fitting line.
The Normal Distribution
Let’s continue to do a Bayesian analysis
Inferences for Regression
Simulation-Based Approach for Comparing Two Means
CHAPTER 12 More About Regression
PSY 626: Bayesian Statistics for Psychological Science
Reasoning in Psychology Using Statistics
PSY 626: Bayesian Statistics for Psychological Science
Statistical Methods For Engineers
CHAPTER 26: Inference for Regression

Information criterion
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Chapter 3 Describing Relationships Section 3.2
PSY 626: Bayesian Statistics for Psychological Science
Correlation and Regression
Reasoning in Psychology Using Statistics
PSY 626: Bayesian Statistics for Psychological Science
Let’s do a Bayesian analysis
PSY 626: Bayesian Statistics for Psychological Science
Regression Statistics
CHAPTER 8 Estimating with Confidence
CHAPTER 12 More About Regression
Reasoning in Psychology Using Statistics
CHAPTER 12 More About Regression
Inferences for Regression
Chapter 5: Sampling Distributions
Presentation transcript:

Let’s do a Bayesian analysis 9/11/2018 Let’s do a Bayesian analysis Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2016 Purdue University PSY200 Cognitive Psychology

Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not https://coglab.cengage.com/?labname=visual_search Log in with ID=GregTest-145, password=12345678

Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape)

Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape)

Visual Search A classic experiment in perception/attention involves visual search Respond as quickly as possible whether an image contains a target (a green circle) or not Vary number of distractors: 4, 16, 32, 64 Vary type of distractors: feature (different color), conjunctive (different color or shape) Measure reaction time: time between onset of image and participant’s response 5 trials for each of 16 conditions: 4 number of distractors x 2 target (present or absent) x 2 distractor types (feature or conjunctive) = 80 trials

Visual Search Typical results: For conjunctive distractors, response time increases with the number of distractors

Linear model Suppose you want to model the search time on the Conjunctive search trials when the target is Absent as a linear equation Let’s do it for a single participant We are basically going through Section 4.4 of the text, but using a new data set Download files from the class web site and follow along in class

Read in data rm(list=ls(all=TRUE)) # clear all variables Clear old variables out from R’s memory graphics.off() Remove old graphics from the display VSdata<-read.csv(file="VisualSearch.csv",header=TRUE,stringsAsFactors=FALSE) Reads in the contents of the file VisualSearch.csv Uses the contents of the first row as a header, and creates variables by the names of those headers (no spaces in the header names!) VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" & VSdata$Target=="Absent" & VSdata$DistractorType=="Conjunction") Creates a new variable that contains just the data for a particular Participant Only includes data from the Target absent trials Only includes data for the Conjunction distractors Check the contents of VSdata2 in the R console

Define the model library(rethinking) VSmodel <- map( This loads up the R library created by the textbook author It has some nice functions for plotting and displaying tables VSmodel <- map( alist( RT_ms ~ dnorm(mu, sigma), mu <- a + b*NumberDistractors, a ~ dnorm(500, 10), b ~ dnorm(0, 100), sigma ~ dunif(0, 500) ), data=VSdata2 ) Defines the linear model Assigns prior distributions to each parameter (a, b, sigma)

Prior distributions b ~ dnorm(0, 100), sigma ~ dunif(0, 500) a ~ dnorm(500, 10), We are saying that we expect the intercept (RT_ms with 0 distractors) to be around 500 This is a claim about the population parameter, not about the data, per se b ~ dnorm(0, 100), We are saying that we expect the slope to be around 0, but with a lot of possible other choices sigma ~ dunif(0, 500) We are saying that sigma could be anything between 0 (standard deviation cannot be negative!) or 500 It cannot be outside this range!

Maximum a Posterior Here, defining the model basically does all the work in the “map” function This function takes the priors and the data, and computes posterior distributions The main output of the map( ) calculation is the set of a, b, and sigma that have the highest probabilities Often this is actually probability density rather than probability

Posterior Distribution Remember, these are distributions of the population parameters!

Posterior Distribution Remember, these are distributions of the population parameters!

Posterior Distribution Remember, these are distributions of the population parameters!

Maximum a Posterior The main output of the map( ) calculation is the set of a, b, and sigma that have the highest probabilities Often this is actually probability density rather than probability

Posterior Distributions It is more complicated because we have three parameters with a joint posterior probability density function Finding the peak of a multidimensional surface is not a trivial problem The map() function allows you to specify a couple of different methods It sometimes produces errors that correspond to trouble finding the peak Caution, model may not have converged. Code 1: Maximum iterations reached.

MAP estimates summaryTable <- summary(VSmodel) print(VSmodel) Prints out a table of Maximum A Posteriori estimates of a, b, and sigma

MAP estimates Maximum a posteriori (MAP) model fit Formula: RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors a ~ dnorm(500, 10) b ~ dnorm(0, 100) sigma ~ dunif(0, 500) MAP values: a b sigma 500.88486 48.29027 355.39830 Log-likelihood: -147.8

MAP estimates (2nd run) Maximum a posteriori (MAP) model fit Formula: RT_ms ~ dnorm(mu, sigma) mu <- a + b * NumberDistractors a ~ dnorm(500, 10) b ~ dnorm(0, 100) sigma ~ dunif(0, 500) MAP values: a b sigma 501.66287 48.48289 388.33640 Log-likelihood: -147.61

MAP estimate Red line is the MAP “best fitting” straight line plot(RT_ms ~ NumberDistractors, data=VSdata2) abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0))

MAP estimate Grey lines are samples of the population parameters from the posterior distribution numVariableLines=2000 numVariableLinesToPlot=20 post<-extract.samples(VSmodel, n= numVariableLines) for(i in 1: numVariableLinesToPlot){ abline(a=post$a[i], b=post$b[i], col=col.alpha("black",0.3)) }

MAP estimate Now is a good time to see if the model makes any sense Model checking Do the priors have the influence we expected? Does the model behave reasonably?

MAP estimate I see several issues. 1) The lines for samples population parameters all converge They all have a common intercept value 2) The “traditional” best fit line is rather different Intercept! a=832.94 b=41.38

Prior We used a ~ dnorm(500, 10) Which means the intercept has to be pretty close to the value 500 (we got 501.7) regardless of the data Suppose we use a ~ dnorm(500, 100) Now we get: MAP values: a b sigma 624.42893 45.84785 357.47658

Prior What about Now we get: MAP values: a b sigma a ~ dnorm(500, 500) Now we get: MAP values: a b sigma 818.79695 41.66933 327.17288 With a ~ dnorm(500, 1000) 832.69534 41.37111 327.51426 Error in map(alist(RT_ms ~ dnorm(mu, sigma), mu <- a + b * NumberDistractors, : non-finite finite-difference value [3] Start values for parameters may be too far from MAP. Try better priors or use explicit start values. If you sampled random start values, just trying again may work. Start values used in this attempt: a = 1619.54309994221 b = -58.0206822367026 sigma = 318.546253605746

MAP Estimate Now it looks pretty good (and closely matches the traditional fit) Check the other priors What happens when you use: sigma ~ dunif(0, 100) sigma ~ dunif(0, 1000) Can you change the prior for b to “break” the model? b ~ dnorm(0, 100)

Why bother? Once the model makes sense, it gives us an answer pretty close to standard regression: MAP: a=832.7, b= 41.4, sigma= 327.5 Standard: a=832.9, b=41.4 The difference is in the question being asked

Standard linear regression What parameter values a, b, minimize the prediction error? This is necessarily just a pair of numbers a, b

MAP estimates What parameter values a, b, maximize the posterior distribution? The maximal values are just a pair of numbers a, b But the posterior contains a lot more information Uncertainty about the estimates Uncertainty about the prediction Compare to confidence intervals

Posterior You can ask all kinds of questions about predictions and so forth by just using probability For example, what is the posterior distribution of the predicted mean value for 35 distractors? mu_at_35 <- post$a +post$b *35 dev.new() # make a new plot window dens(mu_at_35, col=rangi2, lwd=2, xlab="mu|NumDistract=35")

Posterior You can ask all kinds of questions about predictions and so forth by just using probability What is the probability that RT_ms is < 2200? length(mu_at_35[mu_at_35 <= 2200])/length(mu_at_35) [1] 0.1425 Note, this estimate is computed by considering “all” possible values of a, b, sigma Most of those values are close to the “traditional” values, but some are rather different This variability can matter!

Posterior What is the probability that predicted RT_ms is greater than 3000 for 42 distractors? What is the probability that a is more than 900? What is the probability that b is less than 35? Explore: What happens when you set? numVariableLines=100 numVariableLines=10000 What happens to these probability estimates when you change the priors?

Conclusions That wasn’t so bad It does take more care than a standard linear regression analysis You get a lot more out of it!