PSY 626: Bayesian Statistics for Psychological Science

PSY 626: Bayesian Statistics for Psychological Science
12/7/2018 About priors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Levels of priors Flat prior: You know nothing, often improper
Super-vague but proper: N(0, ) Weakly informative prior: N(0, 10) Generic weakly informative prior: N(0,1) Specific informative prior: N(0.4, 0.2)

How to pick a prior Three main issues to consider
Technical/computational: Some priors cause problems with Stan or MCMC methods Sometimes you pick a prior to avoid these problems Model definition: You have to define a prior, but you have little knowledge to guide you Default priors fill the gap of knowledge Model definition: Your model specifies a narrow range of parameter values Informative priors

Technical issues Using a flat prior tends to give parameter estimates similar to frequentist approaches (e.g., linear regression) Mean of posterior distribution matches regression estimates But a flat prior can introduce instability in Stan/MCMC as it searches around parameter space There are similar technical problems with hard limits to priors Instead of Uniform(0,1) use N(0.5, 0.5)

Technical issues If using non-informative/flat priors, it is appropriate to justify them as providing computational benefits Something like: Although we felt parameter A could be anywhere between 0 and 1, to promote model convergence we set its prior to be N(0.5, 0.5).

Weakly informative prior
You may not know what range of values correspond to a parameter But you may want to provide constraints anyhow Shrinkage (often called regularization) offers benefits even when you don’t know the best value to shrink to (mean of the prior) More generally, you may not know what values are appropriate, but you probably know some values that are inappropriate Ridiculous values for a mean or standard deviation in a reaction time experiment (too short or too long) Family income for children in an inner-city school

The overall goal is to rule out unreasonable parameter values Less concern over having the prior specify the true parameter value For example, reaction times are rarely shorter than 150 milliseconds, even for the simplest of tasks (often shorter RTs are interpreted as “anticipatory” responses rather than reactions) N(150, 1000) is about the same as N(0, 1000)

Err on the side of broadness: N(150, 2000) A broader prior allows for more robustness in model fitting In contrast, if your reaction time task results in RT values around 1500 ms, the N(150, 1000) prior is going to have a hard time finding the posterior distribution because everything is in the tails A broader prior has a cost: loss of precision if the data is “typical” Broader posterior distribution

Remember the posterior
On the other hand, the goal of a Bayesian analysis is to identify the posterior distribution of a parameter Do not be too obsessed with just the mean of the posterior distribution A narrower posterior distribution is surely better than a broad distribution

Remember the likelihood
The likelihood characterizes how your data is generated Normal distribution Ex-gaussian distribution Logit model The description of the data generation process is as important as specifying the priors for the likelihood parameters The likelihood needs to be justified just as much (perhaps more so) than the priors

Complications Depending on the model/likelihood, setting broad/uninformative priors can actually be very constraining Especially for complex models an effort to use broad priors can lead to weird effects, where lots of prior “weight” is put on rare parameter values These are subtle effects: my advice is to get expert advice (which is often reflected in default settings)

Informative priors You get the most benefit from a Bayesian analysis by using informative priors More precise models are easier to test because they make more precise predictions Informative priors are not the norm in Bayesian analyses You should constantly look for the opportunity to use them They can come from many different sources

Subjective/Objective priors
If you start to read the history/literature of Bayesian methods, you will see discussions about what a prior is Objective priors: expressions of models, principles, scientific consensus, or computational technicalities Subjective priors: the scientist’s “belief” Personally, I find the “belief” approach to priors to be problematic As a scientist, I hardly care about another researcher’s beliefs; nor would I directly include my beliefs into my analysis

Subjective priors Moreover, there seem to be fundamental problems with priors as beliefs The basic idea is that a prior expresses your belief about some parameter value After gathering and analyzing data, the posterior describes how your belief should be modified

Subjective priors But this interpretation only makes sense if your prior is a reasonable characterization of what you (should) believe How could that happen? Maybe you have been doing Bayesian analyses since birth? (unlikely) Maybe you use some non-Bayesian method for establishing beliefs? If they work well, then why not continue doing that instead of Bayesian analysis? If they do not work well, then this is a poor starting point for your Bayesian analysis.

Virtues Gelman & Hennig (2016) argue that there is no one way to think about priors (and data analysis, more generally) They recommend thinking about “virtues” for data analysis Instead of objectivity, think about: transparency consensus Impartiality Correspondence to observable reality Instead of subjectivity, think about Multiple perspectives Context dependence

Transparency Clear and unambiguous definitions of concepts
Challenging for many models in the social sciences Some attempt is better than nothing: weakly informative prior is better than a non-informative prior Open planning and following agreed protocols Full communication of reasoning, procedures, spelling out of (potentially unverifiable) assumptions and potential limitations Basically, be honest about what you have done and why

Consensus Accounting for relevant knowledge and existing related work
For example, Zelano et al. (2016) reported that memory improved for items studied when inhaling compared to items studied while exhaling (d=0.86) This is much larger than the best known mnemonic strategy (d=0.49) Following generally accepted rules where possible and reasonable Do not make up a new measure of analysis method for your investigation If you have to make up something new, you need to validate it Provision of rationales for consensus and unification Debates need to have some means of reaching conclusions Scientists should be able to agree on what kinds of results would make them change their mind If not, then you are probably not having a scientific debate

Impartiaity Thorough consideration of relevant and potentially competing theories and points of view Thorough consideration and, if possible, removal of potential biases: factors that may jeopardize consensus and the intended interpretation of results Openness to criticism and exchange In terms of priors, you need to be open to the possibility that other scientists could come up with reasonable but different priors that produce different conclusions

Correspondence to reality
Clear connection of concepts and models to observables This is definitely about priors; they are not just beliefs! Clear conditions for reproduction, testing, and falsification This is more about experimental design

Multiple perspectives and Context Dependence
Recognition of dependence on specific contexts and aims Reality and facts are only accessible through individual personal experiences Different people have different skills sets and resources Honest acknowledgment of the researcher’s position, goals, experiences, and subjective point of view Different information and different viewpoints can be valuable Know your own limitations

Investigation of Stability
Consequences of alternative decisions and assumptions that could have been made in the analysis Other models that could have been considered, other comparisons that could be made Different priors may change your conclusions Variability and reproducibility of conclusions on new data New data may change your conclusions Respect the uncertainty in your data and in your model analysis

Using the virtues Every prior should have some kind of justification (a sentence) explaining why it is being used Gelman & Hennig (2016) given an example:

Informative prior? Rule of thumb:
Compare the standard deviation of the posterior to the standard deviation of the prior If the posterior standard deviation is more than 0.1 times the prior standard deviation, then the prior distribution is “informative” You should double-check that the prior makes sense for your situation

Informative prior? For the smiles and leniency data set, we ran a model with some priors on the slopes Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept SmileTypeFelt SmileTypeMiserable SmileTypeNeutral Family Specific Parameters: sigma Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). # Intercept is first Level read in (False SmileType) model2 = brm(Leniency ~ SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c(prior(normal(0, 10), class = "Intercept"), prior(normal(0, 10), class = "b"), prior(cauchy(0, 5), class = "sigma")) )

Informative prior? We can pull out the posterior information
The prior had sd=10 The ratio of posterior sd to prior sd is 0.27/10 = 0.027 We did not use an informative prior > post<-posterior_samples(model2) > FalseLeniency <- post$b_Intercept > sd(FalseLeniency) [1] > plot(density(FalseLeniency))

Informative prior? In Lecture 10, we ran a model with a (“bad”) prior for a slope ADHD data set: D60 condition Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) Population-Level Effects: Intercept DosageD DosageD DosageD Family Specific Parameters: sigma model4 = brm(CorrectResponses ~ Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(20, 1), class = "b", coef="DosageD60")) ) print(summary(model4))

Informative prior? Looking at posteriors Prior has sd=1
The ratio of posterior sd to prior sd is 0.99/1 = 0.99 Informative prior! Is it “bad”? > post<-posterior_samples(model4) > D60 <- post$b_DosageD60 > sd(D60) [1] > plot(density(D60))

Theory Informative priors often come from theory, but especially from theories that have defined mechanisms Mechanisms are fundamental to science It is one thing to say that reaction times increase with set size in a visual search task This theory allows you to identify terms of a model that can be estimated from data (e.g., look for a positive effect of set size) It is something else to say that reaction times increase with set size because of a serial process that moves attention from element to element This theory allows you to predict a roughly 2:1 slope ratio for target absent compared to target present trials

Not just empirical results
Consider superconductivity Discovered in 1911, plays an important role in fMRI How do we know superconductivity works the same in Lausanne and West Lafayette, Indiana? Mountains? Lake versus river? Brick buildings? French versus English? 7T versus 3T? It’s not just that superconductivity worked before! Every new environment is different

Mechanisms There is a theory about superconductivity that describes mechanisms that produce it Meissner effect (1930s) Cooper pairs in quantum mechanisms (1950s) This theory predicts when superconductivity works and when it does not That’s how engineering works It’s not perfect High-temperature superconductivity remains unexplained That’s where science is being done We know/believe fMRI will work in both Lausanne and West Lafayette because we understand the mechanisms that determine when superconductivity will (and will not) happen

Getting to mechanisms If we want to have successful/robust science, our long term goal is identification of mechanisms We might not get there in our lifetime Exploratory work Confirmatory work Proposing theories Testing theories It is not just successful prediction from a statistical model That can be valuable, but it is not enough

Plague Paul-Louis Simond (1898) discovered that the plague was transmitted by fleas on rats Once a mechanism is identified, it suggests what to do To reduce occurrence of the plague, reduce the number of rats and contact with rats Kill rats Keep dogs and cats Seal food containers Set rat traps Avoid rats Don’t bother with quarantining the family of an infected person Not precise predictions about the magnitude of the benefit (but they should all work to some extent)

Mechanisms in social sciences
Psychology faces challenges because there are very few proposed mechanisms Even when something seems to be a strong effect, we cannot judge when it will apply and when it will not Neuroscience and medicine has some hope because scientists naturally seek out mechanisms based on biology But there are other problems with sample sizes and costs of investigations Keep in mind that the long-term goal is to identify mechanisms, and plan studies and analyses accordingly

Conclusions Various types of priors
If you are in the social sciences, you get priors from: Defaults to help model convergence/estimation Other literature (range of plausible values) Theory There are no simple procedures for producing good priors Be transparent and be honest

PSY 626: Bayesian Statistics for Psychological Science

Similar presentations

Presentation on theme: "PSY 626: Bayesian Statistics for Psychological Science"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

PSY 626: Bayesian Statistics for Psychological Science

Similar presentations

Presentation on theme: "PSY 626: Bayesian Statistics for Psychological Science"— Presentation transcript:

Similar presentations

About project

Feedback