PSY 626: Bayesian Statistics for Psychological Science

Slides:



Advertisements
Similar presentations
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Advertisements

Copyright © 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 12 Measures of Association.
1 Psych 5500/6500 The t Test for a Single Group Mean (Part 5): Outliers Fall, 2008.
Scientific method - 1 Scientific method is a body of techniques for investigating phenomena and acquiring new knowledge, as well as for correcting and.
COURSE ON PROFESSIONALISM ASOP #17 - Expert Testimony by Actuaries.
How to write a publishable qualitative article
Critical Appraisal of an Article by Dr. I. Selvaraj B. SC. ,M. B. B. S
Statistical Methods For Engineers ChE 477 (UO Lab) Larry Baxter & Stan Harding Brigham Young University.
URBDP 591 I Lecture 3: Research Process Objectives What are the major steps in the research process? What is an operational definition of variables? What.
Gile Sampling1 Sampling. Fundamental principles. Daniel Gile
Correlation Assume you have two measurements, x and y, on a set of objects, and would like to know if x and y are related. If they are directly related,
ENGM 604: Social, Legal and Ethical Considerations for Engineering Responding to the Call of Morality: Identifying Relevant Facts, Principles and Solutions.
SCIENCE The aim of this tutorial is to help you learn to identify and evaluate scientific methods and assumptions.
9.3/9.4 Hypothesis tests concerning a population mean when  is known- Goals Be able to state the test statistic. Be able to define, interpret and calculate.
Research Methods Chapter 2.
Major Science Project Process A blueprint for experiment success.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Research Design
Virtual University of Pakistan
The Sociological Imagination
PSY 626: Bayesian Statistics for Psychological Science
How to write a publishable qualitative article
Missing data: Why you should care about it and what to do about it
Writing a sound proposal
Information criterion
How to Research Lynn W Zimmerman, PhD.
Reliability and Validity
Experimental Psychology
Unit 5: Hypothesis Testing
Bayesian data analysis
Classification of Research
Let’s continue to do a Bayesian analysis
Chapter 21 More About Tests.
Section 2: Science as a Process
Let’s do a Bayesian analysis
Philosophy of Mathematics 1: Geometry
Research in Psychology
IS Psychology A Science?
IS Psychology A Science?
PSY 626: Bayesian Statistics for Psychological Science
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
Information criterion
PSY 626: Bayesian Statistics for Psychological Science
Developing and Evaluating Theories of Behavior
PSY 626: Bayesian Statistics for Psychological Science
Bayesian style dependent tests
Stephen Hess Dr. Jeffery Heer Discussion for 4/21 CS 376.
PSY 626: Bayesian Statistics for Psychological Science
Significance Tests: The Basics
CSCD 506 Research Methods for Computer Science
PSY 626: Bayesian Statistics for Psychological Science
PSY 626: Bayesian Statistics for Psychological Science
THE NATURE OF SCIENCE.
Let’s do a Bayesian analysis
PSY 626: Bayesian Statistics for Psychological Science
Strategies to Persuade Your
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Statistical Data Analysis
The Scientific Method.
Inferential Statistics
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Copyright © Allyn & Bacon 2007
Mathematical Foundations of BME Reza Shadmehr
Model selection/averaging for subjectivists
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

PSY 626: Bayesian Statistics for Psychological Science 12/7/2018 About priors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University PSY200 Cognitive Psychology

Levels of priors Flat prior: You know nothing, often improper Super-vague but proper: N(0,1000000) Weakly informative prior: N(0, 10) Generic weakly informative prior: N(0,1) Specific informative prior: N(0.4, 0.2)

How to pick a prior Three main issues to consider Technical/computational: Some priors cause problems with Stan or MCMC methods Sometimes you pick a prior to avoid these problems Model definition: You have to define a prior, but you have little knowledge to guide you Default priors fill the gap of knowledge Model definition: Your model specifies a narrow range of parameter values Informative priors

Technical issues Using a flat prior tends to give parameter estimates similar to frequentist approaches (e.g., linear regression) Mean of posterior distribution matches regression estimates But a flat prior can introduce instability in Stan/MCMC as it searches around parameter space There are similar technical problems with hard limits to priors Instead of Uniform(0,1) use N(0.5, 0.5)

Technical issues If using non-informative/flat priors, it is appropriate to justify them as providing computational benefits Something like: Although we felt parameter A could be anywhere between 0 and 1, to promote model convergence we set its prior to be N(0.5, 0.5).

Weakly informative prior You may not know what range of values correspond to a parameter But you may want to provide constraints anyhow Shrinkage (often called regularization) offers benefits even when you don’t know the best value to shrink to (mean of the prior) More generally, you may not know what values are appropriate, but you probably know some values that are inappropriate Ridiculous values for a mean or standard deviation in a reaction time experiment (too short or too long) Family income for children in an inner-city school

Weakly informative prior The overall goal is to rule out unreasonable parameter values Less concern over having the prior specify the true parameter value For example, reaction times are rarely shorter than 150 milliseconds, even for the simplest of tasks (often shorter RTs are interpreted as “anticipatory” responses rather than reactions) N(150, 1000) is about the same as N(0, 1000)

Weakly informative prior Err on the side of broadness: N(150, 2000) A broader prior allows for more robustness in model fitting In contrast, if your reaction time task results in RT values around 1500 ms, the N(150, 1000) prior is going to have a hard time finding the posterior distribution because everything is in the tails A broader prior has a cost: loss of precision if the data is “typical” Broader posterior distribution

Remember the posterior On the other hand, the goal of a Bayesian analysis is to identify the posterior distribution of a parameter Do not be too obsessed with just the mean of the posterior distribution A narrower posterior distribution is surely better than a broad distribution

Remember the likelihood The likelihood characterizes how your data is generated Normal distribution Ex-gaussian distribution Logit model The description of the data generation process is as important as specifying the priors for the likelihood parameters The likelihood needs to be justified just as much (perhaps more so) than the priors

Complications Depending on the model/likelihood, setting broad/uninformative priors can actually be very constraining Especially for complex models an effort to use broad priors can lead to weird effects, where lots of prior “weight” is put on rare parameter values These are subtle effects: my advice is to get expert advice (which is often reflected in default settings)

Informative priors You get the most benefit from a Bayesian analysis by using informative priors More precise models are easier to test because they make more precise predictions Informative priors are not the norm in Bayesian analyses You should constantly look for the opportunity to use them They can come from many different sources

Subjective/Objective priors If you start to read the history/literature of Bayesian methods, you will see discussions about what a prior is Objective priors: expressions of models, principles, scientific consensus, or computational technicalities Subjective priors: the scientist’s “belief” Personally, I find the “belief” approach to priors to be problematic As a scientist, I hardly care about another researcher’s beliefs; nor would I directly include my beliefs into my analysis

Subjective priors Moreover, there seem to be fundamental problems with priors as beliefs The basic idea is that a prior expresses your belief about some parameter value After gathering and analyzing data, the posterior describes how your belief should be modified

Subjective priors But this interpretation only makes sense if your prior is a reasonable characterization of what you (should) believe How could that happen? Maybe you have been doing Bayesian analyses since birth? (unlikely) Maybe you use some non-Bayesian method for establishing beliefs? If they work well, then why not continue doing that instead of Bayesian analysis? If they do not work well, then this is a poor starting point for your Bayesian analysis.

Virtues Gelman & Hennig (2016) argue that there is no one way to think about priors (and data analysis, more generally) They recommend thinking about “virtues” for data analysis Instead of objectivity, think about: transparency consensus Impartiality Correspondence to observable reality Instead of subjectivity, think about Multiple perspectives Context dependence

Transparency Clear and unambiguous definitions of concepts Challenging for many models in the social sciences Some attempt is better than nothing: weakly informative prior is better than a non-informative prior Open planning and following agreed protocols Full communication of reasoning, procedures, spelling out of (potentially unverifiable) assumptions and potential limitations Basically, be honest about what you have done and why

Consensus Accounting for relevant knowledge and existing related work For example, Zelano et al. (2016) reported that memory improved for items studied when inhaling compared to items studied while exhaling (d=0.86) This is much larger than the best known mnemonic strategy (d=0.49) Following generally accepted rules where possible and reasonable Do not make up a new measure of analysis method for your investigation If you have to make up something new, you need to validate it Provision of rationales for consensus and unification Debates need to have some means of reaching conclusions Scientists should be able to agree on what kinds of results would make them change their mind If not, then you are probably not having a scientific debate

Impartiaity Thorough consideration of relevant and potentially competing theories and points of view Thorough consideration and, if possible, removal of potential biases: factors that may jeopardize consensus and the intended interpretation of results Openness to criticism and exchange In terms of priors, you need to be open to the possibility that other scientists could come up with reasonable but different priors that produce different conclusions

Correspondence to reality Clear connection of concepts and models to observables This is definitely about priors; they are not just beliefs! Clear conditions for reproduction, testing, and falsification This is more about experimental design

Multiple perspectives and Context Dependence Recognition of dependence on specific contexts and aims Reality and facts are only accessible through individual personal experiences Different people have different skills sets and resources Honest acknowledgment of the researcher’s position, goals, experiences, and subjective point of view Different information and different viewpoints can be valuable Know your own limitations

Investigation of Stability Consequences of alternative decisions and assumptions that could have been made in the analysis Other models that could have been considered, other comparisons that could be made Different priors may change your conclusions Variability and reproducibility of conclusions on new data New data may change your conclusions Respect the uncertainty in your data and in your model analysis

Using the virtues Every prior should have some kind of justification (a sentence) explaining why it is being used Gelman & Hennig (2016) given an example:

Informative prior? Rule of thumb: Compare the standard deviation of the posterior to the standard deviation of the prior If the posterior standard deviation is more than 0.1 times the prior standard deviation, then the prior distribution is “informative” You should double-check that the prior makes sense for your situation

Informative prior? For the smiles and leniency data set, we ran a model with some priors on the slopes Family: gaussian Links: mu = identity; sigma = identity Formula: Leniency ~ SmileType Data: SLdata (Number of observations: 136) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat Intercept 5.36 0.28 4.83 5.91 2298 1.00 SmileTypeFelt -0.45 0.40 -1.24 0.33 2173 1.00 SmileTypeMiserable -0.44 0.39 -1.19 0.32 2376 1.00 SmileTypeNeutral -1.24 0.39 -2.03 -0.48 2427 1.00 Family Specific Parameters: sigma 1.64 0.10 1.46 1.86 2487 1.00 Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1). # Intercept is first Level read in (False SmileType) model2 = brm(Leniency ~ SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c(prior(normal(0, 10), class = "Intercept"), prior(normal(0, 10), class = "b"), prior(cauchy(0, 5), class = "sigma")) )

Informative prior? We can pull out the posterior information The prior had sd=10 The ratio of posterior sd to prior sd is 0.27/10 = 0.027 We did not use an informative prior > post<-posterior_samples(model2) > FalseLeniency <- post$b_Intercept > sd(FalseLeniency) [1] 0.2765206 > plot(density(FalseLeniency))

Informative prior? In Lecture 10, we ran a model with a (“bad”) prior for a slope ADHD data set: D60 condition Family: gaussian Links: mu = identity; sigma = identity Formula: CorrectResponses ~ Dosage + (1 | SubjectID) Data: ATdata (Number of observations: 96) Samples: 3 chains, each with iter = 2000; warmup = 200; thin = 2; total post-warmup samples = 2700 Group-Level Effects: ~SubjectID (Number of levels: 24) Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat sd(Intercept) 9.30 1.69 6.51 13.13 1369 1.00 Population-Level Effects: Intercept 33.18 2.30 28.53 37.62 1612 1.00 DosageD15 6.28 2.10 2.20 10.58 2591 1.00 DosageD30 10.93 2.12 6.91 15.13 2429 1.00 DosageD60 17.61 1.00 15.68 19.56 2318 1.00 Family Specific Parameters: sigma 8.07 0.75 6.78 9.66 1929 1.00 model4 = brm(CorrectResponses ~ Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = c( prior(normal(20, 1), class = "b", coef="DosageD60")) ) print(summary(model4))

Informative prior? Looking at posteriors Prior has sd=1 The ratio of posterior sd to prior sd is 0.99/1 = 0.99 Informative prior! Is it “bad”? > post<-posterior_samples(model4) > D60 <- post$b_DosageD60 > sd(D60) [1] 0.9978903 > plot(density(D60))

Theory Informative priors often come from theory, but especially from theories that have defined mechanisms Mechanisms are fundamental to science It is one thing to say that reaction times increase with set size in a visual search task This theory allows you to identify terms of a model that can be estimated from data (e.g., look for a positive effect of set size) It is something else to say that reaction times increase with set size because of a serial process that moves attention from element to element This theory allows you to predict a roughly 2:1 slope ratio for target absent compared to target present trials

Not just empirical results Consider superconductivity Discovered in 1911, plays an important role in fMRI How do we know superconductivity works the same in Lausanne and West Lafayette, Indiana? Mountains? Lake versus river? Brick buildings? French versus English? 7T versus 3T? It’s not just that superconductivity worked before! Every new environment is different

Mechanisms There is a theory about superconductivity that describes mechanisms that produce it Meissner effect (1930s) Cooper pairs in quantum mechanisms (1950s) This theory predicts when superconductivity works and when it does not That’s how engineering works It’s not perfect High-temperature superconductivity remains unexplained That’s where science is being done We know/believe fMRI will work in both Lausanne and West Lafayette because we understand the mechanisms that determine when superconductivity will (and will not) happen

Getting to mechanisms If we want to have successful/robust science, our long term goal is identification of mechanisms We might not get there in our lifetime Exploratory work Confirmatory work Proposing theories Testing theories It is not just successful prediction from a statistical model That can be valuable, but it is not enough

Plague Paul-Louis Simond (1898) discovered that the plague was transmitted by fleas on rats Once a mechanism is identified, it suggests what to do To reduce occurrence of the plague, reduce the number of rats and contact with rats Kill rats Keep dogs and cats Seal food containers Set rat traps Avoid rats Don’t bother with quarantining the family of an infected person Not precise predictions about the magnitude of the benefit (but they should all work to some extent)

Mechanisms in social sciences Psychology faces challenges because there are very few proposed mechanisms Even when something seems to be a strong effect, we cannot judge when it will apply and when it will not Neuroscience and medicine has some hope because scientists naturally seek out mechanisms based on biology But there are other problems with sample sizes and costs of investigations Keep in mind that the long-term goal is to identify mechanisms, and plan studies and analyses accordingly

Conclusions Various types of priors If you are in the social sciences, you get priors from: Defaults to help model convergence/estimation Other literature (range of plausible values) Theory There are no simple procedures for producing good priors Be transparent and be honest