Comparing Bayesian and Frequentist Inference for Decision-Making

Slides:



Advertisements
Similar presentations
Decision Errors and Power
Advertisements

Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
What z-scores represent
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
BCOR 1020 Business Statistics
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
Some Introductory Statistics Terminology. Descriptive Statistics Procedures used to summarize, organize, and simplify data (data being a collection of.
T tests comparing two means t tests comparing two means.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
CHAPTER OVERVIEW Say Hello to Inferential Statistics The Idea of Statistical Significance Significance Versus Meaningfulness Meta-analysis.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
The Practice of Statistics Third Edition Chapter 11: Testing a Claim Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Examining difference: chi-squared (x 2 ). When to use Chi-Squared? Chi-squared is used to examine differences between what you actually find in your study.
Chapter 8 Introducing Inferential Statistics.
Hypothesis Tests l Chapter 7 l 7.1 Developing Null and Alternative
Comparing Two Proportions
Chapter 10: Comparing Two Populations or Groups
EXPERIMENTAL RESEARCH
Chapter 25: Paired t-Test
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Observational Study vs. Experimental Design
Understanding Results
Doing more with less: Evaluation with the Rapid Cycle Evaluation Coach
Chapter 8: Inference for Proportions
Chapter 10: Comparing Two Populations or Groups
Tests of significance: The basics
More about Tests and Intervals
What can we learn from small pilots conducted by school districts?
Chapter 9 Hypothesis Testing.
Stat 217 – Day 28 Review Stat 217.
Decision Errors and Power
Inference About Variables Part IV Review
Reasoning in Psychology Using Statistics
Statistical Inference
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Lesson Using Studies Wisely.
TESTs about a population mean
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Experimental Design: The Basic Building Blocks
CHAPTER 10 Comparing Two Populations or Groups
What are their purposes? What kinds?
Reasoning in Psychology Using Statistics
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
PSY 250 Hunter College Spring 2018
Chapter 10: Comparing Two Populations or Groups
Section 10.2 Comparing Two Means.
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter 10: Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
Chapter Ten: Designing, Conducting, Analyzing, and Interpreting Experiments with Two Groups The Psychologist as Detective, 4e by Smith/Davis.
Presentation transcript:

Comparing Bayesian and Frequentist Inference for Decision-Making Presentation at SREE March 2, 2017 Jesse Chandler · Mariel Finucane · Ignacio Martinez Alexandra Resch · Jeffrey Terziev

Motivation for the study Presented results from small pilot of ed tech product Evaluation was added on after the pilot began, underpowered Goals of our technical assistance: Help district learn whether their initiatives are effective Build district capacity for evidence use and generation Our summary: Promising results, but inconclusive Strong reactions to calling results promising given lack of statistical significance Became clear that audiences default to p<0.05 Made us reevaluate our defaults

Can schools generate useful evidence? Observations Schools need to make decisions whether or not an effectiveness study is planned or is feasible Operational decisions take priority over evaluation Even when study is desired, resource constraints affect design Study findings are often not useful to decision makers What can districts learn from everyday decisions? Is there a better way to present information to decision-makers?

This study: Do people make different decisions? Using an online platform we showed a convenience sample information about hypothetical school district decisions: a choice between two software products. In both cases there was some evidence that the new software is more effective, but participants were told that switching takes time and money. No “correct” answer – depends on how you value the costs and benefits, your risk tolerance For each scenario, we asked them What would you decide to do? How confident do you feel about your choice?

Some caveats We’re looking at a particular way of presenting frequentist results Null hypothesis testing using the defaults common in program evaluation in education Vs a particular way of presenting Bayesian results We present posterior probability of an effective treatment The appropriate methods and ideal presentation of results will vary by the application at hand We test this with a convenience sample

Why we choose a Bayesian comparison It could produce inferences we thought were better aligned with how decision-makers think: P(truth|data) vs P(data|truth) Findings could be phrased as probabilistic statements (e.g. there is an 80% chance the intervention has a positive effect on student achievement) Other possible benefits, but not examined here

The scenario You are a curriculum coordinator for a school district and need to decide whether to stick with a current technology or switch to a new technology District conducted pilot in 10 classrooms, randomly assigning 5 classrooms to each product Products cost the same, but some transition costs Each condition sees different presentation of results Is asked: Based on the data, your recommendation is to Use [existing product] Use [new product] Collect more data before deciding which software to use

Randomized crossover design Within subjects: Bayesian or frequentist version first All subjects see both Randomize which is math vs reading scenario Between subjects: text only vs text + graph

All conditions see average scores “Your data specialist tells you that on average, the students in the classrooms that used the new MathCoach software scored 10.31 points higher on the year-end tests than the students in the classrooms that used MathTech.”

Interpretation differs by condition Standard frequentist The 95% confidence interval of the difference in test scores between the two groups of classrooms… includes 0, so they cannot reject the hypothesis that the interventions have the same effect.  Bayesian There is a 77% chance that the new technology improves achievement, and a 23% chance that the new software decreases achievement. 

The standard treatment

The Bayesian treatment

The Sample Convenience sample of 280 participants gathered through Amazon Mechanical Turk Samples drawn from MTurk are relatively young, well educated (for an overview see Chandler & Shapiro, 2016) Our sample 56% male 37 years old (SD = 12) 48% have at least a college degree Asked four factual questions about the scenarios. Excluded 11 participants who had more than 2 incorrect answers.

Bayesian results are actionable

Graphs are actionable

Bayesian results increase confidence

Bayesian results are easier to understand Showing results from first scenario only – second scenario is complicated by practice effect but produces substantially the same interpretation