Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo

Slides:



Advertisements
Similar presentations
AP Statistics Introduction to Elementary Statistical Methods
Advertisements

An Active Approach to Statistical Inference using Randomization Methods Todd Swanson & Jill VanderStoep Hope College Holland, Michigan.
Concepts of Statistical Inference: A Randomization-Based Curriculum Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State.
THE INTRODUCTORY STATISTICS COURSE: A SABER TOOTH CURRICULUM? George W. Cobb Mount Holyoke College USCOTS Columbus, OH 5/20/05.
Lecture 3 Outline: Thurs, Sept 11 Chapters Probability model for 2-group randomized experiment Randomization test p-value Probability model for.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Statistics: Unlocking the Power of Data Lock 5 Hypothesis Testing: p-value STAT 250 Dr. Kari Lock Morgan SECTION 4.2 Randomization distribution p-value.
John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College
Objectives Use simulations and hypothesis testing to compare treatments from a randomized experiment.
A Fiddler on the Roof: Tradition vs. Modern Methods in Teaching Inference Patti Frazer Lock Robin H. Lock St. Lawrence University Joint Mathematics Meetings.
Introducing Concepts of Statistical Inference Beth Chance, John Holcomb, Allan Rossman Cal Poly – San Luis Obispo, Cleveland State University.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Stat 217 – Day 10 Review. Last Time Judging “spread” of a distribution “Empirical rule”: In a mound-shaped symmetric distribution, roughly 68% of observations.
Stat 301 – Day 22 Relative Risk. Announcements HW 5  Learn by Doing Lab 2-3  Evening Office hours  Friday: 10-11, 12-1.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Stat 217 – Day 20 Comparing Two Proportions The judge asked the statistician if she promised to tell the truth, the whole truth, and nothing but the truth?
Stat 512 – Day 5 Statistical significance with quantitative response variable.
Inference about Population Parameters: Hypothesis Testing
Common Core State Standards for Mathematics Making Inferences and Justifying Conclusions S-IC Math.S-IC.5. Use data from a randomized experiment to compare.
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Simulation and Resampling Methods in Introductory Statistics Michael Sullivan Joliet Junior College
Dennis Shasha From a book co-written with Manda Wilson
Review Tests of Significance. Single Proportion.
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Assessing Student Learning about Statistical Inference Beth Chance – Cal Poly, San Luis Obispo, USA John Holcomb – Cleveland State University, USA Allan.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Week 9 Testing Hypotheses. Philosophy of Hypothesis Testing Model Data Null hypothesis, H 0 (and alternative, H A ) Test statistic, T p-value = prob(T.
CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008.
Using Lock5 Statistics: Unlocking the Power of Data
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Using Activity- and Web-Based Materials in Post-Calculus Probability and Statistics Courses Allan Rossman (and Beth Chance) Cal Poly – San Luis Obispo.
Essential Statistics Chapter 131 Introduction to Inference.
INTRODUCTION TO INFERENCE BPS - 5th Ed. Chapter 14 1.
Chapter 1 Statistical Thinking What is statistics? Why do we study statistics.
Confidence intervals and hypothesis testing Petter Mostad
+ Chi Square Test Homogeneity or Independence( Association)
Experimental Psychology PSY 433 Appendix B Statistics.
Section 10.1 Estimating with Confidence AP Statistics February 11 th, 2011.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
Give your data the boot: What is bootstrapping? and Why does it matter? Patti Frazer Lock and Robin H. Lock St. Lawrence University MAA Seaway Section.
Rejecting Chance – Testing Hypotheses in Research Thought Questions 1. Want to test a claim about the proportion of a population who have a certain trait.
Welcome to MM570 Psychological Statistics
9.3/9.4 Hypothesis tests concerning a population mean when  is known- Goals Be able to state the test statistic. Be able to define, interpret and calculate.
BPS - 5th Ed. Chapter 251 Nonparametric Tests. BPS - 5th Ed. Chapter 252 Inference Methods So Far u Variables have had Normal distributions. u In practice,
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Essential Questions How do we use simulations and hypothesis testing to compare treatments from a randomized experiment?
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Chapter 9 Day 2 Tests About a Population Proportion.
Binomial Distribution and Applications. Binomial Probability Distribution A binomial random variable X is defined to the number of “successes” in n independent.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Bootstraps and Scrambles: Letting a Dataset Speak for Itself Robin H. Lock Patti Frazer Lock ‘75 Burry Professor of Statistics Cummings Professor of MathematicsSt.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo
Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
Assessing Statistical Significance ROSS 2016 Lane-Getaz.
Introducing Statistical Inference with Resampling Methods (Part 1)
Measuring Evidence with p-values
Stat 217 – Day 28 Review Stat 217.
Statistics for the Social Sciences
AP Statistics Introduction to Elementary Statistical Methods Mr. Kent
Stat 217 – Day 23 Lab 5.
Using Simulation Methods to Introduce Inference
Daniela Stan Raicu School of CTI, DePaul University
Using Simulation Methods to Introduce Inference
Significance Tests: The Basics
Introduction to Hypothesis Testing
AP Statistics Introduction to Elementary Statistical Methods Mr. Kent
Presentation transcript:

Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo

2 22 Outline 2 × 2 tables  Activity/example 1: Dolphin therapy?  Activity/example 2: Murderous nurse? Quantitative response  Activity/example 3: Sleep deprivation?  Activity/example 4: Age discrimination?  Activity/example 5: Memory study? Extensions, reflections, further reading

3 33 Example 1: Dolphin therapy? Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment Is dolphin therapy more effective than control? Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?

4 44 Example 1 (cont.) Standard approach: Could calculate test statistic, p- value from approximate sampling distribution (z, chi-square)  But technical conditions do not hold  But this would be approximate anyway  But how does this relate to what “significance” means?

5 55 Example 1 (cont.) Alternative: Simulate random assignment process many times, see how often such an extreme result occurs  Assume no treatment effect (null model)  Re-randomize 30 subjects to two groups (using cards) Assuming 13 improvers, 17 non-improvers regardless  Determine number of improvers in dolphin group Or, equivalently, difference in improvement proportions  Repeat large number of times (turn to computer)  Ask whether observed result is in tail of distribution Indicating saw a surprising result under null model Providing evidence that dolphin therapy is more effective

6 66 Example 1 (cont.) (Dolphin study)

7 77 Example 1 (cont.) Conclusion: Experimental result is statistically significant  What does that mean; what is logic behind that? Experimental result very unlikely to occur by chance alone A difference in success proportions at least as large as.467 (in favor of dolphin group) would happen in less than 2% of all possible random assignments if dolphin therapy was not effective

8 Example 1 (cont.) Exact randomization distribution  Hypergeometric distribution  Fisher’s Exact Test  p-value = =.0127

9 Example 2: Murderous Nurse? Murder trial: U.S. vs. Kristin Gilbert  Accused of giving patients fatal dose of heart stimulant  Data presented for 18 months of 8-hour shifts  Relative risk: 6.34

10 Example 2 (cont.) Structurally the same as previous example, but with one crucial difference  No random assignment to groups Observational study Allows many potential explanations other than “random chance”  Confounding variables  Perhaps she worked intensive care unit or night shift  Is statistical significance still relevant? Yes, to see if “random chance” can plausibly be ruled out as an explanation  Some statisticians disagree

11 Example 2 (cont.) Simulation results p-value: less than 1 in a billion

12 Example 2 (cont.) Incredibly unlikely to observe such a difference/ratio by chance alone, if there were no difference between the groups  But this does not prove, or perhaps even strongly suggest, guilt Observational study Allows many potential explanations other than “random chance”  Confounding variables  Perhaps she worked intensive care unit or night shift

13 Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects on cognitive functioning three days later?  21 subjects; random assignment Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?

14 Example 3 (cont.) Could calculate test statistic, p-value from approximate “sampling” distribution (if conditions are met)

15 Example 3 (cont.) Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs Start with tactile simulation using index cards  Write each “score” on a card  Shuffle the cards  Randomly deal out 11 for deprived group, 10 for unrestricted group  Calculate difference in group means  Repeat many times

16 Example 3 (cont.) Then use technology to simulate this randomization process Applet: (Randomization Tests)

17 Example 3 (cont.) Conclusion: Fairly strong evidence that sleep deprivation produces lower improvements, on average, even three days later  Justification: Experimental results as extreme as those in the actual study would be quite unlikely to occur by chance alone, if there were no effect of the sleep deprivation Easy to analyze medians instead

18 Example 3 (cont.) Exact randomization distribution: Exact p-value 2533/352,716 =.0072

19 Example 4: Age discrimination? Martin vs. Westvaco (Statistics in Action) Employee ages:  25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Fired employee ages in bold:  25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Robert Martin: 55 years old Do the data provide evidence that the firing process was not “random”  How unlikely is it that a “random” firing process would produce such a large average age?

20 Example 4 (cont.) Exact permutation distribution: Exact p-value: 6 / 120 =.05

21 Example 5: Memorizing letters You will be given a string of 30 letters  Memorize as many as you can in 20 seconds (in order) Design questions  What kind of study is this?  What kind of randomness was used in this study?  What are the variable, and what kind are they? Analysis questions  Do boxplots suggest a significant difference?  Simulate a randomization test, interpret the results

22 Extensions Matched pairs design  Randomize within pairs (e.g., by flipping coin) Comparing more than 2 groups  Alternative to chi-square, ANOVA  Same use of randomization Somewhat harder to define test statistic Regression/correlation  Randomize/permute one of the variables

23 Reflections You can do this at beginning of course  Then repeat for new scenarios with more richness  Spiraling could lead to deeper conceptual understanding Emphasizes scope of conclusions to be drawn from randomized experiments vs. observational studies Makes clear that “inference” goes beyond data in hand Very powerful, easily generalized  Flexibility in choice of test statistic (e.g. medians, odds ratio)  Generalize to more than two groups Takes advantage of modern computing power  Does not require assumptions of normality

24 Fisher on randomization tests “The statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method.” – R.A. Fisher (1936)

25 Ptolemaic curriculum? “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007)

26 Further reading Ernst (2005), Statistical Science Scheaffer and Tabor (2008), Mathematics Teacher Rossman (2008), Statistics Education Research Journal Statistics: A Guide to the Unknown (ed. R. Peck) NSF-funded project:

27 More information Please feel free to contact me  Thanks very much!