Download presentation
Presentation is loading. Please wait.
Published byClaude Wade Modified over 9 years ago
1
Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo arossman@calpoly.edu
2
2 22 Outline 2 × 2 tables Activity/example 1: Dolphin therapy? Activity/example 2: Murderous nurse? Quantitative response Activity/example 3: Sleep deprivation? Activity/example 4: Age discrimination? Activity/example 5: Memory study? Extensions, reflections, further reading
3
3 33 Example 1: Dolphin therapy? Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment Is dolphin therapy more effective than control? Core question of inference: Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?
4
4 44 Example 1 (cont.) Standard approach: Could calculate test statistic, p- value from approximate sampling distribution (z, chi-square) But technical conditions do not hold But this would be approximate anyway But how does this relate to what “significance” means?
5
5 55 Example 1 (cont.) Alternative: Simulate random assignment process many times, see how often such an extreme result occurs Assume no treatment effect (null model) Re-randomize 30 subjects to two groups (using cards) Assuming 13 improvers, 17 non-improvers regardless Determine number of improvers in dolphin group Or, equivalently, difference in improvement proportions Repeat large number of times (turn to computer) Ask whether observed result is in tail of distribution Indicating saw a surprising result under null model Providing evidence that dolphin therapy is more effective
6
6 66 Example 1 (cont.) www.rossmanchance.com/applets (Dolphin study)
7
7 77 Example 1 (cont.) Conclusion: Experimental result is statistically significant What does that mean; what is logic behind that? Experimental result very unlikely to occur by chance alone A difference in success proportions at least as large as.467 (in favor of dolphin group) would happen in less than 2% of all possible random assignments if dolphin therapy was not effective
8
8 Example 1 (cont.) Exact randomization distribution Hypergeometric distribution Fisher’s Exact Test p-value = =.0127
9
9 Example 2: Murderous Nurse? Murder trial: U.S. vs. Kristin Gilbert Accused of giving patients fatal dose of heart stimulant Data presented for 18 months of 8-hour shifts Relative risk: 6.34
10
10 Example 2 (cont.) Structurally the same as previous example, but with one crucial difference No random assignment to groups Observational study Allows many potential explanations other than “random chance” Confounding variables Perhaps she worked intensive care unit or night shift Is statistical significance still relevant? Yes, to see if “random chance” can plausibly be ruled out as an explanation Some statisticians disagree
11
11 Example 2 (cont.) Simulation results p-value: less than 1 in a billion
12
12 Example 2 (cont.) Incredibly unlikely to observe such a difference/ratio by chance alone, if there were no difference between the groups But this does not prove, or perhaps even strongly suggest, guilt Observational study Allows many potential explanations other than “random chance” Confounding variables Perhaps she worked intensive care unit or night shift
13
13 Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects on cognitive functioning three days later? 21 subjects; random assignment Core question of inference: Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)?
14
14 Example 3 (cont.) Could calculate test statistic, p-value from approximate “sampling” distribution (if conditions are met)
15
15 Example 3 (cont.) Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs Start with tactile simulation using index cards Write each “score” on a card Shuffle the cards Randomly deal out 11 for deprived group, 10 for unrestricted group Calculate difference in group means Repeat many times
16
16 Example 3 (cont.) Then use technology to simulate this randomization process Applet: www.rossmanchance.com/applets/ (Randomization Tests)
17
17 Example 3 (cont.) Conclusion: Fairly strong evidence that sleep deprivation produces lower improvements, on average, even three days later Justification: Experimental results as extreme as those in the actual study would be quite unlikely to occur by chance alone, if there were no effect of the sleep deprivation Easy to analyze medians instead
18
18 Example 3 (cont.) Exact randomization distribution: Exact p-value 2533/352,716 =.0072
19
19 Example 4: Age discrimination? Martin vs. Westvaco (Statistics in Action) Employee ages: 25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Fired employee ages in bold: 25, 33, 35, 38, 48, 55, 55, 55, 56, 64 Robert Martin: 55 years old Do the data provide evidence that the firing process was not “random” How unlikely is it that a “random” firing process would produce such a large average age?
20
20 Example 4 (cont.) Exact permutation distribution: Exact p-value: 6 / 120 =.05
21
21 Example 5: Memorizing letters You will be given a string of 30 letters Memorize as many as you can in 20 seconds (in order) Design questions What kind of study is this? What kind of randomness was used in this study? What are the variable, and what kind are they? Analysis questions Do boxplots suggest a significant difference? Simulate a randomization test, interpret the results
22
22 Extensions Matched pairs design Randomize within pairs (e.g., by flipping coin) Comparing more than 2 groups Alternative to chi-square, ANOVA Same use of randomization Somewhat harder to define test statistic Regression/correlation Randomize/permute one of the variables
23
23 Reflections You can do this at beginning of course Then repeat for new scenarios with more richness Spiraling could lead to deeper conceptual understanding Emphasizes scope of conclusions to be drawn from randomized experiments vs. observational studies Makes clear that “inference” goes beyond data in hand Very powerful, easily generalized Flexibility in choice of test statistic (e.g. medians, odds ratio) Generalize to more than two groups Takes advantage of modern computing power Does not require assumptions of normality
24
24 Fisher on randomization tests “The statistician does not carry out this very simple and very tedious process, but his conclusions have no justification beyond the fact that they agree with those which could have been arrived at by this elementary method.” – R.A. Fisher (1936)
25
25 Ptolemaic curriculum? “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007)
26
26 Further reading Ernst (2005), Statistical Science Scheaffer and Tabor (2008), Mathematics Teacher Rossman (2008), Statistics Education Research Journal Statistics: A Guide to the Unknown (ed. R. Peck) NSF-funded project: http://statweb.calpoly.edu/csi/
27
27 More information Please feel free to contact me arossman@calpoly.edu Thanks very much!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.