Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo

Slides:



Advertisements
Similar presentations
Implementation and Order of Topics at Hope College.
Advertisements

An Active Approach to Statistical Inference using Randomization Methods Todd Swanson & Jill VanderStoep Hope College Holland, Michigan.
Concepts of Statistical Inference: A Randomization-Based Curriculum Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College
Section 9.1 ~ Fundamentals of Hypothesis Testing Introduction to Probability and Statistics Ms. Young.
Chapter 10: Hypothesis Testing
Stat 301 – Day 17 Tests of Significance. Last Time – Sampling cont. Different types of sampling and nonsampling errors  Can only judge sampling bias.
CHAPTER 11 Inference for Distributions of Categorical Data
Introducing Concepts of Statistical Inference Beth Chance, John Holcomb, Allan Rossman Cal Poly – San Luis Obispo, Cleveland State University.
Stat 301 – Day 28 Review. Last Time - Handout (a) Make sure you discuss shape, center, and spread, and cite graphical and numerical evidence, in context.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
Stat 217 – Day 15 Statistical Inference (Topics 17 and 18)
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Inference about Population Parameters: Hypothesis Testing
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
Simulation and Resampling Methods in Introductory Statistics Michael Sullivan Joliet Junior College
How Can We Test whether Categorical Variables are Independent?
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Assessing Student Learning about Statistical Inference Beth Chance – Cal Poly, San Luis Obispo, USA John Holcomb – Cleveland State University, USA Allan.
Chapter 9 Comparing More than Two Means. Review of Simulation-Based Tests  One proportion:  We created a null distribution by flipping a coin, rolling.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008.
Using Lock5 Statistics: Unlocking the Power of Data
Statistics: Unlocking the Power of Data Lock 5 Afternoon Session Using Lock5 Statistics: Unlocking the Power of Data Patti Frazer Lock University of Kentucky.
CHAPTER 18: Inference about a Population Mean
Using Activity- and Web-Based Materials in Post-Calculus Probability and Statistics Courses Allan Rossman (and Beth Chance) Cal Poly – San Luis Obispo.
AP STATISTICS LESSON 10 – 2 DAY 1 TEST OF SIGNIFICANCE.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo
Testing Theories: The Problem of Sampling Error. The problem of sampling error It is often the case—especially when making point predictions—that what.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Confidence intervals and hypothesis testing Petter Mostad
Hypothesis Testing. The 2 nd type of formal statistical inference Our goal is to assess the evidence provided by data from a sample about some claim concerning.
Sampling Distribution Models Chapter 18. Toss a penny 20 times and record the number of heads. Calculate the proportion of heads & mark it on the dot.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Section 10.1 Estimating with Confidence AP Statistics February 11 th, 2011.
Section 3.3: The Story of Statistical Inference Section 4.1: Testing Where a Proportion Is.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
Significance Tests Section Cookie Monster’s Starter Me like Cookies! Do you? You choose a card from my deck. If card is red, I give you coupon.
Two-Sample Proportions Inference. Sampling Distributions for the difference in proportions When tossing pennies, the probability of the coin landing.
Stat 31, Section 1, Last Time Distribution of Sample Means –Expected Value  same –Variance  less, Law of Averages, I –Dist’n  Normal, Law of Averages,
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Tests of Significance We use test to determine whether a “prediction” is “true” or “false”. More precisely, a test of significance gets at the question.
Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright © 2009 Pearson Education, Inc. Chapter 11 Understanding Randomness.
Review Statistical inference and test of significance.
10.2 Comparing Two Means Objectives SWBAT: DESCRIBE the shape, center, and spread of the sampling distribution of the difference of two sample means. DETERMINE.
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
AP Test Practice. A student organization at a university is interested in estimating the proportion of students in favor of showing movies biweekly instead.
Using Randomization Methods to Build Conceptual Understanding in Statistical Inference: Day 1 Lock, Lock, Lock, Lock, and Lock Minicourse – Joint Mathematics.
Assessing Statistical Significance ROSS 2016 Lane-Getaz.
Introducing Statistical Inference with Resampling Methods (Part 1)
What Is a Test of Significance?
Unit 5: Hypothesis Testing
11/16/2016 Examples for Implementing Revised GAISE Guidelines ( Allan J. Rossman Dept of Statistics.
Making Data-Based Decisions
Stat 217 – Day 17 Review.
Significance Tests: The Basics
Presentation transcript:

Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo

Advertisement I will present a longer, more interactive version of this in a workshop on Saturday afternoon in Lincoln room  Lunch provided!  Thanks to John Wiley and Sonss Rossman Northwest Two-Year College Math Conf 2

3 3 Outline Who are you? Overview, motivation Four examples Advantages/merits Implementation suggestions Assessment suggestions Resources Q&A Rossman

Who are you? How many years have you been teaching?  < 1 year  1-3 years  4-8 years  8-15 years  > 15 years Northwest Two-Year College Math Conf 4Rossman

Who are you? How many years have you been teaching statistics?  Never  1-3 years  4-8 years  8-15 years  > 15 years Northwest Two-Year College Math Conf 5Rossman

Who are you? What is your background in statistics?  No formal background  A course or two  Several courses but no degree  Undergraduate degree in statistics  Graduate degree in statistics  Other Northwest Two-Year College Math Conf 6Rossman

Who are you? Have you used simulation in teaching statistics?  Never  A bit, to demonstrate probability ideas  Somewhat, to demonstrate sampling distributions  A great deal, as an inference tool as well as for pedagogical demonstrations Northwest Two-Year College Math Conf 7Rossman

88 Motivation “Ptolemy’s cosmology was needlessly complicated, because he put the earth at the center of his system, instead of putting the sun at the center. Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, at the center of our curriculum, instead of putting the core logic of inference at the center.” – George Cobb (TISE, 2007) Northwest Two-Year College Math Conf Rossman

99 Example 1: Helper/hinderer? Sixteen pre-verbal infants were shown two videos of a toy trying to climb a hill  One where a “helper” toy pushes the original toy up  One where a “hinderer” toy pushes the toy back down Infants were then presented with the two toys from the videos  Researchers noted which toy then infant chose to play with r-Hinderer.html r-Hinderer.html Northwest Two-Year College Math Conf Rossman

10 Example 1: Helper/hinderer? Data: 14 of the 16 infants chose the “helper” toy Two possible explanations  Infants choose randomly, no genuine preference, researchers just got lucky  Infants have a genuine preference for the helper toy Core question of inference:  Is such an extreme result unlikely to occur by chance (random choice) alone …  … if there were no genuine preference (null model)? Northwest Two-Year College Math Conf Rossman

11 Analysis options Could use the normal approximation to the binomial, but sample size is too small for CLT Could use a binomial probability calculation We prefer a simulation approach  To illustrate “how often would we get a result like this just by random chance?”  Starting with tactile simulation Northwest Two-Year College Math Conf Rossman

12 Strategy Students flip a fair coin 16 times  Count number of heads, representing choices of helper and hinderer toys  Under the null model of no genuine preference Repeat several times, combine results  See how surprising it is to get 14 or more heads even with “such a small sample size”  Approximate (empirical) p-value Turn to applet for large number of repetitions: (One Proportion) Northwest Two-Year College Math Conf Rossman

13 Results  Pretty unlikely to obtain 14 or more heads in 16 tosses of a fair coin, so …  Pretty strong evidence that pre-verbal infants do have a genuine preference for helper toy and were not just choosing at random Northwest Two-Year College Math Conf Rossman

Follow-up activity Facial prototyping  Who is on the left – Bob or Tim? Northwest Two-Year College Math Conf 14Rossman

Follow-up activity Facial prototyping  Does our sample result provide convincing evidence that people have a genuine tendency to assign the name Tim to the face on the left?  How can we use simulation to investigate this question?  What conclusion would you draw?  Explain reasoning process behind conclusion Northwest Two-Year College Math Conf 15Rossman

16 Example 2: Dolphin therapy? Subjects who suffer from mild to moderate depression were flown to Honduras, randomly assigned to a treatment Is dolphin therapy more effective than control? Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)? Northwest Two-Year College Math Conf Rossman

17 Some approaches Could calculate test statistic, p-value from approximate sampling distribution (z, chi-square)  But it’s approximate  But conditions might not hold  But how does this relate to what “significance” means? Could conduct Fisher’s Exact Test  But there’s a lot of mathematical start-up required  But that’s still not closely tied to what “significance” means Even though this is a randomization test Northwest Two-Year College Math Conf Rossman

18 Alternative approach Simulate random assignment process many times, see how often such an extreme result occurs  30 index cards representing 30 subjects  Assume no treatment effect (null model) 13 improver cards, 17 non-improver cards  Re-randomize 30 subjects to two groups of 15 and 15  Determine number of improvers in dolphin group Or, equivalently, difference in improvement proportions  Repeat large number of times (turn to computer)  Ask whether observed result is in tail of distribution Northwest Two-Year College Math Conf ? ? Rossman

19 Analysis (Two Proportions) Northwest Two-Year College Math Conf 19Rossman

20 Conclusion Experimental result is statistically significant  And what is the logic behind that? Observed result very unlikely to occur by chance (random assignment) alone (if dolphin therapy was not effective) Providing evidence that dolphin therapy is more effective Northwest Two-Year College Math Conf Rossman

21 Example 3: Lingering sleep deprivation? Does sleep deprivation have harmful effects on cognitive functioning three days later?  21 subjects; random assignment Core question of inference:  Is such an extreme difference unlikely to occur by chance (random assignment) alone (if there were no treatment effect)? Northwest Two-Year College Math Conf Rossman

22 One approach Calculate test statistic, p-value from approximate sampling distribution Northwest Two-Year College Math Conf Rossman

23 Another approach Simulate randomization process many times under null model, see how often such an extreme result (difference in group means) occurs Northwest Two-Year College Math Conf Rossman

Example 4: Draft lottery Rossman Northwest Two-Year College Math Conf 24

Closer look Rossman Northwest Two-Year College Math Conf 25 r =

Familiar refrain How often would such an extreme result (r 0.226) occur by chance alone from a fair, random lottery? Simulate! Rossman Northwest Two-Year College Math Conf 26

Simulation result Such an extreme result would virtually never occur from fair, random lottery Overwhelming evidence that lottery used was not random Rossman Northwest Two-Year College Math Conf 27

28 Advantages You can do this from beginning of course! Emphasizes entire process of conducting statistical investigations to answer real research questions  From data collection to inference in one day  As opposed to disconnected blocks of data analysis, then data collection, then probability, then statistical inference Leads to deeper understanding of concepts such as statistical significance, p-value, confidence Very powerful, easily generalized tool  Flexibility in choice of test statistic (e.g. medians, odds ratio)  Generalize to more than two groups Northwest Two-Year College Math Conf Rossman

Implementation suggestions Begin every example/activity with fundamental questions about the study/data  Observational units?  Variables?  Types (cat/quant) and roles (expl/resp) of variables  Observational study or experiment?  Random sampling?  Random assignment? Rossman Northwest Two-Year College Math Conf 29

Implementation suggestions Emphasize four pillars of inference  Is there a significant effect/difference?  How large is it?  To what population can you generalize?  Can you draw a cause/effect conclusion? Notice that last two questions highlight distinction between random sampling and random assignment Rossman Northwest Two-Year College Math Conf 30

31 Implementation suggestions What about normal-based methods: why? Do not ignore them!  A common shape often arises for empirical randomization/sampling distributions Duh!  Students will see t-tests in other courses, research literature  Process of standardization has inherent value  Gain intuition through formulas Northwest Two-Year College Math Conf 31Rossman

Implementation suggestions What about normal-based methods: how? Introduce after students have gained experience with randomization-based methods As a prediction of how simulation results would turn out Focus on standard deviation of statistic (standard error) Northwest Two-Year College Math Conf 32Rossman

33 Implementation suggestions What about interval estimation? Two possible simulation-based approaches  Invert test Test “all” possible values of parameter, see which do not put observed result in tail Easy enough (but tedious) with one-proportion situation (sliders), but not as obvious how to do this with comparing two proportions  Estimate +/- margin-of-error Could estimate margin-of-error with simulated randomization distribution Rough confidence interval as statistic + 2×(SD of statistic) Northwest Two-Year College Math Conf 33Rossman

34 Implementation suggestions Can we introduce SBI gradually? Yes! One class period:  Use helper/hinderer activity to introduce concepts of statistical significance, p-value, could this have happened by random chance alone Two class periods:  Also use dolphin therapy activity to introduce inference for comparing two groups (chance = random assignment) Three class periods:  Also use sleep deprivation activity prior to two-sample t- tests (for quantitative response) Four class periods:  Also use draft lottery activity (two quantitative variables) Northwest Two-Year College Math Conf 34Rossman

Assessment suggestions Quick assessment of understanding of class activity  What did the cards represent?  What did shuffling and dealing the cards represent?  What implicit assumption about the two groups did the shuffling of cards represent?  What observational units were represented by the dots on the dotplot?  Why did we count the number of repetitions with 10 or more “successes” (that is, why 10 and why “or more”)? 35 Northwest Two-Year College Math Conf 35Rossman

36 Assessment suggestions Conceptual understanding of logic of inference  Interpret p-value in context: Probability of observed data, or more extreme, under randomness hypothesis, if null model is true  Summarize conclusion in context, and explain reasoning process  Apply to new studies, new scenarios Define null model, design simulation, draw conclusion More complicated scenarios (e.g., compare 3 groups), new statistics (e.g., relative risk) Northwest Two-Year College Math Conf 36Rossman

37 Assessment suggestions Multiple-choice example (not simulation-based) Suppose one study finds that 30% of women sampled dream in color, compared to 20% of men. Study A sampled 100 people of each sex, whereas Study B sampled 40 people of each sex. Which study would provide stronger evidence that there is a genuine difference between men and women on this issue? A. Study A B. Study B C. The strength of evidence would be the same for these two studies Northwest Two-Year College Math Conf 37Rossman

38 Assessment suggestions Free response example (simulation-based) In a recent study, researchers presented young children (aged 5 to 8 years) with a choice between two toy characters who were offering stickers. One character was described as mean, and the other was described as nice. The mean character offered two stickers, and the nice character offered one sticker. Researchers wanted to investigate whether infants would tend to select the nice character over the mean character, despite receiving fewer stickers. They found that 16 of the 20 children in the study selected the nice character. Northwest Two-Year College Math Conf 38Rossman

39 Assessment suggestions Free response example (simulation-based) Describe (in words) the null model/hypothesis in this study. Suppose that you were to conduct a simulation analysis of this study to investigate whether the observed result provides strong evidence that children genuinely prefer the nice toy with one sticker over the mean toy with two stickers. Indicate what you would enter for the following three inputs:  Probability of heads: _____  Number of tosses: _____  Number of repetitions: _____ Northwest Two-Year College Math Conf 39Rossman

40 Assessment suggestions Free response example (simulation-based) One of the following graphs was produced from a correct simulation analysis. The other two were produced from incorrect simulation analyses. Circle the correct one. Which of the following is closest to the p-value for this study?  5.0,.50,.05,.005 Northwest Two-Year College Math Conf 40Rossman

41 Assessment suggestions Free response example (simulation-based) Write an interpretation of this p-value in the context of this study (probability of what, assuming what?). Summarize your conclusion from this research study and simulation analysis. Northwest Two-Year College Math Conf 41Rossman

Resources Northwest Two-Year College Math Conf 42Rossman

Resources Northwest Two-Year College Math Conf 43Rossman

Resources Simulation-based inference blog: ISI applets: Statkey app: lock5stat.com/statkey Northwest Two-Year College Math Conf 44Rossman

Thanks! Want to learn more?  Workshop (with lunch) on Saturday afternoon in Lincoln room, thanks to John Wiley and Sons   Northwest Two-Year College Math Conf 45Rossman