John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College

Slides:



Advertisements
Similar presentations
Implementation and Order of Topics at Hope College.
Advertisements

An Active Approach to Statistical Inference using Randomization Methods Todd Swanson & Jill VanderStoep Hope College Holland, Michigan.
Concepts of Statistical Inference: A Randomization-Based Curriculum Allan Rossman, Beth Chance, John Holcomb Cal Poly – San Luis Obispo, Cleveland State.
THE INTRODUCTORY STATISTICS COURSE: A SABER TOOTH CURRICULUM? George W. Cobb Mount Holyoke College USCOTS Columbus, OH 5/20/05.
Lecture 3 Outline: Thurs, Sept 11 Chapters Probability model for 2-group randomized experiment Randomization test p-value Probability model for.
Statistical Issues in Research Planning and Evaluation
Using Simulations to Teach Statistical Inference Beth Chance, Allan Rossman (Cal Poly) ICTCM
Stat 301 – Day 17 Tests of Significance. Last Time – Sampling cont. Different types of sampling and nonsampling errors  Can only judge sampling bias.
Stat 217 – Day 3 Topic 3: Drawing Conclusions. Last Time – Drawing Conclusions Issue #1: Do I believe the sample I have is representative of the population.
CHAPTER 11 Inference for Distributions of Categorical Data
Teaching the BIG IDEAS of Statistics in the Introductory Course Christine Franklin University of Georgia Department of Statistics.
Stat 512 Day 9: Confidence Intervals (Ch 5) Open Stat 512 Java Applets page.
Stat 512 – Lecture 12 Two sample comparisons (Ch. 7) Experiments revisited.
Stat 512 – Lecture 13 Chi-Square Analysis (Ch. 8).
© 2013 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Introductory Statistics: Exploring the World through.
Common Core State Standards for Mathematics Making Inferences and Justifying Conclusions S-IC Math.S-IC.5. Use data from a randomized experiment to compare.
Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Simulation and Resampling Methods in Introductory Statistics Michael Sullivan Joliet Junior College
Using Simulation Methods to Introduce Inference Kari Lock Morgan Duke University In collaboration with Robin Lock, Patti Frazer Lock, Eric Lock, Dennis.
Assessing Student Learning about Statistical Inference Beth Chance – Cal Poly, San Luis Obispo, USA John Holcomb – Cleveland State University, USA Allan.
Aiming to Improve Students' Statistical Reasoning: An Introduction to AIMS Materials Bob delMas, Joan Garfield, and Andy Zieffler University of Minnesota.
CAUSE Webinar: Introducing Math Majors to Statistics Allan Rossman and Beth Chance Cal Poly – San Luis Obispo April 8, 2008.
Using Lock5 Statistics: Unlocking the Power of Data
Chapter 8 Hypothesis Testing I. Chapter Outline  An Overview of Hypothesis Testing  The Five-Step Model for Hypothesis Testing  One-Tailed and Two-Tailed.
Significance Tests: THE BASICS Could it happen by chance alone?
Using Activity- and Web-Based Materials in Post-Calculus Probability and Statistics Courses Allan Rossman (and Beth Chance) Cal Poly – San Luis Obispo.
Introducing Statistical Inference with Randomization Tests Allan Rossman Cal Poly – San Luis Obispo
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Day 3: Sampling Distributions. CCSS.Math.Content.HSS-IC.A.1 Understand statistics as a process for making inferences about population parameters based.
+ Chi Square Test Homogeneity or Independence( Association)
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
AP STATISTICS LESSON 10 – 2 DAY 2 MORE DETAIL: STATING HYPOTHESES.
Not in FPP Bayesian Statistics. The Frequentist paradigm Defines probability as a long-run frequency independent, identical trials Looks at parameters.
PANEL: Rethinking the First Statistics Course for Math Majors Joint Statistical Meetings, 8/11/04 Allan Rossman Beth Chance Cal Poly – San Luis Obispo.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Chapter 8 Hypothesis Testing I. Significant Differences  Hypothesis testing is designed to detect significant differences: differences that did not occur.
AP Statistics Section 11.1 B More on Significance Tests.
+ Using StatCrunch to Teach Statistics Using Resampling Techniques Webster West Texas A&M University.
Early Inference: Using Randomization to Introduce Hypothesis Tests Kari Lock, Harvard University Eric Lock, UNC Chapel Hill Dennis Lock, Iowa State Joint.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
+ Chapter 5 Overview 5.1 Introducing Probability 5.2 Combining Events 5.3 Conditional Probability 5.4 Counting Methods 1.
Section 10.2: Tests of Significance Hypothesis Testing Null and Alternative Hypothesis P-value Statistically Significant.
Synthesis and Review 2/20/12 Hypothesis Tests: the big picture Randomization distributions Connecting intervals and tests Review of major topics Open Q+A.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 11 Inference for Distributions of Categorical.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Today: Hypothesis testing p-value Example: Paul the Octopus In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Is.
CHAPTER 15: Tests of Significance The Basics ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 9: Testing a Claim Section 9.2 Tests About a Population Proportion.
Teaching Introductory Statistics with Simulation-Based Inference Allan Rossman and Beth Chance Cal Poly – San Luis Obispo
Using Simulation to Introduce Concepts of Statistical Inference Allan Rossman Cal Poly – San Luis Obispo
+ Chapter 9 Testing a Claim 9.1Significance Tests: The Basics 9.2Tests about a Population Proportion 9.3Tests about a Population Mean.
Simulation-based inference beyond the introductory course Beth Chance Department of Statistics Cal Poly – San Luis Obispo
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Simulation Based Inference for Learning
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
What Is a Test of Significance?
Unit 5: Hypothesis Testing
Chapter 8: Inference for Proportions
Two-sided p-values (1.4) and Theory-based approaches (1.5)
Section 10.2: Tests of Significance
Stat 217 – Day 23 Lab 5.
Significance Tests: The Basics
Some Key Ingredients for Inferential Statistics
Presentation transcript:

John Holcomb - Cleveland State University Beth Chance, Allan Rossman, Emily Tietjen - Cal Poly State University George Cobb - Mount Holyoke College Introducing Concepts of Statistical Inference Via Randomization Tests

Introduction 2005 USCOTS Cobb proposed the idea of a new introductory curriculum based on randomization methods. Cobb (2007) “Our curriculum is needlessly complicated because we put the normal distribution, as an approximate sampling distribution for the mean, instead of putting the core logic of inference at the center.” Why? “Tyranny of the computable.” How????

Curriculum Goal Use technology simulations to lead students to develop an understanding of the concepts of statistical significance and p-values. Focus on statistical process. Repeated exposure throughout course.

Learning Process Research study and data. Tactile simulation and class discussion of results. Simulation using a tailored applet. Empirical p-value and discussion. Conclusion in context.

Classroom Activities (Example 1): Naughty or Nice? Hamlin, Wynn, and Bloom (Nature, 2007) Videos available at Do the experimental results (14 out of 16) provide convincing evidence that infants have a genuine preference for the helper toy rather than the result occurring by chance alone? Inference for a binomial proportion.

How likely such an extreme result would be under the null model of no preference? Begin with each student flipping a coin 16 times. Combine results from class with a dotplot. Move to applet to simulate tossing 16 coins for 1000 repetitions. Determine the proportion of repetitions14 or more flips were heads. Randomization approach

Example 2: Sleep Deprivation? Stickgold, James, & Hobson, (Nature Neuroscience, 2000) 21 subjects randomly assigned to one of two groups: sleep deprived group and unrestricted sleep. Both groups then allowed as much sleep as wanted on the following two nights. All subjects re-tested on the third day.

Example 2: Sleep Deprivation? Randomized experiment. Compared mean improvement in response time to a visual stimulus on a computer screen. Unrestricted sleep group: ms. Sleep deprived group: 3.90 ms. Inference for the difference of two means from independent samples.

Randomization approach How likely such an extreme result (Diff=15.92 ms) would be under the null model of no treatment effect? 21 improvement scores written on index cards. Randomly “deal” cards to two groups and record difference of group means. Combine results of the class with a dotplot.

Randomization approach Use an applet to simulate randomization process 1000 times and explore the frequency of a difference in means of or more extreme by chance alone.

Research Goal Make evidence-based curriculum decisions. Implement and collect data from small-scale classroom experiments. Identify issues that create difficulties for student understanding. Formulate questions about the most pedagogically effective way to implement this approach.

Curriculum Design Issue #1: First Activity Should the first example that students encounter be one where the result is statistically significant or one where the result is not significant at all?

Advantages Students may find it easier to judge when an observed result is surprising under a null model. Starting with an insignificant result may reinforce students’ natural inclinations to regard a p-value as the probability the null model is true.

Classroom Experiment Four sections of introductory statistics at Cal Poly. Half the students told 9 of the 16 infants chose the helper toy (non-significant result group). Half the students told 14 of the 16 infants chose the helper toy (significant result group). Students given the activity and told to work in pairs.

Classroom Experiment Two instructors: one randomized across sections and the other randomized by individuals. Follow-up Quiz questions the next class period.

When I conducted the simulation using 1,000,000 repetitions, I obtained a proportion (empirical p-value) of.402 (.002). Based on this result, which assumes the null model of no genuine preference, the actual result obtained by the researchers,9 of 16 (14 of 16), choosing the helper is a) impossibleb) very surprising c) somewhat surprisingd) not at all surprising Question 1 9 of of 16

Results 60.6% (n = 71) students in the “9 of 16” group answered correctly. 77.5% (n = 71) students in the “14 of 16” group answered correctly. Two-sided p-value is approximately Interpretation: students find it easier to interpret a surprising outcome than a non- surprising one.

Question 2 Fill in the blanks in the following sentence to interpret this proportion from part (l). This proportion says that in about (1) % of ___(2)___, the researchers would get __(3)___ who choose the helper toy, assuming that ________(4)______.

This proportion says that in about (1) % of ___(2)___, the researchers would get __(3)___ who choose the helper toy, assuming that ______(4)______. (1) 40.2% vs. 0.2%: Non-significant group performed better, but may be an artifact of the difficulty of expressing as a percentage. Results 9 of 1614 of 16

Results This proportion says that in about (1) % of ___(2)___, the researchers would get __(3)___ who choose the helper toy, assuming that ______(4)______. (2) Both groups about 50% correctly answering 1,000,000 repetitions.

Results This proportion says that in about (1) % of ___(2)___, the researchers would get __(3)___ who choose the helper toy, assuming that ______(4)______. (3) “14 of 16” group performed better (54.2% vs. 38.9%, p-value =.066) in citing the observed result or more extreme.

Results This proportion says that in about (1) % of ___(2)___, the researchers would get __(3)___ who choose the helper toy, assuming that ______(4)______. (4) Both groups about 76% correctly answering “assuming no preference.”

Interpretation Although students seem to equally understand the null model, they did differ slightly in realizing what the simulation told them.

Question 3 Based on your answer to (1) and (2), which of the following would you consider the most appropriate conclusion from this study? (choose one) (a) These 16 infants have no genuine preference and therefore there’s no reason to doubt that the researchers’ result is different from.5 just by random chance. (b) The researchers’ results would be very surprising if there was no genuine preference for the helper and therefore I believe there is a preference. (c) There is a large (small) chance that there is a genuine preference for the helper.

Results Approximately 77% in each group answered the correct answer: (a) for the “9 of 16” group and (b) for the “14 of 16.”

Curriculum Design Issue #2: Tactile Simulations We have suggested beginning each simulation with a tactile version before turning to technology, but does the tactile aspect really add value to the students’ learning experience?

Should we do tactile first? Potential advantages: Students are in a better position to understand what the technology simulation is doing if they have first performed a tactile simulation themselves. We use applets that mirror the hands-on activity as closely as possible so the technology is not a “black box.” Potential disadvantage: tactile simulations take valuable class time.

Classroom Experiment Randomly assigned 43 students to two treatment groups. Class topic was investigating the sampling distribution of a single proportion.

Classroom Experiment Tactile group (20 students) students each determined the sample proportion of orange candies among 25 actual Reese’s Pieces. Students created a class dotplot of sample proportions. Then turned to applet for simulation of many samples of size 25.

Classroom Experiment Non-tactile group (23) immediately moved to simulation with applet without working with Reese’s Pieces and creating pooled dotplot.

Results Students in both groups given a quiz the next day. Five questions that involved a single sample proportion. Independent and blinded statistics instructor graded the quizzes. No statistically significant differences were found.

Results Interesting outcome was that both groups seemed to finish the activity in about the same amount of time. May suggest the tactile aspect does not take more time and does not hinder learning.

Results In a follow-up questionnaire to a different class, approximately 50% of the students indicated the tactile component of the activity was helpful.

Curriculum Design Issue #3 Should the first activity that students encounter focus on inference for a single proportion, as in the “Naughty or Nice” example, or on a comparison of two groups, as in the “Sleep Deprivation” example?

Curriculum Design Issue #4 In the case of a simulation involving a single proportion, as with the “Naughty or Nice” example, how should the tactile simulation be conducted? 16 students each tossing one coin Each student tosses 1 coin 16 times Each student tosses 16 coins

Curriculum Design Issue #5 In the case of a randomization test for a 2×2 table, what statistic should the students calculate in their simulations? Difference in conditional proportions of success. Ratio of conditional proportions (Relative risk). Number of successes in group A.

Curriculum Design Issue #6 How much of the work should the technology do automatically (calculating empirical p-values)? Push a button (e.g., based on two-way table). Specify observed result to count beyond.

Curriculum Design Issue #7 Should the type of randomness used in the simulation always reflect the role of randomness used in the actual data collection process? Randomizing (assignment) when data arises from experiments. Bootstrapping and/or sampling from finite populations when data arises from samples. Always randomizing group membership.

Summary Feasibility of random assignment in classroom experiments. Focused and relevant research questions. Direct link between research and classroom practice. Assessment instruments described in Holcomb, Chance, Rossman, and Cobb (2010, Proceedings).

References Cobb, George W. (2007). The Introductory Statistics Course: A Ptolemaic Curriculum? Technology Innovations in Statistics Education, 1(1), Holcomb, J., Chance, B., Rossman, A., & Cobb, G., (2010). Assessing student learning about statistical inference, Proceedings of the 8 th International Conference on Teaching Statistics. Hamlin, J. K., Wynn, K., & Bloom, P. (2007). Social evaluation by preverbal infants. Nature, 450, Stickgold, R., James, L., & Hobson, J.A. (2000). Visual discrimination learning requires post-training sleep. Nature Neuroscience, 2, Thanks to National Science Foundation DUE/CCLI #