What Students Learn (and Don’t Learn) about Inferential Reasoning in Introductory Statistics Courses 2014 Joint Statistical Meetings (JSM) Boston, MA Sharon.

Slides:



Advertisements
Similar presentations
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
Advertisements

1 Hypothesis Testing William P. Wattles, Ph.D. Psychology 302.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Hypothesis Testing Steps of a Statistical Significance Test. 1. Assumptions Type of data, form of population, method of sampling, sample size.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
BCOR 1020 Business Statistics
Richard M. Jacobs, OSA, Ph.D.
Robert delMas (Univ. of Minnesota, USA) Ann Ooms (Kingston College, UK) Joan Garfield (Univ. of Minnesota, USA) Beth Chance (Cal Poly State Univ., USA)
Example 10.1 Experimenting with a New Pizza Style at the Pepperoni Pizza Restaurant Concepts in Hypothesis Testing.
Statistical Techniques I
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Tests of significance: The basics BPS chapter 15 © 2006 W.H. Freeman and Company.
14. Introduction to inference
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
+ Chapter 9 Summary. + Section 9.1 Significance Tests: The Basics After this section, you should be able to… STATE correct hypotheses for a significance.
T tests comparing two means t tests comparing two means.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
The Argument for Using Statistics Weighing the Evidence Statistical Inference: An Overview Applying Statistical Inference: An Example Going Beyond Testing.
Chapter 9: Testing Hypotheses
Chapter 8 Introduction to Hypothesis Testing
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Introduction to inference Tests of significance IPS chapter 6.2 © 2006 W.H. Freeman and Company.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
1 ConceptsDescriptionHypothesis TheoryLawsModel organizesurprise validate formalize The Scientific Method.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Introduction to Inferece BPS chapter 14 © 2010 W.H. Freeman and Company.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Section A Confidence Interval for the Difference of Two Proportions Objectives: 1.To find the mean and standard error of the sampling distribution.
Issues concerning the interpretation of statistical significance tests.
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Example 10.2 Measuring Student Reaction to a New Textbook Hypothesis Tests for a Population Mean.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter Eight: Using Statistics to Answer Questions.
Introduction to inference Tests of significance IPS chapter 6.2 © 2006 W.H. Freeman and Company.
BPS - 5th Ed. Chapter 151 Thinking about Inference.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.2 Tests About a Population.
Chapter 13 Understanding research results: statistical inference.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Testing and Statistical Significance
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 8: Tests of significance and confidence.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
Uncertainty and confidence If you picked different samples from a population, you would probably get different sample means ( x ̅ ) and virtually none.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Comparing Two Proportions Chapter 21. In a two-sample problem, we want to compare two populations or the responses to two treatments based on two independent.
Chapter Nine Hypothesis Testing.
Understanding Results
Intro to Confidence Intervals Introduction to Inference
Presentation transcript:

What Students Learn (and Don’t Learn) about Inferential Reasoning in Introductory Statistics Courses 2014 Joint Statistical Meetings (JSM) Boston, MA Sharon Lane-Getaz St. Olaf College, Northfield, MN

Objective What does statistics education research report about correct conceptions, difficulties and misconceptions people have with inferential reasoning? How might this be of help to statistical consultant dealing with clients? Background: To assess impact of methods on teaching inference, developed instrument to assess 14 known misconceptions and difficulties, added items to assess correct conceptions. Measurement: Reasoning about P-values and Statistical Significance (RPASS) scale reliability in this study is Cronbach’s alpha =.76 (37 items). Study: Compare Pretest and Posttest proportions of students answering each item correctly on a scatterplot (canoe plot). Discussion: Emphasize what students generally learn and what problems tend to persist. Sharon Lane-Getaz,

Subjects and Setting Subjects (N = 138) from two introductory-level statistics courses aimed at the social sciences (n 1 = 78) and natural sciences (n 2 = 60). 138 out of 167 enrolled students completed the Pre- and Posttest, and consented to participate (83% response) (94) females, (43) males, (1) no response (34) first years, (56) sophomores, (30) juniors, (18) seniors. Setting: Small liberal arts college (3000 students) in the upper Midwest US, a small town of “cows, colleges and contentment” Time: Spring semester Sharon Lane-Getaz,

Broad range of results with two courses combined: RPASS-9 Pretests and Posttest Totals Sharon Lane-Getaz,

Pre- and Posttest Totals Gains by Course

Aggregate Results for Both Courses (N = 138) 70% of 37 RPASS-9 Posttest items correct, on average. Five more Posttest items correct, on average:  RPASS-9 Posttest (Mean = 26.1, SD = 5.1)  RPASS-9 Pretest (Mean = 21.0, SD = 4.2) What did students learn, by item, … and what did they not learn? Sharon Lane-Getaz,

Item-Level Analysis (Canoe Plot) Canoe Plot of item-level changes in proportion correct  Scatterplot of Pretest to Posttest proportions by item  95% confidence band along p posttest = p pretest differentiates items with a significant difference in proportions answering correctly from items with insignificant differences (Posttest – Pretest).  Wilson adjusted margins of error: maintains a 95% nominal rate (Agresti & Caffo, 2000).  No family-wise correction, intended for descriptive purposes. Sharon Lane-Getaz,

Proportion Correct Responses by RPASS-9 item Pretest on x, Posttest on y (37 items, N = 138) 23 items above the 95% confidence band, 13 within, and 1 below

Improved 14 Correct Conceptions of the 23 Items “Above the Band” Improved Statistical Literacy: Recognize textbook definitions of p-value (1-1, 6-1) Link p-value to sampling variation (2-1) Understand p-value as a rareness measure (3a-2) Improved Inferential Reasoning: Assess significance graphically (3b-1) Reason about variation (3c-2) Assess impact of alternative hypothesis on p-value (1-3, 4b-1) Differentiate small p-values, Type I and II errors (6-2, 6-7) Reason about sample size impact on p-value (6-4) Reason about strength of evidence vs. p-value (2-2, 4a-1, 6-3) Sharon Lane-Getaz, (5) Green items indicate p c <.50 on Pretest

Improved (Suppressed) 9 Misconceptions of the 23 items “Above the Band” State conclusions within confines of scope of inference: Need random sample to generalize sample to population (5-4) Need random assignment to draw causal conclusion (4a-3). Interpret what a P-value is NOT: Always small or always desired to be low value (3a-3, 3b-3) Probability the Null Hypothesis is false or true (5-1, 5-2) Alpha or significance level (4a-1) Interpret that a small P-value does NOT mean: Chance caused results observed (2-4) Provides definitive, contrapositive proof (3a-1) Sharon Lane-Getaz, (3) Red items indicate p c <.50 on Pretest

No Improvement: “Within the Band” Correct Conceptions (C) Reason about variation in boxplot depiction (3c-1) C Making correct rejection decision (4b-3) C Recognize an informal definition of p-value (1-2) C Recognize p-value as a conditional probability (2-3) C Use Confidence Intervals for statistical significance (2-5) C Differentiate p-values from effects (4a-2) C Interpret large p-value (4b-2) C Consider impact of sample size on p-values (4b-4, 6-4) C Sharon Lane-Getaz, Green indicates p c <.50 on Pretest

No Improvement: “Within the Band” Misconceptions (M) or Multiple Choice Items Belief increased replications = increased sample size (4b-6) M Belief p-values always low or desired to be low (3b-2) M Differentiate statistical vs. practical significance (4b-5, 6-5) C/M Check conditions before making an inference (6-6) C/M Sharon Lane-Getaz, Red indicates p c <.50 on Pretest

The One item “Below the Band” Unlearning, Guessing, Confusion? Responses for one item suggest better reasoning on the Pretest than on the Posttest (just below the 95% confidence band): When asked to choose correct direction to shade the p-value in the sampling distribution of means (3b-4) Students tend to select shade “to the right;” even though the alternative hypothesis suggests that one should shade the larger left tail. Sharon Lane-Getaz,

Remind clients of caveats and limitations of the statistical inference process. P-value is an integrated part of the larger statistical process Logic of inference (how we interpret results) depends on sample size, relates to effect size and importance, and whether conditions were met. Scope of inference (what we can conclude) depends on randomness in study design; how the data were gathered Confidence interval (CI) estimates population parameters or true effects, given the sample we observed…and Provides complementary information than p-values do alone (bounds for the effect). Can assess statistical significance. For example, point out whether a null hypothesis is in the interval or not. Is zero in the interval? Is the interval all positive or all negative? Sharon Lane-Getaz,

Students in a randomization-based curriculum learn more on average, but ironically show no improvement on 5 items associated with the randomization distribution: How one- or two-tailed test relates to p-value (4b-2) M Correct rejection decision (4b-3) C Impact of sample size on significance (4b-4) M Significance vs. practical importance (4b-5) Impact of increasing sample size vs. replications (4b-6) M Sharon Lane-Getaz, A Surprise Aside

References Agresti, A, & Caffo, B. (2000), Simple and Effective Confidence Intervals for Proportions and Differences of Proportions result from Adding Two Successes and Two Failures. The American Statistician, 54(4), 280–288. Chance, B. L., & Rossman, A. J. (2006), Investigating Statistical Concepts, Applications, and Methods, Belmont, CA: Brooks/Cole – Thomson Learning. Cobb, G. (2007), The Introductory Statistics Course: A Ptolemaic Curriculum?. Technology Innovations in Statistics Education, 1,(1). Cumming, G., & Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60(2), delMas, R. C., Garfield, J. B., Ooms, A., & Chance, B. (2007), Assessing Students’ Conceptual Understanding after a First Course in Statistics. Statistics Education Research Journal [online], (6)2, Lane-Getaz, S. J. (2013). Development of a Reliable Measure of Students’ Inferential Reasoning Ability. Statistics Education Research Journal (SERJ), 12(1), Lane-Getaz, S. J. (2007). Toward the Development and Validation of the Reasoning about P- values and Statistical Significance Scale. In B. Phillips & L. Weldon (Eds.), Proceedings of the ISI / IASE Satellite Conference on Assessing Student Learning in Statistics, Voorburg, The Netherlands: ISI. Getaz.pdfhttp:// Getaz.pdf Utts, J. (2003). What Educated Citizens Should Know about Statistics and Probability. The American Statistician, 57(2), Sharon Lane-Getaz,

Contact Information & Slides Sharon Lane-Getaz, On sabbatical this coming year and would love to collaborate with YOU to administer the RPASS at your institution! Let’s talk! These JSM-2014 presentation slides will be available from: The differences in proportions by item appear in the Appendix of this presentation. Please see the proceedings for more!

RPASS-9 item Concept or difficulty assessed p 2 -p Selects a textbook definition of a p-value given multiple choices..41 3b-1 Uses a density curve and an observed value to estimate if the observed value (or more extreme) is statistically significant Reasons smaller p-value, stronger the evidence of a difference or effect..36 4a-1 Confuses p-value with significance level  Recognizes p-value in terms of variation in a sampling distribution Understands magnitude of p-value depends if test is one- or two-sided..30 4a-2 Reasons greater evidence of a difference or effect, smaller the p-value Understands stronger evidence of difference or effect, smaller p-value..23 3c-2 Employs graphical reasoning about variation Understands a small p-value suggests results are statistically significant Believes the p-value is the probability observed results are due to chance or caused by chance, if the null is true..22 3a-1 Believes statistics provide definitive proof; misuses the deterministic Boolean logic of contrapositive proof..19 Table 1. Proportion Correct on RPASS-9 Posttest item exceeds Pretest Proportion Correct (12 of 23 items) Note. a Items associated with sampling or randomization distribution. b Requests explanation of reasoning. Sharon Lane-Getaz,

RPASS-9 item Concept or difficulty assessed p 2 -p 1 4b-1 Interprets a p-value for a one-tailed hypothesis Misinterprets a p-value as the probability the null hypothesis is false Believes p-value is the probability that the alternative hypothesis is true Understands stronger evidence of difference or effect, smaller p-value Reasons about impact of a small sample size on statistical significance..16 3a-2 Understands the p-value as a rareness measure..14 4a-3 Believes causal conclusion can be drawn from small p-values regardless of study design Recognizes a formal textbook definition of the p-value without context..13 3b-3 Believes p-value is always a low number (or always desired to be a low)..13 3a-3 Belief p-values are always a low value or are always desired to be a low value Differentiates between concepts of Type I and Type II error..12 Table 1 contd. Proportion Correct on RPASS-9 Posttest exceeds Pretest Proportion Correct (11 of 23 items) Note. a Items associated with sampling or randomization distribution. b Requests explanation of reasoning. Sharon Lane-Getaz,

RPASS-9 item Concept or difficulty assessed p 2 -p Understands small p-value does not mean practical importance..08 3b-2 Believes p-value is always a low number (or desired to be low)..07 4b-4 Relationship between sample size and p-value Understands p-value is conditioned on the null hypothesis being true Confidence intervals can assess statistical significance, much like p-values are used when hypothesis testing.06 4b-5 Differentiates statistical sand practical significance.03 4b-2 Difficulty with one versus two-tailed p-value.01 3c-1 Employs graphical reasoning about variation 0 4b-3 Understands the rejection decision Confuses if statistical significance refers to a sample or a population b-6 Understands impact of increasing number of replications in a simulation versus the impact of increasing the sample size Understands to conduct a significance test, conditions must be met Recognizes an informal description of the p-value embedded in context Table 2: Equal Proportion of Students Answer RPASS-9 Item Correctly On Posttest and Pretest (13 items) Note. a Items associated with sampling or randomization distribution. b Requests explanation of reasoning. Sharon Lane-Getaz,