On teaching statistical inference: What do p values (not) mean? Bruce Blaine, PhD, PStat® Department of Mathematical and Computing Sciences St. John Fisher.

Slides:



Advertisements
Similar presentations
Small differences. Two Proportion z-Interval and z-Tests.
Advertisements

Topics
Issues About Statistical Inference Dr R.M. Pandey Additional Professor Department of Biostatistics All-India Institute of Medical Sciences New Delhi.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Correlation Mechanics. Covariance The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are.
Confidence Intervals © Scott Evans, Ph.D..
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
PSY 307 – Statistics for the Behavioral Sciences
Inferences About Means of Single Samples Chapter 10 Homework: 1-6.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
Chapter 9 Hypothesis Testing.
The t Tests Independent Samples.
Standard error of estimate & Confidence interval.
Hypothesis Testing and T-Tests. Hypothesis Tests Related to Differences Copyright © 2009 Pearson Education, Inc. Chapter Tests of Differences One.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Inference for a Single Population Proportion (p).
PowerPoint presentations prepared by Lloyd Jaisingh, Morehead State University Statistical Inference: Hypotheses testing for single and two populations.
Chapter 8 Introduction to Hypothesis Testing
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
PSY2004 Research Methods PSY2005 Applied Research Methods Week Eleven Stephen Nunn.
S519: Evaluation of Information Systems Social Statistics Inferential Statistics Chapter 8: Significantly significant.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Inference We want to know how often students in a medium-size college go to the mall in a given year. We interview an SRS of n = 10. If we interviewed.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Chapter 8 Introduction to Hypothesis Testing ©. Chapter 8 - Chapter Outcomes After studying the material in this chapter, you should be able to: 4 Formulate.
Inferential Statistics Body of statistical computations relevant to making inferences from findings based on sample observations to some larger population.
S-012 Testing statistical hypotheses The CI approach The NHST approach.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Statistics 101 Chapter 10 Section 2. How to run a significance test Step 1: Identify the population of interest and the parameter you want to draw conclusions.
1 CHAPTER 4 CHAPTER 4 WHAT IS A CONFIDENCE INTERVAL? WHAT IS A CONFIDENCE INTERVAL? confidence interval A confidence interval estimates a population parameter.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Logic and Vocabulary of Hypothesis Tests Chapter 13.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Statistical Inference Drawing conclusions (“to infer”) about a population based upon data from a sample. Drawing conclusions (“to infer”) about a population.
P-values and statistical inference Dr. Omar Aljadaan.
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Testing and Statistical Significance
Confidence Intervals and Hypothesis Tests Week 5.
Chapter 9 Introduction to the t Statistic
Inference for a Single Population Proportion (p)
More on Inference.
Significance Test for the Difference of Two Proportions
Unit 5: Hypothesis Testing
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
The binomial applied: absolute and relative risks, chi-square
CHAPTER 9 Testing a Claim
AP STATISTICS REVIEW INFERENCE
Null Hypothesis Testing
Hypothesis Theory PhD course.
Two-sided p-values (1.4) and Theory-based approaches (1.5)
More on Inference.
Chapter Review Problems
Chapter 9 Hypothesis Testing.
Statistical inference
Chapter 12: Comparing Independent Means
Statistical Inference
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
CHAPTER 12 Inference for Proportions
CHAPTER 12 Inference for Proportions
More on Testing 500 randomly selected U.S. adults were asked the question: “Would you be willing to pay much higher taxes in order to protect the environment?”
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Statistical inference
Presentation transcript:

On teaching statistical inference: What do p values (not) mean? Bruce Blaine, PhD, PStat® Department of Mathematical and Computing Sciences St. John Fisher College 1

Limitations of NHST The misapplication of null hypothesis significance testing (NHST) procedures for statistical inference is well known. NHST procedures do not address what researchers most want to know. NHST procedures test a (nil) null hypothesis, which is rarely true and therefore uninformative to reject. NHST procedures deliver a conditional probability, p(D|H o ), which is commonly misinterpreted. NHST procedures do not test research hypotheses. NHST procedures do not quantify effect size. 2

Misinterpretations of p values Two misinterpretations of p values from NHST procedures are common in the social sciences (c.f., Kline, 2004): 1. Magnitude fallacy p values are misunderstood as an effect size statistic, such that p is inversely proportional to the evidence for the treatment effect. “…the effect was marginally significant, p=.07” “…the effect was highly (or extremely) significant, p<.001” 2. Validity fallacy p(D|H o ) is misunderstood as p(H 1 |D). “…the treatment improved the outcome, p<.05” “…the treatment had no effect on the outcome, p>.05” 3

Classroom exercise 1: Addressing the magnitude fallacy 1.In Excel (using Data Analysis Toolpak add-in), have students enter the data from a hypothetical experiment in Table Provide, or have them create, the table in Table Have students run an independent-samples t test (assume equal variances). 4. Copy and paste treatment and control data to increase ns by 5, repeating the t test each time. 5. Fill in the table with values from the analyses. 4 Table1. Table 2.

Classroom exercise 1: Results This exercise should point out that p values decrease in the 3 experiments even though the treatment has the same effect in each—why? Students should come to appreciate that larger samples are associated with smaller estimated standard errors. For a constant mean difference (which doesn’t change in this exercise), this will produce larger t values, and smaller p values. 5

Imagine 3 studies that compare students with high (Treatment, or T) and low Facebook time (Control, or C) on GPA, with descriptive statistics from the studies in the table below: 1.Have students observe (via hand calculated t tests or 95% confidence intervals) that none of the 3 studies would reject H o at p< In Excel (using the Meta Easy add-in), have students enter the data from the 3 hypothetical studies and generate a meta- analysis of the effect of Facebook time on GPA. 6 Classroom exercise 2: Addressing the validity fallacy

Classroom exercise 2: Results The exercise should point out that although none of the 3 studies is statistically significant (defined as p<.05), when their data is combined the Facebook effect on GPA is significant. Students should notice that the 95% CI estimate of the Facebook effect on GPA (the FE diamond) does not include 0. 7

These exercises allow data to teach students where p values come from and how to properly interpret them. o Exercise 1 shows that although p values are influenced by mean difference and sample size data, they cannot be trusted to quantify the mean difference alone. o Exercise 2 shows that evidence from “nonsignificant” studies, when taken as evidence against H 1, can be misleading. Genuine treatment effects may be obscured in studies with small samples, high variability, or both. 8 Summary lessons

On teaching statistical inference: more estimation, less NHST o Typical social science statistics textbooks and curricula are overdependent upon NHST methods for statistical inference. o These exercises can be part of a larger effort to teach more estimation methods in basic statistics courses, including confidence intervals, effect size statistics, and meta-analysis. o Estimation methods are more intuitive, because they speak to research, rather than null, hypotheses. 9