Interpreting Null Hypothesis Significance Tests (NHSTs) Joe Stevens, Ph.D. Educational Psychology 120 Simpson Hall , Presentation available at: © Stevens, 2002
Which study is “stronger”? Study A: t (398) = 2.30, p =.022 Study B: t (88) = 2.30, p =.024 Examples inspired by Rosenthal & Gaito (1963).
Answer: Study B ω 2 for Study A =.0106 ω 2 for Study B =.0455
Which study is “stronger”? Study P: F(2, 697) = 5.30, p = Study Q: F(2, 97) = 5.30, p =.00654
Answer: Study Q 2 for Study P =.0150 2 for Study Q =.0985
Which is “significant”/important? Study T: F = 63.62, p < Study U: F = 5.40, p =.0486
2 for Study T =.01, N = 6,300 2 for Study U =.40, N = 10 Study T is a very small effect, Study U is a much larger effect Significance”/importance cannot be determined statistically
Significance Test Results = Effect Size X Size of Study t = X
Significance Test Results = Effect Size X Size of Study t = X
Significance Test Results = Effect Size X Size of Study t = d X
Significance Test Results = Effect Size X Size of Study F = Xdf
Significance Test Results = Effect Size X Size of Study F = X
Interpreting Statistical Results NHST results (t, F, p, etc.) answer the question: Is there a relationship between variables (yes/no)? Effect size (d, g) and strength of association measures (r 2, ω 2, η 2 ) answer the question: How strong is the relationship?
NHST results should always be accompanied by estimation of effect size; report both Estimation of statistical power is also advisable, especially when NHST results are not statistically significant Use of Confidence Intervals can also aid proper interpretation of NHST results
Bibliography Cohen, J. (1994). The earth is round (p <.05). American Psychologist, 49, 997– Harlow, L. L. Mulaik, S. A., & Steiger, J. H. (1997). What if there were no significance tests? Hillsdale, NJ: Erlbaum. Rosenthal & Gaito (1963). The interpretation of levels of significance by psychological researchers. Journal of Psychology, 55, Rosenthal, R. & Rosnow, R. L. (1991). Essentials of behavioral research (2 nd Ed.). New York: McGraw-Hill, Inc. Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25 (2), 26– 30. Wilkinson, L. & Task Force on Statistical Inference (1999). Statistical Methods in Psychology Journals: Guidelines and Explanations, American Psychologist, 54 (8), 594–604. [available at: