Presentation is loading. Please wait.

Presentation is loading. Please wait.

Effect size reporting reveals the weakness Fisher believed inherent in the Neyman-Pearson approach to statistical analysis Michael T. Bradley & A. Luke.

Similar presentations


Presentation on theme: "Effect size reporting reveals the weakness Fisher believed inherent in the Neyman-Pearson approach to statistical analysis Michael T. Bradley & A. Luke."— Presentation transcript:

1 Effect size reporting reveals the weakness Fisher believed inherent in the Neyman-Pearson approach to statistical analysis Michael T. Bradley & A. Luke MacNeill Department of Psychology, University of New Brunswick Abstract Effect sizes provide concrete support for Fisher’s arguments against Neyman and Pearson’s approach to statistical analysis. Neyman and Pearson approached statistics from an applied perspective and sampled from production lines. They could specify a probability that some batch was abnormal and then estimate a power of detecting that abnormality. They accepted p<.05 for a type 1 error and established the.10 level as the willingness to make a type 2 error, and suggested that these two specifications belonged in scientific research. Fisher felt both specifications implied a precision defensible in routine production but not in scientific investigation. He worried that estimates based on a singular or small set of experiments would be unstable. For Fisher accepting a value at.05, only means that new or manipulated data don’t fit a “model” of a null Hypothesis with a specified mean and variance. The question for Fisher is “What is the correct model for such a result?” Neyman and Pearson, on slim data, act as if the model was specified before testing began. The recommended reporting of effect sizes compounds problems with the Neyman and Pearson approach since the model is accepted as correct, and effect size estimates are then considered accurate without considering distribution factors, number of potential attempts to test a hypothesis, or “file drawer effects”. Conclusion APA compounds data analysis problems by endorsing the N-P approach, and recommending the reporting of Null Hypothesis Significance Tests (NHST) in conjunction with Effect Size Estimates, Confidence Intervals (CIs), and clear description of the problem area. One cannot go wrong with a clear description of a problem area. As we have seen, the other recommendations are incompatible with each other. Fisher did not live to see the ultimate missteps in testing based upon the N-P approach, but if he could have predicted how influential N-P were to become, he may well have moved from mild fury that characterized his attacks to apoplectic fits. Effect Sizes and NHST The use of effect sizes is incompatible with the Neyman and Pearson approach. NHST is not a precise measurement technique, whereas effect size estimates are meant to be point estimations. Effect sizes are based on standard deviation, or the average difference from the mean. Theoretically, adding more participants to a study will not have an effect on standard deviation, and so it will not have an impact on effect size. NHST is based on standard error. Adding more participants to a study will decrease the standard error, which will increase power and the chances for significance. Confidence Intervals Confidence intervals are based on an inferential approach and clearly belong to the NHST family. Like NHST, confidence intervals are calculated with the standard error and are subject to N. Since Ns can vary, it is analogous to using an elastic band as a ruler. Significance comes and goes, and worse, the only calculations of effect sizes come from misestimates of mean differences, variability, or a combination of both. An additional problem is that confidence intervals are often huge and exceed the magnitude of the effect size they are meant to bracket. If replication involves achieving a similar effect size, then it is difficult to achieve confirmatory results. At the same time, if replication involves fitting into the confidence interval, it is difficult to fail to replicate. This contradictions illustrates further issues with the N-P approach. Andrew Brand Institute of Psychiatry, King’s College London Background Fisher felt inferential tests were most primitive forms of measurement. According to his view, p is an imprecise estimate that indicates from an individual study whether anything worth pursuing is present. If so, then superior design, measurement, and data analysis strategies could be pursued. Neyman and Pearson (N-P) approached statistics from an applied perspective. They sampled from production lines, and could specify (1) the probability that some batch was abnormal, and (2) the power of detecting that abnormality. N-P suggested that this measure of precision could be obtained with inferential tests. Researchers will either accept a null hypothesis (H O ) or reject it in favor of an alternative hypothesis (H A ). The Type I error (α) is the false rejection of H O, and a Type II error (β) is the false acceptance of H O. N-P accepted.05 as the threshold for a Type I error, and established the.10 level as the willingness to make a Type II error. They suggested that these two specifications belonged in scientific research. Fisher felt both specifications implied a precision defensible in routine production but not in scientific investigation. He worried that estimates based on a singular or small set of experiments would be unstable. For Fisher, accepting a value at.05 only means that the data do not fit a “model” of a null hypothesis with a specified mean and variance. The question for Fisher is “What is the correct model for such a result?” Neyman and Pearson, on slim data, act as if the model was specified before testing began. Calculating an effect size only after a significant NHST could result in an overestimation of the effect size in a given research area. If studies are underpowered (i.e., there is a greater chance that a researcher will miss results), then only effect sizes for studies that improbably achieve significance will be considered and published. Even when appropriately powerful (90% in N- P terms), significance tests exclude 10% of effect size estimates from availability. In many areas of research, significance would appear to truncate the potential family of effect size estimates. If a hallmark of science is accuracy of measurement, then it is precluded in the N-P model. β α power H A Distribution H 0 Distribution Reject H 0 Do Not Reject H 0 Figure 1. Even when a series of studies are appropriately powerful (90%), significance tests exclude 10% of effect size estimates from availability (β). Figure 2. If replication involves achieving a significant difference between groups, then it is difficult to achieve confirmatory results. If replication involves fitting the new value into a confidence interval, it is difficult to fail to replicate. Confidence Intervals ____ control ____ failed replication ____ sig manipulation

2


Download ppt "Effect size reporting reveals the weakness Fisher believed inherent in the Neyman-Pearson approach to statistical analysis Michael T. Bradley & A. Luke."

Similar presentations


Ads by Google