Null Hypothesis Testing Nickerson (2000) Psych Methods
Common misconceptions p(D|H) = p(H|D) Bayes Theorem shows how they are actually related. We must consider the prior probability of the hypothesis. What is the probability that someone has a disease given that they have this test result p(H|D) vs. probability of a test result given they have this disease p(D|H). If the disease is rare, chances are good they do not have it even tho the result is positive. https://www.ted.com/talks/peter_donnelly_shows_how_stats_fool_juries
Picture of the problem Zika in Florida If lab score > 80, test for Zika is positive. Test is true positive 90 pct, false positive 10 pct. Do Not have Zika Have Zika p(positive|Zika) = .90, P(Zika|positive) < .50 If we slide the cutoff score down toward 75, p(pos|Zika) will approach 1, and p(Zika|positive) will diminsh.
Nickerson 1 Belief that p is the probability that the null is true and 1-p is the probability that the alternative is true P is the likelihood, the probability of the data given that the null is true. This is Bayes Theorem.
Nickerson 2 Belief that rejection of the null establishes the truth of the theory that predicts it. Predict that the sun will (not) rise Affirming the consequent – typically many things can result in the same outcome.
Nickerson 3 Small p means results are replicable What does replication mean? Often ambiguous. Significant? Power – often small studies are chosen for publication because of significant results. Unless the population effect size is large, further significant results are unlikely Howell 8.7. Aronson et al (1998) stereotype threat study had significant result. But power calculations show power to be .52. About 50/50 chance to find sig result in a replication.
Nickerson 4 Confusion between statistical and practical (or clinical) significance Small effect size can show small p value of sample size is large. Present effect sizes along with your significance tests.
Nickerson 5 Belief that alpha is the probability that if one has rejected the null, one has made a Type I error. This is only true if the result is nil (ES =0) in the population. Very unlikely unless random numbers. ‘Nature abhors a vacuum.’ Google attributes this quote to Aristotle.
Nickerson 6 Belief that failing to reject the null hypothesis is equivalent to demonstrating it to be true. This is a common misunderstanding. Vote counting in meta-analysis. The kernel of truth is that sometimes you have to make a decision, e.g., which strain to plant, whether to implement a new training program. One good reason to avoid accepting the null is that it is really easy to get nonsignificant results. Use a really small sample, choose poor measures, provide a weak treatment. Alpha makes it hard to get a good result by incompetence; beta makes it easy.
Nickerson 7 All or none decisions; arbitrary boundary There is a difference between a best estimate and a decision to be made on the basis of the data. Provide a point estimate. But also provide a confidence interval to communicate the uncertainty that remains after data collection.
Nickerson 8 Use non nil hypotheses Be specific about alternatives Range null Report p values Report precision of measurement Report confidence intervals Report effect sizes. Two main questions: What is the point prediction or estimate What is the magnitude of effect
ASA statement What is a p-value? Informally, a p-value is the probability under a specified statistical model that a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.
1. P-values can indicate how incompatible the data are with a specified statistical model. 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 4. Proper inference requires full reporting and transparency
5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. No single index should substitute for scientific reasoning.