Common Statistical Errors (and how to avoid them)
Conflict of Interest Disclosure I have no potential conflict of interest to report
A quick tour of common statistical errors Advice to help your submission pass statistical review
Pop quiz: p values
Can you spot the error?
Bill Gates walks into a room…
Error 1 DATA STATISTICIAN P VALUES
Consultation Collaboration Partnership Biostatisticians Consultation The statistician works on a single paper or protocol as a once-off Collaboration The statistician works with you on all of your projects over many years Partnership The statistician is a key part of the research team and develops scientific ideas
Statisticians can also help with… Thinking through the scientific question Experimental design Data collection Data quality assurance
Inference: is something there?
Estimation: How big is it? Mean Median Proportion
10% reduction in breast cancer incidence P=0.07 Trial Results 10% reduction in breast cancer incidence P=0.07
Messi beats me 5 – 1 P = 0.08 by binomial test Error 2 Messi beats me 5 – 1 P = 0.08 by binomial test
I could play for Barcelona! Error 2 I could play for Barcelona!
State a null hypothesis Inference 101 State a null hypothesis
State a null hypothesis Get your data, calculate p value Inference 101 State a null hypothesis Get your data, calculate p value
Inference 101 State a null hypothesis Get your data, calculate p value If p<5%, reject null hypothesis If p ≥5%, don’t reject null hypothesis
Inference 101 Don’t accept the null hypothesis In a court case: guilty or not guilty In a statistical test: reject or don’t reject
Barcelona still hasn’t called. Error 2 Barcelona still hasn’t called.
Which is bigger?
Error 3 “At randomization, there was a statistically significant difference in age between the drug and placebo group (p=0.04).” “Erectile function decreased in older men during the two-year follow-up period (p<0.0001).”
Don’t run a statistical test if you already know the answer Error 3 Don’t run a statistical test if you already know the answer
Error 4 Erk3, ECAD, P21, P53, Cadherin, il 6, il12 and Jak had no association with outcome (p>0.2 for all), Ki67 was a predictor of recurrence (p=0.03). We recommend that Ki67 be measured to determined eligibility for adjuvant chemotherapy.
Looked at 9 different biomarkers. Error 4 Multiple testing: Looked at 9 different biomarkers. 35% chance of at least one marker with p<0.05. 1 significant p-value is not grounds to change practice.
Error 4 Every single p value tests a hypothesis Think carefully about every scientific question you want to ask
Error 5 RESULTS: Compared with a BMI of 18.5 to 21.9 kg/m2 at age 18 years, the hazard ratio for premature death was 2.79 (CI, 2.04 to 3.81) for a BMI of 30 kg/m2 or greater. CONCLUSION: Moderately higher adiposity at age 18 years is associated with increased premature death in younger and middle-aged U.S. women
A RESULT IS NOT A CONCLUSION! Error 5 A RESULT IS NOT A CONCLUSION!
Biostatistics Biology Math
Error 5 OLD CONCLUSION: Moderately higher adiposity at age 18 years is associated with increased premature death in younger and middle-aged U.S. women NEW CONCLUSION:
Error 5 OLD CONCLUSION: Moderately higher adiposity at age 18 years is associated with increased premature death in younger and middle-aged U.S. women NEW CONCLUSION: Public health interventions should relay the risks of premature death among overweight women to encourage teen girls to avoid obesity.
Error 6 Mean gestational time was 36.345 weeks in the experimental group compared to 36.229 weeks in controls (p=0.6945).
Statistical Code
Statistical Code
Statistical Code
Statistical Code
Statistical Code
Statistical Code
Statistical Code
Statistical Code
Simple Statistical Errors Melissa Assel Research Biostatistician Memorial Sloan Kettering Cancer Center