Significance testing Introduction to Intervention Epidemiology Tunis, 3 November 2014 Dr. Ibrahim Saied Epidemiologist, Ministry of Health and Population - Egypt isi.health.eg@gmail.com
Objectives Building sound concepts about: Null hypothesis P value Significance testing Confidence interval By the end of this presentation the participants are aimed to have sound concepts about: No point no five
Association Vs Significance test RR=1 P = 0.03 RR=5 X ? Problem ? Factor B Factor A From engagement to marriage I have a problem Significance testing Calculation or Estimation of Association What about factor A Also what about factor B
The aim of a statistical test + To reach a scientific decision (“yes” or “no”) on a difference (or effect), on a probabilistic basis, on observed data. Which group is taller, female or male? Subjective Objective To
Null Hypothesis
Null vs. alternative hypotheses Null hypothesis (H0): “There is no difference (no effect).” RR=1 ; OR=1 Alternative hypothesis (H1): “There is a difference (effect).”; indicates that the null hypothesis is not true. RR ≠ 1 ; OR ≠ 1 → The intention of a significance test is to reject the null hypothesis!
Even in law The accused person has to be considered innocent until proved otherwise.
Null hypothesis These squares have no white elephants
Alternative hypothesis These squares have white elephants
Right decisions and types of errors Truth H0 is true H0 is false H0 is not rejected Correct decision Type II error ( error) Decision H0 is rejected Correct decision Type I error (-error) Power
P-value
Probability of getting our result (observation) due to chance p-value Probability of getting our result (observation) due to chance Chance Results Similarity 25 % 15 % 3 % 4 % P value Help to reject null hypothesis 8
How to interpret P value Statistical expression 0.05 5 % Significant Not significant 0.01 0.04 0.05 0.06 0.15
Significance Tests
Some types of significance tests According to type of variables: Variable 1 Variable 2 Test Categorical Chi square Numerical t-test Correlation
Test decision >> Scenario 1 0.17 p value Expected Observed Not Reject Null Hypothesis Short distance
Test decision >> Scenario 2 Reject Null Hypothesis 0.03 p value Expected Observed Reject Null Hypothesis Long distance
How to read test results? Test Value P Value Sig. Test Chi = 0.032 t = r =
Example from SPSS
Example from SPSS (Cont’d) Chi-Square = 0.487, p = .485. When reading this table we are interested in the results of the "Pearson Chi-Square" row. We can see here that χ(1) = 0.487, p = .485. This tells us that there is no statistically significant association between Gender and Preferred Learning Medium; that is, both Males and Females equally prefer online learning versus books. This tells us that there is no statistically significant association between Gender and Preferred Learning Medium; that is, both Males and Females equally prefer online learning
Confidence interval
Confidence interval Range of values, on the basis of the sample data, in which the population value (or true value) may lie. Example: A 95% CI includes the true value with a certainty of 95%.
Confidence Interval A confidence interval represents the range of effects that are compatible with the data. CI provides Precision of the point estimate Direction of the effect (risk factor, protective factor) Magnitude of the measured effect How reliable are the information (parameters) one obtains from the data?
Wide Confidence interval Parameter * Small Sample Sample value 2 95% of values * 1 Wide Confidence interval
Narrow Confidence interval Parameter * Large Sample Sample value 2 * 1 95% of values Narrow Confidence interval
Confidence interval (CI) Frequently used formulation: If the data collection and analysis could be replicated many times, the CI should include within it the true value of the measure 95% of the time.
(lower limit ; upper limit) Point estimate ± “deviation” Structure of CIs (lower limit ; upper limit) Point estimate ± “deviation” “deviation” depends on Sample size Level of confidence Variability of data “deviation” comprises the Standard error
precision of the estimates increases Confidence Interval Sample size increases RR=4 (2.2 - 16) RR=4.5 (3.5 - 7) High data variability large CI High confidence level large CI precision of the estimates increases Sample of 25 Sample of 85
CIs and statistical significance If the null hypothesis (RR=1) is included within the CI → not significant RR=2 (0.8 - 4) (0.8 , 0.9 , 1 , 2 , 3 , 4) If the null hypothesis (RR=1) is not included within the CI → significant RR=2 (1.8 - 6) where significance level=(1- confidence level) (1.8 , 1.9 , 2 , 3 , 4 , 5 , 6)
Chlordiazepoxide use and risk of congenital heart disease Example Chlordiazepoxide use and risk of congenital heart disease Chlo. use No Chlo. use Cases 4 386 Controls 1250 OR = (4 x 1250) / (4 x 386) = 3.2 p = 0.08 (not significant) From Rothman K
So chlordiazepoxide use is safe? ”The confidence interval includes 1 so the association is not significant”
Summary Confidence interval RR = 3 (2 – 8) Test of significance P = 0.03 Null Hypothesis RR = 1 Association RR = 5 Go Problem
Conclusions Not all associations found to be statistically significant Significance testing evaluates only the role of chance as alternative explanation of observed difference or effect Confidence intervals are more informative than p-values
Recommendations Finding the proper test is an important step Use confidence intervals to describe your results! (More than one dimension) Report p-values precisely! Say “0.002” not just saying “less than 0.05” Revise and check your results Interpret with caution associations that achieve statistical significance! Always look at the raw data (2x2-table). How many cases can be explained by the exposure?
Suggested reading KJ Rothman, S Greenland, TL Lash, Modern Epidemiology, Lippincott Williams & Wilkins, Philadelphia, PA, 2008 SN Goodman, R Royall, Evidence and Scientific Research, AJPH 78, 1568, 1988 SN Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999 C Poole, Low P-Values or Narrow Confidence Intervals: Which are more Durable? Epidemiology 12, 291, 2001
Significance testing Introduction to Intervention Epidemiology Tunis, 3 November 2014 Dr. Ibrahim Saied Epidemiologist, Ministry of Health and Population - Egypt isi.health.eg@gmail.com