Assessing the Evidence: Statistical Inference Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research
Objectives of Session Understand idea of inference Understand idea of inference Confidence interval approach Confidence interval approach Significance testing Significance testing Briefly - Some simple tests Briefly - Some simple tests
Statistical Inference The aim is to draw conclusions (INFER) from the specific (sample) to the more general (population). Are differences between groups chance occurrences or do they represent statistically significant results (I.e. real differences)?
Extrapolating from the sample to population Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee
Two approaches: confidence intervals and hypothesis testing Confidence Intervals Random variability means that there is statistical (random) variation around any summary statistic: Mean, proportion, difference between means, etc
Confidence intervals Uncertainty expressed as a Confidence Interval: Summary statistic constant x standard error e.g. For 95% CI constant = 1.96 from Normal distribution
Confidence intervals For a percentage the standard error is given by: So for p = 35%, se = 4.8%, where n = 100
Confidence intervals Consider a prevalence of 35% for the uptake of statins for secondary prevention of MI from one practice p = 35%, 95% CI = 35% ± 1.96x 4.8% p = 35%, 95% CI = 35% ± 1.96x 4.8% = 25.6% to 44.4% = 25.6% to 44.4%
Confidence intervals For a mean the standard error is given by: where s is the standard deviation of the distribution
Confidence intervals Consider a mean cholesterol measurement of 5.4 mmol/l for a group of 100 patients with type 2 diabetes and standard deviation s = 1.1 mmol/l = 5.4, 95% CI = 5.4 ± 1.96x 1.1/√100 = 5.4, 95% CI = 5.4 ± 1.96x 1.1/√100 = 5.2 to 5.6 mmol/l = 5.2 to 5.6 mmol/l
Confidence intervals Confidence intervals give estimation of precision of summary statistic Precise Precise Imprecise Major determinant of precision is sample size
Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee
Confidence intervals Warning! Confidence intervals are usually interpreted in a Bayesian way even though used a frequentist method to estimate The probability of the true value lying within the confidence interval is NOT, repeat NOT 95% Bayesian confidence intervals are called CREDIBLE INTERVALS and probability of true value lying in credible interval IS 95%
Frequentist Confidence Interval means that with repeat samples… 95% Confidence interval means that 1 in 20 of proportions from repeat samples would be outside confidence interval Study Sample & 95% CI RepeatSamples… …..ad infinitum
To put it another way the 95% Confidence Interval is… …one of many that could be constructed with the assurance that 95% of the time the true value of the parameter would be included Sample & 95% CI RepeatSamples… …..ad infinitum
Statistical Inference: Hypothesis testing Are differences between groups chance occurrences or do they represent statistically significant results (I.e. real differences)? The process of inference starts from a neutral position – Null Hypothesis
Statistical Inference: Hypothesis testing The null hypothesis (H 0 ) is usually set to ‘there is no difference’ Collect data and carry out hypothesis tests Accept or reject the null hypothesis
Legal analogy Hypothesis testing Legal trial Hypothesis test Defendant assumed innocent until proved guilty Null hypothesis assumes no difference between groups
Legal analogy Hypothesis testing Legal trial Hypothesis test Examine evidence Calculate test statistic based on evidence from sample data
Legal analogy Hypothesis testing Legal trial Hypothesis test 1.Accept evidence proves guilt 2.Evidence does not prove guilt ‘not proven’ 1.Accept significant difference between groups 2.Insufficient evidence to reject H 0
Legal Analogy Hypothesis Testing No statistical significance not same as No difference Illustrations Ian Christie, Orthopaedic & Trauma Surgery, Copyright 2002 University of Dundee
Statistical Inference: Hypothesis testing The test statistic generally consists of: Summary statistic – H 0 value Standard error of summary e.g.Test that the mean is different to zero: t = Mean – 0 Se(Mean) Se(Mean)
Statistical Inference: Hypothesis testing The test statistic is then compared with tabulated values of a distribution (e.g Normal distribution, t-distribution) Assuming the null hypothesis is true, what is the probability of obtaining the actual observed value of the test statistic, t? How likely is the value of t, to have occurred by chance alone?
Statistical Inference: Hypothesis testing Assuming the null hypothesis is true, what is the probability of obtaining the actual observed or greater value of the test statistic, t? Using distribution Of t which is similar to a Normal distribution this probability can be Obtained in figure as p = %
Statistical Inference: Hypothesis testing If probability of the occurrence of the observed value < 5% or p < 0.05 then this is unlikely to be a chance finding Result is declared statistically significant Fortunately most statistical software (e.g. SPSS) will carry out the test you request and give p-values (SPSS labels as ‘Sig’)
Two group hypothesis testing We will consider three common tests: 1. t-test for difference between two means 2. Chi-squared test ( ) for difference between two proportions 2. Chi-squared test ( 2 ) for difference between two proportions 3. Logrank test for difference between two groups median survival All are easily carried out in SPSS
Are practices with access to community hospitals further away on average from general hospitals? No access Access to CH n=17 Mean = 8.68 km SD = km Se (mean) = 2.89 n=10 Mean = km SD = 5.68 km Se (mean) = 1.79
Example t-test 1 and 2 refer to the two groups N is the number in each group bar refers to the mean and X bar refers to the mean and s p is the pooled standard deviation
Example t-test t= / t= / t= With 25 degrees of freedom from t-tables p = and so the difference of is highly statistically significant
Consider a recent RCT Rimonabant vs. placebo to reduce body weight in obese people (BMI > 30kg/m 2 ) Rimonabant vs. placebo to reduce body weight in obese people (BMI > 30kg/m 2 ) Rimonabant (20 mg daily) inhibits affects of cannabinoid agonists which in turn affects energy balance Rimonabant (20 mg daily) inhibits affects of cannabinoid agonists which in turn affects energy balance Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg (rimonab vs. plac) Mean reduction in body weight at one year was 6.6kg vs. 1.8 kg (rimonab vs. plac) Difference was 4.7 kg (95% CI 4.1, 5.4) Difference was 4.7 kg (95% CI 4.1, 5.4) By end of year 2 mean weight was back to start! By end of year 2 mean weight was back to start!
Are practices with access to community hospitals more likely to have training status? 12 (71%) 4 (40%) 5 (29%) 6 (60%) NoYes No Training Status Training status Community Hospital
Are practices with access to community hospitals more likely to have training status? Is the difference in proportionsIs the difference in proportions 60% - 29% = 31% well within the realms of chance or a statistically significant finding? Null hypothesis Difference = 0Null hypothesis Difference = 0 Use chi-squared ( 2 ) test for significance of difference Use chi-squared ( 2 ) test for significance of difference
Pearson Chi-Squared Test NoYes No training status Training status a b cd Comm. Hosp. a+cb+d a+b c+d N
Pearson Chi-Squared Test where N = a+b+c+d and |ad – bc| means take the positive value of the calculation
Pearson Chi-Squared Test = 2.44 with 1 degree of freedom df = (no. rows – 1) x (no. columns – 1) P = which is not statistically significant
More complicated analyses Introduced simple two-group tests Introduced simple two-group tests Results of more complicated analyses are expressed in the same way Results of more complicated analyses are expressed in the same way Summary statistic and 95% confidence interval Summary statistic and 95% confidence interval Usually p-value is also stated but often implicit from the confidence interval Usually p-value is also stated but often implicit from the confidence interval Beware spurious significance e.g. Beware spurious significance e.g. p = (3 d.p. are enough) p = (3 d.p. are enough) ‘Importance’ refers to size of difference ‘Importance’ refers to size of difference An ‘important’ result can be statistically non- significant An ‘important’ result can be statistically non- significant
Sacred 5% level 5% level is arbitrary 5% level is arbitrary Practical choice before computer era to make tables easier to construct Practical choice before computer era to make tables easier to construct Are p = and p = different? Are p = and p = different? In past researchers tended to only present p- values In past researchers tended to only present p- values Now emphasis is on size of effect and 95% CI Now emphasis is on size of effect and 95% CI Unfortunately, Editors still influenced by p- values leading to publication bias Unfortunately, Editors still influenced by p- values leading to publication bias
Summary Do not get carried away by p-values Do not get carried away by p-values Interpretation requires knowledge of area to put into context, but also understanding of what the tests do Interpretation requires knowledge of area to put into context, but also understanding of what the tests do A p-value close to 5% is approaching significance and may suggest it is worth investigating A p-value close to 5% is approaching significance and may suggest it is worth investigating The size of the effect may be more clinically or scientifically important The size of the effect may be more clinically or scientifically important