Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Similar presentations


Presentation on theme: "Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research."— Presentation transcript:

1 Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

2 Tests to be covered Chi-squared test Chi-squared test One-way ANOVA One-way ANOVA Logrank test Logrank test

3 Significance testing – general overview 1.Define the null and alternative hypotheses under the study 2.Acquire data 3.Calculate the value of the test statistic 4.Compare the value of the test statistic to values from a known probability distribution 5.Interpret the p-value and draw conclusion

4

5 Categorical data > 2 groups Unordered categories – Nominal - Chi-squared test for association Ordered categories - Ordinal - Chi squared test for - Chi squared test for trend trend

6 Example Does the proportion of mothers developing pre-eclampsia vary by parity (birth order)?

7 Pre- eclampsia Birth Order 1 st 2 nd 3 rd 4 th 1 st 2 nd 3 rd 4 th No No Yes Yes 1170 (79.4%) 278 (84.8%) 83 (86.5%) 86 (92.4%) 304 (20.6%) 50 (15.2%) 13 (13.5%) 7(7.5%) Contingency table (r x c) (r x c)

8 1.Null hypothesis: No association between pre- eclampsia and birth order 2.Null hypothesis: There is no trend in pre-eclampsia with parity Null Hypotheses

9 Test of association Test of linear trend

10 1.Strong association between pre- eclampsia and birth order (Χ 2 = 15.42, p = 0.001) 2.Significant linear trend in incidence of pre-eclampsia with parity (Χ 2 = 15.03, p < 0.001) 3.Note 3 degrees of freedom for association test and 1 df for test for trend Conclusions

11 Pre- eclampsia Birth Order 1 st 2 nd 3 rd 4 th 1 st 2 nd 3 rd 4 th No No Yes Yes 1170 (79.4%) 278 (84.8%) 83 (86.5%) 86 (92.4%) 304 (20.6%) 50 (15.2%) 13 (13.5%) 7(7.5%) Contingency table (r x c) (r x c)

12 1.Tables can be any size. For example SIMD deciles by parity would be a 10 x 4 table 2.But with very large tables difficult to interpret tests of association 3.Crosstabulations in SPSS can give Odds ratios as an option with row or column with two categories Contingency Tables (r x c)

13 Numerical data > 2 groups Compare means from several groups Single global test of difference in means Also test for linear trend 1-way analysis of variance (ANOVA)

14 Extend t-test to >2 groups i.e Analysis of Variance (ANOVA) Consider scores for contribution to energy intake from fat groups, milk groups and alcohol groups Does the mean score differ across the three categories of intake groups? Koh ET, Owen WL. Introduction to Nutrition and Health Research Kluwer Boston, 2000

15 One-Way ANOVA of scores Contributor to Energy Intake Alcohol n=6Mean=4.22n=6Mean=0.167 FatMilk n=6Mean=2.01

16 One-Way ANOVA of Scores The null hypothesis (H 0 ) is ‘there are no differences in mean score across the three groups’ Use SPSS One-Way ANOVA to carry out this test

17 Assumptions of 1-Way ANOVA 1. Standard deviations are similar 2. Test variable (scores) are approx. Normally distributed If assumptions are not met, use non- parametric equivalent Kruskal-Wallis test

18 Results of ANOVA ANOVA partitions variation into Within and Between group components Results in F-statistic – compared with values in F-tables F = 108.6, with 2 and 15 df, p<0.001

19 Results of ANOVA The groups differ significantly and it is clear the Fat group contributes most to energy score with a mean = 4.22 Further pair-wise comparisons can be made (3 possible) using multiple comparisons test e.g. Bonferroni

20 Example 2 Does income vary by highest level of education achieved?

21 H 0 : no difference in mean income by education level income by education level achieved achieved H 1 : mean income varies with education level achieved education level achieved Null Hypothesis and alternative

22 Assumptions of 1-Way ANOVA 1. Standard deviations or variances are similar 2. Test variable (income) are approx. Normally distributed If assumptions are not met, use non- parametric equivalent Kruskal-Wallis test

23

24

25 Table of Mean income for each level of educational achievement

26

27

28

29 Analysis of Variance Table F-test gives P < 0.001 showing significant difference between mean levels of education

30 Table of each pairwise comparison. Note lower income for ‘did not complete school’ to all other groups. All p-values adjusted for multiple comparisons

31 Summary of ANOVA ANOVA useful if number of groups with continuous summary in each SPSS does all pairwise group comparisons adjusted for multiple testing Note that ANOVA is just a form of linear regression – see later

32 Extending Kaplan-Meier and logrank test in SPSS You need to specify: Survival time – time from surgery (tfsurg) Survival time – time from surgery (tfsurg) Status – Dead = 1, censored = 0 (dead) Status – Dead = 1, censored = 0 (dead) Factor – Duke’s stage at baseline (A, B, C, D, Unknown) Factor – Duke’s stage at baseline (A, B, C, D, Unknown) Select compare factor and logrank Select compare factor and logrank Optionally select plot of survival Optionally select plot of survival

33 Implementing Logrank test in SPSS

34 Select options to obtain plot and median survival Select Compare Factor to obtain logrank test Select linear trend for this test

35 Overall Comparisons Chi-Square dfSig. Log Rank (Mantel-Cox) 80.534 1.000 The vector of trend weights is -2, -1, 0, 1, 2. This is the default. The test for trend in survival across Duke’s stage is highly significant

36 Interpret SPSS output Note the logrank statistic, degrees of freedom and statistical significance (p-value). Note the logrank statistic, degrees of freedom and statistical significance (p-value). Note in which direction survival is worst or best and back up visual information from the Kaplan-Meier plot with median survival and 95% confidence intervals from the output. Note in which direction survival is worst or best and back up visual information from the Kaplan-Meier plot with median survival and 95% confidence intervals from the output. Finally, interpret the results! Finally, interpret the results!

37 Duke’s Stage Median Survival (days) Mean Survival (Days) A27701978 B17491866 C11201304 D375646 Unknown5811297 Interpret test result in relation to median survival

38 Output form Kaplan-Meier in SPSS Note that SPSS gives three possible tests: Logrank, Tarone-Ware and Breslow Logrank, Tarone-Ware and Breslow In general, logrank gives greater weight to later events compared to the other two tests. In general, logrank gives greater weight to later events compared to the other two tests. If all are similar quote logrank test. If all are similar quote logrank test. If different results, quote more than one test result If different results, quote more than one test result

39 Editing SPSS output Note that everything in the SPSS output window can be copied and pasted into Word and Powerpoint. Note that everything in the SPSS output window can be copied and pasted into Word and Powerpoint. Double-clicking on plots also allows editing of the plot such as changing axes, colours, fonts, etc. Double-clicking on plots also allows editing of the plot such as changing axes, colours, fonts, etc.

40 Diabetic patients LDL data Try carrying out extended Crosstabulations and ANOVA where appropriate in the LDL data… Try carrying out extended Crosstabulations and ANOVA where appropriate in the LDL data… E.g. APOE genotype E.g. APOE genotype

41 Colorectal cancer patients: survival following surgery Try carrying out Kaplan- Meier plots and logrank tests for other factors such as WHO Functional Performance, smoking, etc… Try carrying out Kaplan- Meier plots and logrank tests for other factors such as WHO Functional Performance, smoking, etc…

42 Extending test to more than 2 groups Summary Define H 0 and H 1 Define H 0 and H 1 Choosing the appropriate test according to type of variables Choosing the appropriate test according to type of variables Interpret output carefully Interpret output carefully

43


Download ppt "Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research."

Similar presentations


Ads by Google