Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Steps in Hypothesis Testing:
Advertisements

Analysis of variance (ANOVA)-the General Linear Model (GLM)
Departments of Medicine and Biostatistics
Simple Repeated measures Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Analysis of Variance Chapter 16.
Statistical Tests Karen H. Hagglund, M.S.
Chapter Seventeen HYPOTHESIS TESTING
Statistical presentation in international scientific publications 6. Reporting more complicated findings Malcolm Campbell Lecturer in Statistics, School.
Differences Between Group Means
Final Review Session.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Assessing Survival: Cox Proportional Hazards Model Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Non-Parametric Methods Professor of Epidemiology and Biostatistics
Inferential Statistics
Leedy and Ormrod Ch. 11 Gray Ch. 14
Chapter 12: Analysis of Variance
AS 737 Categorical Data Analysis For Multivariate
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Testing Group Difference
Business Research Methods William G. Zikmund Chapter 22: Bivariate Analysis - Tests of Differences.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Statistical Analysis I have all this data. Now what does it mean?
ANALYSIS OF VARIANCE. Analysis of variance ◦ A One-way Analysis Of Variance Is A Way To Test The Equality Of Three Or More Means At One Time By Using.
PS 225 Lecture 15 Analysis of Variance ANOVA Tables.
Inferential Statistics: SPSS
Chapter 13: Inference in Regression
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
© Buddy Freeman, 2015 H 0 : H 1 : α = Decision Rule: If then do not reject H 0, otherwise reject H 0. Test Statistic: Decision: Conclusion: We have found.
Choosing and using statistics to test ecological hypotheses
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Assessing Survival: Cox Proportional Hazards Model
SIMPLE TWO GROUP TESTS Prof Peter T Donnan Prof Peter T Donnan.
A Repertoire of Hypothesis Tests  z-test – for use with normal distributions and large samples.  t-test – for use with small samples and when the pop.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
ANOVA (Analysis of Variance) by Aziza Munir
Exploring Marketing Research William G. Zikmund Chapter 22: Bivariate Statistics- Tests of Differences.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Simple Repeated measures Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Section 9-1: Inference for Slope and Correlation Section 9-3: Confidence and Prediction Intervals Visit the Maths Study Centre.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
ANOVA: Analysis of Variance.
CHI SQUARE TESTS.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Non-parametric Tests e.g., Chi-Square. When to use various statistics n Parametric n Interval or ratio data n Name parametric tests we covered Tuesday.
WINKS 7 Tutorial 3 Analyzing Summary Data (Using Student’s t-test) Permission granted for use for instruction and for personal use. ©
Soc 3306a Lecture 7: Inference and Hypothesis Testing T-tests and ANOVA.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 10-1 Chapter 10 Two-Sample Tests and One-Way ANOVA Business Statistics, A First.
One-way ANOVA Example Analysis of Variance Hypotheses Model & Assumptions Analysis of Variance Multiple Comparisons Checking Assumptions.
STATS 10x Revision CONTENT COVERED: CHAPTERS
Analysis of variance Tron Anders Moger
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Objectives (BPS chapter 12) General rules of probability 1. Independence : Two events A and B are independent if the probability that one event occurs.
Research Methods William G. Zikmund Bivariate Analysis - Tests of Differences.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Dr Hidayathulla Shaikh. Objectives At the end of the lecture student should be able to – Discuss normal curve Classify parametric and non parametric tests.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
I. ANOVA revisited & reviewed
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Statistical Inference for more than two groups
Presentation transcript:

Statistical Inference for more than two groups Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research

Tests to be covered Chi-squared test Chi-squared test One-way ANOVA One-way ANOVA Logrank test Logrank test

Significance testing – general overview 1.Define the null and alternative hypotheses under the study 2.Acquire data 3.Calculate the value of the test statistic 4.Compare the value of the test statistic to values from a known probability distribution 5.Interpret the p-value and draw conclusion

Categorical data > 2 groups Unordered categories – Nominal - Chi-squared test for association Ordered categories - Ordinal - Chi squared test for - Chi squared test for trend trend

Example Does the proportion of mothers developing pre-eclampsia vary by parity (birth order)?

Pre- eclampsia Birth Order 1 st 2 nd 3 rd 4 th 1 st 2 nd 3 rd 4 th No No Yes Yes 1170 (79.4%) 278 (84.8%) 83 (86.5%) 86 (92.4%) 304 (20.6%) 50 (15.2%) 13 (13.5%) 7(7.5%) Contingency table (r x c) (r x c)

1.Null hypothesis: No association between pre- eclampsia and birth order 2.Null hypothesis: There is no trend in pre-eclampsia with parity Null Hypotheses

Test of association Test of linear trend

1.Strong association between pre- eclampsia and birth order (Χ 2 = 15.42, p = 0.001) 2.Significant linear trend in incidence of pre-eclampsia with parity (Χ 2 = 15.03, p < 0.001) 3.Note 3 degrees of freedom for association test and 1 df for test for trend Conclusions

Pre- eclampsia Birth Order 1 st 2 nd 3 rd 4 th 1 st 2 nd 3 rd 4 th No No Yes Yes 1170 (79.4%) 278 (84.8%) 83 (86.5%) 86 (92.4%) 304 (20.6%) 50 (15.2%) 13 (13.5%) 7(7.5%) Contingency table (r x c) (r x c)

1.Tables can be any size. For example SIMD deciles by parity would be a 10 x 4 table 2.But with very large tables difficult to interpret tests of association 3.Crosstabulations in SPSS can give Odds ratios as an option with row or column with two categories Contingency Tables (r x c)

Numerical data > 2 groups Compare means from several groups Single global test of difference in means Also test for linear trend 1-way analysis of variance (ANOVA)

Extend t-test to >2 groups i.e Analysis of Variance (ANOVA) Consider scores for contribution to energy intake from fat groups, milk groups and alcohol groups Does the mean score differ across the three categories of intake groups? Koh ET, Owen WL. Introduction to Nutrition and Health Research Kluwer Boston, 2000

One-Way ANOVA of scores Contributor to Energy Intake Alcohol n=6Mean=4.22n=6Mean=0.167 FatMilk n=6Mean=2.01

One-Way ANOVA of Scores The null hypothesis (H 0 ) is ‘there are no differences in mean score across the three groups’ Use SPSS One-Way ANOVA to carry out this test

Assumptions of 1-Way ANOVA 1. Standard deviations are similar 2. Test variable (scores) are approx. Normally distributed If assumptions are not met, use non- parametric equivalent Kruskal-Wallis test

Results of ANOVA ANOVA partitions variation into Within and Between group components Results in F-statistic – compared with values in F-tables F = 108.6, with 2 and 15 df, p<0.001

Results of ANOVA The groups differ significantly and it is clear the Fat group contributes most to energy score with a mean = 4.22 Further pair-wise comparisons can be made (3 possible) using multiple comparisons test e.g. Bonferroni

Example 2 Does income vary by highest level of education achieved?

H 0 : no difference in mean income by education level income by education level achieved achieved H 1 : mean income varies with education level achieved education level achieved Null Hypothesis and alternative

Assumptions of 1-Way ANOVA 1. Standard deviations or variances are similar 2. Test variable (income) are approx. Normally distributed If assumptions are not met, use non- parametric equivalent Kruskal-Wallis test

Table of Mean income for each level of educational achievement

Analysis of Variance Table F-test gives P < showing significant difference between mean levels of education

Table of each pairwise comparison. Note lower income for ‘did not complete school’ to all other groups. All p-values adjusted for multiple comparisons

Summary of ANOVA ANOVA useful if number of groups with continuous summary in each SPSS does all pairwise group comparisons adjusted for multiple testing Note that ANOVA is just a form of linear regression – see later

Extending Kaplan-Meier and logrank test in SPSS You need to specify: Survival time – time from surgery (tfsurg) Survival time – time from surgery (tfsurg) Status – Dead = 1, censored = 0 (dead) Status – Dead = 1, censored = 0 (dead) Factor – Duke’s stage at baseline (A, B, C, D, Unknown) Factor – Duke’s stage at baseline (A, B, C, D, Unknown) Select compare factor and logrank Select compare factor and logrank Optionally select plot of survival Optionally select plot of survival

Implementing Logrank test in SPSS

Select options to obtain plot and median survival Select Compare Factor to obtain logrank test Select linear trend for this test

Overall Comparisons Chi-Square dfSig. Log Rank (Mantel-Cox) The vector of trend weights is -2, -1, 0, 1, 2. This is the default. The test for trend in survival across Duke’s stage is highly significant

Interpret SPSS output Note the logrank statistic, degrees of freedom and statistical significance (p-value). Note the logrank statistic, degrees of freedom and statistical significance (p-value). Note in which direction survival is worst or best and back up visual information from the Kaplan-Meier plot with median survival and 95% confidence intervals from the output. Note in which direction survival is worst or best and back up visual information from the Kaplan-Meier plot with median survival and 95% confidence intervals from the output. Finally, interpret the results! Finally, interpret the results!

Duke’s Stage Median Survival (days) Mean Survival (Days) A B C D Unknown Interpret test result in relation to median survival

Output form Kaplan-Meier in SPSS Note that SPSS gives three possible tests: Logrank, Tarone-Ware and Breslow Logrank, Tarone-Ware and Breslow In general, logrank gives greater weight to later events compared to the other two tests. In general, logrank gives greater weight to later events compared to the other two tests. If all are similar quote logrank test. If all are similar quote logrank test. If different results, quote more than one test result If different results, quote more than one test result

Editing SPSS output Note that everything in the SPSS output window can be copied and pasted into Word and Powerpoint. Note that everything in the SPSS output window can be copied and pasted into Word and Powerpoint. Double-clicking on plots also allows editing of the plot such as changing axes, colours, fonts, etc. Double-clicking on plots also allows editing of the plot such as changing axes, colours, fonts, etc.

Diabetic patients LDL data Try carrying out extended Crosstabulations and ANOVA where appropriate in the LDL data… Try carrying out extended Crosstabulations and ANOVA where appropriate in the LDL data… E.g. APOE genotype E.g. APOE genotype

Colorectal cancer patients: survival following surgery Try carrying out Kaplan- Meier plots and logrank tests for other factors such as WHO Functional Performance, smoking, etc… Try carrying out Kaplan- Meier plots and logrank tests for other factors such as WHO Functional Performance, smoking, etc…

Extending test to more than 2 groups Summary Define H 0 and H 1 Define H 0 and H 1 Choosing the appropriate test according to type of variables Choosing the appropriate test according to type of variables Interpret output carefully Interpret output carefully