Presentation is loading. Please wait.

Presentation is loading. Please wait.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do.

Similar presentations


Presentation on theme: "Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do."— Presentation transcript:

1 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions

2 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.2 What are General(ized) Linear Models GLMs are models of the form: with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms. GLMs are models of the form: with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms. Multivariate models Simple linear regression Multiple regression Analysis of variance (ANOVA) Analysis of variance (ANOVA) Analysis of covariance (ANCOVA) Analysis of covariance (ANCOVA)

3 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.3 Some GLM procedures *either categorical or treated as a categorical variable

4 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.4 When do we use ANCOVA? to compare the relationship between a dependent (Y) and independent (X 1 ) variable for different levels of one or more categorical variables (X 2 ) e.g. relationship between body mass (Y) and body size (X 1 ) for different taxonomic groups (birds & mammals, X 2 ) to compare the relationship between a dependent (Y) and independent (X 1 ) variable for different levels of one or more categorical variables (X 2 ) e.g. relationship between body mass (Y) and body size (X 1 ) for different taxonomic groups (birds & mammals, X 2 ) Body size Body mass Body size

5 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.5 When do we use ANCOVA? In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables... …otherwise, one is comparing apples and oranges! In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables... …otherwise, one is comparing apples and oranges! Level 1 of X 2 Level 2 of X 2 Y Qualitatively similar models X1X1 Y Qualitatively different models

6 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.6 When do we use ANCOVA? ANCOVA is used to compare linear models … … although ANCOVA-like extensions have been developed for nonlinear models. ANCOVA is used to compare linear models … … although ANCOVA-like extensions have been developed for nonlinear models. Level 1 of X 2 Level 2 of X 2 X1X1 Y Non- linear models X1X1 Y Linear models

7 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.7 The simple regression model The regression model is: So, all simple regression models are described by 2 parameters, the intercept (  ) and slope (b). The regression model is: So, all simple regression models are described by 2 parameters, the intercept (  ) and slope (b).  =  Y  X (slope)  (intercept) Observed Expected X XX YY ii XiXi YiYi

8 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.8 Simple GLMs Two linear models may differ as follows: differences in both intercepts (  ) and slopes (  ) n different intercepts but the same slopes (ANCOVA model) Two linear models may differ as follows: differences in both intercepts (  ) and slopes (  ) n different intercepts but the same slopes (ANCOVA model) X1X1 Y Different  &  X1X1 Y Different , same 

9 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.9 Simple GLMs Two linear models may also differ as follows: different slopes (  ) but the same intercepts (  ) n same slopes and intercepts (common regression model) Two linear models may also differ as follows: different slopes (  ) but the same intercepts (  ) n same slopes and intercepts (common regression model) X1X1 Y Same  different  X1X1 Y Same , same 

10 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.10 Fitting GLMs Proceeds in hierarchical fashion fitting the most complex model first. Evaluate significance of a term by fitting two models: one with the term in, the other with it removed. Test for change in model fit (  MF) associated with removal of the term in question. Proceeds in hierarchical fashion fitting the most complex model first. Evaluate significance of a term by fitting two models: one with the term in, the other with it removed. Test for change in model fit (  MF) associated with removal of the term in question. Model A (term in) Model B (term out)  MF Delete term (  small) Retain term (  large)

11 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.11 Model fitting: evaluating the significance of model terms Fit higher order model (hom) including all possible terms; retain SS residual and MS residual. Fit reduced model (rm), retain SS residual. Test for significance of removed term by computing: Fit higher order model (hom) including all possible terms; retain SS residual and MS residual. Fit reduced model (rm), retain SS residual. Test for significance of removed term by computing: Higher order model Reduced model F Delete term (p  ) Retain term (p  )

12 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.12 The full model with 2 independent variables The full model is:  i is the slope of the regression of Y on X 1 (the covariate) estimated for level i of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. The full model is:  i is the slope of the regression of Y on X 1 (the covariate) estimated for level i of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. Level 1 of variable X 2 Level 2 of variable X 2 

13 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.13 The full model : null hypotheses For the full model with 2 independent variables, there are 3 null hypotheses: Level 1 of variable X 2 Level 2 of variable X 2 

14 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.14 YYY

15 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.15 Assumptions for full model hypothesis testing Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear.

16 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.16 ProcedureProcedure Fit full model, test for differences among slopes. If H 02 rejected, run separate regressions for each level of categorical variable(s). If H 02 accepted, proceed to fit ANCOVA model. Fit full model, test for differences among slopes. If H 02 rejected, run separate regressions for each level of categorical variable(s). If H 02 accepted, proceed to fit ANCOVA model. Level 1 of variable X 2 Level 2 of variable X 2 ANCOVA Separate regressions H 02 acceptedH 02 rejected X1X1 Y

17 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.17 The full model is:  is the slope of the regression of Y on X 1 (the covariate) pooled over levels of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. The full model is:  is the slope of the regression of Y on X 1 (the covariate) pooled over levels of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. The ANCOVA model with 2 independent variables Level 1 of variable X 2 Level 2 of variable X 2 

18 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.18 The ANCOVA model: null hypotheses For the ANCOVA model with 2 independent variables, there are 2 null hypotheses: Level 1 of variable X 2 Level 2 of variable X 2 

19 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.19 YYY

20 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.20 Assumptions for hypothesis testing in ANCOVA model Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. The slope of the regression of Y on X 1 (the covariate) is the same for all levels of the categorical variable X 2 (not an assumption for full model!). Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. The slope of the regression of Y on X 1 (the covariate) is the same for all levels of the categorical variable X 2 (not an assumption for full model!).

21 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.21 Fit ANCOVA model; test for differences among intercepts. If H 01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X 2 ). If H 01 accepted, proceed to fit common regression model. Fit ANCOVA model; test for differences among intercepts. If H 01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X 2 ). If H 01 accepted, proceed to fit common regression model. ProcedureProcedure Level 1 of variable X 2 Level 2 of variable X 2 Common regression Multiple comparisons H 01 acceptedH 01 rejected X1X1 Y

22 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.22 The model is:  is the slope of the regression of Y on X 1 pooled over levels of the categorical variable X 2.  is the pooled intercept. is the pooled average of X 1. The model is:  is the slope of the regression of Y on X 1 pooled over levels of the categorical variable X 2.  is the pooled intercept. is the pooled average of X 1. The common regression model with 2 independent variables  Level 1 of variable X 2 Level 2 of variable X 2

23 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.23 The common regression model : null hypotheses For the common regression model, there are 2 null hypotheses: Level 1 of variable X 2 Level 2 of variable X 2 

24 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.24 Assumptions for hypothesis testing in common regression model Residuals are independent and normally distributed. Residual variance is equal for all values of X. No error in independent variable Relationship between Y and X is linear. Residuals are independent and normally distributed. Residual variance is equal for all values of X. No error in independent variable Relationship between Y and X is linear.

25 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.25 Example 1: effects of sex and age on sturgeon size at The Pas Males Females

26 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.26 AnalysisAnalysis Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels). Q1: is slope of regression of LFKL on LAGE the same for both sexes? Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels). Q1: is slope of regression of LFKL on LAGE the same for both sexes? Females Males

27 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.27 Effects of sex and age on size of sturgeon at The Pas

28 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.28 AnalysisAnalysis Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H 03 ) since p(SEX$*LAGE) >.05. Q2: is intercept the same for both males and females? Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H 03 ) since p(SEX$*LAGE) >.05. Q2: is intercept the same for both males and females? Females Males

29 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.29 Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)

30 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.30 AnalysisAnalysis Conclusion 2: Intercept is the same for both males and females. H 02 is accepted since p(SEX$ > 0.05), implying that… …best model is common regression model. Note that reduction in fit (R 2 ) from full model to ANCOVA model is negligible (.697 to.696) indicating that deleting a model term has a negligible impact on model fit. Conclusion 2: Intercept is the same for both males and females. H 02 is accepted since p(SEX$ > 0.05), implying that… …best model is common regression model. Note that reduction in fit (R 2 ) from full model to ANCOVA model is negligible (.697 to.696) indicating that deleting a model term has a negligible impact on model fit. Females Males

31 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.31 Effects of sex and age on size of sturgeon at The Pas (common regression)

32 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.32 Example 2: Effect of location and age on sturgeon size LFKL

33 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.33 AnalysisAnalysis Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels). Q: is slope of regression of LFKL on LAGE the same at both locations? Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels). Q: is slope of regression of LFKL on LAGE the same at both locations? Nelson River Lake of the Woods LFKL

34 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.34 Effect of location and age on sturgeon size

35 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.35 AnalysisAnalysis Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H 03 ) since p(LOCATION$*LAGE) <.05. So, should fit individual regressions for each location. Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H 03 ) since p(LOCATION$*LAGE) <.05. So, should fit individual regressions for each location. Nelson River Lake of the Woods LFKL

36 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.36 What do you do if? More than 2 levels of categorical variable? Follow above procedure but if H 03 (same slope) rejected, do pairwise contrasts of individual slopes. If H 03 accepted but H 02 (same intercepts) rejected, do pairwise comparisons of intercepts. Always control for experiment-wise Type I error rate. Y X

37 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.37 What do you do if? Biological hypothesis implies one-tailed null(s)? Follow above procedure but if H 03 (same slope) rejected, do one-tailed pairwise contrasts of individual slopes. If H 03 accepted but H 02 (same intercepts) rejected, do one-tailed pairwise comparisons of intercepts. Y X

38 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.38 Power analysis in GLM In any GLM, hypotheses are tested by means of an F-test. Remember: the appropriate SS error and df error depends on the type of analysis and the hypothesis under investigation. Knowing F, we can compute R 2, the proportion of the total variance in Y explained by the factor (source) under consideration. In any GLM, hypotheses are tested by means of an F-test. Remember: the appropriate SS error and df error depends on the type of analysis and the hypothesis under investigation. Knowing F, we can compute R 2, the proportion of the total variance in Y explained by the factor (source) under consideration.

39 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.39 Partial and total R 2 The total R 2 (R 2 YB ) is the proportion of variance in Y accounted for (explained by) a set of independent variables B. The partial R 2 (R 2 YA,B - R 2 YA ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed. The total R 2 (R 2 YB ) is the proportion of variance in Y accounted for (explained by) a set of independent variables B. The partial R 2 (R 2 YA,B - R 2 YA ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed. Proportion of variance accounted for by both A and B (R 2 YA,B ) Proportion of variance accounted for by A only (R 2 YA )(total R 2 ) Proportion of variance accounted for by B independent of A (R 2 YA,B - R 2 YA ) (partial R 2 )

40 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.40 Partial and total R 2 The total R 2 (R 2 YB ) for set B equals the partial R 2 (R 2 YA,B - R 2 YA ) for set B if either (1) the total R 2 for A (R 2 YA ) is zero; or (2) if A and B are independent (in which case R 2 YA,B = R 2 YA + R 2 YB ). Proportion of variance accounted for by B (R 2 YB )(total R 2 ) Proportion of variance independent of A (R 2 YA,B - R 2 YA ) (partial R 2 ) A Y B A Equal iff

41 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.41 Partial and total R 2 In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical). In these cases, set B includes only one variable X and total R 2 (R 2 YB ) = total R 2 (R 2 YX ) and the partial and total R 2 are the same. In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical). In these cases, set B includes only one variable X and total R 2 (R 2 YB ) = total R 2 (R 2 YX ) and the partial and total R 2 are the same. X Y Water temperature (°C) 16 2024 28 0.00 0.04 0.08 0.12 0.16 0.20 Growth rate (cm/day)

42 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.42 Partial and total R 2 In ANCOVA and multiple- factor ANOVA, there are several independent variables X 1, X 2,... (either continuous or categorical), so set B includes several variables. In this case, the total and partial R 2 may be very different. In ANCOVA and multiple- factor ANOVA, there are several independent variables X 1, X 2,... (either continuous or categorical), so set B includes several variables. In this case, the total and partial R 2 may be very different. X1X1 Y pH = 6.5 pH = 4.5 Water temperature (°C) 16202428 0.00 0.04 0.08 0.12 0.16 0.20 Growth rate (cm/day)

43 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.43 Example: Partial and total R 2 in ANCOVA Two independent variables: X 1 (continuous) and X 2 (categorical) X1X1 Y X 2 = L 1 X 2 = L 2

44 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.44 Defining effect size in GLM The effect size, denoted f 2, is given by the ratio of the factor (source) R 2 factor and 1 minus the appropriate error R 2 error. Note: both R 2 factor and R 2 error depend on the null hypothesis under investigation. The effect size, denoted f 2, is given by the ratio of the factor (source) R 2 factor and 1 minus the appropriate error R 2 error. Note: both R 2 factor and R 2 error depend on the null hypothesis under investigation.

45 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.45 Effects of sex and age on size of sturgeon at The Pas (common regression)

46 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.46 Defining effect size in GLM: case 1 Case 1: a set B is related to Y, and the total R 2 (R 2 YB ) is determined. The error variance proportion is then 1- R 2 YB. H 0 : R 2 YB = 0 Example: effect of age on sturgeon size at The Pas B = {LAGE} Case 1: a set B is related to Y, and the total R 2 (R 2 YB ) is determined. The error variance proportion is then 1- R 2 YB. H 0 : R 2 YB = 0 Example: effect of age on sturgeon size at The Pas B = {LAGE}

47 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.47 Effects of sex and age on size of sturgeon at The Pas

48 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.48 Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)

49 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.49 Defining effect size in GLM: case 2 Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R 2 YA,B - R 2 YA ). The error variance proportion is then 1- R 2 YA,B. H 0 : R 2 YA,B - R 2 YA = 0 Example: effect of SEX$*LAGE on sturgeon size at The Pas B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE} Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R 2 YA,B - R 2 YA ). The error variance proportion is then 1- R 2 YA,B. H 0 : R 2 YA,B - R 2 YA = 0 Example: effect of SEX$*LAGE on sturgeon size at The Pas B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE}

50 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.50 Determining power Once f 2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non- central F parameter . Knowing  and factor (source) ( 1 ) and error ( 2 ) degrees of freedom, we can determine power from appropriate tables for given . Once f 2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non- central F parameter . Knowing  and factor (source) ( 1 ) and error ( 2 ) degrees of freedom, we can determine power from appropriate tables for given .  =.05)  =.01) Decreasing 2 1-  1 = 2  =.05 2 3 45  =.01 11.52 2.5

51 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.51 Example: effect of pH and nutrient levels on growth rate of bass Sample of 35 lakes 3 pH levels: acid, circumneutral, basic For each lake, an estimate of growth rate is obtained (e.g. from size-age regression). What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given  =.05? Sample of 35 lakes 3 pH levels: acid, circumneutral, basic For each lake, an estimate of growth rate is obtained (e.g. from size-age regression). What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given  =.05?

52 Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.52 Example: effect of pH and nutrient levels on growth rate of bass Sample effect size f 2 for pH once effects of N and pH*N have been controlled for = 0.14 Source (pH) df = 1 = 2; error df = 2 = 35 - 2 - 2- 1 - 1 = 29 Use tables of  based on R 2 to get power (NOT the same tables as for ANOVA). Sample effect size f 2 for pH once effects of N and pH*N have been controlled for = 0.14 Source (pH) df = 1 = 2; error df = 2 = 35 - 2 - 2- 1 - 1 = 29 Use tables of  based on R 2 to get power (NOT the same tables as for ANOVA).


Download ppt "Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do."

Similar presentations


Ads by Google