Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do.

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
Analysis of variance (ANOVA)-the General Linear Model (GLM)
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Linear regression models
Multiple Regression [ Cross-Sectional Data ]
Generalized Linear Models (GLM)
Introduction to Regression Analysis
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Allied Multivariate Biostatistics L6.1 Lecture 6: Single-classification multivariate ANOVA (k-group.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L10.1 CorrelationCorrelation The underlying principle of correlation analysis.
Chapter 13 Multiple Regression
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Chapter 11 Multiple Regression.
Ch. 14: The Multiple Regression Model building
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
Statistical Methods in Computer Science Hypothesis Testing II: Single-Factor Experiments Ido Dagan.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Lorelei Howard and Nick Wright MfD 2008
Introduction to Regression Analysis, Chapter 13,
Relationships Among Variables
Hypothesis tests for slopes in multiple linear regression model Using the general linear test and sequential sums of squares.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
Regression and Correlation Methods Judy Zhong Ph.D.
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 11/10/2015 2:54 PM 1 Multiple linear regression When and why.
Introduction to Linear Regression
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
One-Way ANOVA ANOVA = Analysis of Variance This is a technique used to analyze the results of an experiment when you have more than two groups.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Analysis of Variance.
Lecture 10: Correlation and Regression Model.
Environmental Modeling Basic Testing Methods - Statistics III.
The general linear test approach to regression analysis.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L11.1 Simple linear regression What regression analysis does The simple.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 22/11/ :12 AM 1 Contingency tables and log-linear models.
Stats Methods at IC Lecture 3: Regression.
The simple linear regression model and parameter estimation
Chapter 14 Introduction to Multiple Regression
Inference for Least Squares Lines
Multiple Regression Analysis and Model Building
Correlation and Regression
CHAPTER 29: Multiple Regression*
Prepared by Lee Revere and John Large
Multiple Regression Models
MOHAMMAD NAZMUL HUQ, Assistant Professor, Department of Business Administration. Chapter-16: Analysis of Variance and Covariance Relationship among techniques.
Statistical Inference about Regression
Lecture 7: Single classification analysis of variance (ANOVA)
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.1 Lecture 12: Generalized Linear Models (GLM) What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions What are they? When do we use it? The full model The ANCOVA model The common regression model The extra sum of squares principle Assumptions

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.2 What are General(ized) Linear Models GLMs are models of the form: with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms. GLMs are models of the form: with Y, a vector of dependent variables, b, a vector of estimated coefficients, X, a vector of independent variables and e, a vector of error terms. Multivariate models Simple linear regression Multiple regression Analysis of variance (ANOVA) Analysis of variance (ANOVA) Analysis of covariance (ANCOVA) Analysis of covariance (ANCOVA)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.3 Some GLM procedures *either categorical or treated as a categorical variable

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.4 When do we use ANCOVA? to compare the relationship between a dependent (Y) and independent (X 1 ) variable for different levels of one or more categorical variables (X 2 ) e.g. relationship between body mass (Y) and body size (X 1 ) for different taxonomic groups (birds & mammals, X 2 ) to compare the relationship between a dependent (Y) and independent (X 1 ) variable for different levels of one or more categorical variables (X 2 ) e.g. relationship between body mass (Y) and body size (X 1 ) for different taxonomic groups (birds & mammals, X 2 ) Body size Body mass Body size

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.5 When do we use ANCOVA? In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables... …otherwise, one is comparing apples and oranges! In doing comparisons, we assume that the qualitative form of the model is the same for all levels of the categorical variables... …otherwise, one is comparing apples and oranges! Level 1 of X 2 Level 2 of X 2 Y Qualitatively similar models X1X1 Y Qualitatively different models

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.6 When do we use ANCOVA? ANCOVA is used to compare linear models … … although ANCOVA-like extensions have been developed for nonlinear models. ANCOVA is used to compare linear models … … although ANCOVA-like extensions have been developed for nonlinear models. Level 1 of X 2 Level 2 of X 2 X1X1 Y Non- linear models X1X1 Y Linear models

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.7 The simple regression model The regression model is: So, all simple regression models are described by 2 parameters, the intercept (  ) and slope (b). The regression model is: So, all simple regression models are described by 2 parameters, the intercept (  ) and slope (b).  =  Y  X (slope)  (intercept) Observed Expected X XX YY ii XiXi YiYi

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.8 Simple GLMs Two linear models may differ as follows: differences in both intercepts (  ) and slopes (  ) n different intercepts but the same slopes (ANCOVA model) Two linear models may differ as follows: differences in both intercepts (  ) and slopes (  ) n different intercepts but the same slopes (ANCOVA model) X1X1 Y Different  &  X1X1 Y Different , same 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.9 Simple GLMs Two linear models may also differ as follows: different slopes (  ) but the same intercepts (  ) n same slopes and intercepts (common regression model) Two linear models may also differ as follows: different slopes (  ) but the same intercepts (  ) n same slopes and intercepts (common regression model) X1X1 Y Same  different  X1X1 Y Same , same 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.10 Fitting GLMs Proceeds in hierarchical fashion fitting the most complex model first. Evaluate significance of a term by fitting two models: one with the term in, the other with it removed. Test for change in model fit (  MF) associated with removal of the term in question. Proceeds in hierarchical fashion fitting the most complex model first. Evaluate significance of a term by fitting two models: one with the term in, the other with it removed. Test for change in model fit (  MF) associated with removal of the term in question. Model A (term in) Model B (term out)  MF Delete term (  small) Retain term (  large)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.11 Model fitting: evaluating the significance of model terms Fit higher order model (hom) including all possible terms; retain SS residual and MS residual. Fit reduced model (rm), retain SS residual. Test for significance of removed term by computing: Fit higher order model (hom) including all possible terms; retain SS residual and MS residual. Fit reduced model (rm), retain SS residual. Test for significance of removed term by computing: Higher order model Reduced model F Delete term (p  ) Retain term (p  )

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.12 The full model with 2 independent variables The full model is:  i is the slope of the regression of Y on X 1 (the covariate) estimated for level i of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. The full model is:  i is the slope of the regression of Y on X 1 (the covariate) estimated for level i of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. Level 1 of variable X 2 Level 2 of variable X 2 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.13 The full model : null hypotheses For the full model with 2 independent variables, there are 3 null hypotheses: Level 1 of variable X 2 Level 2 of variable X 2 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.14 YYY

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.15 Assumptions for full model hypothesis testing Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.16 ProcedureProcedure Fit full model, test for differences among slopes. If H 02 rejected, run separate regressions for each level of categorical variable(s). If H 02 accepted, proceed to fit ANCOVA model. Fit full model, test for differences among slopes. If H 02 rejected, run separate regressions for each level of categorical variable(s). If H 02 accepted, proceed to fit ANCOVA model. Level 1 of variable X 2 Level 2 of variable X 2 ANCOVA Separate regressions H 02 acceptedH 02 rejected X1X1 Y

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.17 The full model is:  is the slope of the regression of Y on X 1 (the covariate) pooled over levels of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. The full model is:  is the slope of the regression of Y on X 1 (the covariate) pooled over levels of the categorical variable X 2.  i is the difference between the mean of each level i of the categorical variable X 2 and the overall mean. The ANCOVA model with 2 independent variables Level 1 of variable X 2 Level 2 of variable X 2 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.18 The ANCOVA model: null hypotheses For the ANCOVA model with 2 independent variables, there are 2 null hypotheses: Level 1 of variable X 2 Level 2 of variable X 2 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.19 YYY

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.20 Assumptions for hypothesis testing in ANCOVA model Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. The slope of the regression of Y on X 1 (the covariate) is the same for all levels of the categorical variable X 2 (not an assumption for full model!). Residuals are independent and normally distributed. Residual variance is equal for all values of X and independent of the value of the categorical variable (homoscedasticity). No error in independent variables Relationship between Y and covariate is linear. The slope of the regression of Y on X 1 (the covariate) is the same for all levels of the categorical variable X 2 (not an assumption for full model!).

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.21 Fit ANCOVA model; test for differences among intercepts. If H 01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X 2 ). If H 01 accepted, proceed to fit common regression model. Fit ANCOVA model; test for differences among intercepts. If H 01 rejected, do multiple comparisons to see which intercepts differ (if there are more than 2 levels for X 2 ). If H 01 accepted, proceed to fit common regression model. ProcedureProcedure Level 1 of variable X 2 Level 2 of variable X 2 Common regression Multiple comparisons H 01 acceptedH 01 rejected X1X1 Y

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.22 The model is:  is the slope of the regression of Y on X 1 pooled over levels of the categorical variable X 2.  is the pooled intercept. is the pooled average of X 1. The model is:  is the slope of the regression of Y on X 1 pooled over levels of the categorical variable X 2.  is the pooled intercept. is the pooled average of X 1. The common regression model with 2 independent variables  Level 1 of variable X 2 Level 2 of variable X 2

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.23 The common regression model : null hypotheses For the common regression model, there are 2 null hypotheses: Level 1 of variable X 2 Level 2 of variable X 2 

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.24 Assumptions for hypothesis testing in common regression model Residuals are independent and normally distributed. Residual variance is equal for all values of X. No error in independent variable Relationship between Y and X is linear. Residuals are independent and normally distributed. Residual variance is equal for all values of X. No error in independent variable Relationship between Y and X is linear.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.25 Example 1: effects of sex and age on sturgeon size at The Pas Males Females

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.26 AnalysisAnalysis Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels). Q1: is slope of regression of LFKL on LAGE the same for both sexes? Log(forklength)(LFKL) is dependent variable; log(age) (LAGE) is the covariate, and sex (SEX$) is the categorical variable (2 levels). Q1: is slope of regression of LFKL on LAGE the same for both sexes? Females Males

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.27 Effects of sex and age on size of sturgeon at The Pas

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.28 AnalysisAnalysis Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H 03 ) since p(SEX$*LAGE) >.05. Q2: is intercept the same for both males and females? Conclusion 1: slope of regression of LFKL on LAGE is the same for both sexes (accept H 03 ) since p(SEX$*LAGE) >.05. Q2: is intercept the same for both males and females? Females Males

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.29 Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.30 AnalysisAnalysis Conclusion 2: Intercept is the same for both males and females. H 02 is accepted since p(SEX$ > 0.05), implying that… …best model is common regression model. Note that reduction in fit (R 2 ) from full model to ANCOVA model is negligible (.697 to.696) indicating that deleting a model term has a negligible impact on model fit. Conclusion 2: Intercept is the same for both males and females. H 02 is accepted since p(SEX$ > 0.05), implying that… …best model is common regression model. Note that reduction in fit (R 2 ) from full model to ANCOVA model is negligible (.697 to.696) indicating that deleting a model term has a negligible impact on model fit. Females Males

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.31 Effects of sex and age on size of sturgeon at The Pas (common regression)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.32 Example 2: Effect of location and age on sturgeon size LFKL

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.33 AnalysisAnalysis Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels). Q: is slope of regression of LFKL on LAGE the same at both locations? Log(forklength)(LFKL) is dependent variable; log(age) (LAGE)is the covariate, and location (SEX$) is the categorical variable (2 levels). Q: is slope of regression of LFKL on LAGE the same at both locations? Nelson River Lake of the Woods LFKL

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.34 Effect of location and age on sturgeon size

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.35 AnalysisAnalysis Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H 03 ) since p(LOCATION$*LAGE) <.05. So, should fit individual regressions for each location. Conclusion: slope of regression of LFKL on LAGE is different at the two locations (reject H 03 ) since p(LOCATION$*LAGE) <.05. So, should fit individual regressions for each location. Nelson River Lake of the Woods LFKL

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.36 What do you do if? More than 2 levels of categorical variable? Follow above procedure but if H 03 (same slope) rejected, do pairwise contrasts of individual slopes. If H 03 accepted but H 02 (same intercepts) rejected, do pairwise comparisons of intercepts. Always control for experiment-wise Type I error rate. Y X

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.37 What do you do if? Biological hypothesis implies one-tailed null(s)? Follow above procedure but if H 03 (same slope) rejected, do one-tailed pairwise contrasts of individual slopes. If H 03 accepted but H 02 (same intercepts) rejected, do one-tailed pairwise comparisons of intercepts. Y X

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.38 Power analysis in GLM In any GLM, hypotheses are tested by means of an F-test. Remember: the appropriate SS error and df error depends on the type of analysis and the hypothesis under investigation. Knowing F, we can compute R 2, the proportion of the total variance in Y explained by the factor (source) under consideration. In any GLM, hypotheses are tested by means of an F-test. Remember: the appropriate SS error and df error depends on the type of analysis and the hypothesis under investigation. Knowing F, we can compute R 2, the proportion of the total variance in Y explained by the factor (source) under consideration.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.39 Partial and total R 2 The total R 2 (R 2 YB ) is the proportion of variance in Y accounted for (explained by) a set of independent variables B. The partial R 2 (R 2 YA,B - R 2 YA ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed. The total R 2 (R 2 YB ) is the proportion of variance in Y accounted for (explained by) a set of independent variables B. The partial R 2 (R 2 YA,B - R 2 YA ) is the proportion of variance in Y accounted for by B when the variance accounted for by another set A is removed. Proportion of variance accounted for by both A and B (R 2 YA,B ) Proportion of variance accounted for by A only (R 2 YA )(total R 2 ) Proportion of variance accounted for by B independent of A (R 2 YA,B - R 2 YA ) (partial R 2 )

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.40 Partial and total R 2 The total R 2 (R 2 YB ) for set B equals the partial R 2 (R 2 YA,B - R 2 YA ) for set B if either (1) the total R 2 for A (R 2 YA ) is zero; or (2) if A and B are independent (in which case R 2 YA,B = R 2 YA + R 2 YB ). Proportion of variance accounted for by B (R 2 YB )(total R 2 ) Proportion of variance independent of A (R 2 YA,B - R 2 YA ) (partial R 2 ) A Y B A Equal iff

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.41 Partial and total R 2 In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical). In these cases, set B includes only one variable X and total R 2 (R 2 YB ) = total R 2 (R 2 YX ) and the partial and total R 2 are the same. In simple linear regression and single-factor ANOVA, there is only one independent variable X (either continuous or categorical). In these cases, set B includes only one variable X and total R 2 (R 2 YB ) = total R 2 (R 2 YX ) and the partial and total R 2 are the same. X Y Water temperature (°C) Growth rate (cm/day)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.42 Partial and total R 2 In ANCOVA and multiple- factor ANOVA, there are several independent variables X 1, X 2,... (either continuous or categorical), so set B includes several variables. In this case, the total and partial R 2 may be very different. In ANCOVA and multiple- factor ANOVA, there are several independent variables X 1, X 2,... (either continuous or categorical), so set B includes several variables. In this case, the total and partial R 2 may be very different. X1X1 Y pH = 6.5 pH = 4.5 Water temperature (°C) Growth rate (cm/day)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.43 Example: Partial and total R 2 in ANCOVA Two independent variables: X 1 (continuous) and X 2 (categorical) X1X1 Y X 2 = L 1 X 2 = L 2

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.44 Defining effect size in GLM The effect size, denoted f 2, is given by the ratio of the factor (source) R 2 factor and 1 minus the appropriate error R 2 error. Note: both R 2 factor and R 2 error depend on the null hypothesis under investigation. The effect size, denoted f 2, is given by the ratio of the factor (source) R 2 factor and 1 minus the appropriate error R 2 error. Note: both R 2 factor and R 2 error depend on the null hypothesis under investigation.

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.45 Effects of sex and age on size of sturgeon at The Pas (common regression)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.46 Defining effect size in GLM: case 1 Case 1: a set B is related to Y, and the total R 2 (R 2 YB ) is determined. The error variance proportion is then 1- R 2 YB. H 0 : R 2 YB = 0 Example: effect of age on sturgeon size at The Pas B = {LAGE} Case 1: a set B is related to Y, and the total R 2 (R 2 YB ) is determined. The error variance proportion is then 1- R 2 YB. H 0 : R 2 YB = 0 Example: effect of age on sturgeon size at The Pas B = {LAGE}

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.47 Effects of sex and age on size of sturgeon at The Pas

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.48 Effects of sex and age on size of sturgeon at The Pas (ANCOVA model)

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.49 Defining effect size in GLM: case 2 Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R 2 YA,B - R 2 YA ). The error variance proportion is then 1- R 2 YA,B. H 0 : R 2 YA,B - R 2 YA = 0 Example: effect of SEX$*LAGE on sturgeon size at The Pas B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE} Case 2: the proportion of variance of Y due to B over and above that due to A is determined (R 2 YA,B - R 2 YA ). The error variance proportion is then 1- R 2 YA,B. H 0 : R 2 YA,B - R 2 YA = 0 Example: effect of SEX$*LAGE on sturgeon size at The Pas B ={SEX$*LAGE}, A,B = {SEX$, LAGE, SEX$*LAGE}

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.50 Determining power Once f 2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non- central F parameter . Knowing  and factor (source) ( 1 ) and error ( 2 ) degrees of freedom, we can determine power from appropriate tables for given . Once f 2 has been determined, either a priori (as an alternate hypothesis) or a posteriori (the observed effect size), calculate non- central F parameter . Knowing  and factor (source) ( 1 ) and error ( 2 ) degrees of freedom, we can determine power from appropriate tables for given .  =.05)  =.01) Decreasing 2 1-  1 = 2  =  =

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.51 Example: effect of pH and nutrient levels on growth rate of bass Sample of 35 lakes 3 pH levels: acid, circumneutral, basic For each lake, an estimate of growth rate is obtained (e.g. from size-age regression). What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given  =.05? Sample of 35 lakes 3 pH levels: acid, circumneutral, basic For each lake, an estimate of growth rate is obtained (e.g. from size-age regression). What is probability of detecting a true effect size as large as the sample effect size for pH*N once effects of N and pH have been controlled for, given  =.05?

Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L12.52 Example: effect of pH and nutrient levels on growth rate of bass Sample effect size f 2 for pH once effects of N and pH*N have been controlled for = 0.14 Source (pH) df = 1 = 2; error df = 2 = = 29 Use tables of  based on R 2 to get power (NOT the same tables as for ANOVA). Sample effect size f 2 for pH once effects of N and pH*N have been controlled for = 0.14 Source (pH) df = 1 = 2; error df = 2 = = 29 Use tables of  based on R 2 to get power (NOT the same tables as for ANOVA).