Download presentation
Presentation is loading. Please wait.
Published byDoreen Hill Modified over 9 years ago
1
General Linear Models; Generalized Linear Models Hal Whitehead BIOL4062/5062
2
Transformations Analysis of Covariance General Linear Models Generalized Linear Models Non-Linear Models
3
Common Transformations Logarithmic: X’=Log(X) –Most common, morphometrics, allometry Squareroot: X’=√X –Counts, Poisson distributed –X’=√(X+0.5) if counts include zeros Arcsine-squareroot: X’=arcsine(√X) –Proportions (or percentages /100) Box-Cox –General transformation
4
Regression and ANOVA Multiple regression: Y = β 0 + β 1 ·X 1 + β 2 ·X 2 + β 3 ·X 3 + … + Error {X’s are continuous variables} ANOVA: Y = γ 0 + γ 1 ( Z 1 )+ γ 2 ( Z 2 ) + γ 3 ( Z 3 ) + … + Error {Z’s are categorical variables, defining groups}
5
Analysis of Covariance (mixture of ANOVA and regression) Y = β 0 +β 1 ·X 1 +β 2 ·X 2 +…+γ 1 ( Z 1 )+γ 2 ( Z 2 )+... +Error {X’s are continuous variables} {Z’s are categorical variables, defining groups} Important assumption: Parallelism: β’s the same for all groups Estimate β’s and γ’s using least squares
6
Analysis of Covariance Data: –Catch rates of sperm whales (per whaling day) by Yankee whalers from logbooks of Yankee whalers off Galapagos Islands 1830-1850 Questions: –Was there a significant change in catch rate over this period? –Was there a significant seasonal pattern?
7
Analysis of Covariance Model: Catch (m,t) = β 0 + β 1 ·t + γ(m) + Error t =1830-1850 [continuous] m = Jan-Feb, Mar-Apr, …, Nov-Dec
8
Analysis of Covariance Model: Catch (m,t) = β 0 + β 1 · t + γ(m) + Error Parameter estimates: β 0 = 4.528 [constant] β 1 =-0.002 [change/yr] γ(Jan-Feb)= 0.016 γ(Mar-Apr)= 0.013 γ(May-Jun) =-0.038 γ(Jul-Aug)=-0.020 γ(Sep-Oct) = 0.000 γ(Nov-Dec) = 0.000
9
Analysis of Covariance Model: Catch (m,t) = β 0 + β 1 · t + γ(m) + Error Analysis of Variance Table: SourceSSdf MS F-ratioP YEAR0.01410.0143.6530.061 MONTH0.03450.0071.7820.131 Error0.220570.004
10
Analysis of Covariance Durbin-Watson D Statistic: 1.923 First Order Autocorrelation: 0.034
11
General Linear Model: Analysis of Covariance plus Interactions Y = β 0 + β 1 ·X 1 + β 2 ·X 2 + … + γ 1 ( Z 1 ) + γ 2 ( Z 2 ) + … + β 12 ·X 1 ·X 2 + … + γ 12 ( Z 1, Z 2 ) + … + α 12 ( Z 1 )·X 1 + … + Error {X’s are continuous variables} {Z’s are categorical variables, defining groups}
12
Characteristics of General Linear Models The response Y has a normal distribution with vector mean μ and SD σ 2. A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). The model equates the two as: μ = X·b
13
General Linear Models Coefficients (β’s, γ’s, α’s), and fit of model (σ² or r²) estimated using least squares Subsets of predictor variables may be selected using stepwise methods, etc. Beware: –Collinearity –Empty or nearly-empty cells (combinations of categorical variables with few units)
14
General Linear Model Data: –Movements of sperm whales (displacement per 12-hr) off Galapagos Islands with year, clan, and shit rate Questions: –Are movements of sperm whales affected by year, clan, shit rate or combinations of them?
15
General Linear Model Potential X variables: Year (Categorical: 1987 and 1989) Clan (Categorical: ‘Plus-one’ and ‘Regular’) Shit-rate (Continuous, Arcsine-Squareroot transform) Year*Clan Year*Shit-rate Clan*Shit-rate
16
General Linear Model X variables selected by stepwise selection (P-to-enter = 0.15/ P-to-remove = 0.15) BackwardForwardYearClanShit-rateYear*ClanYear*Shit-rate Clan*Shit-rate
17
General Linear Model Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan
18
General Linear Model Why two “best models”? Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan 1987 1989
19
General Linear Model Which is “best”? Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan 1987 1989 r²=0.264 2 d.f. r²=0.347 1 d.f.
20
General Linear Models The response Y has a normal distribution with vector mean μ and SD σ 2. A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). The model equates the two as: μ = X·b
21
Generalized Linear Models The response Y has a distribution that may be normal, binomial, Poisson, gamma, or inverse Gaussian, with parameters including a mean µ. A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). A link function f defines the link between the two as : f(μ) = X·b
22
Generalized linear models Examine assumptions using residuals Examine fit using “deviance”: –a generalization of the residual sum of squares –twice difference of log-likelihoods of model in question and full model –fits of different models can be compared –Related to AIC
23
Generalized Linear Models: can fit non-linear relationships using ‘link functions’ and can consider non- normal errors MATLAB: glmdemo
24
Proportion of sexually-mature animals at different weights MATLAB: glmdemo
25
Two problems with linear regression: 1) probabilities 1 2) clearly non-linear MATLAB: glmdemo
26
Polynomial Regression better, but also: 1) probabilities 1 2) inflections are not real MATLAB: glmdemo
27
Instead fit “logistic regression” using generalized linear model and binomial distribution MATLAB: glmdemo Y= 1/(1+e β0+β1 · X )
28
Compare two generalized linear models MATLAB: glmdemo Y= 1/(1+e β0+β1 · X ) Y= 1/(1+e β0+β1 · X +β2 · X · X ) Difference in deviance =0.70; P=0.40
29
Examine assumptions using residuals MATLAB: glmdemo
30
Making predictions: MATLAB: glmdemo
31
Non-linear models, e.g. Y = c + EXP(ß 0 + ß 1 · X) + E Y = ß 0 + ß 1 · X · [X>X K ] + E More general than generalized linear models But harder to fit: –iterative process –may not converge –non-unique solution –harder to compare
32
Summary: Methods with One Dependent Variable Simple Linear Regression One-way ANOVA Multiple Linear Regression Multi-way ANOVA Analysis of Covariance General Linear Model Generalized Linear Model Non-Linear Model Increasing Complexity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.