General Linear Models; Generalized Linear Models Hal Whitehead BIOL4062/5062.

Slides:



Advertisements
Similar presentations
Topic 12: Multiple Linear Regression
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
1 1 Chapter 5: Multiple Regression 5.1 Fitting a Multiple Regression Model 5.2 Fitting a Multiple Regression Model with Interactions 5.3 Generating and.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Research Support Center Chongming Yang
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Probability & Statistical Inference Lecture 9
Regression: (2) Multiple Linear Regression and Path Analysis Hal Whitehead BIOL4062/5062.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Use of regression analysis Regression analysis: –relation between dependent variable Y and one or more independent variables Xi Use of regression model.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Part V The Generalized Linear Model Chapter 16 Introduction.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multipe and non-linear regression. What is what? Regression: One variable is considered dependent on the other(s) Correlation: No variables are considered.
N-way ANOVA. 3-way ANOVA 2 H 0 : The mean respiratory rate is the same for all species H 0 : The mean respiratory rate is the same for all temperatures.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Statistics for Managers Using Microsoft® Excel 5th Edition
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Chapter 11 Multiple Regression.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 7 Forecasting with Simple Regression
Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
Variance and covariance Sums of squares General linear models.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Chapter 8 Forecasting with Multiple Regression
Objectives of Multiple Regression
Regression and Correlation Methods Judy Zhong Ph.D.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Simple Linear Regression
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building.
 Combines linear regression and ANOVA  Can be used to compare g treatments, after controlling for quantitative factor believed to be related to response.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.
Today: Lab 9ab due after lecture: CEQ Monday: Quizz 11: review Wednesday: Guest lecture – Multivariate Analysis Friday: last lecture: review – Bring questions.
Multiple Regression I KNNL – Chapter 6. Models with Multiple Predictors Most Practical Problems have more than one potential predictor variable Goal is.
Linear Model. Formal Definition General Linear Model.
Logistic Regression Database Marketing Instructor: N. Kumar.
Simple Linear Regression (SLR)
Simple Linear Regression (OLS). Types of Correlation Positive correlationNegative correlationNo correlation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Statistics……revisited
KNN Ch. 3 Diagnostics and Remedial Measures Applied Regression Analysis BUSI 6220.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Chap 13-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 13 Multiple Regression and.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
There is a hypothesis about dependent and independent variables The relation is supposed to be linear We have a hypothesis about the distribution of errors.
Variance Stabilizing Transformations. Variance is Related to Mean Usual Assumption in ANOVA and Regression is that the variance of each observation is.
LESSON 4.1. MULTIPLE LINEAR REGRESSION 1 Design and Data Analysis in Psychology II Salvador Chacón Moscoso Susana Sanduvete Chaves.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Chapter 15 Multiple Regression and Model Building
Non-linear relationships
CHAPTER 29: Multiple Regression*
What is Regression Analysis?
Nonlinear Fitting.
Generalized Additive Model
Presentation transcript:

General Linear Models; Generalized Linear Models Hal Whitehead BIOL4062/5062

Transformations Analysis of Covariance General Linear Models Generalized Linear Models Non-Linear Models

Common Transformations Logarithmic: X’=Log(X) –Most common, morphometrics, allometry Squareroot: X’=√X –Counts, Poisson distributed –X’=√(X+0.5) if counts include zeros Arcsine-squareroot: X’=arcsine(√X) –Proportions (or percentages /100) Box-Cox –General transformation

Regression and ANOVA Multiple regression: Y = β 0 + β 1 ·X 1 + β 2 ·X 2 + β 3 ·X 3 + … + Error {X’s are continuous variables} ANOVA: Y = γ 0 + γ 1 ( Z 1 )+ γ 2 ( Z 2 ) + γ 3 ( Z 3 ) + … + Error {Z’s are categorical variables, defining groups}

Analysis of Covariance (mixture of ANOVA and regression) Y = β 0 +β 1 ·X 1 +β 2 ·X 2 +…+γ 1 ( Z 1 )+γ 2 ( Z 2 )+... +Error {X’s are continuous variables} {Z’s are categorical variables, defining groups} Important assumption: Parallelism: β’s the same for all groups Estimate β’s and γ’s using least squares

Analysis of Covariance Data: –Catch rates of sperm whales (per whaling day) by Yankee whalers from logbooks of Yankee whalers off Galapagos Islands Questions: –Was there a significant change in catch rate over this period? –Was there a significant seasonal pattern?

Analysis of Covariance Model: Catch (m,t) = β 0 + β 1 ·t + γ(m) + Error t = [continuous] m = Jan-Feb, Mar-Apr, …, Nov-Dec

Analysis of Covariance Model: Catch (m,t) = β 0 + β 1 · t + γ(m) + Error Parameter estimates: β 0 = [constant] β 1 = [change/yr] γ(Jan-Feb)= γ(Mar-Apr)= γ(May-Jun) = γ(Jul-Aug)= γ(Sep-Oct) = γ(Nov-Dec) = 0.000

Analysis of Covariance Model: Catch (m,t) = β 0 + β 1 · t + γ(m) + Error Analysis of Variance Table: SourceSSdf MS F-ratioP YEAR MONTH Error

Analysis of Covariance Durbin-Watson D Statistic: First Order Autocorrelation: 0.034

General Linear Model: Analysis of Covariance plus Interactions Y = β 0 + β 1 ·X 1 + β 2 ·X 2 + … + γ 1 ( Z 1 ) + γ 2 ( Z 2 ) + … + β 12 ·X 1 ·X 2 + … + γ 12 ( Z 1, Z 2 ) + … + α 12 ( Z 1 )·X 1 + … + Error {X’s are continuous variables} {Z’s are categorical variables, defining groups}

Characteristics of General Linear Models The response Y has a normal distribution with vector mean μ and SD σ 2. A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). The model equates the two as: μ = X·b

General Linear Models Coefficients (β’s, γ’s, α’s), and fit of model (σ² or r²) estimated using least squares Subsets of predictor variables may be selected using stepwise methods, etc. Beware: –Collinearity –Empty or nearly-empty cells (combinations of categorical variables with few units)

General Linear Model Data: –Movements of sperm whales (displacement per 12-hr) off Galapagos Islands with year, clan, and shit rate Questions: –Are movements of sperm whales affected by year, clan, shit rate or combinations of them?

General Linear Model Potential X variables: Year (Categorical: 1987 and 1989) Clan (Categorical: ‘Plus-one’ and ‘Regular’) Shit-rate (Continuous, Arcsine-Squareroot transform) Year*Clan Year*Shit-rate Clan*Shit-rate

General Linear Model X variables selected by stepwise selection (P-to-enter = 0.15/ P-to-remove = 0.15) BackwardForwardYearClanShit-rateYear*ClanYear*Shit-rate Clan*Shit-rate

General Linear Model Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan

General Linear Model Why two “best models”? Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan

General Linear Model Which is “best”? Backward Y =c + Clan + Year*Clan Forward Y =c + Shit-rate*Clan r²= d.f. r²= d.f.

General Linear Models The response Y has a normal distribution with vector mean μ and SD σ 2. A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). The model equates the two as: μ = X·b

Generalized Linear Models The response Y has a distribution that may be normal, binomial, Poisson, gamma, or inverse Gaussian, with parameters including a mean µ. A coefficient vector (b=[β’s, γ’s, α’s]) defines a linear combination of the predictors (X’s). A link function f defines the link between the two as : f(μ) = X·b

Generalized linear models Examine assumptions using residuals Examine fit using “deviance”: –a generalization of the residual sum of squares –twice difference of log-likelihoods of model in question and full model –fits of different models can be compared –Related to AIC

Generalized Linear Models: can fit non-linear relationships using ‘link functions’ and can consider non- normal errors MATLAB: glmdemo

Proportion of sexually-mature animals at different weights MATLAB: glmdemo

Two problems with linear regression: 1) probabilities 1 2) clearly non-linear MATLAB: glmdemo

Polynomial Regression better, but also: 1) probabilities 1 2) inflections are not real MATLAB: glmdemo

Instead fit “logistic regression” using generalized linear model and binomial distribution MATLAB: glmdemo Y= 1/(1+e β0+β1 · X )

Compare two generalized linear models MATLAB: glmdemo Y= 1/(1+e β0+β1 · X ) Y= 1/(1+e β0+β1 · X +β2 · X · X ) Difference in deviance =0.70; P=0.40

Examine assumptions using residuals MATLAB: glmdemo

Making predictions: MATLAB: glmdemo

Non-linear models, e.g. Y = c + EXP(ß 0 + ß 1 · X) + E Y = ß 0 + ß 1 · X · [X>X K ] + E More general than generalized linear models But harder to fit: –iterative process –may not converge –non-unique solution –harder to compare

Summary: Methods with One Dependent Variable Simple Linear Regression One-way ANOVA Multiple Linear Regression Multi-way ANOVA Analysis of Covariance General Linear Model Generalized Linear Model Non-Linear Model Increasing Complexity