Introduction to Generalized Linear Model (GLM) Man Li, Research Fellow International Food Policy Research Institute Technical Training for Modeling Scenarios.

Slides:



Advertisements
Similar presentations
Ecole Nationale Vétérinaire de Toulouse Linear Regression
Advertisements

Dummy Dependent variable Models
Residuals Residuals are used to investigate the lack of fit of a model to a given subject. For Cox regression, there’s no easy analog to the usual “observed.
1 What you've always wanted to know about logistic regression analysis, but were afraid to ask... Februari, Gerrit Rooks Sociology of Innovation.
Multilevel Event History Modelling of Birth Intervals
Statistical Analysis SC504/HS927 Spring Term 2008
  Men like Kit Carson and Jim Bridger led the way in the Westward wilderness  They spent their lives trapping beaver and selling their fur Mountain.
Multiple Regression and Model Building
Multinomial Logistic Regression David F. Staples.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
3. Binary Choice – Inference. Hypothesis Testing in Binary Choice Models.
By Nai Saelee Mr. Brown’s Fourth Quarter Project
Simple Logistic Regression
The Donner Party The long journey they went through. It started out in 1846 from Independence of Missouri. About 2,700 people went out on the trial and.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
Log-linear and logistic models
EPI 809/Spring Multiple Logistic Regression.
1 G Lect 11M Binary outcomes in psychology Can Binary Outcomes Be Studied Using OLS Multiple Regression? Transforming the binary outcome Logistic.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Generalized Linear Models
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
C. Logit model, logistic regression, and log-linear model A comparison.
Logistic regression for binary response variables.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Section A-Overview “Westward Expansion” Section B-Alexander Gram Bell Section C-The Donner Party Bibliography.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Logit model, logistic regression, and log-linear model A comparison.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Logistic Regression STA2101/442 F 2014 See last slide for copyright information.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Dummy Variable Regression Models chapter ten.
1 Multivariable Modeling. 2 nAdjustment by statistical model for the relationships of predictors to the outcome. nRepresents the frequency or magnitude.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
9.1 Chapter 9: Dummy Variables A Dummy Variable: is a variable that can take on only 2 possible values: yes, no up, down male, female union member, non-union.
Subjects Review Introduction to Statistical Learning Midterm: Thursday, October 15th :00-16:00 ADV2.
Logistic Regression. Linear regression – numerical response Logistic regression – binary categorical response eg. has the disease, or unaffected by the.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Statistics 2: generalized linear models. General linear model: Y ~ a + b 1 * x 1 + … + b n * x n + ε There are many cases when general linear models are.
Logistic regression (when you have a binary response variable)
Logistic Regression Saed Sayad 1www.ismartsoft.com.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Logistic Regression Hal Whitehead BIOL4062/5062.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
A little VOCAB.  Causation is the "causal relationship between conduct and result". That is to say that causation provides a means of connecting conduct.
Analysis of matched data Analysis of matched data.
BINARY LOGISTIC REGRESSION
The Donner Party The long journey they went through. It started out in 1846 from Independence of Missouri. About 2,700 people went out on the trial and.
Notes on Logistic Regression
Chapter 13 Nonlinear and Multiple Regression
Generalized Linear Models
Introduction to logistic regression a.k.a. Varbrul
Multiple logistic regression
Categorical Data Analysis Review for Final
Introduction to Logistic Regression
Presentation transcript:

Introduction to Generalized Linear Model (GLM) Man Li, Research Fellow International Food Policy Research Institute Technical Training for Modeling Scenarios for Low Emission Development Strategies, September 9 th –20 th, 2013

What is GLM? In statistics, the GLM is a flexible generalization of ordinary linear (OL) regression that allows for response variable (Y) that other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related Y via a LINK FUNCTION, i.e., E(Y) = μ = g -1 (Xβ), where g is the link function s.t. g(μ) = Xβ.

Common distributions with typical uses and canonical link functions DistributionSupport of distribution Typical usesLink name Link function Mean function NormalReal: (- ∞, + ∞ ) Linear-response data IdentifyXβ = μμ = Xβ BernoulliInteger: [0, 1] Outcome of single yes/no occurrence Logit Xβ = log(μ/1-μ) μ = exp(Xβ)/1 +exp(Xβ) BinomialInteger: [0,N] Count of # of “yes” occurrence out of N yes/no occurrences CategoricalK-vector of integer: [0, 1] Outcome of single K- way occurrence Similar but a bit complicat ed MultinomialK-vector of integer: [0, N] Count of occurrences of 1-K types out of N total K-way occurrences

Logit Regression for Binary Responses Example: Survival and gender in the Donner party―an observational study In 1846 the Donner families left Springfield, Illinois for California by covered wagon. When they reached Fort Bridger, Wyoming in July, the Donner party decided to attempt a new and untested route to the Sacramento Valley. Having reached its full size of 87 people and 20 wagons, the party was delayed in the difficult crossing of the Wasatch Range and again in the crossing of the desert west of the Great Salt Lake. The group became stranded in the eastern Sierra Nevada mountains when hit by heavy snows in late October. By the time the last survivor was rescued on 21 April 1847, 40 of the 87 members had died from famine and exposure to extreme cold.

Example: Donner Party Deaths These data were used to study the theory that females are better able to withstand harsh conditions than are males AGESEXSTATUS 23.00MALEDIED 40.00FEMALESURVIVED 40.00MALESURVIVED 30.00MALEDIED 28.00MALEDIED 40.00MALEDIED 45.00FEMALEDIED 62.00MALEDIED 65.00MALEDIED 45.00FEMALEDIED 25.00FEMALEDIED 28.00MALESURVIVED 28.00MALEDIED 23.00MALEDIED 22.00FEMALESURVIVED 23.00FEMALESURVIVED 28.00MALESURVIVED 15.00FEMALESURVIVED 47.00FEMALEDIED 57.00MALEDIED 20.00FEMALESURVIVED … …… Ages and sexes of the adult (over 15 years) in the party

Example: Donner Party Deaths Question: For a given age, were women more likely to survival than were men? If linear model: – Y i |X i = X i β (i.i.d) – Y = 1 if survived, = 0 if died – X = (age, sex)

Ordinary Linear Regression Fitting model: Y = – 0.013*age *I [sex=female]

Ordinary Linear Regression ―with Interaction Term Fitting model: Y = – 0.006*age *I [sex=female] – 0.025*age*I [sex=female]

Logit Regression Model: – Y i |X i ~ Bin(1, π i ) (independent) – g(π i ) = log(π i /1- π i ) = X i β – Y = 1 if survived, = 0 if died – X = (age, sex) – Null model: log odds of survival = β 0 +β 1 age+β 2 I [sex=female]

Possible problems Logit is not a straight line function of age – Do quadratic age term tests separately for males and females (Wald test) X = (age, agesq) Slopes are not the same for males and females – Test for the significance of interaction term (Wald test) X = (age, sex, age*sex) – Alternative to Wald: Likelihood ratio test

Exercise Open R program code that is located at ftp://ftp.cgiar.org/ifpri/leds2013sep/GLM/GL M_code.R ftp://ftp.cgiar.org/ifpri/leds2013sep/GLM/GL M_code.R Load data named “donner” Define indicator variable “survival” and “sex” Draw a scatterplot: survival vs. age by gender

Exercise Estimate the null model, examine the sign and the p-Value of age and sex variables Test for the quadratic term of age by gender group Test for the interaction of sex and age Draw two fitting plots: the null model and the model with interaction term

How the Results look like? H 0 model: log odds of survival = *age+1.597*I [sex=female] H 1 model: log odds of survival = *age+6.928*I [sex=female] – 0.025*age*I [sex=female]

Logit Regression for Multiple Responses Y i |X i ~ Mult(m i, π 1i, π 2i,…, π Ki ), ∑ k π ki = 1 Y = 1,2,…,K. (K-category response) There are K-1 logit models: log(π 1i / π Ki ) = X i β 1 log(π 2i / π Ki ) = X i β 2 … log(π k-1i / π Ki ) = X i β K-1 Note: β K is normalized to be 0 Rewrite the probabilities Pr(Y i = 1) = exp(X i β 1 )/∑ k exp(X i β k ) Pr(Y i = 2) = exp(X i β 2 )/∑ k exp(X i β k ) … Pr(Y i = K-1) = exp(X i β K-1 )/∑ k exp(X i β k ) Pr(Y i = K) = exp(X i β K )/∑ k exp(X i β k )

Logit Regression for Multiple Responses

R Code multinom() function library(nnet) count.matrix <- cbind(Y1,Y2,…,YK) fit <- multinom(count.matrix ~ X1+X2+…, data=, Hess=True)

Some Extensions Conditional logit – X ik is specific to alternative choice, but β does not vary across choice, i.e., X ik β Nested logit – Can be decomposed into two standard logit Mixed logit – Integrals of standard logit probabilities over a density of parameters β See Train (2003) Discrete Choice Methods with Simulation for more discussions