Chapter 17.1 Poisson Regression
Classic Poisson Example Number of deaths by horse kick, for each of 16 corps in the Prussian army, from 1875 to 1894 Did the risk of death show an trend across years for the guard corps?
1. Construct Model – Graphical
1. Construct Model - Formal Write General Linear Model: General linear model inappropriate for count data: Variance likely increases with mean Fitted values may be negative Errors tend not to be normal Zeros are difficult to handle with transformations
1. Construct Model - Formal
2. Execute analysis & 3. Evaluate model glm1 <- glm(deaths~year, family=poisson(link=log), data=horsekick)
2. Execute analysis & 3. Evaluate model glm1 <- glm(deaths~year, family=poisson(link=log), data=horsekick)
4.State population and whether sample is representative. 5.Decide on mode of inference. Is hypothesis testing appropriate? 6.State H A / H o pair, tolerance for Type I error Statistic: Distribution:
7. ANODEV. Calculate change in fit (ΔG) due to explanatory variables. The F-statistic is not used for models with non-normal errors We will assess improvement in fit (ANODEV)
7. ANODEV. Calculate change in fit (ΔG) due to explanatory variables. > anova(glm1, test="Chisq") Analysis of Deviance Table Model: poisson, link: log Response: deaths Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL year
8.Assess table in view of evaluation of residuals. – Residuals acceptable 9.Assess table in view of evaluation of residuals. – Reject H A : There was no apparent trend in deaths by horsekick over two decades (ΔG=0.611, p=0.4343) 10.Analysis of parameters of biological interest. – β year was not significant – report mean deaths/yr 16 deaths / 20 years = 0.8 deaths/year
library(pscl) library(Hmisc) prussian horsekick <- subset(prussian, corp=="G") names(horsekick) <- c("deaths","year","corps") glm0 <- glm(deaths ~ 1, family = poisson(link = log), data = horsekick) # intercept only glm1 <- glm(deaths ~ year, family = poisson(link = log), data = horsekick) plot(glm1, which=1, add.smooth=F, pch=16) plot(glm1$residuals, Lag(glm1$residuals), xlab="Residuals", ylab="Lagged residuals", pch=16) plot(deaths~year, data=horsekick, pch=16, axes=F, xlab="Year", ylab="Deaths (Guard corp)") axis(1, at=75:94, labels=1875:1894) axis(2, at=0:3) box() lines(horsekick$year, glm1$fitted) # with regression term lines(horsekick$year, glm0$fitted, lty=2) # intercept anova(glm1, test="Chisq")