Download presentation
Presentation is loading. Please wait.
Published byJames Webster Modified over 8 years ago
1
一般化線形モデル( GLM ) generalized linear Models データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店
2
Generalized Linear Models Linear Model response variable ~ intercept + slope * explanatory variable lm(y~ x + f ・・・ ) , lm(y~x + f -1) (no intercept) require(graphics) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2,10,20, labels=c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D90) opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted,... Par(opar) ### less simple examples in "See Also" above
3
Generalized Linear Models Linear Model response variable ~ intercept + slope * explanatory variable lm(y~ x + f ・・・ ) , lm(y~x + f -1) (no intercept) Generalized Linear Model Model &Link function ~ intercept + slope * explanatory variable glm(y ~ x, data = d, family = poisson)
4
Poisson Model ( counting data of occurrence) Poisson Model λ : mean occurrence in unit time Identity link Log link(canonical) 正準リンク関数:最も自然なリンク関数:乗法効 果) Link function ~ intercept + slope * explanatory variable glm(y ~ x, data = d, family = poisson(link=“log”)) Canonical link function is set as default
5
Poisson Model (p49) ( counting data of occurrence) Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49) Maximize log-likelihood glm(y ~ x + f, data = d, family = poisson) #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2)
6
Poisson Model (p49) ( counting data of occurrence) Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49) Maximize log-likelihood #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2)
7
Other Generalized Linear Models (chap6 p114) ProbabilityRandom numbers generation Family in glm() Standard link function (discrete)Binomialrbinom()binomiallogit Poissonrpois()poissonlog Negative Binomial rnbinom()(glb.nb() function) log (continuous)Gammargamma()gammalog, inverse Normalrnorm()gaussianidentity
8
Generalized Linear Models Generalized Linear Model glm(y ~ x, data = d, family = poisson) Family ( Modelled Probability Distribution) binomial(link = “logit“) 2 項分布(規定試行中の発生数) gaussian(link = “identity”) 正規分布 Gamma(link = “inverse”) ガンマ分布(正のみ) inverse.gaussian(link = “1/mu^2”) 逆ガウス分布 poisson(link = “log”) ポアソン分布(一定時間中の発生回 数) quasi(link = “identity”, variance = “constant”) 正規分布(不均 一) quasibinomial(link = “logit”) 2 項分布(分散不均一) quasipoisson(link = “log”) ポアソン分布(分散不均一)
9
Binomial Logistic Model (p118) ( occurrence number in given trials) Binomial Model for the number of survived plant in 8 obserbations, regressed on plant size and nutrification (p118) Maximize log-likelihood glm(cbind(y,N-y) ~ x + f, data = d, family = binomial) #page 117 plant data d <- read.csv("data4a.csv") d$N # number of trials d$y # number of survived plant d$x # plant size d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p122 fit.all <- glm(cbind(y, N-y) ~ x + f, data=d, family=binomial) print(fit.all) logLik(fit.all)
10
Offset Term(p131) ( avoid a division calculation) Count data for several zones having different area, or different population One way is define a density (occurrence in unit area) and apply Poisson model glm(y ~ x, offset =log(A), data = d, family = poisson) #page 133 plant data d <- read.csv("data4b.csv") d$y # number of plants in lot i d$x # brightness at lot I d$A # area of lot i plot(d$A, d$y) # model p131 fit<- glm(y ~ x, offset = log(A), data=d, family=poisson) print(fit) logLik(fit)
11
Gamma Distribution Model (p138) Gamma Distribution (continuous positive data) s: shape parameter, r: rate parameter, theta=1/r: scale parameter time length before s times occurrence of random events with occurrence rate of r. (average occurrence interval is ) Average : Variance: dgamma(y, shape, rate) Weight of flower of a plant y (continuous, positive) average weight Loglink function of linear estimator glm(y ~ log(x), data = d, family = gamma(link="log"))
12
Gamma Distribution Model (p138) Gamma Distribution (continuous positive data) glm(y ~ log(x), data = d, family = gamma(link="log") # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2) clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) summary(glm(lot1 ~ log(u), data=clotting, family=Gamma)) summary(glm(lot2 ~ log(u), data=clotting, family=Gamma)) Call:glm(formula = lot1 ~ log(u), family = Gamma, data = clotting) Deviance Residuals: Min 1Q Median 3Q Max -0.04008 -0.03756 -0.02637 0.02905 0.08641 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0165544 0.0009275 -17.85 4.28e-07 *** log(u) 0.0153431 0.0004150 36.98 2.75e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Gamma family taken to be 0.002446059) Null deviance: 3.51283 on 8 degrees of freedom Residual deviance: 0.01673 on 7 degrees of freedom AIC: 37.99
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.