Presentation is loading. Please wait.

Presentation is loading. Please wait.

一般化線形モデル( GLM ) generalized linear Models データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店.

Similar presentations


Presentation on theme: "一般化線形モデル( GLM ) generalized linear Models データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店."— Presentation transcript:

1 一般化線形モデル( GLM ) generalized linear Models データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店

2 Generalized Linear Models Linear Model  response variable ~ intercept + slope * explanatory variable  lm(y~ x + f ・・・ ) , lm(y~x + f -1) (no intercept) require(graphics) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2,10,20, labels=c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) lm.D90 <- lm(weight ~ group - 1) # omitting intercept anova(lm.D9) summary(lm.D90) opar <- par(mfrow = c(2,2), oma = c(0, 0, 1.1, 0)) plot(lm.D9, las = 1) # Residuals, Fitted,... Par(opar) ### less simple examples in "See Also" above

3 Generalized Linear Models Linear Model  response variable ~ intercept + slope * explanatory variable  lm(y~ x + f ・・・ ) , lm(y~x + f -1) (no intercept)  Generalized Linear Model  Model &Link function ~ intercept + slope * explanatory variable  glm(y ~ x, data = d, family = poisson)

4 Poisson Model ( counting data of occurrence)  Poisson Model  λ : mean occurrence in unit time  Identity link  Log link(canonical)  正準リンク関数:最も自然なリンク関数:乗法効 果) Link function ~ intercept + slope * explanatory variable  glm(y ~ x, data = d, family = poisson(link=“log”))  Canonical link function is set as default

5 Poisson Model (p49) ( counting data of occurrence)  Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49)  Maximize log-likelihood glm(y ~ x + f, data = d, family = poisson) #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2)

6 Poisson Model (p49) ( counting data of occurrence)  Poisson Model for number of seeds of a plant, regressed on plant size and nutrification (p49)  Maximize log-likelihood #page 42 plant data d <- read.csv("data3a.csv") d$y # number of seeds d$x # plant size (hight) d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p58 fit.all <- glm(y ~ x + f, data=d, family=poisson) print(fit.all) logLik(fit.all) plot(d$x, d$y, pch =c(21, 19)[d$f]) xx <- seq(min(d$x), max(d$x), length =100) lines(xx,exp(1.263 + 0.0801 * xx), lwd=2)

7 Other Generalized Linear Models (chap6 p114) ProbabilityRandom numbers generation Family in glm() Standard link function (discrete)Binomialrbinom()binomiallogit Poissonrpois()poissonlog Negative Binomial rnbinom()(glb.nb() function) log (continuous)Gammargamma()gammalog, inverse Normalrnorm()gaussianidentity

8 Generalized Linear Models  Generalized Linear Model glm(y ~ x, data = d, family = poisson)  Family ( Modelled Probability Distribution)  binomial(link = “logit“) 2 項分布(規定試行中の発生数)  gaussian(link = “identity”) 正規分布  Gamma(link = “inverse”) ガンマ分布(正のみ)  inverse.gaussian(link = “1/mu^2”) 逆ガウス分布  poisson(link = “log”) ポアソン分布(一定時間中の発生回 数)  quasi(link = “identity”, variance = “constant”) 正規分布(不均 一)  quasibinomial(link = “logit”) 2 項分布(分散不均一)  quasipoisson(link = “log”) ポアソン分布(分散不均一)

9 Binomial Logistic Model (p118) ( occurrence number in given trials)  Binomial Model for the number of survived plant in 8 obserbations, regressed on plant size and nutrification (p118)  Maximize log-likelihood glm(cbind(y,N-y) ~ x + f, data = d, family = binomial) #page 117 plant data d <- read.csv("data4a.csv") d$N # number of trials d$y # number of survived plant d$x # plant size d$f # nutrification (treat-control) plot(d$x, d$y, pch =c(21, 19)[d$f]) # model p122 fit.all <- glm(cbind(y, N-y) ~ x + f, data=d, family=binomial) print(fit.all) logLik(fit.all)

10 Offset Term(p131) ( avoid a division calculation)  Count data for several zones having different area, or different population  One way is define a density (occurrence in unit area) and apply Poisson model glm(y ~ x, offset =log(A), data = d, family = poisson) #page 133 plant data d <- read.csv("data4b.csv") d$y # number of plants in lot i d$x # brightness at lot I d$A # area of lot i plot(d$A, d$y) # model p131 fit<- glm(y ~ x, offset = log(A), data=d, family=poisson) print(fit) logLik(fit)

11 Gamma Distribution Model (p138)  Gamma Distribution (continuous positive data)  s: shape parameter, r: rate parameter, theta=1/r: scale parameter  time length before s times occurrence of random events with occurrence rate of r. (average occurrence interval is )  Average : Variance:  dgamma(y, shape, rate)  Weight of flower of a plant y (continuous, positive)  average weight  Loglink function of linear estimator  glm(y ~ log(x), data = d, family = gamma(link="log"))

12 Gamma Distribution Model (p138)  Gamma Distribution (continuous positive data)  glm(y ~ log(x), data = d, family = gamma(link="log") # A Gamma example, from McCullagh & Nelder (1989, pp. 300-2) clotting <- data.frame( u = c(5,10,15,20,30,40,60,80,100), lot1 = c(118,58,42,35,27,25,21,19,18), lot2 = c(69,35,26,21,18,16,13,12,12)) summary(glm(lot1 ~ log(u), data=clotting, family=Gamma)) summary(glm(lot2 ~ log(u), data=clotting, family=Gamma)) Call:glm(formula = lot1 ~ log(u), family = Gamma, data = clotting) Deviance Residuals: Min 1Q Median 3Q Max -0.04008 -0.03756 -0.02637 0.02905 0.08641 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.0165544 0.0009275 -17.85 4.28e-07 *** log(u) 0.0153431 0.0004150 36.98 2.75e-09 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for Gamma family taken to be 0.002446059) Null deviance: 3.51283 on 8 degrees of freedom Residual deviance: 0.01673 on 7 degrees of freedom AIC: 37.99


Download ppt "一般化線形モデル( GLM ) generalized linear Models データ解析のための統計モデリング入門 久保拓也(2012) 岩波書店."

Similar presentations


Ads by Google