1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free forever, who can attain?
2 Categorical Data Analysis Chapter 4: Introduction to Generalized Linear Models
3 3-Way Tables Response ClinicTreatmentSuccessFailure 1A1812 B 8 2A28 B832 TotalA20 B 40
4 Marginal vs. Conditional Marginal independence: marginal odds ratio =1 Conditional independence: conditional odds ratios =1 The observed effect of X on Y might simply reflect effects of other covariates on both X and Y
5 Generalized Linear Models (GLM) 3 components: Random component (Y1, Y2, …, Yn): n independent observations from a distribution in the exponential family (not necessary to be i.i.d.): for i=1,2,…,n, where a, b, c are all positive functions. e.g. poisson, binomial, exponential, normal
6 Systematic component: the right hand side of the model equation; often a linear combination of explanatory variables: Or in matrix format or X’B
7 Link component: the link between u(=E(Y)) and X’B In a model equation g(u)=X’B, g(.) is called the link function, a monotonic differentiable function –Most common link: Canonical link e.g. normal, binomial, poisson
Canonical Link Normal data: identity link Binary data: logit link Count data: log link 8
Deviance Deviance measures the loss of information from data reduction Saturated model: the most general model which fits each observation –Could be each subject’s response or –Could be each cell frequency 9
10 GLMs for Binary Data Linear probability model (identity link) Logistic regression model (logit link) Probit model (probit link)
11 Example: Snoring vs. Heart Disease Snoring (score) Observed P(H.D.) Linear fitLogit fitProbit fit Never (0) Occasionally (2) Almost daily (4) Daily (5) Note: The fits will change if relative spacings between scores change.
Example: 2x2 Tables 12 Binary covariate X and response Y Logit link GLM:
13 GLMs for Count Data Poisson loglinear model Count data: certain events occur over time, space or alike, e.g. the # of car accidents of a random sample of 100 drivers in 2005 Rate data: count/(time or space or alike), e.g. the car accident rates of a random sample of 100 drivers in 2005 (Sec , p. 385)
Example: Horseshoe Crabs Data: Table 4.3 (p. 127) Y= # of satellites X= carapace width 1.Poisson loglinear model 2.Poisson GLM with identity link 14
15 Inference for GLMs Goodness of fit: –Measure: deviance –Test: Likelihood-Ratio (LR) tests Model comparison: L-R tests Residuals: Pearson and standardized
16 Example: Insecticide vs Beetles dosage# of beetles exposed # of dead beetles ………
17 Types of Models for Statistical Analysis Random component LinkSystematic component ModelChapter NormalIdentityContinuousRegression NormalIdentityCategoricalANOVA NormalIdentityMixedANCOVA BinomialLogitMixedLogistic regression 5, 6 PoissonLogMixedLoglinear8 multinomialGenerali- zed logit MixedMultinomial response 7