© Department of Statistics 2012 STATS 330 Lecture 31: Slide 1 Stats 330: Lecture 31
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 2 Example 1 A (German) company publishing a women’s magazine surveys its readers aged 18-49, receiving 941 responses. Questions asked were Are you a regular reader (Yes/No) Your Age (18-29, 30-39, 40-49) Your education level (L1, L2, L3, L4)
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 3 data RegularReader Age Education Freqs 1 No L Yes L1 4 3 No L Yes L No L Yes L No L Yes L No L Yes L No L Yes L No L Yes L No L other lines, 24 in all …………………………………….
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 4 Cross-tabs,, Education = L1 Age RegularReader Yes No ,, Education = L2 Age RegularReader Yes No ,, Education = L3 Age RegularReader Yes No ,, Education = L4 Age RegularReader Yes No > my.table=xtabs(Freqs~RegularReader+Age+Education, data=reader.df) > my.table
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 5 Fitting models > model1 = glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL Age ** Education < 2.2e-16 *** RegularReader < 2.2e-16 *** Age:Education e-12 *** Age:RegularReader e-05 *** Education:RegularReader e-07 *** Age:Education:RegularReader Suggests homogeneous association model
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 6 AIC > AIC(glm(Freqs~Age+Education+RegularReader, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age+Education*RegularReader, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*Education+RegularReader, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*RegularReader+Education, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*Education+Age*RegularReader, family=poisson,data=reader.df)) [1] >AIC(glm(Freqs~Age*Education+Education*RegularReader,family=poisson,data=reader.df)) [1] > AIC(glm(Freqs~Age*RegularReader+Education*RegularReader,family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~(Age+Education+RegularReader)^2, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df)) [1]
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 7 Step Call: glm(formula = Freqs ~ Age + Education + RegularReader + Age:Education + Age:RegularReader + Education:RegularReader, family = poisson, data = reader.df) Coefficients: (Intercept) Age30-39 Age EducationL2 EducationL3 EducationL RegularReaderNo Age30-39:EducationL2 Age40-49:EducationL Age30-39:EducationL3 Age40-49:EducationL3 Age30-39:EducationL Age40-49:EducationL4 Age30-39:RegularReadNo Age40-49:RegularReadNo EducationL2:RegularReaderNo EducationL3:RegularReadNo EducationL4:RegularReadNo Degrees of Freedom: 23 Total (i.e. Null); 6 Residual Null Deviance: Residual Deviance: AIC: All methods agree: homogeneous association model!
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 8 Odds ratios: saturated model,, Education = L1 Age RegularReader Yes No L1 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 4*87/(38*12) = L2 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = L3 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = L4 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) =
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 9 Odds ratios: saturated model Estimate Std Error Age30-39:RegularReaderNo Age40-49:RegularReaderNo EducationL2:RegularReaderNo EducationL3:RegularReaderNo EducationL4:RegularReaderNo Age30-39:EducationL2:RegularReadNo Age40-49:EducationL2:RegularReadNo Age30-39:EducationL3:RegularReadNo Age40-49:EducationL3:RegularReadNo Age30-39:EducationL4:RegularReadNo Age40-49:EducationL4:RegularReadNo > exp( ) > exp( ) [1] [1] > exp( ) > exp( ) [1] [1]
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 10 Odds ratios: Homogeneous association From fitting the homogeneous association model: Estimate Std Error Age30-39:RegularReaderNo Age40-49:RegularReaderNo common estimate (Odds RR Yes for 18-29)/(odds RR Yes for 40-49) > exp(0.5995) [1] odds of being a regular reader 1.8 times higher for age group than for CI is exp( c(-1,1)*1.96*0.2013) ( , )
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 11 Example 2: Hair-eye colour 593 students at the University of Delaware classified by sex, eye colour and hair colour. Factors and levels: Sex: male, female Eye colour: Brown, Blue, Hazel, Green Hair Colour: black, brown, red, blond.
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 12 data,, Sex = Male Eye Hair Brown Blue Hazel Green Black Brown Red Blond ,, Sex = Female Eye Hair Brown Blue Hazel Green Black Brown Red Blond In the form of an array HairEyeColor:
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 13 Convert to data frame > HEC.df = as.data.frame(HairEyeColor) > HEC.df Hair Eye Sex Freq 1 Black Brown Male 32 2 Brown Brown Male 53 3 Red Brown Male 10 4 Blond Brown Male 3 5 Black Blue Male 11 6 Brown Blue Male 50 7 Red Blue Male 10 8 Blond Blue Male 30 9Black Hazel Male More lines (32 in all)
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 14 Anova > model1 = glm(Freq~Hair*Eye*Sex, family=poisson, data=HEC.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL Hair < 2e-16 *** Eye < 2e-16 *** Sex Hair:Eye < 2e-16 *** Hair:Sex * Eye:Sex Hair:Eye:Sex
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 15 Step > step(model1, formula(model1), direction = "back") Call: glm(formula = Freq ~ Hair + Eye + Sex + Hair:Eye + Hair:Sex, family = poisson, data = HEC.df) Coefficients: (Intercept) HairBrown HairRed HairBlond EyeBlue EyeHazel EyeGreen SexFemale HairBrown:EyeBlue HairRed:EyeBlue HairBlond:EyeBlue HairBrown:EyeHazel HairRed:EyeHazel HairBlond:EyeHazel HairBrown:EyeGreen HairRed:EyeGreen HairBlond:EyeGreen HairBrown:SexFemale HairRed:SexFemale HairBlond:SexFemale Degrees of Freedom: 31 Total (i.e. Null); 12 Residual Null Deviance: Residual Deviance: AIC: Suggests eye and sex independent given hair
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 16 AIC – fancy code formula.list = vector(length=9, mode="list") formula.list[[1]] = Freq ~ Hair + Eye + Sex formula.list[[2]] = Freq ~ Hair + Eye * Sex formula.list[[3]] = Freq ~ Hair * Eye + Sex formula.list[[4]] = Freq ~ Hair * Sex + Eye formula.list[[5]] = Freq ~ Hair*Eye + Hair*Sex formula.list[[6]] = Freq ~ Hair*Sex + Eye*Sex formula.list[[7]] = Freq ~ Hair*Eye + Sex*Eye formula.list[[8]] = Freq ~ (Hair + Eye + Sex)^2 formula.list[[9]] = Freq ~ Hair*Eye*Sex AIC.vec = numeric(9) for(i in 1:9){ model = glm(formula.list[[i]], family=poisson,data=HEC.df) AIC.vec[i] = AIC(model) }
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 17 AIC’s for different models > data.frame(as.character(formula.list), AIC.vec) as.character.formula.list. AIC.vec 1 Freq ~ Hair + Eye + Sex Freq ~ Hair + Eye * Sex Freq ~ Hair * Eye + Sex Freq ~ Hair * Sex + Eye Freq ~ Hair * Eye + Hair * Sex Freq ~ Hair * Sex + Eye * Sex Freq ~ Hair * Eye + Sex * Eye Freq ~ (Hair + Eye + Sex)^ Freq ~ Hair * Eye * Sex Confirms eye colour and sex independent, given hair colour
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 18 Marginal independence? Are eye colour and sex independent, ignoring hair colour (in all hair colours combined)? > model4 = glm(formula = Freq ~ Eye*Sex, family = poisson, data = HEC.df) > anova(model4, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL Eye <2e-16 *** Sex Eye:Sex No evidence of interaction, hence eye colour and sex unconditionally independent as well
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 19 Marginal odds ratios > xtabs(Freq~Eye+Sex, data=HEC.df) Sex Eye Male Female Brown Blue Hazel Green Unconditional OR's (Brown/blue)male / ((Brown/blue)female is 98*114/(101*122) = (Brown/hazel)male / ((Brown/hazel)female is 98*46/(47*122) = (Brown/Green)male / ((Brown/Green)female is 98*31/(33*122) =
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 20 Same calculation from summary summary(glm(Freq ~ Eye*Sex, family=poisson, data=HEC.df)) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) < 2e-16 *** EyeBlue EyeHazel e-05 *** EyeGreen e-08 *** SexFemale EyeBlue:SexFemale EyeHazel:SexFemale EyeGreen:SexFemale Conf interval for OR Brown/blue)male / ((Brown/blue)female: > exp( ) [1] > exp( c(-1,1)*1.96* ) [1]