© Department of Statistics 2012 STATS 330 Lecture 31: Slide 1 Stats 330: Lecture 31.

Slides:

Advertisements

Similar presentations

Qualitative predictor variables

Advertisements

Lecture 10 F-tests in MLR (continued) Coefficients of Determination BMTRY 701 Biostatistical Methods II.

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.

© Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27.

Logistic Regression Example: Horseshoe Crab Data

Logistic Regression.

Logistic Regression Predicting Dichotomous Data. Predicting a Dichotomy Response variable has only two states: male/female, present/absent, yes/no, etc.

Adjusting for extraneous factors Topics for today Stratified analysis of 2x2 tables Regression Readings Jewell Chapter 9.

Chapter Goals After completing this chapter, you should be able to:

Sociology 601 Class12: October 8, 2009 The Chi-Squared Test (8.2) – expected frequencies – calculating Chi-square – finding p When (not) to use Chi-squared.

Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.

1 Modeling Ordinal Associations Section 9.4 Roanna Gee.

Introduction to Logistic Regression Analysis Dr Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.

1 Logistic Regression Homework Solutions EPP 245/298 Statistical Analysis of Laboratory Data.

Cross-Tabulations.

C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.

Logistic Regression with “Grouped” Data Lobster Survival by Size in a Tethering Experiment Source: E.B. Wilkinson, J.H. Grabowski, G.D. Sherwood, P.O.

Example of Simple and Multiple Regression

Distributions of Nominal Variables 12/02. Nominal Data Some measurements are just types or categories – Favorite color, college major, political affiliation,

Logistic Regression and Generalized Linear Models:

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28.

© Department of Statistics 2012 STATS 330 Lecture 18 Slide 1 Stats 330: Lecture 18.

Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.

9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.

Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.

© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.

© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.

Logistic Regression Pre-Challenger Relation Between Temperature and Field-Joint O-Ring Failure Dalal, Fowlkes, and Hoadley (1989). “Risk Analysis of the.

Section 10.1 Goodness of Fit. Section 10.1 Objectives Use the chi-square distribution to test whether a frequency distribution fits a claimed distribution.

Logit model, logistic regression, and log-linear model A comparison.

Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.

Advanced Math Topics Finals Review: Chapters 12 & 13.

November 5, 2008 Logistic and Poisson Regression: Modeling Binary and Count Data LISA Short Course Series Mark Seiss, Dept. of Statistics.

1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.

Preparing for the final - sample questions with answers.

Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.

Multiple Alleles Practice Problems.

Analysis of Two-Way Tables Moore IPS Chapter 9 © 2012 W.H. Freeman and Company.

© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.

Test of Homogeneity Lecture 45 Section 14.4 Wed, Apr 19, 2006.

Warm up On slide.

A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.

Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu

© Department of Statistics 2012 STATS 330 Lecture 30: Slide 1 Stats 330: Lecture 30.

Count Data. HT Cleopatra VII & Marcus Antony C c Aa.

13.2 Inference for Two Way Tables.  Analyze Two Way Tables Using Chi-Squared Test for Homogeneity and Independence.

© Department of Statistics 2012 STATS 330 Lecture 19: Slide 1 Stats 330: Lecture 19.

Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.

11.2 Tests Using Contingency Tables When data can be tabulated in table form in terms of frequencies, several types of hypotheses can be tested by using.

© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.

Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =

© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.

Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.

Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.

Logistic Regression and Odds Ratios Psych DeShon.

R Programming/ Binomial Models Shinichiro Suna. Binomial Models In binomial model, we have one outcome which is binary and a set of explanatory variables.

13.2 Inference for Two Way Tables.  Analyze Two Way Tables Using Chi-Squared Test for Homogeneity and Independence.

Logistic Regression Jeff Witmer 30 March Categorical Response Variables Examples: Whether or not a person smokes Success of a medical treatment.

Distributions of Nominal Variables

A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data  Poisson.

The Chi-Squared Test Learning outcomes

Distributions of Nominal Variables

Distributions of Nominal Variables

(& Generalized Linear Models)

PSY 626: Bayesian Statistics for Psychological Science

Basic Introduction LOGISTIC REGRESSION

Chapter 26 Comparing Counts.

Logistic Regression with “Grouped” Data

Karl L. Wuensch Department of Psychology East Carolina University

Table 2. Regression statistics for independent and dependent variables

Presentation transcript:

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 1 Stats 330: Lecture 31

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 2 Example 1 A (German) company publishing a women’s magazine surveys its readers aged 18-49, receiving 941 responses. Questions asked were Are you a regular reader (Yes/No) Your Age (18-29, 30-39, 40-49) Your education level (L1, L2, L3, L4)

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 3 data RegularReader Age Education Freqs 1 No L Yes L1 4 3 No L Yes L No L Yes L No L Yes L No L Yes L No L Yes L No L Yes L No L other lines, 24 in all …………………………………….

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 4 Cross-tabs,, Education = L1 Age RegularReader Yes No ,, Education = L2 Age RegularReader Yes No ,, Education = L3 Age RegularReader Yes No ,, Education = L4 Age RegularReader Yes No > my.table=xtabs(Freqs~RegularReader+Age+Education, data=reader.df) > my.table

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 5 Fitting models > model1 = glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL Age ** Education < 2.2e-16 *** RegularReader < 2.2e-16 *** Age:Education e-12 *** Age:RegularReader e-05 *** Education:RegularReader e-07 *** Age:Education:RegularReader Suggests homogeneous association model

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 6 AIC > AIC(glm(Freqs~Age+Education+RegularReader, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age+Education*RegularReader, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*Education+RegularReader, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*RegularReader+Education, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*Education+Age*RegularReader, family=poisson,data=reader.df)) [1] >AIC(glm(Freqs~Age*Education+Education*RegularReader,family=poisson,data=reader.df)) [1] > AIC(glm(Freqs~Age*RegularReader+Education*RegularReader,family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~(Age+Education+RegularReader)^2, family=poisson, data=reader.df)) [1] > AIC(glm(Freqs~Age*Education*RegularReader, family=poisson, data=reader.df)) [1]

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 7 Step Call: glm(formula = Freqs ~ Age + Education + RegularReader + Age:Education + Age:RegularReader + Education:RegularReader, family = poisson, data = reader.df) Coefficients: (Intercept) Age30-39 Age EducationL2 EducationL3 EducationL RegularReaderNo Age30-39:EducationL2 Age40-49:EducationL Age30-39:EducationL3 Age40-49:EducationL3 Age30-39:EducationL Age40-49:EducationL4 Age30-39:RegularReadNo Age40-49:RegularReadNo EducationL2:RegularReaderNo EducationL3:RegularReadNo EducationL4:RegularReadNo Degrees of Freedom: 23 Total (i.e. Null); 6 Residual Null Deviance: Residual Deviance: AIC: All methods agree: homogeneous association model!

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 8 Odds ratios: saturated model,, Education = L1 Age RegularReader Yes No L1 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = 4*87/(38*12) = L2 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = L3 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) = L4 table: OR (Odds Yes for 18-29)/(odds Yes for 40-49) =

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 9 Odds ratios: saturated model Estimate Std Error Age30-39:RegularReaderNo Age40-49:RegularReaderNo EducationL2:RegularReaderNo EducationL3:RegularReaderNo EducationL4:RegularReaderNo Age30-39:EducationL2:RegularReadNo Age40-49:EducationL2:RegularReadNo Age30-39:EducationL3:RegularReadNo Age40-49:EducationL3:RegularReadNo Age30-39:EducationL4:RegularReadNo Age40-49:EducationL4:RegularReadNo > exp( ) > exp( ) [1] [1] > exp( ) > exp( ) [1] [1]

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 10 Odds ratios: Homogeneous association From fitting the homogeneous association model: Estimate Std Error Age30-39:RegularReaderNo Age40-49:RegularReaderNo common estimate (Odds RR Yes for 18-29)/(odds RR Yes for 40-49) > exp(0.5995) [1] odds of being a regular reader 1.8 times higher for age group than for CI is exp( c(-1,1)*1.96*0.2013) ( , )

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 11 Example 2: Hair-eye colour 593 students at the University of Delaware classified by sex, eye colour and hair colour. Factors and levels: Sex: male, female Eye colour: Brown, Blue, Hazel, Green Hair Colour: black, brown, red, blond.

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 12 data,, Sex = Male Eye Hair Brown Blue Hazel Green Black Brown Red Blond ,, Sex = Female Eye Hair Brown Blue Hazel Green Black Brown Red Blond In the form of an array HairEyeColor:

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 13 Convert to data frame > HEC.df = as.data.frame(HairEyeColor) > HEC.df Hair Eye Sex Freq 1 Black Brown Male 32 2 Brown Brown Male 53 3 Red Brown Male 10 4 Blond Brown Male 3 5 Black Blue Male 11 6 Brown Blue Male 50 7 Red Blue Male 10 8 Blond Blue Male 30 9Black Hazel Male More lines (32 in all)

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 14 Anova > model1 = glm(Freq~Hair*Eye*Sex, family=poisson, data=HEC.df) > anova(model1, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL Hair < 2e-16 *** Eye < 2e-16 *** Sex Hair:Eye < 2e-16 *** Hair:Sex * Eye:Sex Hair:Eye:Sex

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 15 Step > step(model1, formula(model1), direction = "back") Call: glm(formula = Freq ~ Hair + Eye + Sex + Hair:Eye + Hair:Sex, family = poisson, data = HEC.df) Coefficients: (Intercept) HairBrown HairRed HairBlond EyeBlue EyeHazel EyeGreen SexFemale HairBrown:EyeBlue HairRed:EyeBlue HairBlond:EyeBlue HairBrown:EyeHazel HairRed:EyeHazel HairBlond:EyeHazel HairBrown:EyeGreen HairRed:EyeGreen HairBlond:EyeGreen HairBrown:SexFemale HairRed:SexFemale HairBlond:SexFemale Degrees of Freedom: 31 Total (i.e. Null); 12 Residual Null Deviance: Residual Deviance: AIC: Suggests eye and sex independent given hair

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 16 AIC – fancy code formula.list = vector(length=9, mode="list") formula.list[[1]] = Freq ~ Hair + Eye + Sex formula.list[[2]] = Freq ~ Hair + Eye * Sex formula.list[[3]] = Freq ~ Hair * Eye + Sex formula.list[[4]] = Freq ~ Hair * Sex + Eye formula.list[[5]] = Freq ~ Hair*Eye + Hair*Sex formula.list[[6]] = Freq ~ Hair*Sex + Eye*Sex formula.list[[7]] = Freq ~ Hair*Eye + Sex*Eye formula.list[[8]] = Freq ~ (Hair + Eye + Sex)^2 formula.list[[9]] = Freq ~ Hair*Eye*Sex AIC.vec = numeric(9) for(i in 1:9){ model = glm(formula.list[[i]], family=poisson,data=HEC.df) AIC.vec[i] = AIC(model) }

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 17 AIC’s for different models > data.frame(as.character(formula.list), AIC.vec) as.character.formula.list. AIC.vec 1 Freq ~ Hair + Eye + Sex Freq ~ Hair + Eye * Sex Freq ~ Hair * Eye + Sex Freq ~ Hair * Sex + Eye Freq ~ Hair * Eye + Hair * Sex Freq ~ Hair * Sex + Eye * Sex Freq ~ Hair * Eye + Sex * Eye Freq ~ (Hair + Eye + Sex)^ Freq ~ Hair * Eye * Sex Confirms eye colour and sex independent, given hair colour

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 18 Marginal independence? Are eye colour and sex independent, ignoring hair colour (in all hair colours combined)? > model4 = glm(formula = Freq ~ Eye*Sex, family = poisson, data = HEC.df) > anova(model4, test="Chi") Analysis of Deviance Table Df Deviance Resid. Df Resid. Dev Pr(>Chi) NULL Eye <2e-16 *** Sex Eye:Sex No evidence of interaction, hence eye colour and sex unconditionally independent as well

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 19 Marginal odds ratios > xtabs(Freq~Eye+Sex, data=HEC.df) Sex Eye Male Female Brown Blue Hazel Green Unconditional OR's (Brown/blue)male / ((Brown/blue)female is 98*114/(101*122) = (Brown/hazel)male / ((Brown/hazel)female is 98*46/(47*122) = (Brown/Green)male / ((Brown/Green)female is 98*31/(33*122) =

© Department of Statistics 2012 STATS 330 Lecture 31: Slide 20 Same calculation from summary summary(glm(Freq ~ Eye*Sex, family=poisson, data=HEC.df)) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) < 2e-16 *** EyeBlue EyeHazel e-05 *** EyeGreen e-08 *** SexFemale EyeBlue:SexFemale EyeHazel:SexFemale EyeGreen:SexFemale Conf interval for OR Brown/blue)male / ((Brown/blue)female: > exp( ) [1] > exp( c(-1,1)*1.96* ) [1]