© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28.

Slides:



Advertisements
Similar presentations
LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES
Advertisements

Topic 12: Multiple Linear Regression
Three or more categorical variables
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
CHI-SQUARE(X2) DISTRIBUTION
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Three-dimensional tables (Please Read Chapter 3).
© Department of Statistics 2012 STATS 330 Lecture 27: Slide 1 Stats 330: Lecture 27.
Logistic Regression Example: Horseshoe Crab Data
Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier.
Logistic Regression.
Part I – MULTIVARIATE ANALYSIS
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Log-linear and logistic models
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Quantitative Business Methods for Decision Making Estimation and Testing of Hypotheses.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Logistic Regression and Generalized Linear Models:
© Department of Statistics 2012 STATS 330 Lecture 18 Slide 1 Stats 330: Lecture 18.
9/14/ Lecture 61 STATS 330: Lecture 6. 9/14/ Lecture 62 Inference for the Regression model Aim of today’s lecture: To discuss how we assess.
Lecture 15: Logistic Regression: Inference and link functions BMTRY 701 Biostatistical Methods II.
Lecture 6 Generalized Linear Models Olivier MISSA, Advanced Research Skills.
© Department of Statistics 2012 STATS 330 Lecture 26: Slide 1 Stats 330: Lecture 26.
© Department of Statistics 2012 STATS 330 Lecture 25: Slide 1 Stats 330: Lecture 25.
F73DB3 CATEGORICAL DATA ANALYSIS Workbook Contents page Preface Aims Summary Content/structure/syllabus plus other information Background – computing (R)
Repeated Measures  The term repeated measures refers to data sets with multiple measurements of a response variable on the same experimental unit or subject.
Statistics 1: tests and linear models. How to get started? Exploring data graphically: Scatterplot HistogramBoxplot.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Psych 5500/6500 Other ANOVA’s Fall, Factorial Designs Factorial Designs have one dependent variable and more than one independent variable (i.e.
© Department of Statistics 2012 STATS 330 Lecture 31: Slide 1 Stats 330: Lecture 31.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
© Department of Statistics 2012 STATS 330 Lecture 23: Slide 1 Stats 330: Lecture 23.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
A preliminary exploration into the Binomial Logistic Regression Models in R and their potential application Andrew Trant PPS Arctic - Labrador Highlands.
Applied Statistics Week 4 Exercise 3 Tick bites and suspicion of Borrelia Mihaela Frincu
© Department of Statistics 2012 STATS 330 Lecture 30: Slide 1 Stats 330: Lecture 30.
Count Data. HT Cleopatra VII & Marcus Antony C c Aa.
© Department of Statistics 2012 STATS 330 Lecture 19: Slide 1 Stats 330: Lecture 19.
12/22/ lecture 171 STATS 330: Lecture /22/ lecture 172 Factors  In the models discussed so far, all explanatory variables have been.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Logistic Regression. Example: Survival of Titanic passengers  We want to know if the probability of survival is higher among children  Outcome (y) =
Chapter 17.1 Poisson Regression Classic Poisson Example Number of deaths by horse kick, for each of 16 corps in the Prussian army, from 1875 to 1894.
© Department of Statistics 2012 STATS 330 Lecture 24: Slide 1 Stats 330: Lecture 24.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #1 Chi-square Contingency Table Test.
Categorical Data Analysis
Logistic Regression and Odds Ratios Psych DeShon.
1 G Lect 10M Contrasting coefficients: a review ANOVA and Regression software Interactions of categorical predictors Type I, II, and III sums of.
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Comparing Models.
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
Transforming the data Modified from:
Statistical Analysis Professor Lynne Stokes
Categorical Data Aims Loglinear models Categorical data
Review for Exam 2 Some important themes from Chapters 6-9
Test of Independence in 3 Variables
STAT 312 Introduction Z-Tests and Confidence Intervals for a
Tutorial 4 For the Seat Belt Data, the Death Penalty Data, and the University Admission Data, (1). Identify the response variable and the explanatory.
Multiple Testing Tukey’s Multiple comparison procedure
Lecture 46 Section 14.5 Wed, Apr 13, 2005
Presentation transcript:

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 2 Plan of the day In today’s lecture we apply Poisson regression to the analysis of contingency tables having 3 or 4 dimensions. Topics –Types of independence –Connection between independence and interactions for 3 and 4 dimensional tables –Hierarchical models –Graphical models –Examples Reference: Coursebook, sections 5.3, 5.3.1

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 3 Example: the Florida murder data 326 convicted murderers in Florida, Classified by –Death penalty (n/y) –Victims race (black/white) –Defendants race (black/white)

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 4 Contingency table Victim's race BlackWhite Defendant'sDeath Penalty RaceYesNoYesNo Black White

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 5 Getting the data into R murder.df<-data.frame(expand.grid( defendant=c("b","w"),dp = c("y","n"), victim=c("b","w")), counts=c(13,1,195,19,23,39,105,265)) Note use of function expand.grid expand.grid(defendant=c("b","w"), dp = c("y","n"),victim=c("b","w")) defendant dp victim 1 b y b 2 w y b 3 b n b 4 w n b 5 b y w 6 w y w 7 b n w 8 w n w

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 6 The data frame defendant dp victim counts 1 b y b 13 2 w y b 1 3 b n b w n b 19 5 b y w 23 6 w y w 39 7 b n w w n w 265

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 7 Questions Is death penalty independent of race? What is the role of victim’s race? What does “independent” mean when we have 3 factors?

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 8 Types of independence Suppose we have 3 criteria (factors) A, B and C In a multinomial sampling context, let  ijk = Pr(A=i, B=j, C=k) Various forms of independence may be of interest. These can be expressed in terms of the probabilities  ijk

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 9 Marginal probabilities

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 10 Marginal probabilities (ii)

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 11 All three factors independent This can be expressed as

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 12 One factor independent of the other two This can be expressed as

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 13 Two factors conditionally independent, given a third This can be expressed as

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 14 Parameterising tables of Poisson means with main effects and interactions Recall (see Slides 22 and 23 of lecture 19) that in ordinary 3-way ANOVA we split up the table of means into main effects and interactions: m ijk  i   j   k   ij   jk   ik  ijk We can do exactly the same thing with the logs of the Poission means in a 3-way table: Log( m  jk   i   j   k   ij   jk   ik  ijk

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 15 Parameterising tables of probabilities with main effects and interactions Corresponding multinomial probabilities are given by the model Log(   jk /     i   j   k   ij   jk   ik  ijk

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 16 Parameterising tables with main effects and interactions (cont) The interactions are related to the different forms of independence: –if all the interactions are zero, then the 3 factors are mutually independent –If the ABC and AB interactions are zero, then A and B are independent, given C –If the ABC, AB and AC interactions are zero, then A is independent of B and C Using the connection between multinomial and Poisson sampling, we can test for the various types of independence by fitting a Poisson regression with A, B and C as explanatory variables, and testing for interactions.

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 17 Summary of independence models All 3 factors independent in multinomial model –Equivalent to all interactions zero in Poisson model –Poisson Model is count ~ A + B + C A independent of B and C –Equivalent to all interactions between A and the others zero –Poisson Model is count ~ A + B + C + B:C or count ~ A + B*C

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 18 Summary of independence models (cont) Factors A and B conditionally independent given C –Equivalent to all interactions containing both A and B zero –Poisson Model is count ~ A + B + C + A:C + B:C –Equivalent to count ~ A*C + B*C

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 19 Analysis strategy: Florida data We will fit some models to this data and investigate the pattern of independence/ dependence between the factors. We fit a maximal model, and then investigate suitable submodels.

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 20 The analysis > maximal.glm<-glm(counts~victim*defendent*dp, family=poisson, data=murder.df) > anova(maximal.glm, test="Chisq") Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL victim e-15 defendent dp e-98 victim:defendent e-57 victim:dp e-04 defendent:dp victim:defendent:dp e > No interactions between defendants race and death penalty - Seems that model victim*dp + victim*defendant is appropriate “given the victim’s race, defendants race and death penalty are independent” ie the conditional independence model. Model also selected by stepwise, other anovas.

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 21 Testing if the model is adequate > submodel.glm<-glm(counts~victim*defendant + victim*dp, family=poisson, data=murder.df) > summary(submodel.glm) Deviance Residuals: Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) < 2e-16 *** victimw defendantw < 2e-16 *** dpn < 2e-16 *** victimw:defendantw < 2e-16 *** victimw:dpn ** ---

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 22 Testing if the model is adequate Null deviance: on 7 degrees of freedom Residual deviance: on 2 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 > 1-pchisq(1.9216,2) [1] Model seems OK Some hint cell 8 not well fitted

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 23 Example: the Copenhagen housing data 317 apartment residents in Copenhagen were surveyed on their housing. 3 variables were measured: –sat: satisfaction with housing (Low, medium,high) –cont: amount of contact with other residents (Low,high) –infl: influence on management decisions –(Low, medium,high)

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 24 Copenhagen housing data sat infl cont count Low Low Low 61 Medium Low Low 23 High Low Low 17 Low Medium Low 43 Medium Medium Low 35 High Medium Low 40 Low High Low 26 Medium High Low 18 High High Low 54 9 more lines…

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 25 Copenhagen housing data cont LowHigh Low6178 satMedium2346 High1743 cont LowHigh Low4348 satMedium3545 High4086 cont LowHigh Low2615 satMedium1825 High5462 infl = Lowinfl = Med infl = High

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 26 The analysis > housing.glm<-glm(count~sat*infl*cont, family=poisson, data=housing.df) > anova(housing.glm, test]”Chisq”) Df Deviance Resid. Df Resid. Dev P(>|Chi|) NULL sat e-06 infl e-05 cont e-06 sat:infl e-15 sat:cont infl:cont sat:infl:cont e This time, only the 3-factor interaction is insignificant. This is the “homogeneous association model”

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 27 Homogeneous association model Association between two factors measured by sets of odds ratios

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 28 Copenhagen housing OR’s cont LowHigh Low ** satMedium * 1.56 High * 1.97 cont LowHigh Low ** satMedium * 1.15 High * 1.92 cont LowHigh Low ** satMedium * 2.40 High * 1.99 infl = Lowinfl = Med infl = High *25/(18*15) 26*62/(54*15)

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 29 Homogeneous association model (2) For the homogeneous association model, the conditional odds ratios for A and B (ie using the conditional distributions of A and B given C=k) do not depend on k. That is, the pattern of association between A and B is the same for all levels of C. Common value of the conditional AB Log OR’s are estimated by the AB interactions

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 30 Estimating conditional OR for housing data Estimated Sat-cont conditional log ORs are estimated with Sat-cont interactions in homogeneous association model > homogen.glm<-glm(count~sat*infl*cont-sat:infl:cont, family=poisson, data=housing.df) > summary(homogen.glm) Coefficients: Estimate Std. Error z value Pr(>|z|) satMedium:contHigh * satHigh:contHigh ***

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 31 Estimating conditional OR for housing data (2) For med/High, est for log OR is , std error is CI for OR is exp( /- 1.96* ) i.e. (1.034, 2.220) Estimated OR for high/high is exp(0.6496) = > c(-1,1)*1.96* [1] > exp( c(-1,1)*1.96*0.1948) [1] > exp(0.4157) [1] > exp(0.6496) [1]

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 32 4 dimensional tables Similar results apply for 4-dimensional tables For example, models for 4 factors A, B, C and D –A, B, C and D all independent: A + B + C +D –A, B independent of C and D: A*B + C*D –D conditionally independent of C, given A and B: A*B*C + A*B*D

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 33 Hierarchical models We will assume that all models are hierarchical: if the model includes an interaction with factors A 1, … A k, then it includes all main effects and interactions that can be formed from A 1, … A k We can represent hierarchical models by listing these “maximal interactions”

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 34 Examples 2 factor model A + B + A:B is hierarchical –Hierarchical notation: [AB] 3 factor model A + B + C + A:B is hierarchical –Hierarchical notation: [AB][C] 3 factor model A + B + C + A:B + A:C is hierarchical –Hierarchical notation: [AB][AC]

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 35 Graphical Models A way of visualising independence patterns: A subset of hierarchical models Each factor represented by the vertex of an “association” graph Two vertices connected by edges if they have a non-zero interaction Then –A vertex not connected to any other vertex is independent of the other vertices –two vertices not directly connected are conditionally independent, given the connecting vertices

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 36 Examples 2 factor model A + B + A:B or [AB] 3 factor model A + B + C +AB or [AB][C] C independent of A and B 3 factor model A + B + C + A:B + A:C +B:C or [AB][AC][BC] AB A BC A BC

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 37 More Examples 3 factor model A + B + C + AB + AC [AB][AC] 4 factor model A + B + C + D + A:B + A:C +B:C [AB][AC][BC][D] A B D A BC C

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 38 Another Example 4 factor model A + B + C + D + A:B + B:C + A:D [AB][BC][AD] C and D are conditionally independent given A and B A B D C

© Department of Statistics 2012 STATS 330 Lecture 28: Slide 39 Another Example 4 factor model A + B + C + D + A:B:C + A:B:D [ABC][ABD] C and D are conditionally independent given A and B A B D C