LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES

Slides:



Advertisements
Similar presentations
1 2 Test for Independence 2 Test for Independence.
Advertisements

Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
SADC Course in Statistics Revision on tests for proportions using CAST (Session 18)
Chapter 7 Sampling and Sampling Distributions
Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong.
Contingency tables enable us to compare one characteristic of the sample, e.g. degree of religious fundamentalism, for groups or subsets of cases defined.
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
Lecture 32 APPLICATIONS OF BIVARIATE OPTIMIZATION.
Environmental Data Analysis with MatLab
Chapter 18: The Chi-Square Statistic
Categorical Data Analysis
CHAPTER 14: Confidence Intervals: The Basics
Chapter 4 Systems of Linear Equations; Matrices
Three or more categorical variables
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Chapter 11 Other Chi-Squared Tests
CHI-SQUARE(X2) DISTRIBUTION
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Three-dimensional tables (Please Read Chapter 3).
The Analysis of Categorical Data. Categorical variables When both predictor and response variables are categorical: Presence or absence Color, etc. The.
Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
2.4 Cautions about Correlation and Regression. Residuals (again!) Recall our discussion about residuals- what is a residual? The idea for line of best.
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Simple Linear Regression and Correlation
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Presentation 12 Chi-Square test.
AS 737 Categorical Data Analysis For Multivariate
Log-linear Models For 2-dimensional tables. Two-Factor ANOVA (Mean rot of potatoes) Bacteria Type Temp123 1=Cool 2=Warm.
© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28.
Copyright © 2009 Cengage Learning 15.1 Chapter 16 Chi-Squared Tests.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter.
Logistic Regression Database Marketing Instructor: N. Kumar.
Aim: How do we analyze data with a two-way table?
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
STA617 Advanced Categorical Data Analysis
Joyful mood is a meritorious deed that cheers up people around you like the showering of cool spring breeze.
© Department of Statistics 2012 STATS 330 Lecture 30: Slide 1 Stats 330: Lecture 30.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Dan Piett STAT West Virginia University Lecture 12.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #1 Chi-square Contingency Table Test.
Categorical Data Analysis
Copyright © Cengage Learning. All rights reserved. 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 22/11/ :12 AM 1 Contingency tables and log-linear models.
CHI-SQUARE(X2) DISTRIBUTION
Statistical Analysis Professor Lynne Stokes
The binomial applied: absolute and relative risks, chi-square
Hypothesis Testing Review
Discrete Multivariate Analysis
Data Analysis for Two-Way Tables
Log Linear Modeling of Independence
Test of Independence in 3 Variables
Hypothesis testing. Chi-square test
Chapter 10 Analyzing the Association Between Categorical Variables
Joyful mood is a meritorious deed that cheers up people around you
Modeling Ordinal Associations Bin Hu
Presentation transcript:

LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES BY ENI SUMARMININGSIH, SSI, MM

Table Structure For Three Dimensions When all variables are categorical, a multidimensional contingency table displays the data We illustrate ideas using thr three-variables case. Denote the variables by X, Y, and Z. We display the distribution of X-Y cell count at different level of Z using cross sections of the three way contingency table (called partial tables)

The two way contingency table obtained by combining the partial table is called the X-Y marginal table (this table ignores Z)

Death Penalty Example Defendant’s race Victim’s Race Death Penalty Percentage Yes Yes No White 19 132 12.6 Black 9 11 52 17.5 6 97 5.8

Marginal table Defendant’s Race Death Penalty Total Yes No White 19 141 160 Black 17 149 166 36 290 326

Partial and Marginal Odd Ratio Partial Odd ratio describe the association when the third variable is controlled The Marginal Odd ratio describe the association when the Third variable is ignored (i.e when we sum the counts over the levels of the third variable to obtain a marginal two-way table)

Association Variables P-D P-V D-V Marginal 1.18 2.71 25.99 Partial Level 1 0.67 2.80 22.04 Level 2 0.79 3.29 25.90

Types of Independence A three-way IXJXK cross-classification of response variables X, Y, and Z has several potential types of independence We assume a multinomial distribution with cell probabilities {i jk}, and 𝑖 𝑗 𝑘 𝜋 𝑖𝑗𝑘 =1 The models also apply to Poisson sampling with means {𝜇 𝑖𝑗𝑘 }. The three variables are mutually independent when

Similarly, X could be jointly independent of Y and Z, or Z could be jointly independent of X and Y. Mutual independence (8.5) implies joint independence of any one variable from the others. X and Y are conditionally independent, given Z when independence holds for each partial table within which Z is fixed. That is, if 𝜋 𝑖𝑗|𝑘 =𝑃(𝑋=𝑖,𝑌=𝑗|𝑍=𝑘) then

Homogeneous Association and Three-Factor Interaction

Marginal vs Conditional Independence Partial association can be quite different from marginal association For further illustration, we now see that conditional independence of X and Y, given Z, does not imply marginal independence of X and Y The joint probability in Table 5.5 show hypothetical relationship among three variables for new graduate of a university

Table 5.5 Joint Probability Major Gender Income Low High Liberal Art Female 0.18 0.12 Male 0.08 Science or Engineering 0.02 0.32 Total 0.20 0.40

The association between Y=income at first job(high, low) and X=gender(female, male) at two level of Z=major discipline (liberal art, science or engineering) is described by the odd ratios 𝜃 𝑙𝑖𝑏 = 0.18×0.08 0.12×0.12 =1.0 𝜃 𝑠𝑐𝑖 = 0.02×0.32 0.08×0.08 =1.0 Income and gender are conditionally independent, given major

Marginal Probability of Y and X Gender Income low high Female 0.18+0.02=0.20 0.12+0.08=0.20 Male 0.08+0.32=0.40 Total 0.40 0.60 The odd ratio for the (income, gender) from marginal table 𝜃= 0.20×0.40 0.20×0.20 =2 The variables are not independent when we ignore major

Suppose Y is jointly independent of X and Z, so 𝜋 𝑖𝑗𝑘 = 𝜋 𝑖+𝑘 𝜋 +𝑗+ Then 𝜋 𝑖𝑗|𝑘 = 𝜋 𝑖𝑗𝑘 𝜋 ++𝑘 = 𝜋 𝑖+𝑘 𝜋 +𝑗+ 𝜋 ++𝑘 And summing both side over i we obtain 𝜋 +𝑗|𝑘 = 𝜋 ++𝑘 𝜋 +𝑗+ 𝜋 ++𝑘 = 𝜋 +𝑗+ Therefore 𝜋 𝑖𝑗|𝑘 = 𝜋 𝑖+𝑘 𝜋 ++𝑘 𝜋 +𝑗+ = 𝜋 𝑖+|𝑘 𝜋 +𝑗|𝑘

So X and Y are also conditionally independent. In summary, mutual indepedence of the variables implies that Y is jointly independent of X and Z, which itself implies that X and Y are conditionaaly independent. Suppose Y is jointly independent of X and Z, that is 𝜋 𝑖𝑗𝑘 = 𝜋 𝑖+𝑘 𝜋 +𝑗+ . Summing over k on both side, we obtain 𝜋 𝑖𝑗+ = 𝜋 𝑖++ 𝜋 +𝑗+ Thus, X and Y also exhibit marginal independence

So, joint independence of Y from X and Z (or X from Y and Z) implies X and Y are both marginally and condotionally independent. Since mutual independence of X, Y and Z implies that Y is jointly independent of X and Z, mutual independence also implies that X and Y are both marginally and conditionally independent However, when we know only that X and Y are conditionally independent, 𝜋 𝑖𝑗𝑘 = 𝜋 𝑖+𝑘 𝜋 +𝑗𝑘 / 𝜋 ++𝑘 Summing over k on both sides, we obtain 𝜋 𝑖𝑗+ = 𝑘 𝜋 𝑖+𝑘 𝜋 +𝑗𝑘 / 𝜋 ++𝑘

All three terms in the summation involve k, and this does not simplify to 𝜋 𝑖++ 𝜋 +𝑗+ , marginal independence

A model that permits all three pairs to be conditionally dependent is Model 8.11. is called the loglinear model of homogeneous association or of no three-factor interaction.

Loglinear Models for Three Dimensions Hierarchical Loglinear Models Let {ijk} denote expected frequencies. Suppose all ijk >0 and let ijk = log ijk . A dot in a subscript denotes the average with respect to that index; for instance,  ∙𝑗𝑘 = 𝑖  𝑖𝑗𝑘 𝐼 . We set 𝜇=  …  𝑖 𝑋 =  𝑖.. −  … ,  𝑗 𝑌 =  ∙𝑗∙ −  ∙∙∙ ,  𝑘 𝑍 =  ∙∙𝑘 −  ∙∙∙  𝑖𝑗 𝑋𝑌 =  𝑖𝑗∙ −  𝑖∙∙ −  ∙𝑗∙ +  ∙∙∙

 𝑖𝑘 𝑋𝑍 =  𝑖∙𝑘 −  𝑖∙∙ −  ∙∙𝑘 +  ⋯  𝑗𝑘 𝑌𝑍 =  ∙𝑗𝑘 −  ∙𝑗∙ −  ∙∙𝑘 +  ⋯  𝑖𝑗𝑘 𝑋𝑌𝑍 =  𝑖𝑗𝑘 −  𝑖𝑗∙ −  𝑖∙𝑘 −  ∙𝑗𝑘 +  𝑖∙∙ +  ∙𝑗∙ +  ∙∙𝑘 −  ⋯ The sum of parameters for any index equals zero. That is 𝑖  𝑖 𝑋 = 𝑗  𝑗 𝑌 = 𝑘  𝑘 𝑍 = 𝑖  𝑖𝑗 𝑋𝑌 = 𝑗  𝑖𝑗 𝑋𝑌 =⋯= 𝑘  𝑖𝑗𝑘 𝑋𝑌𝑍 =0

The general loglinear model for a three-way table is This model has as many parameters as observations and describes all possible positive i jk Setting certain parameters equal to zero in 8.12. yields the models introduced previously. Table 8.2 lists some of these models. To ease referring to models, Table 8.2 assigns to each model a symbol that lists the highest-order term(s) for each variable

Interpreting Model Parameters Interpretations of loglinear model parameters use their highest-order terms. For instance, interpretations for model (8.11). use the two-factor terms to describe conditional odds ratios At a fixed level k of Z, the conditional association between X and Y uses (I- 1)(J – 1). odds ratios, such as the local odds ratios Similarly, ( I – 1)(K – 1) odds ratios {i (j)k} describe XZ conditional association, and (J – 1)(K – 1) odds ratios {(i) jk} describe YZ conditional association.

Loglinear models have characterizations using constraints on conditional odds ratios. For instance, conditional independence of X and Y is equivalent to {i j(k)} = 1, i=1, . . . , I-1, j=1, . . . , J-1, k=1, . . . , K. substituting (8.11) for model (XY, XZ, YZ) into log i j(k) yields Any model not having the three-factor interaction term has a homogeneous association for each pair of variables.

For 2x2x2 tables

Alcohol, Cigarette, and Marijuana Use Example Table 8.3 refers to a 1992 survey by the Wright State University School of Medicine and the United Health Services in Dayton, Ohio. The survey asked 2276 students in their final year of high school in a nonurban area near Dayton, Ohio whether they had ever used alcohol, cigarettes, or marijuana. Denote the variables in this 222 table by A for alcohol use, C for cigarette use, and M for marijuana use.

Table 8.5 illustrates model association patterns by presenting estimated conditional and marginal odds ratios For example, the entry 1.0 for the AC conditional association for the model (AM, CM) of AC conditional independence is the common value of the AC fitted odds ratios at the two levels of M,

The entry 2.7 for the AC marginal association for this model is the odds ratio for the marginal AC fitted table Table 8.5 shows that estimated conditional odds ratios equal 1.0 for each pairwise term not appearing in a model, such as the AC association in model ( AM, CM). For that model, the estimated marginal AC odds ratio differs from 1.0, since conditional independence does not imply marginal independence. Model (AC, AM, CM) permits all pairwise associations but maintains homogeneous odds ratios between two variables at each level of the third. The AC fitted conditional odds ratios for this model equal 7.8. One can calculate this odds ratio using the model’s fitted values at either level of M, or from (8.14) using

INFERENCE FOR LOGLINEAR MODELS Chi-Squared Goodness-of-Fit Tests As usual, X 2 and G2 test whether a model holds by comparing cell fitted values to observed counts 𝑋 2 = 𝑖 𝑗 𝑘 𝑛 𝑖𝑗𝑘 − 𝜇 𝑖𝑗𝑘 2 𝜇 𝑖𝑗𝑘 𝐺 2 =2 𝑖 𝑗 𝑘 𝑛 𝑖𝑗𝑘 𝑙𝑜𝑔 𝑛 𝑖𝑗𝑘 𝜇 𝑖𝑗𝑘 Where nijk = observed frequency and 𝜇 𝑖𝑗𝑘 =expected frequency Here df equals the number of cell counts minus the number of model parameters. For the student survey (Table 8.3), Table 8.6 shows results of testing fit for several loglinear models.

Models that lack any association term fit poorly The model ( AC, AM, CM) that has all pairwise associations fits well (P= 0.54) It is suggested by other criteria also, such as minimizing AIC= - 2(maximized log likelihood - number of parameters in model) or equivalently, minimizing [G2- 2(df)].