Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier.

Slides:



Advertisements
Similar presentations
Testing for Marginal Independence Between Two Categorical Variables with Multiple Responses Robert Jeutong.
Advertisements

Contingency Table Analysis Mary Whiteside, Ph.D..
LOGLINEAR MODELS FOR INDEPENDENCE AND INTERACTION IN THREE-WAY TABLES
Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Three or more categorical variables
1 Markov Chains: Transitional Modeling Qi Liu. 2 content Terminology Transitional Models without Explanatory Variables Transitional Models without Explanatory.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
The Analysis of Categorical Data. Categorical variables When both predictor and response variables are categorical: Presence or absence Color, etc. The.
The Multigraph for Loglinear Models Harry Khamis Statistical Consulting Center Wright State University Dayton, Ohio, USA.
Loglinear Contingency Table Analysis Karl L. Wuensch Dept of Psychology East Carolina University.
Chapter 13: The Chi-Square Test
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
Log-linear Analysis - Analysing Categorical Data
Lesson #29 2  2 Contingency Tables. In general, contingency tables are used to present data that has been “cross-classified” by two categorical variables.
1 Modeling Ordinal Associations Section 9.4 Roanna Gee.
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
BCOR 1020 Business Statistics
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
C. Logit model, logistic regression, and log-linear model A comparison.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Poisson Regression Caution Flags (Crashes) in NASCAR Winston Cup Races L. Winner (2006). “NASCAR Winston Cup Race Results for ,” Journal.
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.
Log-linear Models For 2-dimensional tables. Two-Factor ANOVA (Mean rot of potatoes) Bacteria Type Temp123 1=Cool 2=Warm.
© Department of Statistics 2012 STATS 330 Lecture 28: Slide 1 Stats 330: Lecture 28.
AP STATISTICS Section 4.2 Relationships between Categorical Variables.
Logit model, logistic regression, and log-linear model A comparison.
A. Analysis of count data
Multinomial Distribution
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
Probability Unit 4 - Statistics What is probability? Proportion of times any outcome of any random phenomenon would occur in a very long series of repetitions.
1 STA 617 – Chp9 Loglinear/Logit Models Loglinear / Logit Models  Chapter 5-7 logistic regression: GLM with logit link binomial / multinomial  Chapter.
1 In this case, each element of a population is assigned to one and only one of several classes or categories. Chapter 11 – Test of Independence - Hypothesis.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
Data Analysis for Two-Way Tables. The Basics Two-way table of counts Organizes data about 2 categorical variables Row variables run across the table Column.
Contingency Tables 1.Explain  2 Test of Independence 2.Measure of Association.
 Some variables are inherently categorical, for example:  Sex  Race  Occupation  Other categorical variables are created by grouping values of a.
BPS - 3rd Ed. Chapter 61 Two-Way Tables. BPS - 3rd Ed. Chapter 62 u In this chapter we will study the relationship between two categorical variables (variables.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 STA 617 – Chp11 Models for repeated data Analyzing Repeated Categorical Response Data  Repeated categorical responses may come from  repeated measurements.
1 STA 617 – Chp9 Loglinear/Logit Models 9.7 Poisson regressions for rates  In Section 4.3 we introduced Poisson regression for modeling counts. When outcomes.
STA617 Advanced Categorical Data Analysis
Joyful mood is a meritorious deed that cheers up people around you like the showering of cool spring breeze.
1 STA 517 – Chp4 Introduction to Generalized Linear Models 4.3 GENERALIZED LINEAR MODELS FOR COUNTS  count data - assume a Poisson distribution  counts.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
AP STATISTICS LESSON (DAY 1) INFERENCE FOR TWO – WAY TABLES.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis – mutually exclusive – exhaustive.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 12 Tests of Goodness of Fit and Independence n Goodness of Fit Test: A Multinomial.
1 Statistical Analysis Professor Lynne Stokes Department of Statistical Science Lecture #1 Chi-square Contingency Table Test.
Categorical Data Analysis
ERIC CANEN, M.S. UNIVERSITY OF WYOMING WYOMING SURVEY & ANALYSIS CENTER EVALUATION 2010: EVALUATION QUALITY SAN ANTONIO, TX NOVEMBER 13, 2010 What Am I.
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Bivariate Association. Introduction This chapter is about measures of association This chapter is about measures of association These are designed to.
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 22/11/ :12 AM 1 Contingency tables and log-linear models.
Categorical Data Aims Loglinear models Categorical data
Introduction to logistic regression a.k.a. Varbrul
Goodness of Fit Tests The goal of χ2 goodness of fit tests is to test is the data comes from a certain distribution. There are various situations to which.
Data Analysis for Two-Way Tables
Test of Independence in 3 Variables
Relations in Categorical Data
Categorical Data Analysis
Joyful mood is a meritorious deed that cheers up people around you
Presentation transcript:

Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier

Quick Review from Agresti, 4.3 Poisson Loglinear Models are based on Poisson distribution of Y counts and employ log link function: log μ Y = α + βx μ Y = exp(α + βx)

Value of Loglinear Models? Used to model cell counts in contingency tables where at least 2 variables are response variables Specify how expected cell counts depend on levels of categorical variables Allow for analysis of association and interaction patterns among variables

Models for Two-way Tables Independence Model –μ ij = μα i β j –log μ ij = λ + λ i X + λ j Y –where λ i X is row effect, and λ j Y is column effect –odds for column response independent of row Saturated (Dependence) Model –terms logμ ij = λ + λ i X + λ j Y + λ ij XY –where λ ij XY are association that represent interactions between X and Y –odds for column response depends on row

Loglinear Models for Three-way (I x J x K) Tables Describe independence and association patterns Assume a multinomial distribution of cell counts with cell probabilities {π ijk } Also apply to Poisson sampling with means {µ ijk }

Types of Independence for Cell Probabilities in I x J x K Tables Mutual Independence Joint Independence Conditional Independence Marginal Independence

Mutual Independence π ijk = (π i++ ) (π +j+ ) (π ++k ) for all i, j, k Loglinear Model for Expected Frequencies –log μ ijk = λ + λ i X + λ j Y + λ k Z Interpretation: –X independent of Y independent of Z independent of X –No association between variables

Joint Independence X jointly independent of Y and Z: –π ijk = (π +jk ) (π i++ ) for all i, j, k Loglinear Model for Expected Frequencies –log μ ijk = λ + λ i X + λ j Y + λ k Z + λ jk YZ Interpretation: –X independent of Y and Z –Partial association between variables Y and Z 3 Joint Independence Models

Conditional Independence X and Y conditionally independent of Z: – π ijk = (π i+k ) (π +jk ) / π ++k for all i, j, k Loglinear Model for Expected Frequencies –log μ ijk = λ + λ i X + λ j Y + λ k Z + λ ik XZ + λ jk YZ Interpretation: –X and Y independent given Z –Partial association between X,Z and Y,Z 3 Conditional Independence Models

Marginal Independence X and Y marginally independent of Z: – π ij+ = (π j++ ) (π +j+ ) for all i, j, k Interpretation: –X and Y independent in the two-way table that has been collapsed over the levels of Z –Variables may have different strength of marginal association than conditional (partial) association - Simpson’s Paradox

Partial v. Marginal Tables

Relationships Among Types of XY Independence

Homogenous Association Model Loglinear Model for Expected Frequencies –log μ ijk = λ + λ i X + λ j Y + λ k Z + λ ij XY + λ ik XZ + λ jk YZ Interpretation: –Homogenous association: identical conditional odds ratios between any two variables over the levels of the third variable θ ij(1) = θ ij(2) = … = θ ij(K) for all i and j

Saturated Model Loglinear Model for Expected Frequencies –log μ ijk = λ + λ i X + λ j Y + λ k Z + λ ij XY + λ ik XZ + λ jk YZ + λ ijk XYZ Interpretation: –Each pair of variables may be conditionally dependent –Odds ratios for any pair of variables may vary over levels of the third variable –perfect fit to observed data

Inference for Loglinear Models Interpretation of Loglinear model parameters is at the level of the highest- order terms χ 2 or G 2 Goodness of Fit Tests can be used to select best fitting model Parameter estimates are log odds ratios for associations

Example: Alcohol, Cigarette, and Marijuana Data Alcohol UseCigarette Use Marijuana Use: Yes Marijuana Use: NO Yes No NoYes No Source: Data courtesy of Harry Khamis, Wright State University

SAS Code data drugs; input a c m count; cards; ; proc genmod; class a c m; model count = a c m / dist=poi link=log obstats; run; proc genmod; class a c m; model count = a c m c*m / dist=poi link=log obstats; run; proc genmod; class a c m; model count = a c m a*m / dist=poi link=log obstats; run; proc genmod; class a c m; model count = a c m a*c / dist=poi link=log obstats; run; proc genmod; class a c m; model count = a c m a*c a*m / dist=poi link=log obstats; run; proc genmod; class a c m; model count = a c m a*c c*m / dist=poi link=log obstats; run; proc genmod; class a c m; model count = a c m a*c a*m c*m / dist=poi link=log obstats; run; proc genmod; class a c m; model count = a c m a*c a*m c*m a*c*m/ dist=poi link=log obstats; run;

Fitted Values for Loglinear Models Alcohol Use Cigarette Use Marijuan a Use (A, C, M) (AC, M)(AM, CM) (AC, AM, CM) (ACM) Yes No NoYes No NoYes No NoYes No Loglinear Model A, alcohol use; C, cigarette use; M, marijuana use. a

Estimated Odds Ratios for Loglinear Models Model Conditional Association Marginal Association AC AM CM AC AM CM (A,C,M) (AC,M) (AM,CM) (AC,AM,CM) (ACM)

Computation of the Odds Ratio

Model (AC, AM, CM) permits all pairwise associations but maintains homogeneous odds rations between two variables at each level of the third. The previous table shows that estimated odds ratios are very dependent on the model, and from this we can only say that the model fits well.

Conditional independence has implications regarding marginal (in) dependence; however, marginal (in) dependence does not have implications regarding conditional (in) dependence. Conditional independence->marginal independence Conditional independence->marginal dependence Marginal independence does not ->conditional independence Marginal dependence does not ->conditional dependence.