Introduction to log-linear models

Slides:



Advertisements
Similar presentations
Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Advertisements

Sociology 690 Multivariate Analysis Log Linear Models.
Lecture 11 (Chapter 9).
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Logistic Regression.
Simple Logistic Regression
The Analysis of Categorical Data. Categorical variables When both predictor and response variables are categorical: Presence or absence Color, etc. The.
Loglinear Contingency Table Analysis Karl L. Wuensch Dept of Psychology East Carolina University.
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
1 STA 517 – Introduction: Distribution and Inference 1.5 STATISTICAL INFERENCE FOR MULTINOMIAL PARAMETERS  Recall multi(n, =( 1,  2, …,  c ))  Suppose.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
Generalized Linear Models
1 B. The log-rate model Statistical analysis of occurrence-exposure rates.
C. Logit model, logistic regression, and log-linear model A comparison.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
STAT E-150 Statistical Methods
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 1. Observations and random experiments Observations are viewed as outcomes of a random experiment.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
AS 737 Categorical Data Analysis For Multivariate
Categorical Data Prof. Andy Field.
Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda.
Logit model, logistic regression, and log-linear model A comparison.
A. Analysis of count data
Multinomial Distribution
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
Introduction Many experiments result in measurements that are qualitative or categorical rather than quantitative. Humans classified by ethnic origin Hair.
1 GLM I: Introduction to Generalized Linear Models By Curtis Gary Dean Distinguished Professor of Actuarial Science Ball State University By Curtis Gary.
CHI SQUARE TESTS.
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Logistic regression (when you have a binary response variable)
Nonparametric Statistics
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Test of Independence Tests the claim that the two variables related. For example: each sample (incident) was classified by the type of crime and the victim.
Goodness-of-Fit and Contingency Tables Chapter 11.
ERIC CANEN, M.S. UNIVERSITY OF WYOMING WYOMING SURVEY & ANALYSIS CENTER EVALUATION 2010: EVALUATION QUALITY SAN ANTONIO, TX NOVEMBER 13, 2010 What Am I.
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Nonparametric Statistics
Chapter 4: Basic Estimation Techniques
BINARY LOGISTIC REGRESSION
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Notes on Logistic Regression
Assessing Disclosure Risk in Microdata
Discrete Multivariate Analysis
John Loucks St. Edward’s University . SLIDES . BY.
Generalized Linear Models
Multiple logistic regression
Nonparametric Statistics
Chapter 8: Weighting adjustment
Categorical Data Analysis Review for Final
The log-rate model Statistical analysis of occurrence-exposure rates
Introduction to Logistic Regression
Joyful mood is a meritorious deed that cheers up people around you
Section 11-1 Review and Preview
Applied Statistics Using SPSS
Applied Statistics Using SPSS
Logistic Regression.
Karl L. Wuensch Department of Psychology East Carolina University
Presentation transcript:

Introduction to log-linear models Saturday, February 02, 2019Saturday, February 02, 2019 Analysis of count data Introduction to log-linear models Log-linear analysis = analysis on logarithmic scale!!

Logarithmic scale Natural logarithm If y = ln x x = exp[y] x changes exponentially with a linear change in y y is measured on log scale

Logarithmic scale If ln x = a, then x = exp(a) If ln x = az and z is discrete, then the change in x associated with one unit change in z is exp(a) If ln x = az and z is continuous, then the change in x associated with an infinitesimally small change in z is

Logarithmic scale and logit scale (First-order) difference in ln is ln of ratio Second-order difference If ln OR = 1.2 and ln a = -ln b = -ln c = ln d, then odds ln(odds) = logit If y = f(x) and y = ln(a/b) then y is measured on logit scale odds ratio coding

Log-linear analysis Contingency-table analysis Categorical data analysis Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) Analysis of cross-classified data Multivariate analysis of qualitative data (Goodman, 1978) Count data analysis

Log-linear model fit a model to a table of counts / frequencies Two data sets: Survey: political attitudes of British electors Survey: leaving parental home in the Netherlands

Survey: political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

Survey: leaving parental home in the Netherlands

Counts are generated by Poisson process  Poisson distribution

The Poisson probability model Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter  is the expected number of events per unit time interval:  = E[N]

Likelihood function Probability mass function: Log-likelihood function:  Likelihood equations to determine ‘best’ value of parameter 

Likelihood equations Hence: Hence: Var(N) = 

Log-linear model Let i represent an individual with characteristics xi The probability of observing ni events during a unit interval given that the expected number of events is  : with or Log-linear model

The log-linear model The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).

Log-linear models for two-way tables Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints

Survey: leaving parental home in the Netherlands Research question: do females leave home earlier than males?

Descriptive statistics Leaving home Descriptive statistics Counts Percentages Odds of leaving home early rather than late Reference category

Log-linear models for two-way tables 4 models Leaving home Log-linear models for two-way tables 4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) for all i and j Exp(4.887) = 132.5 = 530/4  = 4.887 s.e. 0.0434 ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)

Leaving home Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29) 

Log-linear models for two-way tables Leaving home Log-linear models for two-way tables Model 2: B null model: GLIM Categories of variable B (sex) are equiprobable within levels of variable A (age; time) for all j GLIM estimate s.e. Parameter Exp(parameter) Prediction 4.649 0.06914 Overall effect 104.5 0.0000 TIME(1) 1.000 104.5 0.4291 0.08886 TIME(2) 1.536 160.5 209/2 [321/2]/104.5

Log-linear models for two-way tables Leaving home Log-linear models for two-way tables Model 2: B null model:SPSS Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j SPSS estimate s.e. Parameter Exp(parameter) 5.0783 0.0558 Overall effect 160.5 -0.4291 0.0889 TIME(1) 0.6511 0.0000 TIME(2)

SPSS Model: Poisson Design: Constant + TIMING Observed Expected GENLOG timing sex /MODEL=POISSON /PRINT FREQ ESTIM CORR COV /PLOT NONE /CRITERIA =CIN(95) ITERATE(20) CONVERGE(.001) DELTA(0) /DESIGN timing /SAVE PRED . SPSS Model: Poisson Design: Constant + TIMING Observed Expected Factor Value Count % Count % TIMING Early SEX Females 135.00 ( 25.47) 104.50 ( 19.72) SEX Males 74.00 ( 13.96) 104.50 ( 19.72) TIMING Late SEX Females 143.00 ( 26.98) 160.50 ( 30.28) SEX Males 178.00 ( 33.58) 160.50 ( 30.28) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Parameter Estimates Asymptotic 95% CI Parameter Estimate SE Lower Upper 1 5.0783 .0558 4.97 5.19 2 -.4291 .0889 -.60 -.25 3 .0000 . . .

Design: Constant + SEX + TIMING Table Information Observed Expected GENLOG timing sex /MODEL=POISSON /PRINT FREQ ESTIM CORR COV /CRITERIA =CIN(95) ITERATE(20) CONVERGE(.001) DELTA(0) /DESIGN sex timing /SAVE PRED . Model: Poisson Design: Constant + SEX + TIMING Table Information Observed Expected Factor Value Count % Count % TIMING Early SEX Females 135.00 ( 25.47) 109.63 ( 20.68) SEX Males 74.00 ( 13.96) 99.37 ( 18.75) TIMING Late SEX Females 143.00 ( 26.98) 168.37 ( 31.77) SEX Males 178.00 ( 33.58) 152.63 ( 28.80) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Parameter Estimates Asymptotic 95% CI Parameter Estimate SE Lower Upper 1 5.0280 .0721 4.89 5.17 2 .0982 .0870 -.07 .27 3 .0000 . . . 4 -.4291 .0889 -.60 -.25 5 .0000 . . . Constant [SEX = 1] [SEX = 2] [TIMING = 1] [TIMING = 2]

Log-linear models for two-way tables Leaving home Log-linear models for two-way tables Model 3: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (age; time) estimate s.e. Parameter Exp(parameter) 4.697 0.0806 Overall effect 109.62 0.4291 0.0889 TIME(2) 1.536 -0.09819 0.0870 SEX(2) 0.906 GLIM

Females leaving home early: 109.62 LOG-LINEAR MODEL: predictions (unsaturated model) Females leaving home early: 109.62 Females leaving home late: 109.62 * 1.536 = 168.37 Males leaving home early: 109.62 * 0.906 = 99.37 Males leaving home late: 109.62 * 1.536 * 0.906 = 152.63

SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect Leaving home SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect 2 -.4291 .0889 Time(1) 3 .0000 . Time(2) 4 .0982 .0870 Sex(1) 5 .0000 . Sex (2)

Log-linear models for two-way tables Leaving home Log-linear models for two-way tables Model 4: saturated model The values of categories of variable B (sex) depend on levels of variable A (age; time) estimate s.e. parameter 4.905 0.08607 Overall effect 0.05757 0.1200 TIME(2) -0.6012 0.1446 SEX(2) 0.8201 0.1831 TIME(2).SEX(2) GLIM ln 135 ln 143 - ln 135 ln odds ln 74 - ln 135 ln odds ratio

Log-linear model parameters and odds and odds ratios Dummy-variable coding: Reference categories: conservative / male Interaction effect: ln odds ratio Dummy coding Main effects: ln odds(reference category) Time effect: ln odds(females) = ln 143/135 = ln 1.059 = 0.05757 Sex effect: ln odds(early) = ln 74/135 = ln 0.5481 = -0.6012 Dummy coding Overall effect: ln frequency ln frequency(early, female) = ln 135 = 4.9053 Dummy coding

Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect Leaving home Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect 2 -.8738 .1379 Time(1) 3 .0000 . Time(2) 4 -.2183 .1121 Sex(1) 5 .0000 . Sex(2) 6 .8164 .1827 Time(1) * Sex(1) 7 .0000 . Time(1) * Sex(2) 8 .0000 . Time(2) * Sex(1) 9 .0000 . Time(2) * Sex(2) SPSS

LOG-LINEAR MODEL: predictions Expected frequencies Leaving home LOG-LINEAR MODEL: predictions Expected frequencies Observed Model 1 Model 2 Model 3 Model 4 Model 5 Fem_<20 F11 135 132.50 104.50 139.00 109.63 135.00 Mal_<20 F12 74 132.50 104.50 126.00 99.37 74.00 Fem_>20 F21 143 132.50 160.50 139.00 168.37 143.00 Mal_>20 F22 178 132.50 160.50 126.00 152.63 178.00 D:\s\1\liebr\2_2\2_2.wq2

Relation log-linear model and Poisson regression model are dummy variables (0 if categ. i or j = 1 and 1 if i or j = 2) and interaction variable is

Log-linear model fit a model to a table of frequencies Data: survey of political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp. 105-144 [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)

The classical approach Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233

The basic model Political attitudes Overall effect : 22.98/4 = 5.7456 Effect of party : Conservative : 11.49/2 - 5.7456 = 0.0018 Labour : 11.49/2 - 5.7456 = -0.0018 Effect of gender : Male : 11.44/2 - 5.7456 = -0.0229 Female : 11.54/2 - 5.7456 = 0.0229 Interaction effects: Gender-Party interaction effect Male conservative : 5.6312 - 5.7456 - 0.0018 + 0.0229 = -0.0933 Female conservative : 5.8636 - 5.7456 - 0.0018 - 0.0229 = 0.0933 Male labour : 5.8141 - 5.7456 + 0.0018 + 0.0229 = 0.0933 Female labour : 5.6733 - 5.7456 + 0.0018 - 0.0229 = -0.0933

Parameters are subject to constraints: normalisation constraints Political attitudes The basic model Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:220-233 Coding: effect coding Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated:

Political attitudes The basic model (GLIM) Estimate S.E.

Log-linear model parameters and odds and odds ratios Dummy-variable coding: Reference categories: conservative / male Interaction effect: ln odds ratio Main effects: ln odds(reference category) Party effect: ln odds(males) = ln 335/279 = ln 1.201 = 0.1829 Gender effect: ln odds(conservatives) = ln 352/279 = ln 1.262 = 0.2324 Overall effect: ln frequency ln frequency(conservatives,males) = ln 279 = 5.6312

Log-linear model parameters and odds and odds ratios Recall: translation from odds to probabilities If you want to predict probabilities or proportions instead of odds

Log-linear model parameters and odds and odds ratios Effect coding: +1:labour / female -1: conservative / male Interaction effect: ln odds ratio Dummy coding Translation between dummy-variable coding and effect coding (Alba, 1987) Sign Parameter Male conservative + 1 +(-0.3732/4) = - 0.0933 Female conservative - 1 -(-0.3732/4) = 0.0933 Male labour - 1 -(-0.3732/4) = 0.0933 Female labour + 1 +(-0.3732/4) = - 0.0933 Effect coding Translation between effect coding and dummy-variable coding: WEIGHTED SUM (+1)(-0.0933)+(-1)(0.0933)+(-1)(0.0933)+(+1)(-0.0933) = -0.3732

Log-linear model parameters and odds and odds ratios Effect coding: +1:labour / female -1: conservative / male Main effects: ln odds(reference category) Gender effect: ln odds(conservatives) = ln 352/279 = ln 1.262 = 0.2324 Dummy coding Translation Sign Parameter Female + 1 +0.2324/2 - 0.0933 = 0.0229 Male - 1 -(0.2324/2 - 0.0933) = -0.0229 Effect coding (ln odds) / 2 (ln odds ratio) / 4 Translation: WEIGHTED SUM Dummy coding (+1)(0.0229+0.0933)+(-1)(-0.0229-0.0933) = 0.2324 Female conservative Male conservative

Log-linear model parameters and odds and odds ratios Effect coding: +1:labour / female -1: conservative / male Main effects: ln odds(reference category) Party effect: ln odds(males) = ln 335/279 = ln 1.201 = 0.1829 Dummy coding Translation Sign Parameter Conservative - 1 -(0.1829/2 - 0.0933) = 0.00185 Labour + 1 +0.1829/2 - 0.0933 = -0.00185 Effect coding (ln odds) / 2 (ln odds ratio) / 4 Translation: WEIGHTED SUM Dummy coding (-1)(0.00185-0.0933)+(+1)(-0.00185+0.0933) = 0.1829 Conservative male Labour male

Log-linear model parameters and odds and odds ratios Effect coding: +1:labour / female -1: conservative / male Overall effect: ln frequency ln frequency(conservatives,males) = ln 279 = 5.6312 Dummy coding Translation Sign Parameter Conservatives, males +1 5.6312 - 0.00185 + 0.0229 + 0.0933 = 5.7456 Effect coding (ln odds)/2 (ln odds ratio)/4 (ln odds)/2 Translation: WEIGHTED SUM Dummy coding (+1)[5.7456+0.00185 -0.0229 -0.0933 ] = 5.6312 Conservative Male Conservative Male

Political attitudes The basic model (SPSS)

The basic model (1) Political attitudes ln 11 = 5.7456 + 0.0018 - 0.0229 - 0.0933 = 5.6312 ln 12 = 5.7456 + 0.0018 + 0.0229 + 0.0933 = 5.8636 ln 21 = 5.7456 - 0.0018 - 0.0229 + 0.0933 = 5.8142 ln 22 = 5.7456 - 0.0018 + 0.0229 - 0.0933 = 5.6734

The design-matrix approach

Design matrix unsaturated log-linear model  Number of parameters exceeds number of equations  need for additional equations (X’X)-1 is singular  identify linear dependencies

Design matrix unsaturated log-linear model (additional eq.) Coding!

3 unknowns  3 equations where is the frequency predicted by the model

Political attitudes

 Political attitudes 314.17*1.0040*0.9772 = 308.23 314.17*[1/1.0040]*0.9772 = 305.78

Design matrix Saturated log-linear model

Political attitudes exp[5.7456+0.0018-0.0229-0.0933] = exp[5.6312] = 279 exp[5.7456-0.0018-0.0229+0.0933] = 335

Political attitudes

Design matrix: other restrictions on parameters saturated log-linear model (SPSS)

Political attitudes

Political attitudes REF: females labour REF: males conservative 335/279 352/291 REF: females labour REF: males conservative

Political attitudes

Prediction of counts or frequencies: Political attitudes Prediction of counts or frequencies: A. Effect coding 279 = 312.80 * 0.97736 * 1.00185 * 0.91092 352 = 312.80 * 1.02316 * 1.00185 * 1.09779 335 = 312.80 * 0.97736 * 0.99815 * 1.09779 291 = 312.80 * 1.02316 * 0.99815 * 0.91092 B. Contrast coding: GLIM 291 = 279 * 1.2616 * 1.2007 * 0.6885 (females voting labour) 279 = 279 * 1 * 1 * 1 (males voting conservative = ref.cat) 352 = 279 * 1.2616 * 1 * 1 (females voting conservative) 335 = 279 * 1 * 1.2007 * 1 (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = 291.5 * 1.15096 * 1.20925 * 0.68894 352.5 = 291.5 * 1 * 1.20925 * 1 291.5 = 291.5 * 1 * 1 * 1 (females voting labour = ref.cat) 335.5 = 291.5 * 1.15096 * 1 * 1

The Poisson regression model

The Poisson probability model Political attitudes The Poisson probability model with

Design: Constant + DESTIN + ORIGIN Model: Poisson Design: Constant + DESTIN + ORIGIN Parameter Estimate SE 1 16.0698 .0002 Overall 2 -.0122 .0002 Destin 1 3 .1594 .0002 Destin 2 4 .5115 .0002 Destin 3 5 .0000 . Destin 4 6 .0235 .0002 Origin 1 7 .1871 .0002 Origin 2 8 .5051 .0002 Origin 3 9 .0000 . Origin 4

Hybrid log-linear models Hybrid log-linear models contain unconventional effect parameters. Interaction effects are restricted in certain way.  restrictions on interaction parameters.

Restrictions on effect parameters Some parameter values are fixed e.g. offset (biproportional adjustment) e.g. quasi-independence model (ij = 0 for i=j) Relation between some parameter values is fixed e.g. normalisation restrictions (coding) e.g. hybrid log-linear models

Examples of hybrid log-linear models Diagonals parameter model 1: (main) diagonal effect With ck = 1 for i  j and ck = c for i = j (diagonal) Off-diagonal elements are independent and diagonal elements are changed by a common factor.

ck = 1 for i  j and ck = ci for i = j (diagonal) Diagonals parameter model 2: each diagonal element has separate effect parameter ck = 1 for i  j and ck = ci for i = j (diagonal) Diagonal elements are predicted perfectly by the model Diagonals parameter model 3: the diagonal and each minor diagonal has unique effect parameter With k indicated the diagonal: k = R + i - j where R is the number of rows (or columns). There are 2R-1 values of ck. Application: APC models

Sufficient statistics Predicted marginal totals should satisfy the sufficient statistics Model: With Sk the set (i,j)-combinations with the same value of ck. Predicted cell frequencies should satisfy: or with

Algorithms for hybrid log-linear models Generalized iterative scaling algorithm by Darroch and Ratcliffe (1972) Iterative proportional fitting (IPF) applied to unfolded table