Download presentation
Presentation is loading. Please wait.
1
Introduction to log-linear models
Saturday, February 02, 2019Saturday, February 02, 2019 Analysis of count data Introduction to log-linear models Log-linear analysis = analysis on logarithmic scale!!
2
Logarithmic scale Natural logarithm If y = ln x
x = exp[y] x changes exponentially with a linear change in y y is measured on log scale
3
Logarithmic scale If ln x = a, then x = exp(a)
If ln x = az and z is discrete, then the change in x associated with one unit change in z is exp(a) If ln x = az and z is continuous, then the change in x associated with an infinitesimally small change in z is
4
Logarithmic scale and logit scale
(First-order) difference in ln is ln of ratio Second-order difference If ln OR = 1.2 and ln a = -ln b = -ln c = ln d, then odds ln(odds) = logit If y = f(x) and y = ln(a/b) then y is measured on logit scale odds ratio coding
5
Log-linear analysis Contingency-table analysis
Categorical data analysis Discrete multivariate analysis (Bishop, Fienberg and Holland, 1975) Analysis of cross-classified data Multivariate analysis of qualitative data (Goodman, 1978) Count data analysis
6
Log-linear model fit a model to a table of counts / frequencies
Two data sets: Survey: political attitudes of British electors Survey: leaving parental home in the Netherlands
7
Survey: political attitudes of British electors
Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)
8
Survey: leaving parental home in the Netherlands
9
Counts are generated by Poisson process Poisson distribution
10
The Poisson probability model
Let N be a random variable representing the number of events during a unit interval and let n be a realisation of n (COUNT): N is a Poisson r.v. following a Poisson distribution with parameter : The parameter is the expected number of events per unit time interval: = E[N]
12
Likelihood function Probability mass function: Log-likelihood function: Likelihood equations to determine ‘best’ value of parameter
13
Likelihood equations Hence: Hence: Var(N) =
14
Log-linear model Let i represent an individual with characteristics xi
The probability of observing ni events during a unit interval given that the expected number of events is : with or Log-linear model
15
The log-linear model The objective of log-linear analysis is to determine if the distribution of counts among the cells of a table can be explained by a simpler, underlying structure. Log-linear models specify different structures in terms of the cross-classified variables (rows, columns and layers of the table).
16
Log-linear models for two-way tables
Saturated log-linear model: Overall effect (level) Main effects (marginal freq.) Interaction effect In case of 2 x 2 table: 4 observations 9 parameters Normalisation constraints
17
Survey: leaving parental home in the Netherlands
Research question: do females leave home earlier than males?
18
Descriptive statistics
Leaving home Descriptive statistics Counts Percentages Odds of leaving home early rather than late Reference category
19
Log-linear models for two-way tables 4 models
Leaving home Log-linear models for two-way tables 4 models Model 1: Null model or overall effect model All categories are equiprobable (an observation is equally likely to fall into any cell) for all i and j Exp(4.887) = 132.5 = 530/4 = s.e ij is expected count (frequency) in cell (ij): category i of variable A (row) and category j of variable B (column)
20
Leaving home Where ij is a cell frequency generated by a Poisson process and Var[aX] = a2 Var[X] where a is a constant (e.g. Fingleton, 1984, p. 29)
21
Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 2: B null model: GLIM Categories of variable B (sex) are equiprobable within levels of variable A (age; time) for all j GLIM estimate s.e Parameter Exp(parameter) Prediction Overall effect TIME(1) TIME(2) 209/2 [321/2]/104.5
22
Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 2: B null model:SPSS Categories of variable B (sex) are equiprobable within levels of variable A (time) for all j SPSS estimate s.e Parameter Exp(parameter) Overall effect TIME(1) TIME(2)
23
SPSS Model: Poisson Design: Constant + TIMING Observed Expected
GENLOG timing sex /MODEL=POISSON /PRINT FREQ ESTIM CORR COV /PLOT NONE /CRITERIA =CIN(95) ITERATE(20) CONVERGE(.001) DELTA(0) /DESIGN timing /SAVE PRED . SPSS Model: Poisson Design: Constant + TIMING Observed Expected Factor Value Count % Count % TIMING Early SEX Females ( 25.47) ( 19.72) SEX Males ( 13.96) ( 19.72) TIMING Late SEX Females ( 26.98) ( 30.28) SEX Males ( 33.58) ( 30.28) Parameter Estimates Asymptotic 95% CI Parameter Estimate SE Lower Upper
24
Design: Constant + SEX + TIMING Table Information Observed Expected
GENLOG timing sex /MODEL=POISSON /PRINT FREQ ESTIM CORR COV /CRITERIA =CIN(95) ITERATE(20) CONVERGE(.001) DELTA(0) /DESIGN sex timing /SAVE PRED . Model: Poisson Design: Constant + SEX + TIMING Table Information Observed Expected Factor Value Count % Count % TIMING Early SEX Females ( 25.47) ( 20.68) SEX Males ( 13.96) ( 18.75) TIMING Late SEX Females ( 26.98) ( 31.77) SEX Males ( 33.58) ( 28.80) Parameter Estimates Asymptotic 95% CI Parameter Estimate SE Lower Upper Constant [SEX = 1] [SEX = 2] [TIMING = 1] [TIMING = 2]
25
Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 3: independence model (unsaturated model) Categories of variable B (sex) are not equiprobable but the probability is independent of levels of variable A (age; time) estimate s.e Parameter Exp(parameter) Overall effect TIME(2) SEX(2) GLIM
26
Females leaving home early: 109.62
LOG-LINEAR MODEL: predictions (unsaturated model) Females leaving home early: Females leaving home late: * = Males leaving home early: * = 99.37 Males leaving home late: * * =
27
SPSS Parameter Estimate SE 1 5.0280 .0721 Overall effect
Leaving home SPSS Parameter Estimate SE Overall effect Time(1) Time(2) Sex(1) Sex (2)
28
Log-linear models for two-way tables
Leaving home Log-linear models for two-way tables Model 4: saturated model The values of categories of variable B (sex) depend on levels of variable A (age; time) estimate s.e parameter Overall effect TIME(2) SEX(2) TIME(2).SEX(2) GLIM ln 135 ln ln 135 ln odds ln 74 - ln 135 ln odds ratio
29
Log-linear model parameters and odds and odds ratios
Dummy-variable coding: Reference categories: conservative / male Interaction effect: ln odds ratio Dummy coding Main effects: ln odds(reference category) Time effect: ln odds(females) = ln 143/135 = ln = Sex effect: ln odds(early) = ln 74/135 = ln = Dummy coding Overall effect: ln frequency ln frequency(early, female) = ln 135 = Dummy coding
30
Parameter Estimate SE Parameter 1 5.1846 .0748 Overall effect
Leaving home Parameter Estimate SE Parameter Overall effect Time(1) Time(2) Sex(1) Sex(2) Time(1) * Sex(1) Time(1) * Sex(2) Time(2) * Sex(1) Time(2) * Sex(2) SPSS
31
LOG-LINEAR MODEL: predictions Expected frequencies
Leaving home LOG-LINEAR MODEL: predictions Expected frequencies Observed Model 1 Model 2 Model 3 Model 4 Model 5 Fem_<20 F Mal_<20 F Fem_>20 F Mal_>20 F D:\s\1\liebr\2_2\2_2.wq2
32
Relation log-linear model and Poisson regression model
are dummy variables (0 if categ. i or j = 1 and 1 if i or j = 2) and interaction variable is
33
Log-linear model fit a model to a table of frequencies
Data: survey of political attitudes of British electors Source: Payne, C. (1977) The log-linear model for contingency. In: C.O. Muircheartaigh and C. Payne eds. The analysis of survey data. Vol 2: Model fitting, Wiley, New York, pp [data p. 106].(from Butler and Stokes, ‘Political change in Britain’, Macmillan, 2nd edidition, 1974)
34
The classical approach
Geometric means (Birch, 1963) Effect coding (mean is ref. Cat.) Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25:
35
The basic model Political attitudes Overall effect : 22.98/4 = 5.7456
Effect of party : Conservative : 11.49/ = Labour : 11.49/ = Effect of gender : Male : 11.44/ = Female : 11.54/ = Interaction effects: Gender-Party interaction effect Male conservative : = Female conservative : = Male labour : = Female labour : =
36
Parameters are subject to constraints: normalisation constraints
Political attitudes The basic model Birch, M.W. (1963) ‘Maximum likelihood in three-way contingency tables’,J. Royal Stat. Soc. (B), 25: Coding: effect coding Parameters are subject to constraints: normalisation constraints Only first-order contrasts can be estimated:
37
Political attitudes The basic model (GLIM) Estimate S.E.
38
Log-linear model parameters and odds and odds ratios
Dummy-variable coding: Reference categories: conservative / male Interaction effect: ln odds ratio Main effects: ln odds(reference category) Party effect: ln odds(males) = ln 335/279 = ln = Gender effect: ln odds(conservatives) = ln 352/279 = ln = 0.2324 Overall effect: ln frequency ln frequency(conservatives,males) = ln 279 =
39
Log-linear model parameters and odds and odds ratios
Recall: translation from odds to probabilities If you want to predict probabilities or proportions instead of odds
40
Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Interaction effect: ln odds ratio Dummy coding Translation between dummy-variable coding and effect coding (Alba, 1987) Sign Parameter Male conservative ( /4) = Female conservative ( /4) = Male labour ( /4) = Female labour ( /4) = Effect coding Translation between effect coding and dummy-variable coding: WEIGHTED SUM (+1)( )+(-1)(0.0933)+(-1)(0.0933)+(+1)( ) =
41
Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Main effects: ln odds(reference category) Gender effect: ln odds(conservatives) = ln 352/279 = ln = 0.2324 Dummy coding Translation Sign Parameter Female / = Male (0.2324/ ) = Effect coding (ln odds) / 2 (ln odds ratio) / 4 Translation: WEIGHTED SUM Dummy coding (+1)( )+(-1)( ) = Female conservative Male conservative
42
Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Main effects: ln odds(reference category) Party effect: ln odds(males) = ln 335/279 = ln = Dummy coding Translation Sign Parameter Conservative (0.1829/ ) = Labour / = Effect coding (ln odds) / 2 (ln odds ratio) / 4 Translation: WEIGHTED SUM Dummy coding (-1)( )+(+1)( ) = Conservative male Labour male
43
Log-linear model parameters and odds and odds ratios
Effect coding: +1:labour / female -1: conservative / male Overall effect: ln frequency ln frequency(conservatives,males) = ln 279 = Dummy coding Translation Sign Parameter Conservatives, males = Effect coding (ln odds)/2 (ln odds ratio)/4 (ln odds)/2 Translation: WEIGHTED SUM Dummy coding (+1)[ ] = Conservative Male Conservative Male
44
Political attitudes The basic model (SPSS)
45
The basic model (1) Political attitudes
ln 11 = = ln 12 = = ln 21 = = ln 22 = =
46
The design-matrix approach
47
Design matrix unsaturated log-linear model
Number of parameters exceeds number of equations need for additional equations (X’X)-1 is singular identify linear dependencies
48
Design matrix unsaturated log-linear model
(additional eq.) Coding!
49
3 unknowns 3 equations where is the frequency predicted by the model
50
Political attitudes
51
Political attitudes 314.17*1.0040*0.9772 = 308.23
314.17*[1/1.0040]* =
52
Design matrix Saturated log-linear model
53
Political attitudes exp[ ] = exp[5.6312] = 279 exp[ ] = 335
54
Political attitudes
55
Design matrix: other restrictions on parameters saturated log-linear model
(SPSS)
56
Political attitudes
57
Political attitudes REF: females labour REF: males conservative
335/279 352/291 REF: females labour REF: males conservative
58
Political attitudes
59
Prediction of counts or frequencies:
Political attitudes Prediction of counts or frequencies: A. Effect coding 279 = * * * 352 = * * * 335 = * * * 291 = * * * B. Contrast coding: GLIM 291 = 279 * * * (females voting labour) 279 = 279 * * * (males voting conservative = ref.cat) 352 = 279 * * * (females voting conservative) 335 = 279 * * * (males voting labour) C. Contrast coding: SPSS (SPSS adds 0.5 to observed values ) 279.5 = * * * 352.5 = * * * 1 291.5 = * * * 1 (females voting labour = ref.cat) 335.5 = * * * 1
60
The Poisson regression model
61
The Poisson probability model
Political attitudes The Poisson probability model with
65
Design: Constant + DESTIN + ORIGIN
Model: Poisson Design: Constant + DESTIN + ORIGIN Parameter Estimate SE Overall Destin 1 Destin 2 Destin 3 Destin 4 Origin 1 Origin 2 Origin 3 Origin 4
66
Hybrid log-linear models
Hybrid log-linear models contain unconventional effect parameters. Interaction effects are restricted in certain way. restrictions on interaction parameters.
67
Restrictions on effect parameters
Some parameter values are fixed e.g. offset (biproportional adjustment) e.g. quasi-independence model (ij = 0 for i=j) Relation between some parameter values is fixed e.g. normalisation restrictions (coding) e.g. hybrid log-linear models
68
Examples of hybrid log-linear models
Diagonals parameter model 1: (main) diagonal effect With ck = 1 for i j and ck = c for i = j (diagonal) Off-diagonal elements are independent and diagonal elements are changed by a common factor.
69
ck = 1 for i j and ck = ci for i = j (diagonal)
Diagonals parameter model 2: each diagonal element has separate effect parameter ck = 1 for i j and ck = ci for i = j (diagonal) Diagonal elements are predicted perfectly by the model Diagonals parameter model 3: the diagonal and each minor diagonal has unique effect parameter With k indicated the diagonal: k = R + i - j where R is the number of rows (or columns). There are 2R-1 values of ck. Application: APC models
70
Sufficient statistics Predicted marginal totals should satisfy the sufficient statistics
Model: With Sk the set (i,j)-combinations with the same value of ck. Predicted cell frequencies should satisfy: or with
71
Algorithms for hybrid log-linear models
Generalized iterative scaling algorithm by Darroch and Ratcliffe (1972) Iterative proportional fitting (IPF) applied to unfolded table
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.