(Hierarchical) Log-Linear Models Friday 18 th March 2011.

(Hierarchical) Log-Linear Models Friday 18 th March 2011

Hierarchical log-linear models These are models which are applied to multi- way cross-tabulations, and hence categorical data They focus on the presence or absence of relationships between the variables defining the cross-tabulation More sophisticated models can also take into account the form of relationship that exists between two variables, but we will not consider these models in this module…

A standard form of notation for (hierarchical) log- linear models labels each variable with a letter, and places the effects of/relationships between these variables within square brackets. Suppose, for example, the topic of interest is intergenerational social class mobility. If parental class is labelled ‘P’ and a child’s own social class is labelled ‘O’, then, within a model: [ P ] would indicate the inclusion of the parental class variable, [ PO ] would indicate a relationship between parental class and child’s own class.

A bivariate analysis Bivariate (hierarchical) log-linear models are of limited interest, but for illustrative purposes, there are two models of a two-way cross- tabulation: [ P ] [ O ], the ‘independence model’, which indicates that the two variables are unrelated. [ PO ], an example of a ‘saturated model’, wherein all of the variables are related to each other simultaneously (i.e. in this simplest form of saturated model, the two variables are related).

‘Goodness (or ‘badness’)-of-fit The model [ PO ] is consistent with any observed relationship in a cross- tabulation, and hence, by definition, fits the observed data perfectly. It is therefore said to have a ‘goodness-of- fit’ value of 0. (Note that measures of ‘goodness-of-fit’ typically measure badness of fit!)

Turning to the independence model, the ‘goodness-of-fit’ of [ P ] [ O ] can be viewed as equivalent to the chi-square statistic, as this summarises the evidence of a relationship, and hence the evidence that the (null) hypothesis of independence, i.e. the independence model, is incorrect. In fact, it is the likelihood ratio chi-square statistic from SPSS output for a cross-tabulation which is relevant here. A chi-square test is thus, in effect, a comparison (and choice) between two possible models of a two-way cross-tabulation.

A multivariate analysis Suppose that one was interested in whether the extent of social mobility was changing over time (i.e. between birth cohorts). Then we would need to include in any model a third variable, i.e. birth cohort, represented by ‘C’.

A wider choice of models… For a three-way cross-tabulation, there are a greater number of possible hierarchical models of the cross-tabulation: The ‘independence model’ [ P ] [ O ] [ C ], The ‘saturated model’ [ POC ], which indicates that the relationship between parental class and child’s own class varies according to birth cohort, and…

…various other models in between these: [ PO ] [ C ] [ PC ] [ O ] [ OC ] [ P ] [ PO ] [ PC ] [ PO ] [ OC ] [ PC ] [ OC ] [ PO ] [ PC ] [ OC ]

How does one know which model is best? Each model has a chi-square-like ‘goodness-of- fit’ measure, often referred to as the model’s deviance, which can be used to test whether the observed data is significantly different from what one would expect to have seen given that model. In other words, to quantify how likely it is that the difference(s) between the observed data and the model’s predictions would have occurred simply as a consequence of sampling error.

The difference between the deviance values for two models can be used, in a similar way, to test whether the more complex of the two models fits significantly better. In other words, does the additional element of the model improve the model’s fit more than can reasonably be attributed to sampling error? So, ideally, the ‘best model’ fits the data in absolute terms, but also does not fit the data substantially less well than any more complex model does. [Note that the ‘saturated model’ fits by definition, and has a value of 0 for the deviance measure.]

…back to the example! If the (null) hypothesis of interest is that the extent of social mobility is not changing over time (i.e. between birth cohorts), then the most complex model corresponding to this is as follows: [ PO ] [ PC ] [ OC ] The question now becomes, does this fit better than the model that specifies change over time, namely: [ POC ]

Where does the deviance measure come from? The deviance of a model is calculated as: -2 log likelihood where ‘likelihood’ refers to the likelihood of the specified model having produced the observed data. However, it behaves much like a conventional chi-square statistic.

What about degrees of freedom? Each model deviance value has an associated number of degrees of freedom, relating to the various relationships between variables that are not included in the model. Hence the ‘saturated model’ has zero degrees of freedom. If the three variables, P, O and C had a, b and c categories, then the ‘independence model’ would have (a x b x c) – (a + b + c) + 2 degrees of freedom, e.g. 4 degrees of freedom if all the variables had two categories each.

Degrees of freedom for interactions If two variables with interact, e.g. [ PO ], then this interaction term within a model (assuming the variables had a and b categories respectively) would have: (a-1) x (b-1) degrees of freedom, i.e. the same number of degrees of freedom as the chi-square statistic for a two-way cross-tabulation with those numbers of rows and columns.

(Hierarchical) Log-Linear Models Friday 18 th March 2011.

Similar presentations

Presentation on theme: "(Hierarchical) Log-Linear Models Friday 18 th March 2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

(Hierarchical) Log-Linear Models Friday 18 th March 2011.

Similar presentations

Presentation on theme: "(Hierarchical) Log-Linear Models Friday 18 th March 2011."— Presentation transcript:

Similar presentations

About project

Feedback