(Hierarchical) Log-Linear Models Friday 18 th March 2011.

Slides:



Advertisements
Similar presentations
Sociology 690 Multivariate Analysis Log Linear Models.
Advertisements

DTC Quantitative Methods Regression II: Thursday 13 th March 2014.
Chapter 11 Inference for Distributions of Categorical Data
Chapter 13: The Chi-Square Test
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Correlation Chapter 9.
Categorical Data. To identify any association between two categorical data. Example: 1,073 subjects of both genders were recruited for a study where the.
Analysis of frequency counts with Chi square
Log-linear Analysis - Analysing Categorical Data
Chapter18 Determining and Interpreting Associations Among Variables.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
DTC Quantitative Research Methods Three (or more) Variables: Extensions to Cross- tabular Analyses Thursday 13 th November 2014.
An Introduction to Logistic Regression
PSY 307 – Statistics for the Behavioral Sciences Chapter 19 – Chi-Square Test for Qualitative Data Chapter 21 – Deciding Which Test to Use.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Analyzing Data: Bivariate Relationships Chapter 7.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Categorical Data Prof. Andy Field.
Correlation and Linear Regression
DTC Quantitative Research Methods Comparing Means II: Nonparametric Tests and Bivariate and Multivariate Analysis of Variance (ANOVA) Thursday 20 th November.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS Introduction.
Testing Hypotheses about Differences among Several Means.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
+ Chi Square Test Homogeneity or Independence( Association)
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Chapter 11: Chi-Square  Chi-Square as a Statistical Test  Statistical Independence  Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Leftover Slides from Week Five. Steps in Hypothesis Testing Specify the research hypothesis and corresponding null hypothesis Compute the value of a test.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi-Square Analyses.
Chapter 13- Inference For Tables: Chi-square Procedures Section Test for goodness of fit Section Inference for Two-Way tables Presented By:
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
Nonparametric Statistics
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 3 Multivariate analysis.
Determining and Interpreting Associations between Variables Cross-Tabs Chi-Square Correlation.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Making Use of Associations Tests
Hypothesis Testing Review
Categorical Data Aims Loglinear models Categorical data
Qualitative data – tests of association
Interactions & Simple Effects finding the differences
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means I: (Extending) Analysis.
Presentation transcript:

(Hierarchical) Log-Linear Models Friday 18 th March 2011

Hierarchical log-linear models These are models which are applied to multi- way cross-tabulations, and hence categorical data They focus on the presence or absence of relationships between the variables defining the cross-tabulation More sophisticated models can also take into account the form of relationship that exists between two variables, but we will not consider these models in this module…

A standard form of notation for (hierarchical) log- linear models labels each variable with a letter, and places the effects of/relationships between these variables within square brackets. Suppose, for example, the topic of interest is intergenerational social class mobility. If parental class is labelled ‘P’ and a child’s own social class is labelled ‘O’, then, within a model: [ P ] would indicate the inclusion of the parental class variable, [ PO ] would indicate a relationship between parental class and child’s own class.

A bivariate analysis Bivariate (hierarchical) log-linear models are of limited interest, but for illustrative purposes, there are two models of a two-way cross- tabulation: [ P ] [ O ], the ‘independence model’, which indicates that the two variables are unrelated. [ PO ], an example of a ‘saturated model’, wherein all of the variables are related to each other simultaneously (i.e. in this simplest form of saturated model, the two variables are related).

‘Goodness (or ‘badness’)-of-fit The model [ PO ] is consistent with any observed relationship in a cross- tabulation, and hence, by definition, fits the observed data perfectly. It is therefore said to have a ‘goodness-of- fit’ value of 0. (Note that measures of ‘goodness-of-fit’ typically measure badness of fit!)

Turning to the independence model, the ‘goodness-of-fit’ of [ P ] [ O ] can be viewed as equivalent to the chi-square statistic, as this summarises the evidence of a relationship, and hence the evidence that the (null) hypothesis of independence, i.e. the independence model, is incorrect. In fact, it is the likelihood ratio chi-square statistic from SPSS output for a cross-tabulation which is relevant here. A chi-square test is thus, in effect, a comparison (and choice) between two possible models of a two-way cross-tabulation.

A multivariate analysis Suppose that one was interested in whether the extent of social mobility was changing over time (i.e. between birth cohorts). Then we would need to include in any model a third variable, i.e. birth cohort, represented by ‘C’.

A wider choice of models… For a three-way cross-tabulation, there are a greater number of possible hierarchical models of the cross-tabulation: The ‘independence model’ [ P ] [ O ] [ C ], The ‘saturated model’ [ POC ], which indicates that the relationship between parental class and child’s own class varies according to birth cohort, and…

…various other models in between these: [ PO ] [ C ] [ PC ] [ O ] [ OC ] [ P ] [ PO ] [ PC ] [ PO ] [ OC ] [ PC ] [ OC ] [ PO ] [ PC ] [ OC ]

How does one know which model is best? Each model has a chi-square-like ‘goodness-of- fit’ measure, often referred to as the model’s deviance, which can be used to test whether the observed data is significantly different from what one would expect to have seen given that model. In other words, to quantify how likely it is that the difference(s) between the observed data and the model’s predictions would have occurred simply as a consequence of sampling error.

The difference between the deviance values for two models can be used, in a similar way, to test whether the more complex of the two models fits significantly better. In other words, does the additional element of the model improve the model’s fit more than can reasonably be attributed to sampling error? So, ideally, the ‘best model’ fits the data in absolute terms, but also does not fit the data substantially less well than any more complex model does. [Note that the ‘saturated model’ fits by definition, and has a value of 0 for the deviance measure.]

…back to the example! If the (null) hypothesis of interest is that the extent of social mobility is not changing over time (i.e. between birth cohorts), then the most complex model corresponding to this is as follows: [ PO ] [ PC ] [ OC ] The question now becomes, does this fit better than the model that specifies change over time, namely: [ POC ]

Where does the deviance measure come from? The deviance of a model is calculated as: -2 log likelihood where ‘likelihood’ refers to the likelihood of the specified model having produced the observed data. However, it behaves much like a conventional chi-square statistic.

What about degrees of freedom? Each model deviance value has an associated number of degrees of freedom, relating to the various relationships between variables that are not included in the model. Hence the ‘saturated model’ has zero degrees of freedom. If the three variables, P, O and C had a, b and c categories, then the ‘independence model’ would have (a x b x c) – (a + b + c) + 2 degrees of freedom, e.g. 4 degrees of freedom if all the variables had two categories each.

Degrees of freedom for interactions If two variables with interact, e.g. [ PO ], then this interaction term within a model (assuming the variables had a and b categories respectively) would have: (a-1) x (b-1) degrees of freedom, i.e. the same number of degrees of freedom as the chi-square statistic for a two-way cross-tabulation with those numbers of rows and columns.