University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Sociology 690 Multivariate Analysis Log Linear Models.
DTC Quantitative Methods Regression II: Thursday 13 th March 2014.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Analysis of frequency counts with Chi square
(Hierarchical) Log-Linear Models Friday 18 th March 2011.
Econ 140 Lecture 171 Multiple Regression Applications II &III Lecture 17.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
DTC Quantitative Research Methods Three (or more) Variables: Extensions to Cross- tabular Analyses Thursday 13 th November 2014.
An Introduction to Logistic Regression
C. Logit model, logistic regression, and log-linear model A comparison.
Analyzing Data: Bivariate Relationships Chapter 7.
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Assessing Survival: Cox Proportional Hazards Model
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
FPP 28 Chi-square test. More types of inference for nominal variables Nominal data is categorical with more than two categories Compare observed frequencies.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Section 12.2: Tests for Homogeneity and Independence in a Two-Way Table.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.
Logistic Regression Analysis Gerrit Rooks
Chapter 14 – 1 Chi-Square Chi-Square as a Statistical Test Statistical Independence Hypothesis Testing with Chi-Square The Assumptions Stating the Research.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
Nonparametric Statistics
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 3 Multivariate analysis.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 6 Regression: ‘Loose Ends’
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression III/ (Hierarchical)
Methods of Presenting and Interpreting Information Class 9.
I. ANOVA revisited & reviewed
Introduction to Marketing Research
Nonparametric Statistics
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
BINARY LOGISTIC REGRESSION
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Hypothesis Testing.
Chi-Square Test A fundamental problem is genetics is determining whether the experimentally determined data fits the results expected from theory (i.e.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Measuring association and inequality.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 1 Bivariate Analysis with SPSS.
The Correlation Coefficient (r)
8. Association between Categorical Variables
Correlation – Regression
Hypothesis Testing Review
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means II: Nonparametric techniques.
Qualitative data – tests of association
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Multiple logistic regression
Nonparametric Statistics
AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Contingency Tables (cross tabs)
Categorical Data Analysis Review for Final
Chi-Square Goodness-of-Fit Tests
Overview and Chi-Square
Inference for Two Way Tables
UNIT V CHISQUARE DISTRIBUTION
S.M.JOSHI COLLEGE, HADAPSAR
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means I: (Extending) Analysis.
Chi-Square Test A fundamental problem in Science is determining whether the experiment data fits the results expected. How can you tell if an observed.
Hypothesis testing Imagine that we know that the mean income of university graduates is £16,500. We then do a survey of 64 sociology graduates and find.
Inference for Distributions of Categorical Data
The Correlation Coefficient (r)
What is Chi-Square and its used in Hypothesis? Kinza malik 1.
Presentation transcript:

University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical) Log-Linear Models I

‘Multiple’ Logistic Regression log odds = C + (0.546 x SEX) log odds = C + (0.461 x SEX) + (-0.099 x AGE) For B1 = 0.546, p = 0.000 < 0.05 For B1 = 0.461, p = 0.000 < 0.05 For B2 = -0.099, p = 0.000 < 0.05 Exp(B1) = Exp(0.546) = 1.73 Exp(B1) = Exp(0.461) = 1.59 Exp(B2) = Exp(-0.099) = 0.905

The odds ratio comparing men with women and controlling for age is 1 The odds ratio comparing men with women and controlling for age is 1.59, less than the original value of 1.73. Thus some, but not all, of the gender difference in having (any) teeth can be accounted for in terms of age. The odds ratio of 0.905 for age indicates that the odds of having any teeth decrease by more than 9% for each extra year of age (since 1 - 0.905 = 0.095 = 9.5%). In other words the odds of having no teeth increase by over 10% for each extra year of age! (since 1/0.905 = 1.105).

Adding Father’s Class B p Exp(B) Sex .471 .000 1.602 Age -.097 .000 .908 Father’s Class .000 ‘None’ vs IV/V .504 .007 1.656 I/II vs IV/V 1.374 .000 3.950 III NM vs IV/V 1.432 .000 4.187 III M vs IV/V .463 .008 1.588 Constant 6.132

Adding Own Class B p Exp(B) Father’s class .002 ‘None’ vs IV/V .342 .075 1.407 I/II vs IV/V .957 .000 2.603 III NM vs IV/V .974 .019 2.648 III M vs IV/V .315 .079 1.370 Own class .000 ‘None’ vs IV/V .591 .052 1.805 I/II vs IV/V 1.474 .000 4.366 III NM vs IV/V 1.189 .000 3.284 III M vs IV/V .416 .003 1.515 Constant 5.736

Model Fit -2 Log Cox & Snell Nagelkerke Model 1 4275.6 .010 .017 Likelihood R Square R Square Model 1 4275.6 .010 .017 Model 2 2852.0 .266 .446 Model 3 2810.0 .273 .457 Model 4 2667.8 .294 .493

Model Fit Change LR chi-square d.f. p-value (Model 0 to) Model 1 48.1 1 0.000 Model 1 to Model 2 1423.6 1 0.000 Model 2 to Model 3 42.0 4 0.000 Model 3 to Model 4 142.2 4 0.000

Hierarchical log-linear models These are models which are applied to multi-way cross-tabulations, and hence categorical data They focus on the presence or absence of relationships between the variables defining the cross-tabulation More sophisticated models can also take into account the form of relationship that exists between two variables, but we will not consider these models in this module…

A standard form of notation for (hierarchical) log-linear models labels each variable with a letter, and places the effects of/relationships between these variables within square brackets. Suppose, for example, the topic of interest is intergenerational social class mobility. If parental class is labelled ‘P’ and a child’s own social class is labelled ‘O’, then, within a model: [ P ] would indicate the inclusion of the parental class variable, [ PO ] would indicate a relationship between parental class and child’s own class.

A bivariate analysis Bivariate (hierarchical) log-linear models are of limited interest, but for illustrative purposes, there are two models of a two-way cross-tabulation: [ P ] [ O ], the ‘independence model’, which indicates that the two variables are unrelated. [ PO ], an example of a ‘saturated model’, wherein all of the variables are related to each other simultaneously (i.e. in this simplest form of saturated model, the two variables are related).

‘Goodness (or ‘badness’)-of-fit The model [ PO ] is consistent with any observed relationship in a cross-tabulation, and hence, by definition, fits the observed data perfectly. It is therefore said to have a ‘goodness-of-fit’ value of 0. (Note that measures of ‘goodness-of-fit’ typically measure badness of fit!)

Turning to the independence model, the ‘goodness-of-fit’ of [ P ] [ O ] can be viewed as equivalent to the chi-square statistic, as this summarises the evidence of a relationship, and hence the evidence that the (null) hypothesis of independence, i.e. the independence model, is incorrect. In fact, it is the likelihood ratio chi-square statistic from SPSS output for a cross-tabulation which is relevant here. A chi-square test is thus, in effect, a comparison (and choice) between two possible models of a two-way cross-tabulation.

A multivariate analysis Suppose that one was interested in whether the extent of social mobility was changing over time (i.e. between birth cohorts). Then we would need to include in any model a third variable, i.e. birth cohort, represented by ‘C’.

A wider choice of models… For a three-way cross-tabulation, there are a greater number of possible hierarchical models of the cross-tabulation: The ‘independence model’ [ P ] [ O ] [ C ], The ‘saturated model’ [ POC ], which indicates that the relationship between parental class and child’s own class varies according to birth cohort, and…

…various other models in between these: [ PO ] [ C ] [ PC ] [ O ] [ OC ] [ P ] [ PO ] [ PC ] [ PO ] [ OC ] [ PC ] [ OC ] [ PO ] [ PC ] [ OC ]

How does one know which model is best? Each model has a chi-square-like ‘goodness-of-fit’ measure, often referred to as the model’s deviance, which can be used to test whether the observed data is significantly different from what one would expect to have seen given that model. In other words, to quantify how likely it is that the difference(s) between the observed data and the model’s predictions would have occurred simply as a consequence of sampling error.

The difference between the deviance values for two models can be used, in a similar way, to test whether the more complex of the two models fits significantly better. In other words, does the additional element of the model improve the model’s fit more than can reasonably be attributed to sampling error? So, ideally, the ‘best model’ fits the data in absolute terms, but also does not fit the data substantially less well than any more complex model does. [Note that the ‘saturated model’ fits by definition, and has a value of 0 for the deviance measure.]

…back to the example! If the (null) hypothesis of interest is that the extent of social mobility is not changing over time (i.e. between birth cohorts), then the most complex model corresponding to this is as follows: [ PO ] [ PC ] [ OC ] The question now becomes, does this fit better than the model that specifies change over time, namely: [ POC ]

Where does the deviance measure come from? The deviance of a model is calculated as: -2 log likelihood where ‘likelihood’ refers to the likelihood of the specified model having produced the observed data. However, it behaves much like a conventional chi-square statistic.

What about degrees of freedom? Each model deviance value has an associated number of degrees of freedom, relating to the various relationships between variables that are not included in the model. Hence the ‘saturated model’ has zero degrees of freedom. If the three variables, P, O and C had a, b and c categories, then the ‘independence model’ would have (a x b x c) – (a + b + c) + 2 degrees of freedom, e.g. 4 degrees of freedom if all the variables had two categories each.

Degrees of freedom for interactions If two variables with interact, e.g. [ PO ], then this interaction term within a model (assuming the variables had a and b categories respectively) would have: (a-1) x (b-1) degrees of freedom, i.e. the same number of degrees of freedom as the chi-square statistic for a two-way cross-tabulation with those numbers of rows and columns.

Examples Do gender (sex) differences in attitude towards the domestic division of labour vary according to social class? Did British society become more open in the late 19th/early 20th Century? (judged in terms of the inter-mixing of class backgrounds between brides and grooms marrying in different years).

The answer is ‘Yes’ in each case! The model [CSA] fits significantly better than the model [CS][CA][SA] (p = 0.046) The model [BGY] fits significantly better than the model [BG][BY][GY] (p < 0.001)