University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression III/ (Hierarchical)

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Statistical Analysis SC504/HS927 Spring Term 2008
Topic 12 – Further Topics in ANOVA
(Hierarchical) Log-Linear Models Friday 18 th March 2011.
Chapter 13 Multiple Regression
Lecture 23: Tues., Dec. 2 Today: Thursday:
Chapter 12 Multiple Regression
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Interpreting published multivariate analyses (using logistic regression) Friday 11 th March 2011.
Multiple Regression Models
Ordinal Logistic Regression
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
Multiple Regression – Basic Relationships
STAT E-150 Statistical Methods
Hierarchical Binary Logistic Regression
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Soc 3306a Multiple Regression Testing a Model and Interpreting Coefficients.
Chapter 9 – Classification and Regression Trees
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
Logistic Regression. Conceptual Framework - LR Dependent variable: two categories with underlying propensity (yes/no) (absent/present) Independent variables:
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Difference Between Means Test (“t” statistic) Analysis of Variance (“F” statistic)
Time series Model assessment. Tourist arrivals to NZ Period is quarterly.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Logistic Regression Analysis Gerrit Rooks
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Difference Between Means Test (“t” statistic) Analysis of Variance (F statistic)
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Interpreting published multivariate analyses (using logistic regression) (Week 12) University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 3 Multivariate analysis.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 1-2 Wedding Attitude by Sex Cross-tabulation.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 6 Regression: ‘Loose Ends’
HW 23 Key. 24:41 Promotion. These data describe promotional spending by a pharmaceutical company for a cholesterol-lowering drug. The data cover 39 consecutive.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Stats Methods at IC Lecture 3: Regression.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Taking Part 2008 Multivariate analysis December 2008
BPK 304W Correlation Correlation Coefficient r Limitations of r
Analysis of Time Series Data
Chapter 14 Introduction to Multiple Regression
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Measuring association and inequality.
LOGISTIC REGRESSION 1.
Notes on Logistic Regression
Basic Estimation Techniques
Interpreting published multivariate analyses
The Correlation Coefficient (r)
Multiple Regression Analysis and Model Building
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means II: Nonparametric techniques.
Categorical Data Aims Loglinear models Categorical data
Multiple Regression – Part II
Difference Between Means Test (“t” statistic)
Basic Estimation Techniques
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Categorical Data Analysis Review for Final
Chapter 7: The Normality Assumption and Inference with OLS
Applied Economic Analysis
Multiple Regression – Split Sample Validation
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means I: (Extending) Analysis.
Chapter 13 Excel Extension: Now You Try!
The Correlation Coefficient (r)
Presentation transcript:

University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression III/ (Hierarchical) Log-Linear Models II (Week 9)

… the story so far… In the session last Wednesday we identified a number of factors that affected the odds of being an owner-occupier (in the Coventry area in the mid-1980s!) Living with a partner, age, highest Goldthorpe class, and household income all improved the fit of the model significantly.

Changes in model fit Change inDeviance model chi-sq. d.f.p(-2LL) Adding: Partner Age Class Income

The ‘changing’ effect of a partner Exp(B) (Odds ratio) for having partner Model Partner Age Age + Class Age + Class + Income6.35

But I don’t remember those odds ratios! When one adds the variables in a series of blocks, the sample used throughout excludes the missing cases for class and income. So the ‘starting point’ effect of having a partner is lower (i.e instead of 14.4) But some of the effect of a partner is still ‘explained’ by age and by income (and some was ‘suppressed’ before class was added!)

Is there an interaction between the effects of living with a partner and age? Looking at the hard copy handout, the B of for such an interaction is statistically significant (p < 0.001) Note that the three effects, i.e. of age, of living with a partner, and of their interaction, need to be considered together.

What do the Bs for main effects mean when an interaction is included? The B of (p < 0.001) for age means that where someone lives without a partner there is a significant age effect. But if we change the point of reference (reference category) to living with a partner, the ‘revised’ B of (p = 0.143) means that for someone in this situation the (log) odds of owner occupation do not increase significantly with rising age…

Is a linear model OK? Apart from in the model just including living with a partner, the Hosmer and Lemeshow test does not indicate that assuming that the impact of the explanatory variables on the log odds of owner occupation is linear is problematic. Hosmer and Lemeshow Test Chi-squaredfSig

Looking at the hard copy handout… We can see that quite a lot of the evidence of a class effect (as indicated by the Wald statistic) disappears when income is included … and the effects for the bottom three classes compared to the top class (i.e. the relevant odds ratios/Exp(B)’s) get smaller (closer to 1!) too…

The income effect! While this is clearly significant overall (p < 0.001), the Bs and Exp(B)s are difficult to interpret (and mostly relate to insignificant comparisons). This is because the reference category (i.e. the first category indicating the lowest income range) is small and has an effect which doesn’t fit in with the broader trend)

Using a different reference category As the hard copy handout shows, if we use the tenth income category as the point of reference (this can be achieved by changing ‘1’ to ‘10’ in the relevant command pasted to a syntax window), then the categories of income lower than this largely have lower (log) odds of owner occupation, and higher incomes largely have higher (log) odds of owner occupation!

Collapsing the income variable But a lot of the income comparisons are still not statistically significant, in part because some of the categories are quite small… Using less categories would lose information, but would make presenting the model easier… What happens if we collapse income into four broad ranges? (Up to 50, 50-99, and 160+ pounds per week)

Not much… The Wald statistic only drops by 10.0 (i.e – 34.4) for a reduction of 14 degrees of freedom. And, more importantly, the gain in model fit by adding (i.e. including) the ‘whole’ income variable is only 13.6 for 14 degrees of freedom (p = 0.477)

Similarly… If class is collapsed into I-IV, V-VI and VII, the improvement in fit through adding the ‘whole’ class variable is only 0.3 for 4 d.f. (p = 0.992). If age is collapsed into ranges of 20-29, 30-39, and 50-60, the improvement in fit through adding the ‘whole’ age variable is only 1.3 for 1 d.f. (p = 0.263).

What about fitting a log-linear model to the five variables? Now that we have five categorical variables, we could use a log-linear model to establish which are related to which, and to what degree of complexity… As the hard copy handout shows, this indicates the preferred, ‘best’ model is: [AC] [PC] [TC] [CI] [TPI] [PAI] [TA] A= Age; C= Class; P=Partner; I=Income; T=Tenure

Uh-oh! This suggests that there may be more of a case for an interaction between the effects of living with a partner and income than between the effects of living with a partner and age… Adding this second interaction indeed improves the logistic regression model fit significantly by 23.0 for 3 d.f. (p < 0.001)

Nevertheless… While the hard copy handout suggests that including the second interaction renders the other statistically non-significant, it still looks substantively plausible (and detail has been lost from the age variable…) The interaction between the income effect and the living with a partner effect is most easily interpreted with reference to the relevant three-way cross-tabulation (see the hard copy handout).