Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Correlational and Differential Research
Sociology 680 Multivariate Analysis Logistic Regression.
Richard Williams (with assistance from Cheng Wang) Notre Dame Sociology August 2012 Annual Meetings of the.
Brief introduction on Logistic Regression
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Binary Logistic Regression: One Dichotomous Independent Variable
Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana, Slovenia.
FIN822 Li11 Binary independent and dependent variables.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Multiple Regression Fenster Today we start on the last part of the course: multivariate analysis. Up to now we have been concerned with testing the significance.
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Richard Williams Department of Sociology University.
Moderation: Assumptions
Sociology 601 Class 13: October 13, 2009 Measures of association for tables (8.4) –Difference of proportions –Ratios of proportions –the odds ratio Measures.
Models with Discrete Dependent Variables
Generalized Ordered Logit Models Part II: Interpretation
Using Stata’s Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects Richard Williams
Multiple Linear Regression Model
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Excellence Justify the choice of your model by commenting on at least 3 points. Your comments could include the following: a)Relate the solution to the.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Topic 3: Regression.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
An Introduction to Logistic Regression
Part 18: Regression Modeling 18-1/44 Statistics and Data Analysis Professor William Greene Stern School of Business IOMS Department Department of Economics.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Multiple Regression 2 Sociology 5811 Lecture 23 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
Objectives of Multiple Regression
9. Binary Dependent Variables 9.1 Homogeneous models –Logit, probit models –Inference –Tax preparers 9.2 Random effects models 9.3 Fixed effects models.
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
Chapter 10 Hypothesis Testing
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.
Moderation & Mediation
Using the Margins Command to Estimate and Interpret Adjusted Predictions and Marginal Effects Richard Williams
Correlation and Linear Regression. Evaluating Relations Between Interval Level Variables Up to now you have learned to evaluate differences between the.
Estimating Heterogeneous Choice Models with Stata Richard Williams Notre Dame Sociology West Coast Stata Users Group Meetings October 25,
Associate Professor Arthur Dryver, PhD School of Business Administration, NIDA url:
Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Multiple Regression 3 Sociology 5811 Lecture 24 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Overview of Regression Analysis. Conditional Mean We all know what a mean or average is. E.g. The mean annual earnings for year old working males.
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Chapter 8: Simple Linear Regression Yang Zhenlin.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Qualitative and Limited Dependent Variable Models ECON 6002 Econometrics Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Nonparametric Statistics
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
EHS Lecture 14: Linear and logistic regression, task-based assessment
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Nonparametric Statistics
CHAPTER 26: Inference for Regression
Gologit2: Generalized Logistic Regression/ Partial Proportional Odds Models for Ordinal Dependent Variables Part 1: The gologit model & gologit2 program.
Generalized Ordered Logit Models Part II: Interpretation
Chapter 7: The Normality Assumption and Inference with OLS
Presentation transcript:

Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July

The gologit/gologit2 model  The gologit (generalized ordered logit) model can be written as  The unconstrained model gives results that are similar to running a series of logistic regressions, where first it is category 1 versus all others, then categories 1 & 2 versus all others, then 1, 2 & 3 versus all others, etc.  The unconstrained model estimates as many parameters as mlogit does, and tends to yield very similar fits.

 The much better known ordered logit (ologit) model is a special case of the gologit model, where the betas are the same for each j (NOTE: ologit actually reports cut points, which equal the negatives of the alphas used here)

 The partial proportional odds models is another special case – some but not all betas are the same across values of j. For example, in the following the betas for X1 and X2 are constrained but the betas for X3 are not.

Key advantages of gologit2  Can estimate models that are less restrictive than ologit (whose assumptions are often violated)  Can estimate models (i.e. partial proportional odds) that are more parsimonious than non- ordinal alternatives, such as mlogit  HOWEVER, there are also several potential concerns users may not be aware of or have not thought about

Concern 1: Unconstrained model does not require ordinality  As Clogg & Shihadeh (1994) point out, the totally unconstrained model arguably isn’t even ordinal  You can rearrange the categories, and fit can be hardly affected  If a totally unconstrained model is the only one that fits, it may make more sense to use mlogit  Gologit is mostly useful when you get a non-trivial # of constraints.

Concern II: Estimated probabilities can go negative  Unlike other categorical models, estimated probabilities can be negative.  This was addressed by McCullaph & Nelder, Generalized Linear Models, 2nd edition, 1989, p. 155: “The usefulness of non-parallel regression models is limited to some extent by the fact that the lines must eventually intersect. Negative fitted values are then unavoidable for some values of x, though perhaps not in the observed range. If such intersections occur in a sufficiently remote region of the x-space, this flaw in the model need not be serious.”

 Probabilities might go negative in unlikely or impossible X ranges, e.g. when years of education is negative or hourly wages are > $5 million. But, it could also happen with more plausible sets of values  Multiple tests with 10s of thousands of cases typically resulted in only 0 to 3 negative predicted probabilities.  Seems most problematic with small samples, complicated models, analyses where the data are being spread very thin they might be troublesome regardless - gologit2 could help expose problems that might otherwise be overlooked  Can also get negative predicted probabilities when measurement of the outcome isn’t actually ordinal

 gologit2 now checks to see if any in-sample predicted probabilities are negative. It is still possible that plausible values not in- sample could produce negative predicted probabilities.  You may want to use some other method if there are a non-trivial number of negative predicted probabilities and you are otherwise confident in your models and data.

Concern III: How do you interpret the results???  One rationale for ordinal regression models is that there is an underlying, continuous y* that reflects the dependent variable we are interested in.  y* is unobserved, however. Instead, we observe y, which is basically a collapsed/grouped version of the unobserved y*. High Income, Moderate Income and Low Income are a collapsed version of a continuous Income variable Some ranges of attitudes can be collapsed into a 5 category scale ranging from Strongly Disagree to Strongly Agree  As individuals cross thresholds (aka cut-points) on y*, their value on the observed y changes

 Question: What does the gologit model mean for the behavior we are modeling? Does it mean the slopes of the latent regression are functions of the left hand side variable, that there is some sort of interaction effect between x and y? i.e. y* = beta1'x + e if y = 1 y* = beta2'x + e if y = 2

 Further, does the whole idea of an underlying y* go out the window once you allow a single non-proportional effect? If so, how do you interpret the model? In an ordered logit (ologit) model, you only have one predicted value for y* But in a gologit model, once you have a single non-parallel effect, you have M-1 linear predictions (similar to mlogit)

Interpretation 1: gologit as non-linear probability model  As Long & Freese (2006, p. 187) point out “The ordinal regression model can also be developed as a nonlinear probability model without appealing to the idea of a latent variable.”  Ergo, the simplest thing may just be to interpret gologit as a non-linear probability model that lets you estimate the determinants & probability of each outcome occurring. Forget about the idea of a y*  Other interpretations, however, can preserve or modify the idea of an underlying y*

Interpretation 2: State-dependent reporting bias - gologit as measurement model  As noted, the idea behind y* is that there is an unobserved continuous variable that gets collapsed into the limited number of categories for the observed variable y.  HOWEVER, respondents have to decide how that collapsing should be done, e.g. they have to decide whether their feelings cross the threshold between “agree” and “strongly agree,” whether their health is “good” or “very good,” etc.

 Respondents do NOT necessarily use the same frame of reference when answering, e.g. the elderly may use a different frame of reference than the young do when assessing their health  Other factors can also cause respondents to employ different thresholds when describing things Some groups may be more modest in describing their wealth, IQ or other characteristics

 In these cases the underlying latent variable may be the same for all groups; but the thresholds/cut points used may vary. Example: an estimated gender effect could reflect differences in measurement across genders rather than a real gender effect on the outcome of interest.  Lindeboom & Doorslaer (2004) note that this has been referred to as state-dependent reporting bias, scale of reference bias, response category cut-point shift, reporting heterogeneity & differential item functioning.

 If the difference in thresholds is constant (index shift), proportional odds will still hold EX: Women’s cutpoints are all a half point higher than the corresponding male cutpoints ologit could be used in such cases  If the difference is not constant (cut point shift), proportional odds will be violated EX: Men and women might have the same thresholds at lower levels of pain but have different thresholds for higher levels A gologit/ partial proportional odds model can capture this

 If you are confident that some apparent effects reflect differences in measurement rather than real differences in effects, then Cutpoints (and their determinants) are substantively interesting, rather than just “nuisance” parameters The idea of an underlying y* is preserved (Determinants of y* are the same for all, but cutpoints differ across individuals and groups) You should change the way predicted values are computed, i.e. you should just drop the measurement parameters when computing predictions (I think!)

 Key advantage: This could greatly improve cross-group comparisons, getting rid of artifactual differences caused by differences in measurement.  Key Concern: Can you really be sure the coefficients reflect measurement and not real effects, or some combination of real & measurement effects?

 Theory may help – if your model strongly claims the effect of gender should be zero, then any observed effect of gender can be attributed to measurement differences.  But regardless of what your theory says, you may at least want to acknowledge the possibility that apparent effects could be “real” or just measurement artifacts.

Interpretation 3: The outcome is multi-dimensional  A variable that is ordinal in some respects may not be ordinal or else be differently- ordinal in others. E.g. variables could be ordered either by direction (Strongly disagree to Strongly Agree) or intensity (Indifferent to Feel Strongly)

 Suppose women tend to take less extreme political positions than men. Using the first (directional) coding, an ordinal model might not work very well, whereas it could work well with the 2nd (intensity) coding. But, suppose that for every other independent variable the directional coding works fine in an ordinal model.

 Our choices in the past have either been to (a) run ordered logit, with the model really not appropriate for the gender variable, or (b) run multinomial logit, ignoring the parsimony of the ordinal model just because one variable doesn’t work with it.  With gologit models, we have option (c) – constrain the vars where it works to meet the parallel lines assumption, while freeing up other vars (e.g. gender) from that constraint.

 This interpretation suggests that there may actually be multiple y*’s that give rise to a single observed y  NOTE: This is very similar to the rationale for the multidimensional stereotype logit model estimated by slogit.

Interpretation 4: The effect of x on y does depend on the value of y  There are actually many situations where the effect of x on y is going to vary across the range of y. EX: A 1-unit increase in x produces a 5% increase in y So, if y = $10,000, the increase will be $500. But if y = $100,000, the increase will be $5,000.

 If we were using OLS, we might address this issue by transforming y, e.g. takes its log, so that the effect of x was linear and the same across all values of the transformed y.  But with ordinal methods, we can’t easily transform an unobserved latent variable; so with gologit we allow the effect of x to vary across values of y.  This suggests that there is an underlying y*; but because we can’t observe or transform it we have to allow the regression coefficients to vary across values of y instead.

 Substantive example: Boes & Winkelman, 2004: Completely missing so far is any evidence whether the magnitude of the income effect depends on a person’s happiness: is it possible that the effect of income on happiness is different in different parts of the outcome distribution? Could it be that “money cannot buy happiness, but buy-off unhappiness” as a proverb says? And if so, how can such distributional effects be quantified?

One last methodological note on using gologit2  Despite its name, gologit2 actually supports 5 link functions: logit, probit, log-log, complementary log-log, & Cauchit. Each of these has a somewhat different distribution, differing, for example, in how heavy the tails are and how likely it is you will get extreme values.  Changing the link function may change whether or not a variable meets the parallel lines assumption.  Ergo, before turning to more complicated models and interpretations, you may want to try out different link functions to see if one of them makes it more likely that the parallel lines assumption will hold.

An Alternative to gologit: Heterogeneous Choice (aka Location-Scale) Models  Heterogeneous choice (aka location-scale) models can be generalized for use with either ordinal or binary dependent variables. They can be estimated in Stata by using Williams’ oglm program. (Also see handout p. 3). For a binary outcome,

 The logit & ordered logit models assume sigma is the same for all individuals  Allison (1999) argues that sigma often differs across groups (e.g. women have more heterogeneous career patterns). Unlike OLS, failure to account for this results in biased parameter estimates.  Williams (2006) shows that Allison’s proposed solution for dealing with across-group differences is actually a special case of the heterogeneous choice model, and can be estimated (and improved upon) by using oglm.

 Heterogeneous choice models may also provide an attractive alternative to gologit models Model fits, predicted values and ultimate substantive conclusions are sometimes similar Heterogeneous choice models are more widely known and may be easier to justify and explain, both methodologically & theoretically

Example:  (Adapted from Long & Freese, 2006 – Data from the 1977 & 1989 General Social Survey)  Respondents are asked to evaluate the following statement: “A working mother can establish just as warm and secure a relationship with her child as a mother who does not work.” 1 = Strongly Disagree (SD) 2 = Disagree (D) 3 = Agree (A) 4 = Strongly Agree (SA).

 Explanatory variables are yr89 (survey year; 0 = 1977, 1 = 1989) male (0 = female, 1 = male) white (0 = nonwhite, 1 = white) age (measured in years) ed (years of education) prst (occupational prestige scale).

 See handout pages 2-3 for Stata output  For ologit, chi-square is with 6 d.f. Both gologit2 ( with 10 d.f.) and oglm ( with 8 d.f.) fit much better. The BIC test picks oglm as the best-fitting model.  The corresponding predicted probabilities from oglm and gologit all correlate at.99 or higher.

 The marginal effects (handout p. 4) show that the heterogeneous choice and gologit models agree (unlike ologit) that the main reason attitudes became more favorable across time was because people shifted from extremely negative positions to more moderate positions NOTE: In Stata, marginal effects for multiple outcome models are easily estimated and formatted for output by using Williams’s mfx2 program in conjunction with programs like estout and outreg2.  oglm & gologit also agree that it isn’t so much that men were extremely negative in their attitudes; it is more a matter of them being less likely than women to be extremely supportive.

 In the oglm printout, the negative coefficients in the variance equation for yr89 and male show that there was less variability in attitudes in 1989 than in 1977, and that men were less variable in their attitudes than women. This is substantively interesting and relatively easy to explain

 Empirically, you’d be hard pressed to choose between oglm and gologit in this case  Theoretical issues or simply ease and clarity of presentation might lead you to prefer oglm However, see Williams (2006) and Keele & Park (2006) for potential problems and pitfalls with the heterogeneous choice model  Of course, in other cases gologit models may be clearly preferable

For more information, see: