Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Sociology 680 Multivariate Analysis Logistic Regression.
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Models with Discrete Dependent Variables
Data Analysis Statistics. Inferential statistics.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Chi-square Test of Independence
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
An Introduction to Logistic Regression
Data Analysis Statistics. Inferential statistics.
Introduction to Linear and Logistic Regression. Basic Ideas Linear Transformation Finding the Regression Line Minimize sum of the quadratic residuals.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Multiple Regression – Basic Relationships
Correlation and Regression Analysis
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Non-parametric statistics
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Correlation & Regression
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Lecture 16 Correlation and Coefficient of Correlation
Selecting the Correct Statistical Test
Hypothesis Testing:.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Chapter 15 Correlation and Regression
Generalized Linear Model (GZLM): Overview. Dependent Variables Continuous Discrete  Dichotomous  Polychotomous  Ordinal  Count.
Chi-square Test of Independence Steps in Testing Chi-square Test of Independence Hypotheses.
Regression Analysis. Scatter plots Regression analysis requires interval and ratio-level data. To see if your data fits the models of regression, it is.
Correlation is a statistical technique that describes the degree of relationship between two variables when you have bivariate data. A bivariate distribution.
Examining Relationships in Quantitative Research
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
SW388R6 Data Analysis and Computers I Slide 1 Multiple Regression Key Points about Multiple Regression Sample Homework Problem Solving the Problem with.
Multiple Regression BPS chapter 28 © 2006 W.H. Freeman and Company.
Thursday August 29, 2013 The Z Transformation. Today: Z-Scores First--Upper and lower real limits: Boundaries of intervals for scores that are represented.
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Interpreting multivariate OLS and logit coefficients Jane E. Miller, PhD.
Going from data to analysis Dr. Nancy Mayo. Getting it right Research is about getting the right answer, not just an answer An answer is easy The right.
Chapter 14 Correlation and Regression
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 8. Parameter Estimation Using Confidence Intervals.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Advanced Statistical Methods: Continuous Variables REVIEW Dr. Irina Tomescu-Dubrow.
Logistic Regression Analysis Gerrit Rooks
Applied Quantitative Analysis and Practices LECTURE#30 By Dr. Osman Sadiq Paracha.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
Categorical Variables in Regression
Nonparametric Statistics
Regression Analysis.
BINARY LOGISTIC REGRESSION
Logistic Regression APKC – STATS AFAC (2016).
Bivariate & Multivariate Regression Analysis
THE LOGIT AND PROBIT MODELS
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Nonparametric Statistics
Correlations: testing linear relationships between two metric variables Lecture 18:
Logistic Regression.
Regression and Categorical Predictors
Presentation transcript:

Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)

Ordinal Logistic Regression Also known as the “ordinal logit,” “ordered polytomous logit,” “constrained cumulative logit,” “proportional odds,” “parallel regression,” or “grouped continuous model” Generalization of binary logistic regression to an ordinal DV  When applied to a dichotomous DV identical to binary logistic regression

Ordinal Variables Three or more ordered categories Sometimes called “ordered categorical” or “ordered polytomous” variables

Ordinal DVs Job satisfaction:  very dissatisfied, somewhat dissatisfied, neutral, somewhat satisfied, or very satisfied Severity of child abuse injury:  none, mild, moderate, or severe Willingness to foster children with emotional or behavioral problems:  least acceptable, willing to discuss, or most acceptable

Single (Dichotomous) IV Example DV = satisfaction with foster care agencies  (1) dissatisfied; (2) neither satisfied nor dissatisfied; (3) satisfied IV = agencies provided sufficient information about the role of foster care workers  0 (no) or 1 (yes) N = 300 foster mothers

Single (Dichotomous) IV Example (cont’d) Are foster mothers who report that they were provided sufficient information about the role of foster care workers more satisfied with their foster care agencies?

Crosstabulation Table 4.1 Relationship between information and satisfaction is statistically significant [  2 (2, N = 300) = 23.52, p <.001]

Cumulative Probability Ordinal logistic regression focuses on cumulative probabilities of the DV and odds and ORs based on cumulative probabilities.  By cumulative probability we mean the probability that the DV is less than or equal to a particular value (e.g., 1, 2, or 3 in our example).

Cumulative Probabilities Dissatisfied  Insufficient Info:.2857  Sufficient Info:.1151 Dissatisfied or neutral  Insufficient Info:.5590 ( )  Sufficient Info:.2878 ( ) Dissatisfied, neutral, or satisfied  Insufficient Info: 1.00 ( )  Sufficient Info: 1.00 ( )

Cumulative Odds Probability that the DV is less than or equal to a particular value is compared to (divided by) the probability that it is greater than that value  Reverse of what you do in binary and multinomial logistic regression  Probability that the DV is 1 (dissatisfied) vs. the probability that it is either 2 or 3 (neutral or satisfied); probability that the DV is 1 or 2 (dissatisfied or neutral) vs. the probability that it is 3 (satisfied)

Cumulative Odds & Odds Ratios Odds of being dissatisfied (vs. neutral or satisfied)  Insufficient Info:.4000 (.2857 / [ ])  Sufficient Info:.1301 (.1151 / [ ])  OR =.33 (.1301 /.4000) (-67%) Odds of being dissatisfied or neutral (vs. satisfied)  Insufficient Info: (.5590 / [ ])  Sufficient Info:.4041 (.2878 / [ ])  OR =.32 (.4041 / ) (-68%)

Question & Answer Are foster mothers who report that they were provided sufficient information about the role of foster care workers more satisfied with their foster care agencies? The odds of being dissatisfied (vs. being neutral or satisfied) are.33 times (67%) smaller for mothers who received sufficient information. The odds of being dissatisfied or neutral (vs. being satisfied) are.32 times (68%) smaller for mothers who received sufficient information.

Ordinal Logistic Regression Set of binary logistic regression models estimated simultaneously (like multinomial logistic regression)  Number of non-redundant binary logistic regression equations equals the number of categories of the DV minus one Focus on cumulative probabilities and odds, and ORs are computed from cumulative odds (unlike multinomial logistic regression)

Threshold Suppose our three-point variable is a rough measure of an underlying continuous satisfaction variable. At a certain point on this continuous variable the population threshold (symbolized by τ, the Greek letter tau), that is a person’s level of satisfaction, goes from one value to another on the ordinal measure of satisfaction. e.g., the first threshold (τ 1 ) would be the point at which the level of satisfaction goes from dissatisfied to neutral (i.e., 1 to 2), and the second threshold (τ 2 ) would be the point at which the level of satisfaction goes from neutral to satisfied (i.e., 2 to 3).

Threshold (cont’d) The number of thresholds is always one fewer than the number of values of the DV. Usually thresholds are of little interest except in the calculation of estimated values. Thresholds typically are used in place of the intercept to express the ordinal logistic regression model

Estimated Cumulative Logits L (Dissatisfied vs. Neutral/Satisfied) = t 1 - BX L (Dissatisfied/Neutral vs. Satisfied) = t 2 – BX Table 4.2 L (Dissatisfied vs. Neutral/Satisfied) = – 1.139X L (Dissatisfied/Neutral vs. Satisfied) =.235 – 1.139X

Estimated Cumulative Logits (cont’d) Each equation has a different threshold (e.g., t 1 and t 2 ) One common slope (B).  It is assumed that the effect of the IVs is the same for different values of the DV (“parallel regression” assumption) Slope is multiplied by a value of the IV and subtracted from, not added to, the threshold.

Statistical Significance Table 4.2   (Info) = 0 Reject

Estimated Cumulative Logits (X = 1) L (Dissatisfied vs. Neutral/Satisfied) = = – (1.139)(1) L (Dissatisfied/Neutral vs. Satisfied) = =.235 – (1.139)(1)

Effect of Information on Satisfaction (Cumulative Logits)

Cumulative Logits to Cumulative Odds (X = 1) L (Dissatisfied vs. Neutral/Satisfied) = e =.129 L (Dissatisfied/Neutral vs. Satisfied) = e =.405

Effect of Information on Satisfaction (Cumulative Odds)

Cumulative Logits to Cumulative Probabilities (X = 1) (cont’d)

Effect of Information on Satisfaction (Cumulative Probabilities)

Odds Ratio Reverse the sign of the slope and exponentiate it. e.g., OR equals.31, calculated as e In contrast to binary logistic regression, in which odds are calculated as a ratio of probabilities for higher to lower values of the DV (odds of 1 vs. 0), in ordinal logistic regression it is the reverse

Odds Ratio (cont’d) SPSS reports the exponentiated slope (e = 3.123)--the sign of the slope is not reversed before it is exponentiated (e =.320)

Question & Answer Are foster mothers who report that they were provided sufficient information about the role of foster care workers more satisfied with their foster care agencies? The odds of being dissatisfied (vs. neutral or satisfied) are.32 times smaller (68%) for mothers who received sufficient information. Similarly, the odds of dissatisfied or neutral (vs. satisfied) are.32 times smaller (68%) for mothers who received sufficient information.

Single (Quantitative) IV Example DV = satisfaction with foster care agencies  (1) dissatisfied; (2) neither satisfied nor dissatisfied; (3) satisfied IV = available time to foster (Available Time Scale); higher scores indicate more time to foster  Converted to z-scores N = 300 foster mothers

Single (Quantitative) IV Example (cont’d) Are foster mothers with more time to foster more satisfied with their foster care agencies?

Statistical Significance Table 4.3   (zTime) = 0 Reject

Odds Ratio OR equals.76 (e )  For a one standard-deviation increase in available time, the odds of being dissatisfied (vs. neutral or satisfied) decrease by a factor of.76 (24%). Similarly, for one standard- deviation increase in available time the odds of being dissatisfied or neutral (vs. satisfied) decrease by a factor of.76 (24%).

Figures zATS.xls

Estimated Cumulative Logits L (Dissatisfied vs. Neutral/Satisfied) = t 1 - BX L (Dissatisfied/Neutral vs. Satisfied) = t 2 – BX Table 4.3 L (Dissatisfied vs. Neutral/Satisfied) = –.281X L (Dissatisfied/Neutral vs. Satisfied) = –.281X

Effect of Time on Satisfaction (Cumulative Logits)

Effect of Time on Satisfaction (Cumulative Odds)

Effect of Time on Satisfaction (Cumulative Probabilities)

Question & Answer Are foster mothers with more time to foster more satisfied with their foster care agencies? For a one standard-deviation increase in available time, the odds of being dissatisfied (vs. neutral or satisfied) decrease by a factor of.76 (24%). Similarly, for one standard-deviation increase in available time the odds of being dissatisfied or neutral (vs. satisfied) decrease by a factor of.76 (24%).

Multiple IV Example DV = satisfaction with foster care agencies  (1) dissatisfied; (2) neither satisfied nor dissatisfied; (3) satisfied IV = available time to foster (Available Time Scale); higher scores indicate more time to foster  Converted to z-scores IV = agencies provided sufficient information about the role of foster care workers  0 (no) or 1 (yes) N = 300 foster mothers

Multiple IV Example (cont’d) Are foster mothers who receive sufficient information about the role of foster care workers more satisfied with their foster care agencies, controlling for available time to foster?

Statistical Significance Table 4.4   (Info) =  (zTime) = 0 Reject Table 4.5   (Info) = 0 Reject   (zTime) = 0 Reject Table 4.6   (Info) = 0 Reject   (zTime) = 0 Reject

Odds Ratio: Information OR equals.33 (e )  The odds of being dissatisfied (vs. neutral or satisfied) are.33 times (67%) smaller for mothers who received sufficient information, when controlling for available time to foster. Similarly, the odds of being dissatisfied or neutral (vs. satisfied) are.33 times (67%) smaller for mothers who received sufficient information, when controlling for time.

Odds Ratio: Time OR equals.77 (e )  For a one standard-deviation increase in available time, the odds of being dissatisfied (vs. neutral or satisfied) decrease by a factor of.76 (24%), when controlling for information. Similarly, for one standard-deviation increase in available time the odds of being dissatisfied or neutral (vs. satisfied) decrease by a factor of.76 (24%), when controlling for information.

Estimated Cumulative Logits Table 4.6 L (Dissatisfied vs. Neutral/Satisfied) = – [(1.116)(X Info ) + (.260)(X zTime )] L (Dissatisfied/Neutral vs. Satisfied) =.222 – [(1.116)(X Info ) + (.260)(X zTime )]

Estimated Odds as a Function of Available Time and Information See Table 4.7

Estimated Probabilities as a Function of Available Time and Information See Table 4.9

Question & Answer Are foster mothers who receive sufficient information about the role of foster care workers more satisfied with their foster care agencies, controlling for available time to foster? The odds of being dissatisfied (vs. neutral or satisfied) are.33 times (67%) smaller for mothers who received sufficient information, when controlling for available time to foster. Similarly, the odds of being dissatisfied or neutral (vs. satisfied) are.33 times (67%) smaller for mothers who received sufficient information, when controlling for time.

Assumptions Necessary for Testing Hypotheses Assumptions discussed in GZLM lecture Effect of the IVs is the same for all values of the DV (“parallel lines assumption”) L (Dissatisfied vs. Neutral/Satisfied) = t 1 – (B Info X Info + B zTime X zTime ) L (Dissatisfied/Neutral vs. Satisfied) = t 2 - (B Info X Info + B zTime X zTime ) Ordinal logistic regression assumes that B Info is the same for both equations, and B zTime is the same for both equations See Table 4.10

Model Evaluation Create a set of binary DVs from the polytomous DV compute Satisfaction (1=1) (2=0) (3=0) into SatisfactionLessThan2. compute Satisfaction (1=1) (2=1) (3=0) into SatisfactionLessThan3. Run separate binary logistic regressions Use binary logistic regression methods to detect outliers and influential observations

Model Evaluation (cont’d) Index plots  Leverage values  Standardized or unstandardized deviance residuals  Cook’s D Graph and compare observed and estimated counts

Analogs of R 2 None in standard use and each may give different results Typically much smaller than R 2 values in linear regression Difficult to interpret

Multicollinearity SPSS GZLM doesn’t compute multicollinearity statistics Use SPSS linear regression Problematic levels  Tolerance <.10 or  VIF > 10

Additional Topics Polytomous IVs Curvilinear relationships Interactions

Additional Regression Models for Polytomous DVs Ordinal probit regression  Substantive results essentially indistinguishable from ordinal logistic regression  Choice between this and ordinal logistic regression largely one of convenience and discipline-specific convention  Many researchers prefer ordinal logistic regression because it provides odds ratios whereas ordinal probit regression does not, and ordinal logistic regression comes with a wider variety of fit statistics

Additional Regression Models for Polytomous DVs (cont’d) Adjacent-category logistic model  Compares each value of the DV to the next higher value Continuation-ratio logistic model  Compares each value of the DV to all lower values Generalized ordered logit model  Relaxes the parallel lines assumption

Additional Regression Models for Polytomous DVs (cont’d) Complementary log-log link (also known as clog-log)  Useful when higher categories more probable Negative log-log link  Useful when lower categories more probable Cauchit link  Useful when DV has a number of extreme values