Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

AP Statistics Course Review.
Unit 4a: Basic Logistic (Binomial Logit) Regression Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 4a – Slide 1
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Unit 6a: Motivating Principal Components Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 6a– Slide 1
Logistic Regression Psy 524 Ainsworth.
Binary Logistic Regression: One Dichotomous Independent Variable
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Conclusion to Bivariate Linear Regression Economics 224 – Notes for November 19, 2008.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Moderation: Assumptions
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r Assumptions.
Binary Response Lecture 22 Lecture 22.
Correlation and Regression. Spearman's rank correlation An alternative to correlation that does not make so many assumptions Still measures the strength.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Multinomial Logistic Regression
1 G Lect 11M Binary outcomes in psychology Can Binary Outcomes Be Studied Using OLS Multiple Regression? Transforming the binary outcome Logistic.
An Introduction to Logistic Regression
Correlation and Regression Analysis
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Generalized Linear Models
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis Shopping Presentation: A.
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Unit 3b: From Fixed to Random Intercepts © Andrew Ho, Harvard Graduate School of EducationUnit 3b – Slide 1
Ordinal Logistic Regression “Good, better, best; never let it rest till your good is better and your better is best” (Anonymous)
Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1
© Willett, Harvard University Graduate School of Education, 8/27/2015S052/I.3(c) – Slide 1 More details can be found in the “Course Objectives and Content”
BPS - 3rd Ed. Chapter 211 Inference for Regression.
Chapter 3: Generalized Linear Models 3.1 The Generalization 3.2 Logistic Regression Revisited 3.3 Poisson Regression 1.
Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.
Unit 5b: The Logistic Regression Approach to Life Table Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 5b– Slide 1
Excepted from HSRP 734: Advanced Statistical Methods June 5, 2008.
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis What Would You Like To Know.
Unit 3a: Introducing the Multilevel Regression Model © Andrew Ho, Harvard Graduate School of EducationUnit 3a – Slide 1
© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”
Regression Continued. Example: Y [team finish] =  +  X [spending] Values of the Y variable (team finish) are a function of some constant, plus some.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Logistic Regression Analysis Gerrit Rooks
© Willett, Harvard University Graduate School of Education, 1/19/2016S052/I.2(a) – Slide 1 More details can be found in the “Course Objectives and Content”
CSE 5331/7331 F'07© Prentice Hall1 CSE 5331/7331 Fall 2007 Regression Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
© Willett, Harvard University Graduate School of Education, 2/19/2016S052/II.1(c) – Slide 1 S052/II.1(c): Applied Data Analysis Roadmap of the Course.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Unit 2a: Dealing “Empirically” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2a – Slide 1
BPS - 5th Ed. Chapter 231 Inference for Regression.
© Willett, Harvard University Graduate School of Education, 6/13/2016S052/II.2(a3) – Slide 1 S052/II.2(a3): Applied Data Analysis Roadmap of the Course.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Logistic Regression: Regression with a Binary Dependent Variable.
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
Generalized Linear Models
When You See (This), You Think (That)
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Major Topics first semester by chapter
Major Topics first semester by chapter
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Regression and Categorical Predictors
Presentation transcript:

Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1

Building the Logistic Regression Model Dichotomous Predictors Interactions Post-Hoc GLH Tests © Andrew Ho, Harvard Graduate School of Education Unit 4c– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 4c Today’s Topic Area

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 3 The Bivariate Distribution of HOME on HUBSAL RQ: In 1976, were married Canadian women who had children at home and husbands with higher salaries more likely to work at home rather than joining the labor force (when compared to their married peers with no children at home and husbands who earn less)?

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 4 The Bivariate Distribution of HOME on CHILD Scatterplots don’t work very well with dichotomous outcomes and dichotomous predictors. Instead, try a 2x2 table with the “tabulate” command. Note (1,1) is in the lower right for tables but upper right for scatterplots. Scatterplots don’t work very well with dichotomous outcomes and dichotomous predictors. Instead, try a 2x2 table with the “tabulate” command. Note (1,1) is in the lower right for tables but upper right for scatterplots.  Specifies conditional percentages by rows (and joint probabilities by cells):  Given that there is a child present, the sample probability of being a homemaker is 86.58%.  Given that there is no child present, the sample probability of being a homemaker is 35.29%.  Specifies conditional percentages by rows (and joint probabilities by cells):  Given that there is a child present, the sample probability of being a homemaker is 86.58%.  Given that there is no child present, the sample probability of being a homemaker is 35.29%.

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 5 Sample Probabilities, Odds, Log-Odds, and Odds Ratios Are Children Present in the Home? Sample Probability Homemaker Sample Log-Odds (Logit) Sample Difference in Log-Odds Sample Odds Ratio Sample Log- Odds Ratio No Child35.29% Children86.58% I recommend understanding the logit scale (nonlinear in probability): -2 is around 10%, -1 is around 25%, 0 is 50%, 1 is 75%, 2 is 90%. I recommend understanding the logit scale (nonlinear in probability): -2 is around 10%, -1 is around 25%, 0 is 50%, 1 is 75%, 2 is 90%. We note that an increment from No Child (0) to Children (1) increments the log-odds by 2.47.

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 6 Modeling a Dichotomous Outcome on a Dichotomous Predictor Are Children Present in the Home? Sample Probability Homemaker Sample Log-Odds (Logit) Sample Difference in Log- Odds No Child35.29% Children86.58%1.864

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 7 Building the logistic regression model  Our old friend eststo:  Beginning with the baseline model, no predictors, constant only (Model 1).  Adding main effects separately (Models 2 and 3), together (Model 4), and an interaction (Model 5)  At each step, save the “deviance” (-2*loglikelihood)  Our old friend eststo:  Beginning with the baseline model, no predictors, constant only (Model 1).  Adding main effects separately (Models 2 and 3), together (Model 4), and an interaction (Model 5)  At each step, save the “deviance” (-2*loglikelihood)

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 8 Interpretation of Main Effects

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 9 Interpretation of Fit Statistics

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 10 Graphical Representation of Model 4  It is always good practice to only plot fitted curves in the range of the data whose relationships they describe.  It is particularly important for graphing logistic regression models on the probability metric, where there are clearly nonlinear relationships.  See today’s code for details. Label curves.  It is always good practice to only plot fitted curves in the range of the data whose relationships they describe.  It is particularly important for graphing logistic regression models on the probability metric, where there are clearly nonlinear relationships.  See today’s code for details. Label curves. No Children Children  How do we interpret the varying gap? As an interaction?  No! There is no interaction in Model 4.  The scale is not what it seems. This is actually a linear model in the log-odds.  The distance is just as large at the extremes as it is in the center, it just doesn’t seem that way, since we are plotting on the probability metric.  How do we interpret the varying gap? As an interaction?  No! There is no interaction in Model 4.  The scale is not what it seems. This is actually a linear model in the log-odds.  The distance is just as large at the extremes as it is in the center, it just doesn’t seem that way, since we are plotting on the probability metric.

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 11 Contrasting Graphical Representations of Model 4 No Children Children No Children Children

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 12 Interpretation of Model 5 No Children Children

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 13 Contrasting Graphical Representations of Model 5 No Children Children No Children Children

Foll © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 14 Post-Hoc GLH Tests: Gaps Between Conditional Logistic Curves No Children Children No Children Children

Foll © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 15 Post-Hoc GLH Tests: Conditional Slopes No Children Children No Children Children Are these “slopes” 0 in the population? Are these slopes 0 in the population?

Foll © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 16 Even more “general” GLH Tests: Any two points. No Children Children No Children Children Does a wife with 1+ child and a low-income husband ($1K) have the same population probability of being a homemaker as... a wife with no children but a more wealthy husband ($35K)?

© Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 17 Foll Revisiting Model Fit and Error Variance