Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1

Slides:



Advertisements
Similar presentations
Unit 4a: Basic Logistic (Binomial Logit) Regression Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 4a – Slide 1
Advertisements

Lesson 10: Linear Regression and Correlation
© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Binary Logistic Regression: One Dichotomous Independent Variable
Logistic Regression STA302 F 2014 See last slide for copyright information 1.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Binary Response Lecture 22 Lecture 22.
Generalised linear models
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Nemours Biomedical Research Statistics April 23, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
An Introduction to Logistic Regression
Correlation and Regression Analysis
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Generalized Linear Models
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis Shopping Presentation: A.
Unit 5c: Adding Predictors to the Discrete Time Hazard Model © Andrew Ho, Harvard Graduate School of EducationUnit 5c– Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
Unit 3b: From Fixed to Random Intercepts © Andrew Ho, Harvard Graduate School of EducationUnit 3b – Slide 1
Logistic Regression In logistic regression the outcome variable is binary, and the purpose of the analysis is to assess the effects of multiple explanatory.
Unit 2b: Dealing “Rationally” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2b – Slide 1
Unit 4c: Taxonomies of Logistic Regression Models © Andrew Ho, Harvard Graduate School of EducationUnit 4c – Slide 1
MODELS OF QUALITATIVE CHOICE by Bambang Juanda.  Models in which the dependent variable involves two ore more qualitative choices.  Valuable for the.
© Willett, Harvard University Graduate School of Education, 8/27/2015S052/I.3(c) – Slide 1 More details can be found in the “Course Objectives and Content”
Overview of Meta-Analytic Data Analysis
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Andrew Ho Harvard Graduate School of Education Tuesday, January 22, 2013 S-052 Shopping – Applied Data Analysis.
Unit 5b: The Logistic Regression Approach to Life Table Analysis © Andrew Ho, Harvard Graduate School of EducationUnit 5b– Slide 1
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
Unit 1c: Detecting Influential Data Points and Assessing Their Impact © Andrew Ho, Harvard Graduate School of EducationUnit 1c – Slide 1
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
S052/Shopping Presentation – Slide #1 © Willett, Harvard University Graduate School of Education S052: Applied Data Analysis What Would You Like To Know.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
Unit 3a: Introducing the Multilevel Regression Model © Andrew Ho, Harvard Graduate School of EducationUnit 3a – Slide 1
Introduction to logistic regression and Generalized Linear Models July 14, 2011 Introduction to Statistical Measurement and Modeling Karen Bandeen-Roche,
Logistic Regression. Linear Regression Purchases vs. Income.
© Willett, Harvard University Graduate School of Education, 12/16/2015S052/I.1(d) – Slide 1 More details can be found in the “Course Objectives and Content”
Multiple Logistic Regression STAT E-150 Statistical Methods.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Université d’Ottawa - Bio Biostatistiques appliquées © Antoine Morin et Scott Findlay :32 1 Logistic regression.
Logistic Regression Analysis Gerrit Rooks
© Willett, Harvard University Graduate School of Education, 2/19/2016S052/II.1(c) – Slide 1 S052/II.1(c): Applied Data Analysis Roadmap of the Course.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
Logistic Regression For a binary response variable: 1=Yes, 0=No This slide show is a free open source document. See the last slide for copyright information.
Unit 2a: Dealing “Empirically” with Nonlinear Relationships © Andrew Ho, Harvard Graduate School of EducationUnit 2a – Slide 1
The Probit Model Alexander Spermann University of Freiburg SS 2008.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Nonparametric Statistics
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
CHAPTER 7 Linear Correlation & Regression Methods
Chapter 13 Nonlinear and Multiple Regression
Generalized Linear Models
SA3202 Statistical Methods for Social Sciences
Nonparametric Statistics
Introduction to Logistic Regression
Presentation transcript:

Unit 4b: Fitting the Logistic Model to Data © Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 1

© Andrew Ho, Harvard Graduate School of Education Unit 4b– Slide 2 Multiple Regression Analysis (MRA) Multiple Regression Analysis (MRA) Do your residuals meet the required assumptions? Test for residual normality Use influence statistics to detect atypical datapoints If your residuals are not independent, replace OLS by GLS regression analysis Use Individual growth modeling Specify a Multi-level Model If time is a predictor, you need discrete- time survival analysis… If your outcome is categorical, you need to use… Binomial logistic regression analysis (dichotomous outcome) Multinomial logistic regression analysis (polytomous outcome) If you have more predictors than you can deal with, Create taxonomies of fitted models and compare them. Form composites of the indicators of any common construct. Conduct a Principal Components Analysis Use Cluster Analysis Use non-linear regression analysis. Transform the outcome or predictor If your outcome vs. predictor relationship is non-linear, Use Factor Analysis: EFA or CFA? Course Roadmap: Unit 4b Today’s Topic Area

© Andrew Ho, Harvard Graduate School of EducationUnit 4a – Slide 3 The Bivariate Distribution of HOME on HUBSAL RQ: In 1976, were married Canadian women who had children at home and husbands with higher salaries more likely to work at home rather than joining the labor force (when compared to their married peers with no children at home and husbands who earn less)?

Unit 4b – Slide 4 This will be our statistical model for relating a categorical outcome to predictors. We will fit it to data using Nonlinear Regression Analysis … This will be our statistical model for relating a categorical outcome to predictors. We will fit it to data using Nonlinear Regression Analysis … Logistic Regression Model dichotomous outcome We consider the non-linear Logistic Regression Model for representing the hypothesized population relationship between the dichotomous outcome, HOME, and predictors … underlying probability that the value of the outcome HOME equals 1 The outcome being modeled is the underlying probability that the value of the outcome HOME equals 1 determines the slope but is not equal to it Parameter  1 determines the slope of the curve, but is not equal to it (in fact, the slope is different at every point on the curve). determines the intercept but is not equal to it Parameter  0 determines the intercept of the curve, but is not equal to it. The Logistic Regression Model © Andrew Ho, Harvard Graduate School of Education

Unit 4b – Slide 5 Building the Logistic Regression Model: The Unconditional Model  We recall from multilevel modeling that we wish to maximize our likelihood, “maximum likelihood.”  Because the likelihoods are a product of many, many small probabilities, we maximize the sum of log-likelihoods, an attempt at making a negative number as positive as possible.  Later, we’ll use the difference in -2*loglikelihoods (the deviance) in a statistical test to compare models.  We recall from multilevel modeling that we wish to maximize our likelihood, “maximum likelihood.”  Because the likelihoods are a product of many, many small probabilities, we maximize the sum of log-likelihoods, an attempt at making a negative number as positive as possible.  Later, we’ll use the difference in -2*loglikelihoods (the deviance) in a statistical test to compare models.

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 6 Building the Logistic Regression Model

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 7 Graphical Interpretation of the Logistic Regression Model Comparing local polynomial, linear, and logistic fits to the data.

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 8 The Likelihood Ratio Chi-Square  Our Log Likelihood from our baseline model, with no predictors, is  Deviance = -2*loglikelihood =  Our Log Likelihood from our baseline model, with no predictors, is  Deviance = -2*loglikelihood =  Our Log Likelihood from our 1-predictor model is The loglikelihood of the data is less negative (more likely) given the model parameter estimates.  Deviance = -2*loglikelihood = The deviance has dropped (and will always drop).  Our Log Likelihood from our 1-predictor model is The loglikelihood of the data is less negative (more likely) given the model parameter estimates.  Deviance = -2*loglikelihood = The deviance has dropped (and will always drop).

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 9

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 10 Interpreting Model Results Graphically, Formulaically Husband's income in 1976 Canadian Dollars Estimated probability that the wife is a homemaker $10,00064% $20,00080% $30,00090% $40,00095%

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 11 Interpreting Logistic Model Parameter Estimates – Interpreting Sign

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 12 Object Is it an Easter Egg? (0 = no;1 = yes) Probability of picking an Easter Egg at random, p Odds of picking an Easter Egg (vs. not an Easter Egg), (p/1-p) Log-Odds of picking an Easter Egg (vs. not an Easter Egg), Log(p/1-p) Probability, Odds, and Log-Odds: Formulaically

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 13 One issue with probabilities is that their range of admissible values is restricted, to falling between 0 and 1. This was one of our clues that a linear model would be inappropriate. The logit transformation stretches the probability scale, facilitating a linear relationship p p Probability Theoretical Range Minimum Maximum Formula Quantity -- -- ++ ++ Log(Odds) or “logit” Notice that a log-odds transformation of a probability leads to a scale with an unrestricted range 0 0 ++ ++ Odds p p  1 Probability, Odds, and Log-Odds: By Range

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 14 From Probabilities to Odds PercentageProbabilityOdds 10%0.101/ %0.251/ %0.501/11 75%0.753/13 90%0.909/19

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 15 From Probabilities to Log-Odds (Logits) PercentageProbabilityLogits 10% % % % %

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 16 The Logistic Function as the Inverse of the Logit Function

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 17 General Relationship Our Model

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 18 Interpreting Coefficients in Terms of Logits (Log-Odds)

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 19 Interpreting Model Results in Terms of Odds When the husband earns $10K/year, the fitted odds that the woman is a homemaker is 1.77 to 1. When the husband earns $10K/year, for every woman in the workforce, we estimate that 1.77 are homemakers. When the husband earns $10K/year, the estimated probability that the woman is a homemaker is 1.77 times the estimated probability that the woman works outside the home.

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 20 Interpreting Model Results in Terms of Odds Ratios Husband's income in 1976 Canadian Dollars Estimated probability that the wife is a homemaker Estimated odds that the wife is a homemaker Estimated Odds Ratio $10,00064% $20,00080% $30,00090% $40,00095%20.15 We can calculate the ratio of odds at regular intervals: How much greater are the odds that a wife is a homemaker when the husband’s salary is $20,000 vs. $10,000? This odds ratio is 3.99/1.77= This is not a typo! Successive odds ratios are constant!

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 21 From Log-Odds to Odds Ratios

© Andrew Ho, Harvard Graduate School of EducationUnit 4b – Slide 22 Four Ways to Interpret Slope Coefficients in a Logistic Regression Model Pick Prototypical Odds Estimated odds of being a homemaker across prototypical husband’s income levels: Pick Prototypical Odds Estimated odds of being a homemaker across prototypical husband’s income levels: Husband's income in 1976 Canadian Dollars Estimated probability that the wife is a homemaker Estimated odds that the wife is a homemaker Estimated Odds Ratio $10,00064% $20,00080% $30,00090% $40,00095%20.15 Log-Odds/Logits Two women whose husband’s 1976 salaries differ by $1000 differ by.081 in their fitted log-odds of being a homemaker. Log-Odds/Logits Two women whose husband’s 1976 salaries differ by $1000 differ by.081 in their fitted log-odds of being a homemaker. Pick Prototypical Probabilities Estimated probabilities of being a homemaker across prototypical husband’s income levels: Pick Prototypical Probabilities Estimated probabilities of being a homemaker across prototypical husband’s income levels: