Logistic Regression: Regression with a Binary Dependent Variable.

Slides:



Advertisements
Similar presentations
Dummy Dependent variable Models
Advertisements

Assumptions underlying regression analysis
Brief introduction on Logistic Regression
Logistic Regression Psy 524 Ainsworth.
Logistic Regression.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Quantitative Data Analysis: Hypothesis Testing
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Log-linear and logistic models
Chapter 11 Multiple Regression.
Topic 3: Regression.
What Is Multivariate Analysis of Variance (MANOVA)?
An Introduction to Logistic Regression
Business Statistics - QBM117 Statistical inference for regression.
Multiple Discriminant Analysis and Logistic Regression.
Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall. 2-1 Chapter 2 Examining Your Data.
Multivariate Data Analysis Chapter 8 - Canonical Correlation Analysis.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
9-1 MGMG 522 : Session #9 Binary Regression (Ch. 13)
Logistic Regression Database Marketing Instructor: N. Kumar.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
Multivariate Data Analysis Chapter 5 – Discrimination Analysis and Logistic Regression.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
11 Chapter 12 Quantitative Data Analysis: Hypothesis Testing © 2009 John Wiley & Sons Ltd.
Multiple Discriminant Analysis
Slide 1 The Kleinbaum Sample Problem This problem comes from an example in the text: David G. Kleinbaum. Logistic Regression: A Self-Learning Text. New.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
Logistic Regression. Linear Regression Purchases vs. Income.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Logistic Regression Analysis Gerrit Rooks
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Nonparametric Statistics
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Nonparametric Statistics
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
Logistic Regression When and why do we use logistic regression?
Logistic Regression APKC – STATS AFAC (2016).
26134 Business Statistics Week 5 Tutorial
Notes on Logistic Regression
Multiple Discriminant Analysis and Logistic Regression
Drop-in Sessions! When: Hillary Term - Week 1 Where: Q-Step Lab (TBC) Sign up with Alice Evans.
Generalized Linear Models
Multivariate Analysis Lec 4
Correlation and Regression
Stats Club Marnie Brennan
Nonparametric Statistics
Prepared by Lee Revere and John Large
CH2. Cleaning and Transforming Data
Product moment correlation
Chapter 7 Multivariate Analysis of Variance
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Presentation transcript:

Logistic Regression: Regression with a Binary Dependent Variable

LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression should be used instead of multiple regression. State the circumstances under which logistic regression should be used instead of multiple regression. Identify the types of dependent and independent variables used in the application of logistic regression. Identify the types of dependent and independent variables used in the application of logistic regression. Describe the method used to transform binary measures into the likelihood and probability measures used in logistic regression. Describe the method used to transform binary measures into the likelihood and probability measures used in logistic regression. Logistic Regression: Regression with a Binary Dependent Variable

LEARNING OBJECTIVES continued... Upon completing this chapter, you should be able to do the following: Interpret the results of a logistic regression analysis and assessing predictive accuracy, with comparisons to both multiple regression and discriminant analysis. Interpret the results of a logistic regression analysis and assessing predictive accuracy, with comparisons to both multiple regression and discriminant analysis. Understand the strengths and weaknesses of logistic regression compared to discriminant analysis and multiple regression. Understand the strengths and weaknesses of logistic regression compared to discriminant analysis and multiple regression. Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression... is a specialized form of regression that is designed to predict and explain a binary (two-group) categorical variable rather than a metric dependent measure. Its variate is similar to regular regression and made up of metric independent variables. It is less affected than discriminant analysis when the basic assumptions, particularly normality of the independent variables, are not met. Logistic Regression... is a specialized form of regression that is designed to predict and explain a binary (two-group) categorical variable rather than a metric dependent measure. Its variate is similar to regular regression and made up of metric independent variables. It is less affected than discriminant analysis when the basic assumptions, particularly normality of the independent variables, are not met. Logistic Regression Defined

Logistic Regression May Be Preferred... When the dependent variable has only two groups, logistic regression may be preferred for two reasons: Discriminant analysis assumes multivariate normality and equal variance-covariance matrices across groups, and these assumptions are often not met. Logistic regression does not face these strict assumptions and is much more robust when these assumptions are not met, making its application appropriate in many situations. Even if the assumptions are met, some researchers prefer logistic regression because it is similar to multiple regression. It has straightforward statistical tests, similar approaches to incorporating metric and nonmetric variables and nonlinear effects, and a wide range of diagnostics.

Multiple Regression Decision Process Stage 1: Objectives of Logistic Regression Stage 2: Research Design for Logistic Regression Stage 3: Assumptions of Logistic Regression Stage 4: Estimation of the Logistic Regression Model and Assessing Overall Fit Stage 5: Interpretation of the Results Stage 6: Validation of the Results

Logistic regression is best suited to address two research objectives... Identifying the independent variables that impact group membership in the dependent variable. Identifying the independent variables that impact group membership in the dependent variable. Establishing a classification system based on the logistic model for determining group membership. Establishing a classification system based on the logistic model for determining group membership. Stage 1: Objectives of Logistic Regression

Stage 2: Research Design for Logistic Regression The binary nature of the dependent variable (0 – 1) means the error term has a binomial distribution instead of a normal distribution, and it thus invalidates all testing based on the assumption of normality. The binary nature of the dependent variable (0 – 1) means the error term has a binomial distribution instead of a normal distribution, and it thus invalidates all testing based on the assumption of normality. The variance of the dichotomous variable is not constant, creating instances of heteroscedasticity as well. The variance of the dichotomous variable is not constant, creating instances of heteroscedasticity as well. Neither of the above violations can be remedied through transformations of the dependent or independent variables. Logistic regression was developed to specifically deal with these issues. Neither of the above violations can be remedied through transformations of the dependent or independent variables. Logistic regression was developed to specifically deal with these issues.

Stage 3: Assumptions of Logistic Regression The advantages of logistic regression are primarily the result of the general lack of assumptions. The advantages of logistic regression are primarily the result of the general lack of assumptions. Logistic regression does not require any specific distributional form for the independent variables. Logistic regression does not require any specific distributional form for the independent variables. Heteroscedasticity of the independent variables is not required. Heteroscedasticity of the independent variables is not required. Linear relationships between the dependent and independent variables are not required. Linear relationships between the dependent and independent variables are not required.

Stage 4: Estimation of Logistic Regression Model and Assessing Overall Fit Transforming the dependent variable Transforming the dependent variable Estimating the coefficients Estimating the coefficients Transforming a probability into odds and logit values Transforming a probability into odds and logit values Model estimation Model estimation Assessing the goodness of fit Assessing the goodness of fit

Estimating the Coefficients Two basic steps... 1.Transforming a probability into odds and logit values 2.Model estimation using a maximum likelihood approach, not least squares as in multiple regression The estimation process maximizes the likelihood that an event will occur – the event being a respondent is assigned to one group versus another The estimation process maximizes the likelihood that an event will occur – the event being a respondent is assigned to one group versus another

Transforming a Probability into Odds and Logit Values oThe logistic transformation has two basic steps: Restating a probability as odds, and Restating a probability as odds, and Calculating the logit values. Calculating the logit values. oInstead of using ordinary least squares to estimate the model, the maximum likelihood method is used. oThe basic measure of how well the maximum likelihood estimation procedure fits is the likelihood value.

Model Estimation Fit – Between Model comparisons... Comparisons of the likelihood values follow three steps: 1.Estimate a Null Model – which acts as the “baseline” for making comparisons of improvement in model fit. 2.Estimate Proposed Model – the model containing the independent variables to be included in the logistic regression. 3.Assess – 2LL Difference.

Comparison to Multiple Regression... Correspondence of Primary Elements of Model Fit Multiple RegressionLogistic Regression Total Sum of Squares -2LL of Base Model Error Sum of Squares-2LL of Proposed Model Regression Sum of SquaresDifference of -LL for Base and Proposed Models F test of model fitChi-square Test of -2LL Difference Coefficient of determination“Pseudo” R 2 measures

Stage 5: Interpretation of the Results Testing for significance of the coefficients – based on the Wald statistic Testing for significance of the coefficients – based on the Wald statistic Interpreting the coefficients Interpreting the coefficients Directionality of the relationship Directionality of the relationship Magnitude of the relationship of metric independent variables Magnitude of the relationship of metric independent variables Interpreting nonmetric independent variables Interpreting nonmetric independent variables

Directionality of the Relationship A positive relationship means an increase in the independent variable is associated with an increase in the predicted probability, and vice versa. But the direction of the relationship is reflected differently for the original and exponentiated logistic coefficients. A positive relationship means an increase in the independent variable is associated with an increase in the predicted probability, and vice versa. But the direction of the relationship is reflected differently for the original and exponentiated logistic coefficients. Original coefficient signs indicate the direction of the relationship. Original coefficient signs indicate the direction of the relationship. Exponentiated coefficients are interpreted differently since they are the logarithms of the original coefficients and do not have negative values. Thus, exponentiated coefficients above 1.0 represent a positive relationship and values less than 1.0 represent negative relationships. Exponentiated coefficients are interpreted differently since they are the logarithms of the original coefficients and do not have negative values. Thus, exponentiated coefficients above 1.0 represent a positive relationship and values less than 1.0 represent negative relationships.

Magnitude of the Relationship... The magnitude of metric independent variables is interpreted differently for original and exponentiated logistic coefficients: Original logistic coefficients – are less useful in determining the magnitude of the relationship since the reflect the change in the logit (logged odds) value. Original logistic coefficients – are less useful in determining the magnitude of the relationship since the reflect the change in the logit (logged odds) value. Exponentiated coefficients – directly reflect the magnitude of the change in the odds value. But their impact is multiplicative and a coefficient of 1.0 denotes no change (1.0 times the independent variable = no change). Exponentiated coefficients – directly reflect the magnitude of the change in the odds value. But their impact is multiplicative and a coefficient of 1.0 denotes no change (1.0 times the independent variable = no change).

Rules of Thumb 6–1 Logistic Regression Logistic regression is the preferred method for two- group (binary) dependent variables due to its robustness, ease of interpretation and diagnostics. Logistic regression is the preferred method for two- group (binary) dependent variables due to its robustness, ease of interpretation and diagnostics. Sample size considerations for logistic regression are primarily focused on the size of each group, which should have 10 times the number of estimated model coefficients (the number of variables). Sample size considerations for logistic regression are primarily focused on the size of each group, which should have 10 times the number of estimated model coefficients (the number of variables). Sample size should be met in both the analysis and holdout samples. Sample size should be met in both the analysis and holdout samples. Model significance tests are made with a chi-square test on the differences in the log likelihood values (- 2LL) between two models. Model significance tests are made with a chi-square test on the differences in the log likelihood values (- 2LL) between two models.

Rules of Thumb 6–1 continued... Logistic Regression Coefficients are expressed in two forms: original and exponentiated to assist in interpretation. Coefficients are expressed in two forms: original and exponentiated to assist in interpretation. Interpretation of the coefficients for direction and magnitude is: Interpretation of the coefficients for direction and magnitude is: Direction can be directly assessed in the original coefficients (positive or negative signs) or indirectly in the exponentiated coefficients (less than 1 are negative, greater than 1 are positive). Direction can be directly assessed in the original coefficients (positive or negative signs) or indirectly in the exponentiated coefficients (less than 1 are negative, greater than 1 are positive). Magnitude is best assessed by the exponentiated coefficient, with the percentage change in the dependent variable shown by: Percentage change = (Exponentiated Coefficient – 1.0) * 100 Magnitude is best assessed by the exponentiated coefficient, with the percentage change in the dependent variable shown by: Percentage change = (Exponentiated Coefficient – 1.0) * 100

Stage 6: Validation of the Results Involves ensuring both the internal and external validity of the results. Involves ensuring both the internal and external validity of the results. The most common form of estimating external validity is creation of a holdout or validation sample and calculating the hit ratio. The most common form of estimating external validity is creation of a holdout or validation sample and calculating the hit ratio. A second approach is cross-validation, typically achieved with a jackknife or “leave- one-out” process of calculating the hit ratio. A second approach is cross-validation, typically achieved with a jackknife or “leave- one-out” process of calculating the hit ratio.