An Introduction to Logistic Regression

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Continued Psy 524 Ainsworth
Sociology 680 Multivariate Analysis Logistic Regression.
Brief introduction on Logistic Regression
Correlation and regression
Logistic Regression Psy 524 Ainsworth.
The SPSS Sample Problem
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Logistic Regression.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Regression With Categorical Variables. Overview Regression with Categorical Predictors Logistic Regression.
Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number.
Chapter 13 Multiple Regression
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Chapter 12 Multiple Regression
An Introduction to Logistic Regression JohnWhitehead Department of Economics Appalachian State University.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Topic 3: Regression.
(Correlation and) (Multiple) Regression Friday 5 th March (and Logistic Regression too!)
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Relationships Among Variables
Introduction to Multilevel Modeling Using SPSS
Introduction to Linear Regression and Correlation Analysis
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Overview of Meta-Analytic Data Analysis
Multinomial Logistic Regression Basic Relationships
Slide 1 The SPSS Sample Problem To demonstrate these concepts, we will work the sample problem for logistic regression in SPSS Professional Statistics.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
Logistic (regression) single and multiple. Overview  Defined: A model for predicting one variable from other variable(s).  Variables:IV(s) is continuous/categorical,
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Logistic Regression. Linear Regression Purchases vs. Income.
Warsaw Summer School 2015, OSU Study Abroad Program Advanced Topics: Interaction Logistic Regression.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Logistic Regression Analysis Gerrit Rooks
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Dates Presentations Wed / Fri Ex. 4, logistic regression, Monday Dec 7 th Final Tues. Dec 8 th, 3:30.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
ALISON BOWLING MAXIMUM LIKELIHOOD. GENERAL LINEAR MODEL.
DTC Quantitative Research Methods Regression I: (Correlation and) Linear Regression Thursday 27 th November 2014.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Roger B. Hammer Assistant Professor Department of Sociology Oregon State University Conducting Social Research Logistic Regression Categorical Data Analysis.
Logistic Regression and Odds Ratios Psych DeShon.
Nonparametric Statistics
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 13 LOGISTIC REGRESSION. Set of independent variables Categorical outcome measure, generally dichotomous.
Nonparametric Statistics
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Logistic Regression II/ (Hierarchical)
Notes on Logistic Regression
The Correlation Coefficient (r)
Multiple Regression.
Multiple logistic regression
Nonparametric Statistics
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
MGS 3100 Business Analysis Regression Feb 18, 2016
The Correlation Coefficient (r)
Presentation transcript:

An Introduction to Logistic Regression For categorical Dependent Variables GV917

What do we do when the dependent variable in a regression is a dummy variable? Suppose we have the dummy variable turnout: 1 – if a survey respondent turns out to vote 0 – if they don’t vote One thing we could do is simply run an ordinary least squares regression

Interest in the Election 1 not at all interested 2 not very interested Turnout Interest 1.00 4.00 .00 1.00 1.00 3.00 1.00 2.00 .00 2.00 1.00 1.00 (N=30) Turnout 1 yes 0 No Interest in the Election 1 not at all interested 2 not very interested 3 fairly interested 4 very interested

OLS Regression of Interest on Turnout – The Linear Probability Model

The Residuals of the OLS Turnout Regression

What’s Wrong? Predicted probabilities which exceed 1.0, which makes no sense. The test statistics t and F are not valid because the sampling distribution of residuals does not meet the required assumptions (heteroscedasticity) We can correct for the heteroscedasticity, but a better option is to use a logistic regression model

Some Preliminaries needed for Logistic Regression Odds Ratios These are defined as the probability of an event occurring divided by the probability of it not occurring. Thus if p is the probability of an event: p Odds = ------- 1- p For example: In the 2005 British Election Study Face-to-Face survey 48.2 per cent of the sample were men, and 51.8 percent women, thus the odds of being a man were: 0.482 0.518 -------- = 0.93 and the odds of being a women were -------- = 1.07 0.518 0.482 Note that if the odds ratio was 1.00 it would mean that women were equally likely to appear in the survey as men.

Log Odds The natural logarithm of a number is the power we must raise e (2.718) to give the number in question. So the natural logarithm of 100 is 4.605 because 100 = e 4.605 This can be written 100 = exp(4.605) Similarly the anti-log of 4.605 is 100 because e 4.605 = 100 In the 2005 BES study 70.5 per cent of men and 72.9 per cent of women voted. The odds of men voting were 0.705/0.295 = 2.39, and the log odds were ln(2.39) = 0.8712 The odds of women voting were 0.729/0.271 = 2.69, and the log odds were ln(2.69) = 0.9896 Note that ln(1.0) = 0, so that when the odds ratio is 1.0 the log odds ratio is zero

Why Use Logarithms? They have 3 advantages: Odds vary from 0 to ∞, whereas log odds vary from -∞ to + ∞ and are centered on 0. Odds less than 1 have negative values in log odds, and odds greater than one have positive values in log odds. This accords better with the natural number system which runs from -∞ to + ∞. If we take any two numbers and multiply them together that is the equivalent of adding their logs. Thus logs make it possible to convert multiplicative models to additive models, a useful property in the case of logistic regression which is a non-linear multiplicative model when not expressed in logs A useful statistic for evaluating the fit of models is -2*loglikelihood (also known as the deviance). The model has to be expressed in logarithms for this to work

Logistic Regression ^ p(y) ln -------- = a + bXi 1 - p(y) Where p(y) is the predicted probability of being a voter 1 – p(y) is the predicted probability of not being a voter If we express this in terms of anti-logs or odds ratios, then --------- = exp( a + bXi) and ^ exp(a + bXi) p(y) = ----------------- 1 + exp(a + bXi)

The Logistic Function The logistic function can never be greater than one, so there are no impossible probabilities It corrects for the problems with the test statistics

Estimating a Logistic Regression In OLS regression the least squares solution can be defined analytically – there are equations called the Normal Equations which we use to find the values of a and b. In logistic regression there are no such equations. The solutions are derived iteratively – by a process of trial and error. Doing this involves identifying a likelihood function. A likelihood is a measure of how typical a sample is of a given population. For example we can calculate how typical the ages of the students in this class are in comparison with students in the university as a whole. Applied to our regression problem we are working out how likely individuals are to be voters given their level of interest in the election and given values for the a and b coefficients. We ‘try out’ different values of a and b and the maximum likelihood estimation identifies the values which are most likely to reproduce the distribution of voters and non-voters we see in the sample, given their levels of interest in the election.

A Note on Maximum Likelihood Define the probability of getting a head in tossing a fair coin as p(H) = 0.5, so that p(1-H) = 0.5 (getting a tail). So the probability of two heads followed by a tail is: P[(H)(H)(1-H)] = (0.5)(0.5)(0.5) = 0.125 We can get this sequence in 3 different ways (the tail can be first, second or third), so that the probability of getting 2 heads and a tail without worrying about the sequence is 0.125(3) = 0.375 But suppose we did not know the value of p(H). We could ‘try out’ different values and see how well they fitted an experiment consisting of repeated tosses of a coin three times. For example if we thought p(H) = 0.4, then two heads and a tail would give (0.4)(0.4)(0.6)(3)= 0.288. If we thought it was 0.3 we would get: (0.3)(0.3)(0.7)(3) = 0.189

Maximum Likelihood in General More generally we can write a likelihood function for this exercise: LF = π [pi2 * (1- pi)] where pi is the probability of getting a head and π is the number of ways this sequence can occur. The maximum value of this function occurs when pi=0.5, making this the maximum likelihood estimate of the sequence two heads and a tail.

Explaining Variance In OLS regression we defined the following expression: _ _ Σ(Yi – Y)2 = Σ(Ŷ – Y)2 + Σ(Yi - Ŷ)2 Or Total Variation = Explained Variation + Residual Variation In logistic regression measures of the Deviance replace the sum of squares as the building blocks of measures of fit and statistical tests.

Deviance Deviance measures are built from maximum likelihoods calculated using different models. For example, suppose we fit a model with no slope coefficient (b), but an intercept coefficient (a). We can call this model zero because it has no predictors. We then fit a second model, called model one, which has both a slope and an intercept. we can form the ratio of the maximum likelihoods of these models: maximum likelihood of model zero Likelihood ratio = --------------------------------------------- maximum likelihood of model one Expressed in logs this becomes: Log Likelihood ratio = ln(maximum likelihood of model zero – maximum likelihood of model one) Note that the (Likelihood ratio)2 is the same as 2(log likelihood ratio) The Deviance is defined as -2(log likelihood ratio)

What does this mean? The maximum likelihood of model zero is analogous to the total variation in OLS and the maximum likelihood of model one is analogous to the explained variation. If the maximum likelihoods of models zero and one were the same, then the likelihood ratio would be 1 and the log likelihood ratio 0. This would mean that model one was no better than model zero in accounting for turnout, so the deviance captures how much we improve things by taking into account interest in the election. The bigger the deviance the more the improvement

SPSS Output from the Logistic Regression of Turnout

The Meaning of the Omnibus Test SPSS starts by fitting what it calls Block 0, which is the model containing the constant term and no predictor variables. It then proceeds to Block 1 which fits the model and gives us another estimate of the likelihood function. These two can then be compared and the table shows a chi-square statistical test of the improvement in the model achieved by adding interest in the election to the equation. This chi-square statistic is significant at the 0.001 level. In a multiple logistic regression this table tells us how much all of the predictor variables improve things compared with model zero. We have significantly improved on the baseline model by adding the variable interest to the equation

The Model Summary Table The -2 loglikelihood statistic for our two variable model appears in the table, but it is only really meaningful for comparing different models. The Cox and Snell and the Nagelkerke R Squares are different ways of approximating the percentage of variance explained (R square) in multiple regression. The Cox and Snell statistic is problematic because it has a maximum value of 0.75. The Nagelkerke R square corrects this and has a maximum value of 1.0, so it is often the preferred measure.

The Classification Table The classification table tells us the extent to which the model correctly predicts the actual turnout, so it is another goodness of fit measure. The main diagonal from top left to bottom right contains the cases predicted correctly (23), whereas the off-diagonal from bottom right to top left are the cases predicted incorrectly (7). So overall 76.7 per cent of the cases are predicted correctly.

Interpreting the Coefficients The column on the left gives the coefficients in the logistic regression model. It means that a unit change in the level of interest in the election increases the log odds of voting by 1.742. The standard error appears in the next column (0.697) and the Wald statistic in the third column. The latter is the t statistic squared (6.251) and as we can see it is significant at the 0.012 level. Finally, Exp (B) is the anti-log of the (B) column so that e1.742 = 5.708. This is the effect on the odds of voting of an increase in the level of interest in the election by one unit. Since odds ratios are a bit more easy to understand than log odds ratios the effects are often reported using these coefficients.

Making Sense of the Coefficients ^ p(y) ln -------- = -2.582 + 1.742Xi 1 - p(y) So that p(y) = exp(-2.582 + 1.742Xi) ------------------------------- 1 + exp(-2.582 + 1.742Xi)

Translating into Probabilities Suppose a person scores 4 on the interest in the election variable (they are very interested). Then according to the model the probability that they will vote is: ^ P(y) = exp(-2.582 + 1.742(4)) ------------------------------- 1 + exp(-2.582 + 1.742(4)Xi) P(y) = exp(4.386)/(1 + exp(4.386)) = 0.99 If they are not at all interested and score (1) then: P(y) = exp(-0.84)/(1 + exp(-0.84)) = 0.30 Consequently a change from being not at all interested to being very interested increases the probability of voting by 0.99-0.30= 0.69

Probabilities Level of Interest Probability of Voting 1 0.30 2 0.71 1 0.30 2 0.71 3 0.93 4 0.99

Conclusions Logistic regression allows us to model relationships when the dependent variable is a dummy variable. It can be extended to multinomial logistic regression in which there are several categories – and this produces several sets of coefficients The results are more reliable than if we had just used ordinary least squares regression