Chapter 8 – Logistic Regression

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

Brief introduction on Logistic Regression
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Chapter 5 – Evaluating Classification & Predictive Performance
Chapter 4 – Evaluating Classification & Predictive Performance © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel.
Logistic Regression.
Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.
Multiple Linear Regression
Chapter 7 – Classification and Regression Trees
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
Chapter 7 – Classification and Regression Trees
Chapter 8 Logistic Regression 1. Introduction Logistic regression extends the ideas of linear regression to the situation where the dependent variable,
Multiple Linear Regression Model
x – independent variable (input)
Chapter 7 – K-Nearest-Neighbor
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Introduction to Logistic Regression. Simple linear regression Table 1 Age and systolic blood pressure (SBP) among 33 adult women.
Chapter 11 Multiple Regression.
Topic 3: Regression.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
An Introduction to Logistic Regression
Chapter 5 Data mining : A Closer Look.
Classification and Prediction: Regression Analysis
Generalized Linear Models
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Overview DM for Business Intelligence.
Regression Method.
Chapter 3 Data Exploration and Dimension Reduction 1.
7.1 - Motivation Motivation Correlation / Simple Linear Regression Correlation / Simple Linear Regression Extensions of Simple.
Chapter 9 – Classification and Regression Trees
Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Session 10. Applied Regression -- Prof. Juran2 Outline Binary Logistic Regression Why? –Theoretical and practical difficulties in using regular (continuous)
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
AN INTRODUCTION TO LOGISTIC REGRESSION ENI SUMARMININGSIH, SSI, MM PROGRAM STUDI STATISTIKA JURUSAN MATEMATIKA UNIVERSITAS BRAWIJAYA.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
Chapter 4: Introduction to Predictive Modeling: Regressions
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Logistic Regression. Linear Regression Purchases vs. Income.
© Department of Statistics 2012 STATS 330 Lecture 22: Slide 1 Stats 330: Lecture 22.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
LOGISTIC REGRESSION Binary dependent variable (pass-fail) Odds ratio: p/(1-p) eg. 1/9 means 1 time in 10 pass, 9 times fail Log-odds ratio: y = ln[p/(1-p)]
Additive Models , Trees , and Related Models Prof. Liqing Zhang Dept. Computer Science & Engineering, Shanghai Jiaotong University.
© Galit Shmueli and Peter Bruce 2010 Chapter 6: Multiple Linear Regression Data Mining for Business Analytics Shmueli, Patel & Bruce.
Chapter 4 –Dimension Reduction Data Mining for Business Analytics Shmueli, Patel & Bruce.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Logistic Regression and Odds Ratios Psych DeShon.
1 BINARY CHOICE MODELS: LOGIT ANALYSIS The linear probability model may make the nonsense predictions that an event will occur with probability greater.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
LOGISTIC REGRESSION. Purpose  Logistical regression is regularly used when there are only two categories of the dependent variable and there is a mixture.
Logistic Regression: Regression with a Binary Dependent Variable.
Chapter 12 – Discriminant Analysis
Chapter 7. Classification and Prediction
Classification Methods
Notes on Logistic Regression
Chapter 7 – K-Nearest-Neighbor
Generalized Linear Models
Roberto Battiti, Mauro Brunato
Chapter 6: Multiple Linear Regression
Introduction to Logistic Regression
Simple Linear Regression
Chapter 6 Logistic Regression: Regression with a Binary Dependent Variable Copyright © 2010 Pearson Education, Inc., publishing as Prentice-Hall.
Logistic Regression.
Chapter 4 –Dimension Reduction
Presentation transcript:

Chapter 8 – Logistic Regression Data Mining for Business Intelligence Shmueli, Patel & Bruce © Galit Shmueli and Peter Bruce 2008

Logistic Regression Extends idea of linear regression to situation where outcome variable is categorical Widely used, particularly where a structured model is useful to explain (=profiling) or to predict We focus on binary classification i.e. Y=0 or Y=1

Why Not Linear Regression? Technically, you can run linear regression with a 0/1 response variable and obtain an output But the resulting model will not make sense For instance: Predictions will mostly not be 0 or 1 Coefficient interpretation will not make sense

The Logit Goal: Find a function of the predictor variables that relates them to a 0/1 outcome Instead of Y as outcome variable (like in linear regression), we use a function of Y called the logit Logit can be modeled as a linear function of the predictors The logit can be mapped back to a probability, which, in turn, can be mapped to a class

Step 1: Logistic Response Function p = probability of belonging to class 1 Need to relate p to predictors with a function that guarantees 0  p  1 Standard linear function (as shown below) does not: q = number of predictors

The Fix: use logistic response function Equation 8.2 in textbook

Step 2: The Odds The odds of an event are defined as: eq. 8.3 p = probability of event Or, given the odds of an event, the probability of the event can be computed by: eq. 8.4

We can also relate the Odds to the predictors: eq. 8.5 To get this result, substitute 8.2 into 8.4

Step 3: Take log on both sides This gives us the logit: log(Odds) = logit (eq. 8.6)

Logit, cont. So, the logit is a linear function of predictors x1, x2, … Takes values from -infinity to +infinity Review the relationship between logit, odds and probability

Example

Personal Loan Offer Outcome variable: accept bank loan (0/1) Predictors: Demographic info, and info about their bank relationship

Data Preprocessing Partition 60% training, 40% validation Create 0/1 dummy variables for categorical predictors

Single Predictor Model Modeling loan acceptance on income (x) Fitted coefficients (more later): b0 = -6.3525, b1 = -0.0392

Seeing the Relationship 16

Last Step: Classify Model produces an estimated probability of being a “1” Convert to a classification by establishing cutoff level If estimated prob. > cutoff, classify as “1”

Ways to Determine Cutoff 0.50 is popular initial choice Additional considerations (see Chapter 4) Maximize classification accuracy Maximize sensitivity (subject to min. level of specificity) Minimize false positives (subject to max. false negative rate) Minimize expected cost of misclassification (need to specify costs)

Example, Cont. Estimates of b’s are derived through an iterative process called maximum likelihood estimation Let’s include all 12 predictors in the model now XLMiner’s output gives coefficients for the logit, as well as odds for the individual terms

Estimated Equation for Logit (Equation 8.10)

Equation for Odds (Equation 8.11)

Converting to Probability

Interpreting Odds, Probability For predictive classification, we typically use probability with a cutoff value For explanatory purposes, odds have a useful interpretation: If we increase x1 by one unit, holding x2, x3 … xq constant, then b1 is the factor by which the odds of belonging to class 1 increase

Loan Example: Evaluating Classification Performance Performance measures: Confusion matrix and % of misclassifications More useful in this example: lift

Multicollinearity Problem: As in linear regression, if one predictor is a linear combination of other predictor(s), model estimation will fail Note that in such a case, we have at least one redundant predictor Solution: Remove extreme redundancies (by dropping predictors via variable selection – see next, or by data reduction methods such as PCA) 28

Variable Selection This is the same issue as in linear regression The number of correlated predictors can grow when we create derived variables such as interaction terms (e.g. Income x Family), to capture more complex relationships Problem: Overly complex models have the danger of overfitting Solution: Reduce variables via automated selection of variable subsets (as with linear regression)

Goodness-of-Fit How well does model fit the data? Popular measures: Deviance ( = std. dev. estimate in XLMiner) Multiple R2 Goodness of fit measures are used in explanatory data analysis (profiling), not much in classification (where we look at classification accuracy with validation data)

P-values for Predictors Test null hypothesis that coefficient = 0 Useful for review to determine whether to include variable in model Key in profiling tasks, but less important in predictive classification

Complete Example: Predicting Delayed Flights DC to NY

Variables Outcome: delayed or not-delayed Predictors: Day of week Departure time Origin (DCA, IAD, BWI) Destination (LGA, JFK, EWR) Carrier Weather (1 = bad weather)

Delayed Flights: Summary Bold numbers are delayed flights on Mon – Wed Regular numbers Thurs - Sun

Data Preprocessing Create binary dummies for the categorical variables Partition 60%/40% into training/validation

The Fitted Model (not all 28 variables shown)

Model Output (Validation Data)

Lift Chart

After Variable Selection (Model with 7 Predictors)

7-Predictor Model Note that Weather is unknown at time of prediction (requires weather forecast or dropping that predictor)

Summary Logistic regression is similar to linear regression, except that it is used with a categorical response It can be used for explanatory tasks (=profiling) or predictive tasks (=classification) The predictors are related to the response Y via a nonlinear function called the logit As in linear regression, reducing predictors can be done via variable selection Logistic regression can be generalized to more than two classes (not in XLMiner)