Classification Methods

Slides:



Advertisements
Similar presentations
Linear Regression.
Advertisements

Logistic Regression.
Chapter 8 – Logistic Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Model Assessment, Selection and Averaging
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
An Introduction to Logistic Regression
Classification and Prediction: Regression Analysis
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Logistic Regression Database Marketing Instructor: N. Kumar.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Linear vs. Logistic Regression Log has a slightly better ability to represent the data Dichotomous Prefer Don’t Prefer Linear vs. Logistic Regression.
Chapter 13 Multiple Regression
Logistic Regression. Linear Regression Purchases vs. Income.
Multiple Logistic Regression STAT E-150 Statistical Methods.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Logistic Regression Analysis Gerrit Rooks
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
LECTURE 06: CLASSIFICATION PT. 2 February 10, 2016 SDS 293 Machine Learning.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
The Probit Model Alexander Spermann University of Freiburg SS 2008.
Module II Lecture 1: Multiple Regression
Classification Methods
The Probit Model Alexander Spermann University of Freiburg SoSe 2009
Chapter 2 Linear regression.
32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)
BINARY LOGISTIC REGRESSION
Classification Methods
Bagging and Random Forests
Logistic Regression APKC – STATS AFAC (2016).
STT : Intro. to Statistical Learning
Statistical Data Analysis - Lecture /04/03
Binary Logistic Regression
Logistic Regression.
Notes on Logistic Regression
Logistic Regression CSC 600: Data Mining Class 14.
Correlation and Simple Linear Regression
Correlation, Regression & Nested Models
Chapter 11: Simple Linear Regression
Linear Regression CSC 600: Data Mining Class 13.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
John Loucks St. Edward’s University . SLIDES . BY.
Understanding Standards Event Higher Statistics Award
Support Vector Machines (SVM)
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Correlation and Simple Linear Regression
I271B Quantitative Methods
Scatter Plots of Data with Various Correlation Coefficients
Correlation and Simple Linear Regression
Introduction to Predictive Modeling
Categorical Data Analysis Review for Final
Logistic Regression.
Linear Model Selection and regularization
1/18/2019 ST3131, Lecture 1.
Simple Linear Regression
Bias-variance Trade-off
NONPARAMETRIC METHODS
Regression and Categorical Predictors
Chapter 2: Steps of Econometric Analysis
Logistic Regression.
MGS 3100 Business Analysis Regression Feb 18, 2016
Chapter 13 Excel Extension: Now You Try!
STT : Intro. to Statistical Learning
Classification Methods
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Classification Methods STT592-002: Intro. to Statistical Learning Classification Methods Chapter 04 (part 01) Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning "Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R"  (Springer, 2013) with permission from the authors:  G. James, D. Witten,  T. Hastie and R. Tibshirani " 

Outline: Response Var. is qualitative STT592-002: Intro. to Statistical Learning Outline: Response Var. is qualitative Recall: Chap3: Response var. Y is quantitative/numerical. Chap3: what to do with categorical predictor X? Chap4: Response var. Y is qualitative/categorical (eg: gender).  classification.

STT592-002: Intro. to Statistical Learning Logistic Regression Overview: Classification methods Chap4: Logistic regression; LDA/QDA; KNN Chap7: Generalized Additive Models Chap8: Trees, Random Forest, Boosting Chap9: Support Vector Machines (SVM)

Outline: Response Var. is qualitative STT592-002: Intro. to Statistical Learning Outline: Response Var. is qualitative Cases: Orange Juice Brand Preference Credit Card Default Data Why Not Linear Regression? Simple Logistic Regression Logistic Function Interpreting the coefficients Making Predictions Adding Qualitative Predictors Multiple Logistic Regression

Case 1: OJ data (introduced briefly) STT592-002: Intro. to Statistical Learning Case 1: OJ data (introduced briefly) library(MASS); library(ISLR) data(OJ); head(OJ) ? OJ ## to find more details about data OJ.

Case 1: Brand Preference for Orange Juice STT592-002: Intro. to Statistical Learning Case 1: Brand Preference for Orange Juice We would like to predict what customers prefer to buy: Citrus Hill (CH) or Minute Maid (MM) orange juice? The Y (Purchase) variable is categorical: 0 or 1 The X (LoyalCH) variable is a numerical value (b/w 0 & 1) which specifies how much customers are loyal to Citrus Hill (CH) orange juice Can we use Linear Regression when Y is categorical?

Why not Linear Regression? STT592-002: Intro. to Statistical Learning Why not Linear Regression? When Y only takes on values of 0 and 1, why standard linear regression in inappropriate? First need to consider ordering of Purchase 01. In practice, no particular reason that this needs to be the case. 0.0 0.2 0.4 0.5 0.7 0.9 .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 LoyalCH Purchase How do we interpret values greater than 1? How do we interpret values of Y below 0?

STT592-002: Intro. to Statistical Learning Problems The regression line 0+1X can take on any value between negative and positive infinity In the orange juice classification problem, Y can only take on two possible values: 0 or 1. Therefore regression line almost always predicts wrong value for Y in classification problems

Solution: Use Logistic Function STT592-002: Intro. to Statistical Learning Solution: Use Logistic Function Instead of trying to predict Y, let’s try to predict P(Y = 1), i.e., prob. a customer buys Citrus Hill (CH) juice. Thus, we can model P(Y = 1) using a function that gives outputs between 0 and 1. We can use Logistic Regression! X is centerized

Logistic Regression Model STT592-002: Intro. to Statistical Learning Logistic Regression Model

STT592-002: Intro. to Statistical Learning Logistic Regression Logistic regression is very similar to linear regression We come up with b0 and b1 to estimate 0 and 1. We have similar problems and questions as in linear regression e.g. Is 1 equal to 0? How sure are we about our guesses for 0 and 1? 0.25 0.5 0.75 1 .0 .2 .4 .6 .7 .9 LoyalCH CH MM P(Purchase) If LoyalCH is about .6 then Pr(CH)  .7.

Case 2: Credit Card Default Data STT592-002: Intro. to Statistical Learning Case 2: Credit Card Default Data To predict customers that are likely to default Possible X variables are: Annual Income Monthly credit card balance The Y variable (Default) is categorical: Yes or No How do we check the relationship between Y and X?

STT592-002: Intro. to Statistical Learning The Default Dataset library(MASS); library(ISLR) data(Default) head(Default)

Case 2: Default Dataset (Fig 4.1) STT592-002: Intro. to Statistical Learning Case 2: Default Dataset (Fig 4.1)

Default Dataset: replicate Fig 4.1 STT592-002: Intro. to Statistical Learning Default Dataset: replicate Fig 4.1 library(MASS); library(ISLR) data(Default); ; head(Default); attach(Default) plot(balance[default=="Yes"], income[default=="Yes"], pch="+", col="darkorange") points(balance[default=="No"], income[default=="No"], pch=21, col="lightblue")

Default Dataset: replicate Fig 4.1 STT592-002: Intro. to Statistical Learning Default Dataset: replicate Fig 4.1 library(MASS); library(ISLR) data(Default); head(Default); attach(Default) par(mfrow=c(1,2)) plot(default, balance, col=c("lightblue", "darkorange"), xlab="Default", ylab="Balance") plot(default, income, col=c("lightblue", "darkorange"), xlab="Default", ylab="Income")

Default Dataset: Be careful if you switch the order… STT592-002: Intro. to Statistical Learning Default Dataset: Be careful if you switch the order… library(MASS); library(ISLR) data(Default); head(Default); attach(Default) par(mfrow=c(1,2)) plot(balance, default, col=c("lightblue", "darkorange"), xlab="Default", ylab="Balance") plot(income, default, col=c("lightblue", "darkorange"), xlab="Default", ylab="Income")

Review: Why not Linear Regression? STT592-002: Intro. to Statistical Learning Review: Why not Linear Regression? If we fit a linear regression to the Default data, then for very low balances we predict a negative probability, and for high balances we predict a probability above 1! When Balance < 500, Pr(default) is negative!

Logistic Function on Default Data STT592-002: Intro. to Statistical Learning Logistic Function on Default Data Now probability of default is close to, but not less than zero for low balances. Close to but not above 1 for high balances

Case 2: Default Dataset: STT592-002: Intro. to Statistical Learning Case 2: Default Dataset: library(MASS); library(ISLR) attach(Default); head(Default) # Logistic Regression glm.fit=glm(default~balance,family=binomial) summary(glm.fit)

The Default Dataset: Extra note STT592-002: Intro. to Statistical Learning The Default Dataset: Extra note library(MASS); library(ISLR); attach(Default); head(Default) glm.fit=glm(default~balance,family=binomial) summary(glm.fit) with(glm.fit, null.deviance - deviance) with(glm.fit, df.null - df.residual) with(glm.fit, pchisq(null.deviance - deviance, df.null - df.residual, lower.tail = FALSE)) logLik(glm.fit) Suppose that we have a statistical model of some data. Let L be the maximum value of the likelihood function for the model; let k be the number of estimated parameters in the model. Then the AIC value of the model is the following. Given a set of candidate models for the data, the preferred model is the one with the minimum AIC value

STT592-002: Intro. to Statistical Learning Interpreting 1 We see that β1-hat = 0.0055; this indicates that an increase in balance is associated with an increase in prob. of default. To be precise, a one-unit increase in balance is associated with an increase in the log odds of default by 0.0055 units.

STT592-002: Intro. to Statistical Learning Interpreting 1 We see that β1-hat = 0.0055; this indicates that an increase in balance is associated with an increase in prob. of default. To be precise, a one-unit increase in balance is associated with an increase in the log odds of default by 0.0055 units.

STT592-002: Intro. to Statistical Learning Interpreting 1 STT592-002: Intro. to Statistical Learning Interpreting what 1 means is not very easy with logistic regression, simply because we are predicting P(Y) and not Y. If 1 =0, this means there is no relationship b/w Pr(Y|X) & X. For logistic regression, increasing X by one unit changes the log odds by 1. If 1 >0, this means that when X gets larger so does the probability that Y = 1. If 1 <0, this means that when X gets larger, the probability that Y = 1 gets smaller. But how much bigger or smaller depends on where we are on the slope?

Are coefficients significant? STT592-002: Intro. to Statistical Learning Are coefficients significant? To perform a hypothesis test to see whether we can be sure that are 0 and 1 significantly different from zero. Use a Z test instead of a T test, but of course that doesn’t change the way we interpret p-value. Here p-value for balance is very small, and b1 is positive, so we are sure that if balance increase, then probability of default will increase as well. Why Z-test statistics: (1) https://stats.stackexchange.com/questions/60074/wald-test-for-logistic-regression (2) https://www.quora.com/In-Stata-and-R-output-why-is-z-test-other-than-t-test-used-in-logistic-regression-to-assess-the-significance-of-the-individual-variables

STT592-002: Intro. to Statistical Learning Making Prediction Suppose an individual has an average balance of X=$1000. What is their prob. of default? Predicted probability of default for an individual with a balance of $1000 is less than 1%. For a balance of $2000, probability is much higher, and equals to 0.586 (58.6%).

Logistics Regression for Default Dataset: STT592-002: Intro. to Statistical Learning Logistics Regression for Default Dataset: library(MASS); library(ISLR) attach(Default); head(Default) # Logistic Regression glm.fit=glm(default~student,family=binomial) summary(glm.fit)$coef The estimated intercept is typically not of interest. Its main purpose is to adjust the average fitted prob to the proportion of ones in the data.

Qualitative Predictors in Logistic Regression STT592-002: Intro. to Statistical Learning Qualitative Predictors in Logistic Regression We can predict if an individual default by checking if she is a student or not. Thus we can use a qualitative variable “Student” coded as (Student = 1, Non-student =0). b1 is positive: This indicates students tend to have higher default probabilities than non-students

Multiple Logistic Regression STT592-002: Intro. to Statistical Learning Multiple Logistic Regression We can fit multiple logistic just like regular regression

Example: Default Dataset for multiple logistic regression STT592-002: Intro. to Statistical Learning Example: Default Dataset for multiple logistic regression # Logistic Regression on student, balance and income glm.fit=glm(default~student+balance+income,family=binomial) summary(glm.fit) coef(glm.fit) summary(glm.fit)$coef

Multiple Logistic Regression- Default Data STT592-002: Intro. to Statistical Learning Multiple Logistic Regression- Default Data Predict Default using: Balance (quantitative) Income (quantitative) Student (qualitative)

STT592-002: Intro. to Statistical Learning Predictions A student with a credit card balance of $1,500 and an income of $40,000 has an estimated probability of default

An Apparent Contradiction! STT592-002: Intro. to Statistical Learning An Apparent Contradiction! Positive Negative

Students (Orange) vs. Non-students (Blue) STT592-002: Intro. to Statistical Learning Students (Orange) vs. Non-students (Blue) The left panel provides a graphical illustration of this apparent paradox. The orange and blue solid lines show average default rates for students and non-students, respectively, as a function of credit card balance. Negative coefficient for student in multiple logistic regression indicates that for a fixed value of balance and income, a student is less likely to default than a non-student…

Summary: To whom should credit be offered? STT592-002: Intro. to Statistical Learning Summary: To whom should credit be offered? A student is risker than non students if no information about credit card balance is available However, that student is less risky than a non student with same credit card balance!

Logistic regression with more than 2 response classes STT592-002: Intro. to Statistical Learning Logistic regression with more than 2 response classes 2-class Logistic regression model can be extended to multiple-class cases in different ways. In practice they tend not to be used all that often. Instead discriminant analysis (next section) is popular for multiple-class classification.