CLASSIFICATION: LOGISTIC REGRESSION Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015
Dependent Variable Y In many regression applications the dependent variable y is categorical, we can describe it in binary fashion such that any outcome is either a success or a failure (arbitrarily defined) For example, tossing a coin, we can get either heads (success) or tails (failure) y=1, if outcome is success y=0, if outcome is failure If repeat the experiment for n times, then y follows a binomial distribution
Logistic Regression Model
Estimating P
Example Home owner Marital Status Taxable Income Defaulted Borrower Yes No Yes No Yes No Single Married Single Married Divorce Married Divorce Single Married Single 125k 100k 70k 120k 95k 60k 220k 85k 75k 90k No Yes No Yes No Yes
Inputted data HO=1 if “Yes”; HO=0 if “No” MS=1 if “single or divorce”; MS=0 if “married” Y=1 if “No”; Y=0 if “Yes” HOMSTIY
R Results for Logistic Regression > HO=c(1,0,0,1,0,0,1,0,0,0) > MS=c(1,0,1,0,1,0,1,1,0,1) > TI=c(125,100,70,120,95,60,220,85,75,90) > y=c(1,1,1,1,0,1,1,0,1,0) > mylogit <- glm(y ~ HO + MS + TI,family = "binomial") > mylogit$coef (Intercept) HO MS TI
Prediction HO=0, MS=1, TI=100, P=? The predicted P is: e-16 Classification? Home owner Marital Status Taxable Income Defaulted Borrower NoSingle100k?
Classification Error Rate ##Prediction on Training data and test data > ##correct classification rate > sum((newdata3$PredictedProb-trainData$y)<0.0001)/ nrow(trainData) [1] 1 > sum((newdata3$PredictedProb-testData$y)<0.0001)/ nrow(testData) [1] 1
Decision Tree Classification > myFormula <- y ~ HO+MS+TI > loan_ctree <- ctree(myFormula, data=trainData) > # check the prediction > testPred <- predict(loan_ctree, newdata = testData) > table(testPred, testData$y) testPred 0 1 misclassification
Thank You!