Download presentation
Presentation is loading. Please wait.
1
Reducing Loan Risk Using Data Analytics
Thank you for taking the time to view our video on Reducing Loan Risk using Data Analytics. Our group included Kenji, Venky and Tim. Project Group 5: Kenji Mori, Venketesh Subramony, Timothy Larson
2
Business Understanding
Data Source – LendingClub Loan Data – 2007 through 2017 Q3 11 Public Sets of Data – Each Slightly Different Mining Problem Definition – Predict Charge Offs – Maximize Profit Lets examine the Business Understanding aspect of our project. Our group chose to look at lending data from the Lending Club to see if we could solve the business problem of maximizing profit, improving over the existing Lending Club application process. After a loan is approved <click>, Lending Club makes a profit provided that the loan applicant repays both the loan’s principle and interest. Less money is made when the borrower defaults before repaying the entire loan <click>. Our group classified these loans as charged off. Loan Data from 2007 through the third quarter of 2017 was available from the website, and it was broken into 11 different sets. We found that each set of data was a little different which complicated analysis. We wanted to see if we could find predictive models that might have the ability to improve Lending Club’s profit and also see if we could find interesting relationships in our data sets. If we were to continue data mining iterations, these relationships would be the basis to further improve increasingly accurate predictive models. If we could find a way to make the loan process less risky by predicting more loans that might be charged off, we could maximize the profit of Lending Club.
3
Data Understanding Loans had already been evaluated by LendingClub
Original Data Set (target completed loan process) Explored 151 variables In 2015, 14% of Loans were Charged Off - $245 Million Loss Maximize Profit Understanding our Data - There was a great deal to figure out before we began to create a model. <click> First we wanted to note we were looking at loans that had already been through Lending Club’s risk mitigation model, so our results were dependent on their model’s pre-screening the loans and assigning interest rates based on the loan applicants submitted application. <click> It made sense to us to only look at loans that were made before 2015 because there were a large number of loans in these sets that had no final status, hence we would only have been able to evaluate the short term loans in these data sets. Doing so would have biased our model by not being representative of the entire set of loans for each year. <click> There were 151 variables in the data sets we were looking at in this time period. In looking at the 2015 data we found <click> that 14% of the loans had been charged off, worth $245 million dollars, so if we could find ways of avoiding these losses, <click> we hoped to maximize profit.
4
Data Preparation Aggregated Data Sets into one JMP file
Took Subset -> Reduction for JMP 54,000 Target Variable: 0 or 1 for charge offs Reduced Contributing Variables from 151 to 24 based on analysis Partioned into Training – Validation set Data Preparation – For this step, we spent a considerable amount of time examining missing values, outliers, combining multiple categorical values into only relevant values. To make things easier for analysis with our analytical tool, we combined all data sets into one JMP file. Manipulating this data set was sluggish due to the large number of records, so we chose a subset of approximately 54 thousand random records. We integrated 9 different types of loan status to just 2 options of 0 or 1 for our target variable of charge offs. This involved removing records that did not match these statuses. We examined histograms, correlation charts, did bi-variate analysis, and finally looked at the logworth values to reduce our independent variables to 24 from the original After setting up a 60% training and 40% validation split, we were left with number of records shown in the slide <click><pause> Total Fully Paid Charged Off Training 32,291 26,707 5,584 Validation 21,544 17,765 3,779
5
Modeling Logistical Regression Classification Tree Neural Networks
Failed Loan $6,100 (average) Good Loan $2,500 (average) Model Sensitivity Specificity Precision Accuracy AUC Logistic 0.061 0.987 0.508 0.825 0.7175 Classification Tree 0.053 0.456 0.823 0.6857 Neural Networks 0.051 0.992 0.564 0.827 0.7200 Modeling – We evaluated three models –Logistical Regression, Classification trees, and Neural Networks. Our predictive model would produce confusion matrixes that would show us how often we guessed whether a loan was charged off versus whether they were actually charged off. Here are a few important observations. <click> First, sensitivity corresponded to bad loans that were correctly rejected. Specificity referred to the percentage of good loans that were correctly accepted. To evaluate based on profit, we looked at the average cost in 2014 of a failed loan and compared it to the profit of a good loan. <click> The cost of a failed loan was about $6,100. <click> And the profit of a good loan was $2,500. In observing our results, the most accurate and useful model appears to be the Neural Networks model.
6
Evaluation - How Did We Do? Profit
Estimated Total ($) Estimated Improvement ($) No Secondary Model 318,273,893.37 Logistic Model 318,721,608.29 446,236.02 Tree Model 315,081,469.73 -3,192,423.63 Neural Network Model 318,738,124.02 498,833.98 Evaluating Results – How did we do? <click> Based on profit, we evaluated the models we had created and the best profit was associated with the Neural Network model which would have saved $500 thousand dollars. The Logistical model did fairly well with an estimated savings of $445 thousand dollars. The tree model actually lost 3 million dollars. Important observations of our modelling results included:<click> Our models Generally break even on short term loans <Pause> Our models discovered that there was a Much better profit on long term loans Note – Results are not final – As more loans mature, The numbers on the slide are an estimate of final profit after maturation The Neural Network is verified as best performing model based on profit Generally break even on short term loans Much better profit on long term loans Results not final – As more loans mature, this reflects our conservative estimate of profit Neural Network is the best performing
7
Evaluation - How Did We Do? - Patterns
Term of Loan Interest rates Number of trades opened in last 24 months Debt to Income ratio Evaluating Patterns. Although the Logistical Regression numbers were not as accurate as the Neural Networks model, their associated logworths brings some interesting insights into the loan data. On this slide you see the amount each independent variable contributed to the Logistical Regression model. <Click> Unlike the “black box aspect” of the Neural network model, here you can get an idea of which variables and to what extent they contributed to the predictive model. Looking at the top rated variables <click> we see longer terms had a higher chance of being charged off, but also had higher profit. Interest rates correlated to charge offs, and this would be expected since loan applicants would be paying more based on the risk of their credit application. The number of trades opened in the last 24 months and the debt to income ratio similarly correlated to a higher number of charge offs and the extent can be seen in the logworth graph on the slide. Although not shown on this slide, the sign of these terms in the Logarithmic equation shows whether the contribution is negative or positive.
8
Evaluation - Surprises
chargeoff_within_12_mths Discrete The number of charge-offs within 12 months num_accts_ever_120_pd Continuous The number of accounts ever 120 or more days past due open_acc The number of open credit lines in the borrower's credit file. pub_rec_bankruptcies The number of public record bankruptcies purpose Nominal A category provided by the borrower for the loan request. revol_bal The total credit revolving balance tax_liens The number of tax liens Less Useful Variables Some surprises were found when we finished up this project. When we reduced the data set from 151 to about 25 variables, we were surprised that a number of these variables, shown here <click>, ended up NOT being useful in our predictive models. For example, we believed the purpose of the loan might have some correlation to charge offs, but based on our analysis, it ended up not being statistically significant.
9
Future directions to consider…
Consider treating short term loans differently from long term loans Predict payment % instead of charge-offs Evaluate making models more conservative by raising threshold for rejecting charge-offs Consider ensemble models combining via union, intersection, or a majority rule Evaluate model with other public loan sources and other LendingClub data sets Data Mining is an iterative process and we successfully developed a model that might possibly improve future Lending Club loan decisions. We also wanted to look at what lessons we had learned and what we might do if this project has been funded to do a second data mining cycle. <click> First we would consider treating short term loans differently from long term loans since theses loans behaved very differently, with very different cost functions impacting profit. <click> Next we would want to consider predicting payment % rather than charge-offs since most charge offs had some % of money paid back, hence not all charge offs were complete write offs. <click> Next we would want to look at ways to make the models more conservative by raising the threshold for rejecting charge offs. <click> Next, we would want to consider ensemble models that could be found by combining our existing and possible other models. We would look at unions, intersections or perhaps a majority rule combination of our solution sets. <click> Finally, we could evaluate our predictive models against other data sets that were public as well as start looking at the more recent loan data from Lending club to see if we continued to have an improvement over their existing loans.
10
Thank you for taking the time to watch our presentation.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.