Loan Default Model Saed Sayad 1www.ismartsoft.com
Data Mining Steps 1 Problem Definition 2 Data Preparation 3 Data Exploration 4 Modeling 5 Evaluation 6 Deployment
1. Problem Definition Build loan default prediction model for small business using the historical data to assess the likelihood of default by an obligor. Build loan default prediction model for small business using the historical data to assess the likelihood of default by an obligor.
Data Mining Team Modeler AnalystDBA Domain Expert
2. Data Preparation No of Cases: 35,500 No of Defaults: 2,500 (7%) Number of Variables: 25 Total balance for all cases: $554,000,000 Total balance for defaults: $58,000,000 (10.4%) No of Cases: 35,500 No of Defaults: 2,500 (7%) Number of Variables: 25 Total balance for all cases: $554,000,000 Total balance for defaults: $58,000,000 (10.4%)
3. Data Exploration Data Exploration Univariate Analysis Frequency, Average, Min, Max,... Bar, Line, Pie,... Charts Bivariate Analysis Correlation Z test,... Combination Charts
Data Exploration - Univariate 7www.ismartsoft.com Months in Business
Data Exploration - Bivariate Default% Months in Business and Default
4. Modeling Classification Bayesian Decision Tree Logistic Regression SVM Regression Linear Regression Robust Regression Neural Network Clustering HierarchicalK-Means Association A Priori
Modeling - Classification f DELQ Age Type Default Y or N Logistic Regression
Logistic Regression Model 0 1 Linear Model Logistic Model Default Months in Business 11www.ismartsoft.com
5. Evaluation ChartsStats Variables Contribution Mean Square Error Confusion Matrix K-S ChartLift ChartGain Chart
Evaluation – Variables Contribution
Evaluation - Confusion Matrix % 264 3% 313 4% % 8167 Positive Cases Negative Cases Predicted Positive Predicted Negative
Evaluation – Gain Chart Population% 50%10% 100% 58% 10% Default%
Return On Investment Total Number of Loans = 8,167 Total Number of Defaults = 560 Total Balance for Defaults = $12,281,589 Top 10% Random – Number of Defaults = 56 – Total Balance = $1,230,000 Top 10% Model – Number of Defaults = 305 – Total Balance = $7,655,772 Total Number of Loans = 8,167 Total Number of Defaults = 560 Total Balance for Defaults = $12,281,589 Top 10% Random – Number of Defaults = 56 – Total Balance = $1,230,000 Top 10% Model – Number of Defaults = 305 – Total Balance = $7,655, % ROI
6. Deployment SQL Batch Scoring HTML Web- based Scoring
Questions?