Download presentation
Presentation is loading. Please wait.
Published byMeredith Gwendolyn Nash Modified over 8 years ago
1
Cristián Bravo R. (1), Lyn C. Thomas (2), and Richard Weber (1) (1) Department of Industrial Engineering(2) Centre for Risk Management Universidad de Chile University of Southampton {cbravo,rweber}@dii.uchile.cl L.Thomas@soton.ac.uk
2
Cooperation between the University of Chile and the University of Southampton since 2008. Support:
4
Consumer credit scoring is based on the principle that historical patterns describe present events. For any given loan: Predictive Objective Variables Variable PAST Previous to Granting (Socio-demographic, debt…) FUTURE Post-Granting (Arrears, repayment…)
5
For past information, a lot of work has been done: Evolution of debt. Socio-economic profiling. Context-specific variables. For future information… Default: One (or more) installment was in arrears for more than 90 days in the first year of the loan. What about all the other information?
6
Is it possible to enrich the objective variable using more of the available future information? In practice, predictors are divided between two types: Variables that describe capacity of repayment. ▪ These variables describe how much money a customer has. ▪ If a customer cannot pay because of lack of capacity is colloquially referred to as a “Can’t pay”. Variables that describe willingness to repay. ▪ These variables describe how prone a customer is to pay back the loan. ▪ Customers with problems in this respect are usually referred to as a “Won’t pay”.
7
This notion is not used in credit scoring because the “class” of the applicant is not known in advance. Assuming that: All customers are rational. The company is rational, and does not know the class of the customer. A loan is desirable. We attempt to model the loan granting process.
13
If the customers and the company are rational, then the following has to be satisfied: Borrowers of class C & P want to participate. Borrowers of class W want to participate. The lender wants to participate.
15
To obtain the clusters, the semi-supervised data mining technique called “Constrained Clustering” can be used. The method creates the minimum variance cluster that simultaneously satisfy the constraints. We would like that “most” cases satisfy the constraints, since there is a large variance. ▪ The constraints define “profiles” of customers, which may have variance, or even be outliers. The latter must be filtered. We used the CCF algorithm (Bravo and Weber, 2011) Constrained clustering with Outlier Filtering.
16
Initialize (Random) Assign to Closest Cluster (K-Means) Move cluster if violates constraints Eliminate if infeasible Repeat until convergence
17
The proposed methodology was tested using a database of approx. 25.000 loans to Small and Medium Enterprises (SME) during the years 2000 – 2007. The predictors included socio-demographic variables, debt, sales, age, etc. The variables that describe the evolution of the loan once granted are renegotiations, amount, value of the collateral, requested amount, term, etc. Roughly half the loans are secured. Parameters for the model are obtained using metrics from the institution (repayment rates, “haircuts”, etc…)
19
K Means Clustering Constrained Clustering
20
We compared three models: Neural Networks (ANN), Multinomial Logistic Regression (MLR) and Binary Logistic Regression (LR). LR model was trained with class Defaulter, and used as a benchmark. MLR and ANN were trained as a three class model (P, C & W). Contrast in: Parameters obtained. Classification results (AUC).
21
For the AUC (test set): There is a statistically significant difference of approximately 5% between the classification capacity of the binary model versus the multinomial model. There is no statistically significant difference between the multinomial models.
22
Difference is in differentiating defaulters!
23
LOGISTIC REGRESSION NEURAL NETWORKS
24
An experimental method to construct a scorecard using the risk of being a defaulter for willingness or capacity was presented. It takes advantage of information not used in classifying loans. It is based on simple assumptions on rationality. The method provides interesting insights about the customers. Future information allows to profile defaulters. Patterns are enriched, since some variables that were not useful now are.
25
Results on a test database show the usefulness of the approach. There are gains in classification. ▪ Potential for reducing costs. The clustering method allows us to refine behavior. ▪ Potential for better origination policies. Future work: How to better model behaviour? Which future variables are more descriptiv3e? Which past predictors describe best the new situation?
26
Cristián Bravo R. (1), Lyn C. Thomas (2), and Richard Weber (1) (1) Department of Industrial Engineering(2) Centre for Risk Management Universidad de Chile University of Southampton {cbravo,rweber}@dii.uchile.cl L.Thomas@soton.ac.uk Thank you!! Questions?
28
VariableCluster 1Cluster 2 Amount434.336366.078 Term14,7012,80 Grace period (month)2,222,11 Grace period (day)61,5257,97 Installment Val.36.30835.415 Collateral Val.803.119720.013 Rate2,222,19 Dif. In Rate-0,63-0,36 Prepay req.1,000,00 Prepaid0,470,00 Renegotiated0,530,15 Term Req.15,8613,86 Installment Req.46.64747.277 Amount Paid34.43836.709 Avg. Arrear69,78118,93 Amount Req.491.797426.709
29
VariableClass CClass W Amount530.238338.118 Term15,2712,82 Grace period (month)2,212,13 Grace period (day)61,4158,46 Installment Val.43.82932.656 Collateral Val.1.576.543435.231 Rate2,252,14 Dif. In Rate-0,62-0,40 Prepay req.0,850,18 Prepaid0,440,07 Renegotiated0,500,21 Term Req.16,6513,81 Installment Req.55.78843.707 Amount Paid42.20633.455 Avg. Arrear73,14111,41 Amount Req.614.072388.277
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.