Download presentation
Published byAustin Ramsey Modified over 9 years ago
1
Developing Intelligent Systems for Credit Scoring Using Machine Learning Techniques
PhD Committee J. Vanthienen (promotor, K.U.Leuven) J. Vandenbulcke (K.U.Leuven) M. Verhelst M. Vandebroek J. Crook (Univ. Edinburgh) L. Thomas (Univ. Southampton) Bart Baesens Public Defence September 24th, 2003
2
Overview Knowledge Discovery in Data
KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Knowledge Discovery in Data The Credit Scoring Classification Problem Developing Accurate Credit Scoring Systems Developing Comprehensible Credit Scoring Systems Survival Analysis for Credit Scoring Conclusions
3
Knowledge Discovery in Data
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions The data avalanche problem finance, marketing, medicine, engineering Knowledge Discovery in Data (KDD) aims at learning patterns from data using advanced algorithms KDD steps Data preprocessing Data mining Post processing Machine learning provides a multitude of induction algorithms aimed at learning patterns from data
4
The Credit Scoring Classification Problem
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Credit scoring is a technique that helps organizations to decide whether or not to grant credit to customers who apply for a loan. The aim is to develop classification models based upon repayment behavior of past applicants. These models summarize all available information of an applicant in a score P(applicant is good payer | age, marital status, savings amount, …). If this score is above a predetermined threshold credit is granted, otherwise credit is denied.
5
Developing Accurate Credit Scoring Systems
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Credit scoring systems should be able to accurately distinguish good applicants from bad applicants. The problem is usually tackled using classification techniques. E.g., logistic regression, discriminant analysis, decision trees, Bayesian networks, neural networks, support vector machines, k-nearest neighbor, … Benchmarking study Income > $50,000 Job > 3 Years High Debt No Good Risk Yes Bad Risk
6
Developing Accurate Credit Scoring Systems (contd.)
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Experimental setup 8 real-life credit scoring data sets Various cut-off setting schemes Classification accuracy + Area under Receiver Operating Characteristic Curve McNemar test + DeLong, DeLong and Clarke-Pearson test Conclusions Flat maximum effect Non-linear classifiers perform consistently good, however simple, linear classifiers also give good performance Only a handful of techniques were clearly inferior
7
Developing Comprehensible Credit Scoring Systems
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Ideally, a credit scoring system should be easy to understand and implement. “What is needed, clearly, is a redirection of credit scoring research efforts toward development of explanatory models of credit performance and the isolation of variables bearing an explanatory relationship to credit performance” (Capon, 1982) Legally and ethically justified (e.g. Equal Credit Opportunities Act in US) Trade-off between accuracy and comprehensibility (Occam’s Razor) Pluralitas non est ponenda sine neccesitate William of Occam (ca )
8
Developing Comprehensible Credit Scoring Systems (contd.)
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Neural network rule extraction Rule representation formalisms Propositional rule If purpose=cash and Savings Account ≤ 50€ Then Applicant=bad Oblique rule If 0.84Income Savings Account ≤ 1000€ Then Applicant=bad M-of-N rules If {at least/exactly/at most} M of the N conditions (C1,C2,..,CN) are satisfied Then Applicant=bad Descriptive fuzzy rules If percentage of financial burden is large Then Applicant=bad Approximate fuzzy rules If term is trapezoidal( ) Then Applicant=bad
9
Developing Comprehensible Credit Scoring Systems (contd.)
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions
10
Developing Comprehensible Credit Scoring Systems (contd.)
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions
11
Developing Comprehensible Credit Scoring Systems (contd.)
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions
12
Survival Analysis for Credit Scoring
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Predict when customers default Implications for profit scoring and debt provisioning Censored data Statistical models for survival analysis E.g. Kaplan-Meier, parametric models, proportional hazards Drawbacks Linear relationships No interaction effects Proportional hazards assumption
13
Survival Analysis for Credit Scoring (contd.)
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Neural networks for survival analysis Requirements Monotonically decreasing survival curve Scalable Censoring Empirically tested for predicting default and early repayment Comparisons with proportional hazards models
14
Conclusions Developing accurate credit scoring systems
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Developing accurate credit scoring systems Flat maximum effect Superiority of non-linear classifiers Satisfactory performance of linear classifiers Developing comprehensible credit scoring systems Neural network rule extraction Decision tables Fuzzy rule extraction Neural network survival analysis
15
Future Research Indirect Credit Scoring Knowledge Fusion
Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Indirect Credit Scoring Knowledge Fusion Behavioral Credit Scoring Extensions to other Contexts and Problem Domains
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.