PhD Committee J. Vanthienen (promotor, K.U.Leuven) J. Vandenbulcke

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1.
Brief introduction on Logistic Regression
Intelligent Environments1 Computer Science and Engineering University of Texas at Arlington.
Data Mining Sangeeta Devadiga CS 157B, Spring 2007.
Bayesian Network Classifiers for Identifying the Slope of the customer Lifecycle of Long-Life Customers Authored by: Bart Baesens, Geert Vertraeten, Dirk.
1. Abstract 2 Introduction Related Work Conclusion References.
Introduction to Machine Learning Anjeli Singh Computer Science and Software Engineering April 28 th 2008.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Learning From Data Chichang Jou Tamkang University.
Data Mining By Archana Ketkar.
Support Vector Machines Classification
Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
12 -1 Lecture 12 User Modeling Topics –Basics –Example User Model –Construction of User Models –Updating of User Models –Applications.
Neural Networks in Data Mining “An Overview”
Presented To: Madam Nadia Gul Presented By: Bi Bi Mariam.
Data Mining: A Closer Look
Decision Tree Models in Data Mining
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
Spreadsheet Modeling & Decision Analysis
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
Data Mining Chun-Hung Chou
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data Mining and Application Part 1: Data Mining Fundamentals Part 2: Tools for Knowledge Discovery Part 3: Advanced Data Mining Techniques Part 4: Intelligent.
COMP3503 Intro to Inductive Modeling
Classification. An Example (from Pattern Classification by Duda & Hart & Stork – Second Edition, 2001)
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
DATA MINING 1. 2 Data Mining Extracting or “mining” knowledge from large amounts of data Data mining is the process of autonomously retrieving useful.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Data Mining Knowledge on rough set theory SUSHIL KUMAR SAHU.
CPS 270: Artificial Intelligence Machine learning Instructor: Vincent Conitzer.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
1 Improving quality of graduate students by data mining Asst. Prof. Kitsana Waiyamai, Ph.D. Dept. of Computer Engineering Faculty of Engineering, Kasetsart.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Advanced Database Course (ESED5204) Eng. Hanan Alyazji University of Palestine Software Engineering Department.
Data Mining In contrast to the traditional (reactive) DSS tools, the data mining premise is proactive. Data mining tools automatically search the data.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining and Decision Support
Data Mining By: Johan Johansson. Mining Techniques Association Rules Association Rules Decision Trees Decision Trees Clustering Clustering Nearest Neighbor.
Artificial Intelligence for Data Mining in the Context of Enterprise Systems Thesis Presentation by Real Carbonneau.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Pattern Recognition. What is Pattern Recognition? Pattern recognition is a sub-topic of machine learning. PR is the science that concerns the description.
Cristián Bravo R. (1), Lyn C. Thomas (2), and Richard Weber (1) (1) Department of Industrial Engineering(2) Centre for Risk Management Universidad de Chile.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 7. Classification and Prediction
DATA MINING © Prentice Hall.
RESEARCH APPROACH.
Sangeeta Devadiga CS 157B, Spring 2007
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Developing Intelligent Systems for Credit Scoring Using Machine Learning Techniques PhD Committee J. Vanthienen (promotor, K.U.Leuven) J. Vandenbulcke (K.U.Leuven) M. Verhelst M. Vandebroek J. Crook (Univ. Edinburgh) L. Thomas (Univ. Southampton) Bart Baesens Public Defence September 24th, 2003

Overview Knowledge Discovery in Data KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Knowledge Discovery in Data The Credit Scoring Classification Problem Developing Accurate Credit Scoring Systems Developing Comprehensible Credit Scoring Systems Survival Analysis for Credit Scoring Conclusions

Knowledge Discovery in Data Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions The data avalanche problem finance, marketing, medicine, engineering Knowledge Discovery in Data (KDD) aims at learning patterns from data using advanced algorithms KDD steps Data preprocessing Data mining Post processing Machine learning provides a multitude of induction algorithms aimed at learning patterns from data

The Credit Scoring Classification Problem Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Credit scoring is a technique that helps organizations to decide whether or not to grant credit to customers who apply for a loan. The aim is to develop classification models based upon repayment behavior of past applicants. These models summarize all available information of an applicant in a score P(applicant is good payer | age, marital status, savings amount, …). If this score is above a predetermined threshold credit is granted, otherwise credit is denied.

Developing Accurate Credit Scoring Systems Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Credit scoring systems should be able to accurately distinguish good applicants from bad applicants. The problem is usually tackled using classification techniques. E.g., logistic regression, discriminant analysis, decision trees, Bayesian networks, neural networks, support vector machines, k-nearest neighbor, … Benchmarking study Income > $50,000 Job > 3 Years High Debt No Good Risk Yes Bad Risk

Developing Accurate Credit Scoring Systems (contd.) Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Experimental setup 8 real-life credit scoring data sets Various cut-off setting schemes Classification accuracy + Area under Receiver Operating Characteristic Curve McNemar test + DeLong, DeLong and Clarke-Pearson test Conclusions Flat maximum effect Non-linear classifiers perform consistently good, however simple, linear classifiers also give good performance Only a handful of techniques were clearly inferior

Developing Comprehensible Credit Scoring Systems Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Ideally, a credit scoring system should be easy to understand and implement. “What is needed, clearly, is a redirection of credit scoring research efforts toward development of explanatory models of credit performance and the isolation of variables bearing an explanatory relationship to credit performance” (Capon, 1982) Legally and ethically justified (e.g. Equal Credit Opportunities Act in US) Trade-off between accuracy and comprehensibility (Occam’s Razor) Pluralitas non est ponenda sine neccesitate William of Occam (ca. 1285-1349)

Developing Comprehensible Credit Scoring Systems (contd.) Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Neural network rule extraction Rule representation formalisms Propositional rule If purpose=cash and Savings Account ≤ 50€ Then Applicant=bad Oblique rule If 0.84Income + 0.32Savings Account ≤ 1000€ Then Applicant=bad M-of-N rules If {at least/exactly/at most} M of the N conditions (C1,C2,..,CN) are satisfied Then Applicant=bad Descriptive fuzzy rules If percentage of financial burden is large Then Applicant=bad Approximate fuzzy rules If term is trapezoidal(19.2 31.9 70.2 81.4) Then Applicant=bad

Developing Comprehensible Credit Scoring Systems (contd.) Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions

Developing Comprehensible Credit Scoring Systems (contd.) Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions

Developing Comprehensible Credit Scoring Systems (contd.) Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions

Survival Analysis for Credit Scoring Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Predict when customers default Implications for profit scoring and debt provisioning Censored data Statistical models for survival analysis E.g. Kaplan-Meier, parametric models, proportional hazards Drawbacks Linear relationships No interaction effects Proportional hazards assumption

Survival Analysis for Credit Scoring (contd.) Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Neural networks for survival analysis Requirements Monotonically decreasing survival curve Scalable Censoring Empirically tested for predicting default and early repayment Comparisons with proportional hazards models

Conclusions Developing accurate credit scoring systems Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Developing accurate credit scoring systems Flat maximum effect Superiority of non-linear classifiers Satisfactory performance of linear classifiers Developing comprehensible credit scoring systems Neural network rule extraction Decision tables Fuzzy rule extraction Neural network survival analysis

Future Research Indirect Credit Scoring Knowledge Fusion Overview KDD Credit Scoring Accuracy Comprehensibility Survival Analysis Conclusions Indirect Credit Scoring Knowledge Fusion Behavioral Credit Scoring Extensions to other Contexts and Problem Domains