Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

© National Bank of Belgium. Failure Prediction Models: Disagreements, Performance, and Credit Quality Janet MITCHELL and Patrick VAN ROY National Bank.
Chapter 4 Pattern Recognition Concepts: Introduction & ROC Analysis.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
ROC Statistics for the Lazy Machine Learner in All of Us Bradley Malin Lecture for COS Lab School of Computer Science Carnegie Mellon University 9/22/2005.
Bayesian Network Classifiers for Identifying the Slope of the customer Lifecycle of Long-Life Customers Authored by: Bart Baesens, Geert Vertraeten, Dirk.
1. Abstract 2 Introduction Related Work Conclusion References.
Lecture 14 – Neural Networks
x – independent variable (input)
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Learning From Data Chichang Jou Tamkang University.
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen.
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette on behalf of Andrew Vickers.
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Chapter 5 Data mining : A Closer Look.
Decision analysis and Risk Management course in Kuopio
Energy Efficiency Benchmarking for Mobile Networks
Decision Tree Models in Data Mining
Comparison of Classification Methods for Customer Attrition Analysis Xiaohua Hu, Ph.D. Drexel University Philadelphia, PA, 19104
Application of SAS®! Enterprise Miner™ in Credit Risk Analytics
CS490D: Introduction to Data Mining Prof. Chris Clifton April 14, 2004 Fraud and Misuse Detection.
DATA MINING Team #1 Kristen Durst Mark Gillespie Banan Mandura University of DaytonMBA APR 09.
Understanding Data Analytics and Data Mining Introduction.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Overview of Data Mining Methods Data mining techniques What techniques do, examples, advantages & disadvantages.
Chapter 6 Regression Algorithms in Data Mining
Division of Population Health Sciences Royal College of Surgeons in Ireland Coláiste Ríoga na Máinleá in Éirinn Indices of Performances of CPRs Nicola.
Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 =
Multi-Layer Perceptrons Michael J. Watts
Data Mining Overview. Lecture Objectives After this lecture, you should be able to: 1.Explain key data mining tasks in your own words. 2.Draw an overview.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
PhD Committee J. Vanthienen (promotor, K.U.Leuven) J. Vandenbulcke
How do we know whether a marker or model is any good? A discussion of some simple decision analytic methods Carrie Bennette (on behalf of Andrew Vickers)
Data Analysis – Workshop Decision Making and Risk Spring 2006 Partha Krishnamurthy.
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting,
Neural Networks Automatic Model Building (Machine Learning) Artificial Intelligence.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Introduction to Neural Networks And Their Applications.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
Multivariate Data Analysis Chapter 1 - Introduction.
Evaluating Predictive Models Niels Peek Department of Medical Informatics Academic Medical Center University of Amsterdam.
Linear Discriminant Analysis and Logistic Regression.
Each neuron has a threshold value Each neuron has weighted inputs from other neurons The input signals form a weighted sum If the activation level exceeds.
Blackbox classifiers for preoperative discrimination between malignant and benign ovarian tumors C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
A Decision Support Based on Data Mining in e-Banking Irina Ionita Liviu Ionita Department of Informatics University Petroleum-Gas of Ploiesti.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Evaluation – next steps
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Performance Evaluation 02/15/17
David L. Olson Department of Management University of Nebraska
Introduction to Data Mining and Classification
Advanced Analytics Using Enterprise Miner
Asymmetric Gradient Boosting with Application to Spam Filtering
Data Mining Classification: Alternative Techniques
ECE 471/571 – Lecture 12 Perceptron.
General Aspects of Learning
Sampling and Power Slides by Jishnu Das.
Prediction of in-hospital mortality after ruptured abdominal aortic aneurysm repair using an artificial neural network  Eric S. Wise, MD, Kyle M. Hocking,
A case in Neural Network - A Nerual Network for Bankruptcy Prediction
A case in Neural Network - A Nerual Network for Bankruptcy Prediction
Data Mining Overview.
Presentation transcript:

Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley Satish Nargundkar November 24, 2003

Paper Research Questions This paper addresses the answers to two the following research questions: This paper addresses the answers to two the following research questions: 1.Does model development technique improve classification accuracy? 2.How will model selection vary based upon the evaluation method used?

Discussion Outline Discussion of Modeling Techniques Discussion of Model Evaluation Methods Global Classification Rate Global Classification Rate Loss Function Loss Function K-S Test K-S Test ROC Curves ROC Curves Empirical Example

Model Development Techniques Modeling plays an increasingly important role in CRM strategies: Target Marketing Response Models Response Models Risk Models Risk Models Customer Behavioral Models Usage Models Usage Models Attrition Models Attrition Models Activation Models Activation Models Collections Recovery Models Recovery Models Product Planning Customer Acquisitio n Customer CustomerManagementCustomerManagement CreatingValueCreatingValue Collection s/Recover y Other Models Segmentation Models Segmentation Models Bankruptcy Models Bankruptcy Models Fraud Models Fraud Models

Model Development Techniques Given that even minimal improvements in model classification accuracy can translate into significant savings or incremental revenue, an entire literature exists on the comparison of model development techniques (e.g., Atiya, 2001; Reichert et al., 1983; West, 2000; Vellido et al., 1993; Zhang et al., 1999). Statistical Techniques Linear Discriminant Analysis Linear Discriminant Analysis Logistic Analysis Logistic Analysis Multiple Regression Analysis Multiple Regression Analysis Non-Statistical Techniques Neural Networks Neural Networks Cluster Analysis Cluster Analysis Decision Trees Decision Trees

Model Evaluation Methods But, developing the model is really only half the problem. How do you then determine which model is “best”?

Model Evaluation Methods In the context of binary classification (one of the most common objectives in CRM modeling), one of four outcomes is possible: 1. True positive 2.False positive 3. True negative 4. False negative FN FPTP TN True Good True Bad Pred. Bad Pred. Good

Model Evaluation Methods If all of these outcomes, specifically the errors, have the same associated costs, then a simple global classification rate is a highly appropriate evaluation method: True Good True Bad Total Predicted Good Predicted Bad Total Classification Rate = 75% (( )/1000)

The global classification method is the most commonly used (Bernardi and Zhang, 1999), but fails when the costs of the misclassification errors are different (Type 1 vs Type 2 errors) - Model 1 results: Global Classification Rate = 75% False Positive Rate = 5% False Negative Rate = 20% Model 2 results: Global Classification Rate = 80% False Positive Rate = 15% False Negative Rate = 5% What if the cost of a false positive was great, and the cost of a false negative was negligible? What if it was the other way around? Model Evaluation Methods

If the misclassification error costs are understood with some certainty, a loss function could be used to evaluate the best model: Loss=π 0 f 0 c 0 + π 1 f 1 c 1 Where, π i is the probability that an element comes from class i, (prior probability), f i is the probability that an element will be misclassified in i class, and c i is the cost associated with that misclassification error.

Model Evaluation Methods An evaluation model that uses the same conceptual foundation as the global classification rate is the Kolmorgorov-Smirnov Test: Greatest separation occurs at a cut off score of.65

Model Evaluation Methods What if you don’t have ANY information regarding misclassification error costs…or…the costs are in the eye of the beholder?

Model Evaluation Methods The area under the ROC (Receiver Operating Characteristics) Curve accounts for all possible outcomes (Swets et al., 2000; Thomas et al., 2002; Hanley and McNeil, 1982, 1983): Sensitivity (True Positives) 1-Specificity (False Positives) θ=.5 θ=1.5<θ<1 FN FPTP TN True Good True Bad Pred. Bad Pred. Good

Empirical Example So, given this background, the guiding questions of our research were – 1. Does model development technique impact prediction accuracy? 2. How will model selection vary with the evaluation method used?

Empirical Example We elected to evaluate these questions using a large data set from a pool of car loan applicants. The data set included: 14,042 US applicants for car loans between June 1, 1998 and June 30, ,042 US applicants for car loans between June 1, 1998 and June 30, Of these applicants, 9442 were considered to have been “good” and 4600 were considered to be “bad” as of December 31, 1999.Of these applicants, 9442 were considered to have been “good” and 4600 were considered to be “bad” as of December 31, variables, split into two groups –65 variables, split into two groups – Transaction variables (miles on the vehicle, selling price, age of vehicle, etc.) Transaction variables (miles on the vehicle, selling price, age of vehicle, etc.) Applicant variables (bankruptcies, balances on other loans, number of revolving trades, etc.) Applicant variables (bankruptcies, balances on other loans, number of revolving trades, etc.)

Empirical Example – The LDA and Logistic models were developed using SAS 8.2, while the Neural Network models were developed using Backpack® 4.0. Because there is no accepted guidelines for the number of hidden nodes in Neural Network development (Zhang et al., 1999; Chen and Huang, 2003), we tested a range of hidden nodes from 5 to 50.

Empirical Example – Feed Forward Back Propogation Neural Networks: Input Layer Hidden Layer Output Layer ΣS Combination Function combines all inputs into a single value, usually as a weighted summation TransferFunction Calculates the output value from the combination function input input input input output

Empirical Example - Results Technique Class Rate “Goods” Class Rate “Bads” Class Rate “Global” Theta K-S Test LDA 73.91% 73.91% 43.40% 43.40% 59.74% 59.74%68.98%19% Logistic70.54%59.64%69.45%68.00%24% NN-5 Hidden Nodes 63.50%56.50%58.88%63.59%38% NN-10 Hidden Nodes 75.40%44.50%55.07%64.46%11% NN-15 Hidden Nodes 60.10%62.10%61.40%65.89%24% NN-20 Hidden Nodes 62.70%59.00%60.29%65.27%24% NN-25 Hidden Nodes 76.60%41.90%53.78%63.55%16% NN-30 Hidden Nodes 52.70%68.50%63.13%65.74%22% NN-35 Hidden Nodes 60.30%59.00%59.46%63.30%22% NN-40 Hidden Nodes 62.40%58.30%59.71%64.47%17% NN-45 Hidden Nodes 54.10%65.20%61.40%64.50%31% NN-50 Hidden Nodes 53.20%68.50%63.27%65.15%37%

Conclusions What were we able to demonstrate? 1.The “best” model depends upon the evaluation method selected; 2.The appropriate evaluation method depends upon situational and data context; 3.No multivariate technique is “best” under all circumstances. circumstances.