1 2 Statistical methods for scorecard development 2.1 Methodologies used in credit granting Judgmental evaluation –5C’s – character, capital, collateral,

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Hunt’s Algorithm CIT365: Data Mining & Data Warehousing Bajuna Salehe
Classification with Multiple Decision Trees
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
Logistic Regression.
Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Classification and risk prediction
1 Analysis of Variance This technique is designed to test the null hypothesis that three or more group means are equal.
Data Mining Techniques Outline
Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)
Topic 3: Regression.
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
An Introduction to Logistic Regression
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
1 Linear Classification Problem Two approaches: -Fisher’s Linear Discriminant Analysis -Logistic regression model.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Decision Tree Models in Data Mining
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Selecting the Correct Statistical Test
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
Classification (Supervised Clustering) Naomi Altman Nov '06.
Chapter 6 Regression Algorithms in Data Mining
Chapter 9 – Classification and Regression Trees
K Nearest Neighbors Classifier & Decision Trees
Categorical data. Decision Tree Classification Which feature to split on? Try to classify as many as possible with each split (This is a good split)
Logistic Regression. Conceptual Framework - LR Dependent variable: two categories with underlying propensity (yes/no) (absent/present) Independent variables:
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Regression Models Fit data Time-series data: Forecast Other data: Predict.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Decision Trees. MS Algorithms Decision Trees The basic idea –creating a series of splits, also called nodes, in the tree. The algorithm adds a node to.
Text Classification 2 David Kauchak cs459 Fall 2012 adapted from:
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Chapter 16 Data Analysis: Testing for Associations.
1 Data Mining dr Iwona Schab Decision Trees. 2 Method of classification Recursive procedure which (progressively) divides sets of n units into groups.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression Regression Trees.
New Measures of Data Utility Mi-Ja Woo National Institute of Statistical Sciences.
Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Classification and Regression Trees
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:
BINARY LOGISTIC REGRESSION
DECISION TREES An internal node represents a test on an attribute.
REGRESSION (R2).
Instance Based Learning
CH 5: Multivariate Methods
(classification & regression trees)
Logistic Regression.
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

1 2 Statistical methods for scorecard development 2.1 Methodologies used in credit granting Judgmental evaluation –5C’s – character, capital, collateral, capacity, condition Statistical methods –Discriminant analysis/linear regression –Logistic regression –Classification trees –Nearest neighbour methods Operational Research methods –Linear programming –Goal programming Heuristic Methods –Neural network algorithms –Support vector machines –Genetic algorithms

2 2.1 Approach to scorecard development in all non-judgmental methodologies Take subset of previous applicants as a sample. For each applicant in sample classify subsequent credit history as – acceptable ( good) – not acceptable ( bad- miss 3 consecutive months of payments ) – “indeterminates” who are ignored in subsequent analysis Need to divide set of possible answers of application form questions, A, into two – A B - answers given by those who were bad – A G - answers given by those who were good. Accept those who gave answers in A G ; reject those in A B

3 Not perfect But only two parameters Age+a(income)=b Better classifier But lots more parameters

4 2.2

5 2.3 Linear regression: Fisher discriminant approach If X is r.v. of answers to application form characteristics m G =Exp(X |G); m B =Exp(X |B) ; S= Exp{(X-m G ).(X-m G )} is correlation matrix for G (and B) population Discriminant Analysis asks what linear combination of X i best separates goods from bads Let Y=w 1 X w p X p So w T.m G =Exp(Y |G); w T.m B =Exp(Y |B) ; w T.Sw=Var{Y} Choose w to maximise (Distance between means of Y) 2 / var{Y} = (w T.(m G -m B )) 2 / w T.Sw Maximised by linear discriminant function w = (m G -m B ) T. S -1 (same as Bayes rule) Midpoint between the two means is c= 0.5 w T.(m G +m B ). Classify by saying good if w T. X  c

6 2.4 Linear regression : regression line on probability Discriminant analysis(LDF) is equivalent to linear regression if only two classification groups p i = Exp{Y i }= w 1 X w p X p where Y i = 1 if ith applicant good; 0 if bad Again turns out this is solved by Since this is a regression can use the least squares approach which means the coefficients w are calculated by an analytic expression

7 2.5Statistical tests one can use Since this is multivariate regression can use its tests R 2 : how much of the variation in the p i is explained by the w.x. This is the strength of the relationship Wilkes  ( likelihood ratio test): is m B = m G if variances are the same T-test checks if coefficients of variable are non-zero( so variable should be in scorecard) D 2 : sample Mahalanobis distance, is the measure in the Fisher approach (Difference between means) 2 /(variance of populations) It has a F-distribution

8 2.6

9 2.7

10 Health warning for Regression Approaches · No underlying model and so no a priori justification of variables and weights · Collinearity between application variables leads to unstable coefficients · Qualitative variables (postcode, residential status) need to be translated into quantitative variables For N-values qualitative variable use any of · N-1 dummy binary variables · location model with N discriminant functions · modify variable with r attributes (values). If g i is number of goods in attribute i and b i number of bads. let value w i of attribute i of the variable be: (sometimes called weights of evidence) This approach of categorising variables is also used for quantitative variables as well because of their lack of linearity.

11 Default risk with age

12 All variables are categorical Since risk is not linear in the continuous variables, make these variables categorical as well So age splits into are you 18-21; 22-28; 29-36; 37-59;60+

Methods which group rather than score Methods like classification trees, expert systems and neural nets end up with “scorecards” that classify applicants into groups rather than give a scorecard which adds the score for each answer. Main approach is classification tree. It was developed at the same time in statistics and computer science so is also called Recursive partitioning algorithm Splits A -set of answers into two subsets, depending on the answer to one question so that the two subsets are very different Take each subset and repeat the process until one decides to stop Each terminal node is classified as in A G or A B Classification tree depends on –Splitting rule –how to choose best daughter subsets –Stopping rule- when one decides this is a terminal node –Assigning rule- which categories for eachterminal nodes

14 Classification tree: credit risk example

15 Rules in classification trees Assigning Rule –Normally assign to class which is the largest in that node. Sometimes if D is default cost and L is lost profit, assign to good if good/bad ration>D/L Stopping rule –Stop either if subset is too small ( say <1% population) –Difference in daughter subsets is too small ( under splitting rule) Really it is stopping and pruning rule, as always have to cut back some of the nodes. Do this by using a second sample ( not used in building the tree) Splitting rules – KS, index, chi-square

16 Splitting rules Kolmogorov-Smirnov Maximise |p(L|B)-p(L|G)| L=parent: R=own+tenant p(L|B)=120/500;p(L|G)=80/1500 KS= |(120/500)-(80/1500)|=.187 L=parent+tenant; R=owner p(L|B)=320/500; p(L|G)=480/1500 KS=|(320/500)-(480/1500)|=.32 Choose par+tenant; owner split Residential status OwnerTenantWith parents No. of goods No. of bads Good:bad odds5.6:12:1.67;1 Think of daughters as L(left) and R (right). P(L|B) is prop of bads in original set who are in left daughter ( p(L|G) similar)

17 Basic Impurity Index i(v)is impurity of node so I=i(v)- p(L)i(L)-p(R)i(R) is decrease in impurity. Want to maximise this ( or minimise p(L)i(L)+-p(R)i(R)). I(v) = min (p(G|v), p(B|v)) L=parent; R= owner+tenant: i(v)=.26, p(L)=.1, i(L)=.4,p(R)=.9,i(R)=.22, I=.02 L=Parent+tenant, R=owner: i(v)=.26, p(L)=.4,i(L)=.4, p(R)=.6, i(R)=.167, I=0 Choose Parent;owner+tenant. (N.B. I=0 because same group is in minority in v, L and R.) Gini Index I(v) = p(G|v)p(B|v), so maximise G= p(G|v)p(B|v)-p(L)p(G|L)p(B|L)-p(R)p(G|R)p(B|R) L=parent; R= owner+tenant: i(v)=.1875, p(L)=.1, i(L)=.24,p(R)=.9,i(R)=.166, G=.0141 L=Parent+tenant, R=owner: i(v)=.1875, p(L)=.4,i(L)=.24, p(R)=.6, i(R)=.1275, G=.0915 Choose parent+tenant; owner Chi square (look for large values) If n(L), n(R) are numbers in L and R subset, Chi = n(L)n(R)(p(G|L)-p(G|R))2/(n(L)+n(R)) L=parent; R= owner+tenant: n(L)=200, p(G|L)=.4, n(R)=1800,p(G|R)=.788, Chi=27.1 L=Parent+tenant, R=owner: n(L)=800, p(G|L)=.6,n(R)=1200, p(G|R)=.6, Chi=26.4 Choose parent; owner+tenant

18 2.9