1 2 Statistical methods for scorecard development 2.1 Methodologies used in credit granting Judgmental evaluation –5C’s – character, capital, collateral, capacity, condition Statistical methods –Discriminant analysis/linear regression –Logistic regression –Classification trees –Nearest neighbour methods Operational Research methods –Linear programming –Goal programming Heuristic Methods –Neural network algorithms –Support vector machines –Genetic algorithms
2 2.1 Approach to scorecard development in all non-judgmental methodologies Take subset of previous applicants as a sample. For each applicant in sample classify subsequent credit history as – acceptable ( good) – not acceptable ( bad- miss 3 consecutive months of payments ) – “indeterminates” who are ignored in subsequent analysis Need to divide set of possible answers of application form questions, A, into two – A B - answers given by those who were bad – A G - answers given by those who were good. Accept those who gave answers in A G ; reject those in A B
3 Not perfect But only two parameters Age+a(income)=b Better classifier But lots more parameters
4 2.2
5 2.3 Linear regression: Fisher discriminant approach If X is r.v. of answers to application form characteristics m G =Exp(X |G); m B =Exp(X |B) ; S= Exp{(X-m G ).(X-m G )} is correlation matrix for G (and B) population Discriminant Analysis asks what linear combination of X i best separates goods from bads Let Y=w 1 X w p X p So w T.m G =Exp(Y |G); w T.m B =Exp(Y |B) ; w T.Sw=Var{Y} Choose w to maximise (Distance between means of Y) 2 / var{Y} = (w T.(m G -m B )) 2 / w T.Sw Maximised by linear discriminant function w = (m G -m B ) T. S -1 (same as Bayes rule) Midpoint between the two means is c= 0.5 w T.(m G +m B ). Classify by saying good if w T. X c
6 2.4 Linear regression : regression line on probability Discriminant analysis(LDF) is equivalent to linear regression if only two classification groups p i = Exp{Y i }= w 1 X w p X p where Y i = 1 if ith applicant good; 0 if bad Again turns out this is solved by Since this is a regression can use the least squares approach which means the coefficients w are calculated by an analytic expression
7 2.5Statistical tests one can use Since this is multivariate regression can use its tests R 2 : how much of the variation in the p i is explained by the w.x. This is the strength of the relationship Wilkes ( likelihood ratio test): is m B = m G if variances are the same T-test checks if coefficients of variable are non-zero( so variable should be in scorecard) D 2 : sample Mahalanobis distance, is the measure in the Fisher approach (Difference between means) 2 /(variance of populations) It has a F-distribution
8 2.6
9 2.7
10 Health warning for Regression Approaches · No underlying model and so no a priori justification of variables and weights · Collinearity between application variables leads to unstable coefficients · Qualitative variables (postcode, residential status) need to be translated into quantitative variables For N-values qualitative variable use any of · N-1 dummy binary variables · location model with N discriminant functions · modify variable with r attributes (values). If g i is number of goods in attribute i and b i number of bads. let value w i of attribute i of the variable be: (sometimes called weights of evidence) This approach of categorising variables is also used for quantitative variables as well because of their lack of linearity.
11 Default risk with age
12 All variables are categorical Since risk is not linear in the continuous variables, make these variables categorical as well So age splits into are you 18-21; 22-28; 29-36; 37-59;60+
Methods which group rather than score Methods like classification trees, expert systems and neural nets end up with “scorecards” that classify applicants into groups rather than give a scorecard which adds the score for each answer. Main approach is classification tree. It was developed at the same time in statistics and computer science so is also called Recursive partitioning algorithm Splits A -set of answers into two subsets, depending on the answer to one question so that the two subsets are very different Take each subset and repeat the process until one decides to stop Each terminal node is classified as in A G or A B Classification tree depends on –Splitting rule –how to choose best daughter subsets –Stopping rule- when one decides this is a terminal node –Assigning rule- which categories for eachterminal nodes
14 Classification tree: credit risk example
15 Rules in classification trees Assigning Rule –Normally assign to class which is the largest in that node. Sometimes if D is default cost and L is lost profit, assign to good if good/bad ration>D/L Stopping rule –Stop either if subset is too small ( say <1% population) –Difference in daughter subsets is too small ( under splitting rule) Really it is stopping and pruning rule, as always have to cut back some of the nodes. Do this by using a second sample ( not used in building the tree) Splitting rules – KS, index, chi-square
16 Splitting rules Kolmogorov-Smirnov Maximise |p(L|B)-p(L|G)| L=parent: R=own+tenant p(L|B)=120/500;p(L|G)=80/1500 KS= |(120/500)-(80/1500)|=.187 L=parent+tenant; R=owner p(L|B)=320/500; p(L|G)=480/1500 KS=|(320/500)-(480/1500)|=.32 Choose par+tenant; owner split Residential status OwnerTenantWith parents No. of goods No. of bads Good:bad odds5.6:12:1.67;1 Think of daughters as L(left) and R (right). P(L|B) is prop of bads in original set who are in left daughter ( p(L|G) similar)
17 Basic Impurity Index i(v)is impurity of node so I=i(v)- p(L)i(L)-p(R)i(R) is decrease in impurity. Want to maximise this ( or minimise p(L)i(L)+-p(R)i(R)). I(v) = min (p(G|v), p(B|v)) L=parent; R= owner+tenant: i(v)=.26, p(L)=.1, i(L)=.4,p(R)=.9,i(R)=.22, I=.02 L=Parent+tenant, R=owner: i(v)=.26, p(L)=.4,i(L)=.4, p(R)=.6, i(R)=.167, I=0 Choose Parent;owner+tenant. (N.B. I=0 because same group is in minority in v, L and R.) Gini Index I(v) = p(G|v)p(B|v), so maximise G= p(G|v)p(B|v)-p(L)p(G|L)p(B|L)-p(R)p(G|R)p(B|R) L=parent; R= owner+tenant: i(v)=.1875, p(L)=.1, i(L)=.24,p(R)=.9,i(R)=.166, G=.0141 L=Parent+tenant, R=owner: i(v)=.1875, p(L)=.4,i(L)=.24, p(R)=.6, i(R)=.1275, G=.0915 Choose parent+tenant; owner Chi square (look for large values) If n(L), n(R) are numbers in L and R subset, Chi = n(L)n(R)(p(G|L)-p(G|R))2/(n(L)+n(R)) L=parent; R= owner+tenant: n(L)=200, p(G|L)=.4, n(R)=1800,p(G|R)=.788, Chi=27.1 L=Parent+tenant, R=owner: n(L)=800, p(G|L)=.6,n(R)=1200, p(G|R)=.6, Chi=26.4 Choose parent; owner+tenant
18 2.9