Advanced statistical methods for credit risk modeling in practice

Name: Advanced statistical methods for credit risk modeling in practice
Uploaded: 2017-12-28T20:23:33+00:00
Duration: PTM15S10
Channel: James Bryant
Description: Advanced statistical methods for credit risk modeling in practice

Advanced statistical methods for credit risk modeling in practice
Kádár Ferenc OTP Bank Analysis and Modeling Department

Modeling what? And why? Risk cost = PD·LGD·EAD
Type of risks in a bank: Market risk, Operational risk, Credit risk How can we measure credit risk? Expected loss of lending: Risk cost = PD·LGD·EAD Risk parameters: PD – Probability of Default (0, 1) LGD – Loss Given Default (0, 1] EAD – Exposure at Default (0, ∞) - generally limited  Default: 90+ day delinquency in 1 year. 0 or 1 (good client – bad client) „Risk is not measurable (outside of casinos or the minds of people who call themselves ’risk experts’)”

PD modeling (scorecards)
Early default (Fraud) scorecards Behaviour, Collection scorecards Application scorecards for OTP clients Application scorecard for walk-in clients socio-demographic and financial status, info on employment and financial status, info on employment, historical client data application data + behavior data of the client, network data application data + behavior data of the credit

CRoss Industry Standard Process for Data
CRISP-DM methodology CRoss Industry Standard Process for Data Mining

Observation period (one year in general)
Datasets used Observation period (one year in general) Now Modeling dataset Validation dataset Stability dataset MODELING dataset is divided as follows: Training dataset Testing dataset – out-of-sample validation Filtered dataset VALIDATION dataset serves as out-of-time validation („More recent database”) STABILITY dataset is used for checking stability („Most recent database”) „Time series analysis is similar to sending troops after the battle”

Data preparation Missing value handling Data transformation Outliers
Correlated variables Is our modeling method sensitive to them?

Variable transformation
Categories Weight of Evidence (WoE) = 𝑙𝑛 𝐺𝑜𝑜𝑑𝑠 % 𝐵𝑎𝑑𝑠 %

Grouping in SAS Enterprise Miner

Information value WoE = ln(G/B) IV=(G−B )ln(G/B) Default Non-default
Total Default (%) B(ads) Non-default (%) G(oods) Total (%) WoE IV Variable X =No 8096 175821 183917 58.41% 88.30% 86.36% 0.4133 0.1236 =Yes 5765 23286 29051 41.59% 11.70% 13.64% 0.3793 13861 199107 212968 100.00% 0.5029 WoE = ln(G/B) IV=(G−B )ln(G/B) Information Value Predictive Power < 0.02 useless for prediction 0.02 to 0.1 Weak predictor 0.1 to 0.3 Medium predictor 0.3 to 0.5 Strong predictor >0.5 Suspicious or too good to be true

Variable selection Variable strength Correlation matrix

Loss functions X – the vector space of all inputs,
𝑦 – the space of binary targets, 𝑓:𝑋→ℝ, the estimator to 𝑦 We seek to minimize empirical risk: 𝐼 𝑓 = 1 𝑛 𝐿(𝑓( 𝑋 𝑖 ), 𝑦 𝑖 ) 𝐿 is the loss function Hinge loss: 𝐿 𝑓 𝑥 , 𝑦) =|1−𝑦𝑓 𝑥 | + Square loss: 𝐿 𝑓 𝑥 , 𝑦) =(1−𝑦𝑓 𝑥 ) 2 (extremely penalizes the outliers) Huber loss: Quadratic for 𝑥 <𝑟 and linear for 𝑥 >𝑟 Logistic loss: 𝐿 𝑓 𝑥 , 𝑦) =log( 𝑒 −𝑦𝑓 𝑥 +1) …

Logistic Regression 𝑙𝑜𝑔𝑖𝑡 𝑝 = 𝑙𝑜𝑔 𝑝 1−𝑝 = 𝛽 0 + 𝛽 𝑖 𝑋 𝑖
We look for the probability of default in form: 𝑙𝑜𝑔𝑖𝑡 𝑝 = 𝑙𝑜𝑔 𝑝 1−𝑝 = 𝛽 𝛽 𝑖 𝑋 𝑖 𝑝= 1 1+ 𝑒 −( 𝛽 𝛽 𝑖 𝑋 𝑖 ) or equivalent: Advanced log-loss: 𝑖=1 𝑛 𝛽 𝑖 2 +C·log( 𝑒 −𝑦( 𝛽 𝛽 𝑖 𝑥 𝑖 ) +1)

LogReg Formula Example
1/(1+exp(-( nvl(miota_ugyfel,0) * ( ) + nvl(tx_kozv_terh_db_avg,0) * ( ) + (case when nvl(eletkor,40) <= then when nvl(eletkor,40) >= then else 0 end) + (case when nvl(cs_hazib_mind_db_avg,0) > 0 then (case when (nvl(te_egy_lak_foly_avg,116000) > and nvl(te_egy_lak_foly_avg,116000) <= 0) then when (nvl(te_egy_lak_foly_avg,116000) > and nvl(te_egy_lak_foly_avg,116000) <= ) then when nvl(te_egy_lak_foly_avg,116000) > then ... 𝛽 0 𝛽 1 𝛽 2

Decision Tree Classic methods are simple and easily interpretable.
Logistic regression is sensitive to missing values, outliers and correlated variables, decision trees are not. We generally use the combination of the two above methods. Most often with one-deep trees (decision stumps).

Model evaluation – performance
AUC (Area Under Curve)=𝑃( 𝑋 𝑃 > 𝑋 𝑁 ) Gini = 2 x AUC – 1 Lift(h) = 𝐷𝑅 𝑥<ℎ 𝐷𝑅

Model evaluation – distribution
Evaluate the distribution: Stability index Hosmer-Lemeshow test Binomial test

PD distr. on modeling data (A) PD distr. on stability data (B)
Stability index (PSI) Pool PD distr. on modeling data (A) PD distr. on stability data (B) Pool limit low Pool limit high (A-B)*LN(A/B) 1 10.00% 20.05% 0.00% 0.63% 2 12.80% 0.92% 3 9.37% 1.15% 4 7.30% 1.40% 5 6.75% 1.73% 6 6.91% 2.38% 7 8.36% 4.37% 8 9.44% 10.75% 9 9.04% 35.76% 10 9.97% 100.00% Stability index 0.1141 PSI Value Inference Less than 0.1 Insignificant change 0.1 – 0.25 Some minor change Greater than 0.25 Major shift in population

Classic vs. Ensemble methods
Keep balance between power, stability, interpretability, simplicity. „Less is more, and usually more effective” What we do is not „high-frequency trading”, takes months to reveal whether our estimation is good or bad. We should avoid black-boxes. We have to understand our models – the knowledge of business experts is essential.

„We go from reality to models not from models to reality”
Model error, Model risk The model itself also entail risk! Predicting the „Crisis” yet Blowing Up Sources of model risk: Data errors Parameter uncertainty Misuse of the model … „We go from reality to models not from models to reality”

„Understand model error before you use a model”
Propagation of error 𝐼 𝛼 = 𝛽 𝑖 ± 𝑧 1−𝛼/2 𝜎 𝑖 Quantification of parameter uncertainty: confidence intervals A large amount (𝑛) of random numbers 𝑥 is simulated, according to an even distribution between 0 and 1, 𝑋 ~ 𝑈(0,1). For each 𝑘 ∈ {1..𝑛}, a full set of 𝛽 𝑖 estimators for the logistic regression is simulated through the inverse of its respective distribution functions, 𝛽 𝑖 𝑘 = 𝐹 𝑖 −1 ( 𝑥 𝑘 ) The entire portfolio is scored with each set of estimators. 𝜎 𝑓 2 ≈ 𝑓 𝜎 𝐴 𝐴 𝜎 𝐵 𝐵 𝑐𝑜𝑣 𝐴𝐵 𝐴𝐵 Risk cost = PD·LGD·EAD 𝑓=𝐴∙𝐵 „Understand model error before you use a model”

Software Programming language Graphical Open source Commercial

Modeling Streams SAS Enterprise Miner IBM SPSS Modeler

Modeling with python more alternative models teaching the models
evaluation

Thank you for your attention!
Thanks to… My colleagues in OTP Bank Benczúr András, SZTAKI Gáspár Csaba, BME dmlab Nassim Nicholas Taleb (quotes from Antifragile and Silent Risk) … and many more Thank you for your attention!

Advanced statistical methods for credit risk modeling in practice

Similar presentations

Presentation on theme: "Advanced statistical methods for credit risk modeling in practice"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced statistical methods for credit risk modeling in practice

Similar presentations

Presentation on theme: "Advanced statistical methods for credit risk modeling in practice"— Presentation transcript:

Similar presentations

About project

Feedback