Download presentation
Published byMorris Campbell Modified over 7 years ago
1
Machine learning tehniques for credit risk modeling in practice
Balaton Attila OTP Bank Analysis and Modeling Department
2
„Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data…”* * Wikipedia
3
New, complex databases needed new modeling tools
Powerful models Internal databases GIRINFO Better GINI Comlex, large database to be analyzed Machine learning Utility companies Wide range dataset about costumer behavior Retailers Recognise connection between different variables Social networks
4
Why Machine Learning: to mine new, large complex datasets
The actual phenomenon Traditional Stats Machine Learning BAD BAD GOOD GOOD Description Traditional stats will fit a predetermined (linear, quadratic, logarithmic) function to the data ML algorithms do not use predetermined function so that they can build a model closely to fit with data Self-learning Not available Self-learning possible to some extent (variable weight can be changed automatically) Regular expert supervision needed Dataset and Complexity Adequate for well-structured databases Can’t handle complex, poorly structured datasets Works well with small or poorly-structured datasets Recognizes complex patterns Intrepetation of results Easy to interpret the results and the effect of explanatory variables Model interpretation requires expertise Hardware capacity Less computationally intensive Demands more computational power
5
Need of interpretability
Completely understandable Well comprehensible „Black box” sec min hour day week month year Forecast timeframe
6
New risk models is a sub-project of the Banks’ Digital Strategy
OBR application models Internal development of a new scorecard using AMM techniques OBRU, OBR application models OTP HU application models Regular trainings about the alternative techniques Internal development of at least two scorecards with AMM Trainings for subsidiaries about AMM AMM belongs to the folclor during model development Subsidiaries test AMM techniques External teams support the validation process Establishing a Python server Involving OTP HU Big Data enviroment Weblog and detailed transactional data for fraud prevention Jun. 2016 Dec. 2016 Jun. 2017 Dec. 2017
7
Machine Learning in practice
Machine Learning techniques can not replace the whole „classic” model development lifcycle A modell-fejlesztés ettől kezdve egy adatbányászati feladat, amely során az általános CRISP-DM metodológia szerint kell eljárni. Láthatók a lépései: - Üzleti megértés - Adatok megértése - Adat-előkészítés - Modellezés - Kiértékelés - Alkalmazás Én, amikor először találkoztam olyasmivel, ami adatbányászatnak lehetett nevezni, nyilván az első 3 unalmasnak tűnő lépésen könnyedén átugrottam, és rögtön mentem neki az adatoknak, ha jól emlékszem, neurális hálóval, vagy amim éppen volt. Az eredmény valami olyasmi lett, mint a Galaxis Útikalauz stopposoknak c. könyvben a 42. Valami kijött, de használni nem nagyon lehetett. Szükséges a folyamat minden lépésére rászánni a megfelelő időt és energiát!
8
We tested the following Machine Learning techniques…
Random forest: „average” of a lot of random trees. Did not perform well for scorecards. Support Vector Machine: the goal is to find the appropriate hyperplane with optimal separational power. The extended version with kernel functions had capacity issues so it was only a supporting algorithm combined with regression. Neural network: can not be used for real time decision making Boosting techniques: supervised repeating of „weak classificators” (decision trees, regression, …) lead to a stronge classificator Main types of boosting: AdaBoost: underweight well classified and overweight misclassified elements in every round LogitBoost: special type of Adaboost where the loss function is logistic Gradient Boosting Trees: special type of LogitBoost, loss function is decreased along the gradient
9
Random Forest
10
Base classifiers: C1, C2, …, CT
AdaBoost Base classifiers: C1, C2, …, CT In step m, find best Cm in predefined class using weights wi Error rate: Importance of a classifier:
11
AdaBoost Weight update: Classification:
12
AdaBoost example from “A Tutorial on Boosting” by Yoav Freund and Rob Schapire
13
AdaBoost example
14
Compute , α and the weight of the instances in Round 2
AdaBoost example Compute , α and the weight of the instances in Round 2
15
AdaBoost example
16
AdaBoost example
17
Main AdaBoost idea and a new idea
“Shortcomings” are identified by high weight data points The new model (e.g. stump) is fit irrespective to previous predictions In next iteration, learn just the residual of present model F(x): Fit a model to (x1; y1 – F(x1)); (x2; y2 – F(x2)); …; (xn; yn – F(xn)) Regression, no longer classification!
18
LogitBoost: The additive logistic regression model
Logistic regression learns linear combination of classifiers for the “log odds-ratio” The logit transformation guarantees that for any F(x), p(x) is a probability in [0,1]. inverting, we get: Function of real label y – p(x) will be instance weigth
19
2. Fit a stump by weighted regression
LogitBoost Algorithm Step 1: Initialization committee function: initial probabilities: Step 2: LogitBoost iterations for m=1,2,...,M repeat: A. Fitting the weak learner: 1. Compute working response and weights for i=1,...,n 2. Fit a stump by weighted regression Optimization is no longer for error rate but for (root) mean squared error (RMSE)
20
B. Updating and classifier output
LogitBoost Algorithm B. Updating and classifier output
21
Regression Tree, Regression Stump
Regression stump example: TIME_FROM_FIRST_SEND <= 25.5 true false
22
Iterative prediction y*m
Gradient Boosting As in LogitBoost Iterative prediction y*m Residuals: y*m+1 = y*m + h(x) where h(x) is a simple regressor, e.g. stump, shallow tree New idea: optimize by gradient descent If we minimize the mean squared error to true values y averaged over training data: Derivative of (y*-y)2 can be computed and will be proportional to the error y*-y 𝑑 𝑑 𝐹 𝑗 𝑥 𝑖 𝑦 𝑖 − 𝑗 𝐹 𝑗 𝑥 𝑖 =2 𝑦 𝑖 − 𝑗 𝐹 𝑗 𝑥 𝑖 =2∙residual Well… this is just LogitBoost without the logistic loss function
23
Squared loss is overly sensitive to outliers
Other loss functions Squared loss is overly sensitive to outliers Absolute loss more robust to outliers but has infinite derivative 𝐿 𝑦;𝐹 =|𝑦−𝐹(𝑥)| Huber loss 𝐿 𝑦;𝐹 = (𝑦−𝐹 ) 2 if 𝑦−𝐹 ≤𝛿 𝛿( 𝑦−𝐹 −𝛿/2) if 𝑦−𝐹 >𝛿 Negative gradient is −𝑔 𝑥 𝑖 = 𝑑𝐿 𝑦;𝐹 𝑥 𝑖 𝑑𝐹 𝑥 𝑖 = 𝑦−𝐹( 𝑥 𝑖 ) if 𝑦−𝐹( 𝑥 𝑖 ) ≤𝛿 𝛿sign(𝑦−𝐹( 𝑥 𝑖 )) if 𝑦−𝐹( 𝑥 𝑖 ) >𝛿
24
Two main Gradient Boosting Loss Functions
Deviance (as in Logistic Regression) 1 1+ 𝑒 −𝐹(𝑥) Exponential (as in AdaBoost)
25
Python server OTP Python server Red Hat Linux with Anaconda
(2016Q2) Red Hat Linux with Anaconda Jupyter notebook More effective than localhost enviroments Background mode Web application
26
Application credit fraud prevention
27
SAS Fraud Solution – Logical system architecture
28
System architecture and working logic of OTPHU’s fraud system
Hybrid approach Automated Business Rules Anomaly Detection Predictive Modeling Entity matching Social Network Analysis Alert Generation Process Rejection Process Rejection Application data Historical data Behavioural data Entity matching Network building Scenario scores Fraud scoring Manual investigation Approval Approval
29
Investigation List of suspicious applications Scenario analysis
Network analysis
30
Thank you for your attention!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.