Machine learning tehniques for credit risk modeling in practice

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Machine learning tehniques for credit risk modeling in practice Balaton Attila OTP Bank Analysis and Modeling Department 2017.02.23.

„Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). Evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data…”* * Wikipedia

New, complex databases needed new modeling tools Powerful models Internal databases GIRINFO Better GINI Comlex, large database to be analyzed Machine learning Utility companies Wide range dataset about costumer behavior Retailers Recognise connection between different variables Social networks

Why Machine Learning: to mine new, large complex datasets The actual phenomenon Traditional Stats Machine Learning BAD BAD GOOD GOOD Description Traditional stats will fit a predetermined (linear, quadratic, logarithmic) function to the data ML algorithms do not use predetermined function so that they can build a model closely to fit with data Self-learning Not available Self-learning possible to some extent (variable weight can be changed automatically) Regular expert supervision needed Dataset and Complexity Adequate for well-structured databases Can’t handle complex, poorly structured datasets Works well with small or poorly-structured datasets Recognizes complex patterns Intrepetation of results Easy to interpret the results and the effect of explanatory variables Model interpretation requires expertise Hardware capacity Less computationally intensive Demands more computational power

Need of interpretability Completely understandable Well comprehensible „Black box” sec min hour day week month year Forecast timeframe

New risk models is a sub-project of the Banks’ Digital Strategy OBR application models Internal development of a new scorecard using AMM techniques OBRU, OBR application models OTP HU application models Regular trainings about the alternative techniques Internal development of at least two scorecards with AMM Trainings for subsidiaries about AMM AMM belongs to the folclor during model development Subsidiaries test AMM techniques External teams support the validation process Establishing a Python server Involving OTP HU Big Data enviroment Weblog and detailed transactional data for fraud prevention Jun. 2016 Dec. 2016 Jun. 2017 Dec. 2017

Machine Learning in practice Machine Learning techniques can not replace the whole „classic” model development lifcycle A modell-fejlesztés ettől kezdve egy adatbányászati feladat, amely során az általános CRISP-DM metodológia szerint kell eljárni. Láthatók a lépései: - Üzleti megértés - Adatok megértése - Adat-előkészítés - Modellezés - Kiértékelés - Alkalmazás Én, amikor először találkoztam olyasmivel, ami adatbányászatnak lehetett nevezni, nyilván az első 3 unalmasnak tűnő lépésen könnyedén átugrottam, és rögtön mentem neki az adatoknak, ha jól emlékszem, neurális hálóval, vagy amim éppen volt. Az eredmény valami olyasmi lett, mint a Galaxis Útikalauz stopposoknak c. könyvben a 42. Valami kijött, de használni nem nagyon lehetett. Szükséges a folyamat minden lépésére rászánni a megfelelő időt és energiát!

We tested the following Machine Learning techniques… Random forest: „average” of a lot of random trees. Did not perform well for scorecards. Support Vector Machine: the goal is to find the appropriate hyperplane with optimal separational power. The extended version with kernel functions had capacity issues so it was only a supporting algorithm combined with regression. Neural network: can not be used for real time decision making Boosting techniques: supervised repeating of „weak classificators” (decision trees, regression, …) lead to a stronge classificator Main types of boosting: AdaBoost: underweight well classified and overweight misclassified elements in every round LogitBoost: special type of Adaboost where the loss function is logistic Gradient Boosting Trees: special type of LogitBoost, loss function is decreased along the gradient

Random Forest

Base classifiers: C1, C2, …, CT AdaBoost Base classifiers: C1, C2, …, CT In step m, find best Cm in predefined class using weights wi Error rate: Importance of a classifier:

AdaBoost Weight update: Classification:

AdaBoost example from “A Tutorial on Boosting” by Yoav Freund and Rob Schapire

AdaBoost example

Compute , α and the weight of the instances in Round 2 AdaBoost example Compute , α and the weight of the instances in Round 2

AdaBoost example

AdaBoost example

Main AdaBoost idea and a new idea “Shortcomings” are identified by high weight data points The new model (e.g. stump) is fit irrespective to previous predictions In next iteration, learn just the residual of present model F(x): Fit a model to (x1; y1 – F(x1)); (x2; y2 – F(x2)); …; (xn; yn – F(xn)) Regression, no longer classification!

LogitBoost: The additive logistic regression model Logistic regression learns linear combination of classifiers for the “log odds-ratio” The logit transformation guarantees that for any F(x), p(x) is a probability in [0,1]. inverting, we get: Function of real label y – p(x) will be instance weigth

2. Fit a stump by weighted regression LogitBoost Algorithm Step 1: Initialization committee function: initial probabilities: Step 2: LogitBoost iterations for m=1,2,...,M repeat: A. Fitting the weak learner: 1. Compute working response and weights for i=1,...,n 2. Fit a stump by weighted regression Optimization is no longer for error rate but for (root) mean squared error (RMSE)

B. Updating and classifier output LogitBoost Algorithm B. Updating and classifier output

Regression Tree, Regression Stump Regression stump example: TIME_FROM_FIRST_SEND <= 25.5 true false 0.14627427718697633 -0.14621897522944

Iterative prediction y*m Gradient Boosting As in LogitBoost Iterative prediction y*m Residuals: y*m+1 = y*m + h(x) where h(x) is a simple regressor, e.g. stump, shallow tree New idea: optimize by gradient descent If we minimize the mean squared error to true values y averaged over training data: Derivative of (y*-y)2 can be computed and will be proportional to the error y*-y 𝑑 𝑑 𝐹 𝑗 𝑥 𝑖 𝑦 𝑖 − 𝑗 𝐹 𝑗 𝑥 𝑖 2 =2 𝑦 𝑖 − 𝑗 𝐹 𝑗 𝑥 𝑖 =2∙residual Well… this is just LogitBoost without the logistic loss function

Squared loss is overly sensitive to outliers Other loss functions Squared loss is overly sensitive to outliers Absolute loss more robust to outliers but has infinite derivative 𝐿 𝑦;𝐹 =|𝑦−𝐹(𝑥)| Huber loss 𝐿 𝑦;𝐹 = 1 2 (𝑦−𝐹 ) 2 if 𝑦−𝐹 ≤𝛿 𝛿( 𝑦−𝐹 −𝛿/2) if 𝑦−𝐹 >𝛿 Negative gradient is −𝑔 𝑥 𝑖 = 𝑑𝐿 𝑦;𝐹 𝑥 𝑖 𝑑𝐹 𝑥 𝑖 = 𝑦−𝐹( 𝑥 𝑖 ) if 𝑦−𝐹( 𝑥 𝑖 ) ≤𝛿 𝛿sign(𝑦−𝐹( 𝑥 𝑖 )) if 𝑦−𝐹( 𝑥 𝑖 ) >𝛿

Two main Gradient Boosting Loss Functions Deviance (as in Logistic Regression) 1 1+ 𝑒 −𝐹(𝑥) Exponential (as in AdaBoost)

Python server OTP Python server Red Hat Linux with Anaconda (2016Q2) Red Hat Linux with Anaconda Jupyter notebook More effective than localhost enviroments Background mode Web application

Application credit fraud prevention

SAS Fraud Solution – Logical system architecture

System architecture and working logic of OTPHU’s fraud system Hybrid approach Automated Business Rules Anomaly Detection Predictive Modeling Entity matching Social Network Analysis Alert Generation Process Rejection Process Rejection Application data Historical data Behavioural data Entity matching Network building Scenario scores Fraud scoring Manual investigation Approval Approval

Investigation List of suspicious applications Scenario analysis Network analysis

Thank you for your attention!