Piecewise Logistic Regression: An application in credit scoring

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
Clustered or Multilevel Data
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Topic 3: Regression.
Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
C REDIT R ISK M ODELS C ROSS - V ALIDATION – I S T HERE A NY A DDED V ALUE ? Croatian Quants Day Zagreb, June 6, 2014 Vili Krainz The.
Determining Sample Size
Lecture 14 Multiple Regression Model
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
STA Statistical Inference
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.
Public Policy Analysis ECON 3386 Anant Nyshadham.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Psychology 820 Correlation Regression & Prediction.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
Chapter 13 Understanding research results: statistical inference.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Transportation Planning Asian Institute of Technology
Outline Sampling Measurement Descriptive Statistics:
Logic of Hypothesis Testing
Chapter 14 Introduction to Multiple Regression
Sample Size Determination
Piecewise Polynomials and Splines
Introduction to Hypothesis Test – Part 2
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
26134 Business Statistics Week 5 Tutorial
Kin 304 Regression Linear Regression Least Sum of Squares
Tutorial 8: Probability Distribution
Understanding Results
Correlation – Regression
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Measures of Central Tendency
Statistics in MSmcDESPOT
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Research design I: Experimental design and quasi-experimental research
BPK 304W Regression Linear Regression Least Sum of Squares
Multivariate Analysis Lec 4
Advanced Analytics Using Enterprise Miner
BPK 304W Correlation.
Multiple Regression – Part II
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
Correlation and Regression
Diagnostics and Transformation for SLR
Prepared by Lee Revere and John Large
Correlations: testing linear relationships between two metric variables Lecture 18:
Review: What influences confidence intervals?
Scatter Plots of Data with Various Correlation Coefficients
CHAPTER- 17 CORRELATION AND REGRESSION
Random Heading Angle in Reliability Analyses
About me – Matthew Jones
Chapter 8: Weighting adjustment
Trip Generation II Meeghat Habibian Transportation Demand Analysis
Trip Generation II Meeghat Habibian Transportation Demand Analysis
Elements of a statistical test Statistical null hypotheses
Product moment correlation
SIMPLE LINEAR REGRESSION
Diagnostics and Transformation for SLR
Chapter 9 Hypothesis Testing: Single Population
Warsaw Summer School 2017, OSU Study Abroad Program
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Piecewise Logistic Regression: An application in credit scoring By Raymond Anderson Standard Bank of South Africa Presented at the Credit Scoring and Control IV conference Edinburgh 26-28 August, 2015 10/09/2018

Variable selection is like landing a plane in a cross-wind: Aircraft analogy Variable selection is like landing a plane in a cross-wind: Airplane - always adjusting to ensure a safe landing Regression – always adjusting to ensure the best fit Standard WOE regression – like a fixed wing aircraft. Piecewise WOE regression – more like a bird, adjusting each wing independently as required. Greater maneuverability. 10/09/2018

Definitions With respect to a number of discrete intervals, sets, or pieces <piecewise continous functions> [Merriam Webster] Denoting that a function has a specified property [such] as smoothness or continuity, on each of a finite number of pieces into which its domain is divided [www.disctionary.reference]. In mathematics, a piecewise-defined function (also called a piecewise or hybrid function) is a function which is defined by multiple sub functions, each applying to a certain interval of the main functions domain [Wikipedia] Piecewise regression, also known as segmented or “broken-stick” regression, is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval [Wikipedia] Mostly associated with linear regresssion Few or no references found for logistic regession 10/09/2018

Piece Assignment Split into high- and low- risk pieces, and treat discontinuities (e.g. at 0 or 100%) separately. - High- vs. low risk to address different ends of risk spectrum - Discontinuities where data is conflicted (e.g. what do 0 and 100 really mean?) Maximum of 4 pieces used, usually only 2 or 3 10/09/2018

Low-income loan portfolio High default portfolio Loan volumes peaked in H2 2011 and H1 2012 With reduced risk appetite, loan volumes fell heavily Policy decline on Gross Income increased from $100 to $300 p.m. Loans for terms less than 12 months curtailed Thru-the-door profile is now much lower risk 10/09/2018

Bought performance! Reject performance was “bought”, i.e. no reject inference was done Through-the-door population now better than development accepts Old score implemented during Out-of-time period 10/09/2018

Types Transformation type: Regression type: Time: Base case, using one variable per characteristic; Piecewise, using multiple user-defined variables per characteristic; Dummy, using multiple variables per characteristic, one per coarse class. Regression type: Positive/negative, the normal stepwise logistic results; Positive only, which removed any variables with negative beta coefficients; and Limited, where the number of variables is limited to further avoid overfitting; Time: Development, second half of 2012; First out-of-time, first half of 2013; and Second out-of-time, second half of 2013. 10/09/2018

Gini results Reduces over time, due to risk homogenisation of portfolio and scorecard deterioration. Reduces as steps taken to ensure model robustness, but marginally. Piecewise almost always performs better (exception is Dev/Stepped/Dummy) 10/09/2018

Power Progression Used 25 variables that appeared in any of the “Stepped” models Starting point was Piecewise, and then moved towards extreme ends Modified characteristics in alphabetical (near random) order Gini coefficient shows a curvilinear relationship Gini peaks in middle around Piecewise, and then deteriorates marginally 10/09/2018

Why???? Hypothesis: Base Case vs. Piecewise Dummy WOE vs. Piecewise Better reflects the non-linear relationships within the data Provided many common-sense insights Emphasis on “Time with Bank” for up to 7 years (higher beta), “Age of customer < 45” and “Time at Employment < 7 years” not penalised but higher values rewarded. Lack of negative bureau information rewarded, but penalties for recent credit appetite (enquiries, new loans) and lack of credit experience (number of profiles). Dummy WOE vs. Piecewise Dummy model focused on tails of the distribution Insufficient focus on the mid-range Model too simple Einstein: “Everything must be as simple as possible… but not simpler”. 10/09/2018

Model complexity Base case one variable per characteristic, dummy one per attribute, piecewise varies. As variable rationalisation progresses, complexity reduces. Attributes decrease, variables increase, and characteristics vary. = = 10/09/2018

Caveats Development was done on a unique dataset, rich in bad accounts over a period that provided multiple out-of-time samples. The process of eliminating unstable characteristics by referring to out-of-time samples is not standard, which may raise expectations for out-of-time benefits. Where reject inference is done, the inference results will heavily influence the results in the higher risk spectrum. Extra care needs to be taken where there are low good or bad volumes in any of the risk buckets There may be some issues with monitoring and validation, as processes/calculations may need to be modified. Benefits may reduce if staging done, or variables forced into model. 10/09/2018

Conclusion Piecewise Benefits Caveats Better represents non-linear relationships, without being too simple Marginally more predictive on development, but more robust out of time Relative 0.9% lift on development, but up to 4.4% out-of-time Model interpretation provided greater portfolio insights Potentially easier implementation Less need for multiple models where segmentation done on risk Caveats Unusual situation, with many bads and multiple out-of-time samples High out-of-time results were raised by eliminating unstable characteristics For application scoring, reject inference will have a greater effect on high-risk region Validation and monitoring may be affects: a) marginally more characteristics to monitor; b) some calculations may need reviewing (e.g. characteristic contribution) 10/09/2018

QUESTIONS??? 10/09/2018