Piecewise Logistic Regression: An application in credit scoring

Slides:

Advertisements

Similar presentations

Continued Psy 524 Ainsworth

Advertisements

Kin 304 Regression Linear Regression Least Sum of Squares

Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.

ANCOVA Psy 420 Andrew Ainsworth. What is ANCOVA?

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

Clustered or Multilevel Data

SIMPLE LINEAR REGRESSION

Chapter 11 Multiple Regression.

Topic 3: Regression.

Multiple Regression Farrokh Alemi, Ph.D. Kashif Haqqi M.D.

C REDIT R ISK M ODELS C ROSS - V ALIDATION – I S T HERE A NY A DDED V ALUE ? Croatian Quants Day Zagreb, June 6, 2014 Vili Krainz The.

Determining Sample Size

Lecture 14 Multiple Regression Model

© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.

STA Statistical Inference

1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.

Sampling Design and Analysis MTH 494 LECTURE-12 Ossam Chohan Assistant Professor CIIT Abbottabad.

Public Policy Analysis ECON 3386 Anant Nyshadham.

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

Psychology 820 Correlation Regression & Prediction.

© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.

 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.

Chapter 13 Understanding research results: statistical inference.

NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.

Transportation Planning Asian Institute of Technology

Outline Sampling Measurement Descriptive Statistics:

Logic of Hypothesis Testing

Chapter 14 Introduction to Multiple Regression

Sample Size Determination

Piecewise Polynomials and Splines

Introduction to Hypothesis Test – Part 2

Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides

26134 Business Statistics Week 5 Tutorial

Kin 304 Regression Linear Regression Least Sum of Squares

Tutorial 8: Probability Distribution

Understanding Results

Correlation – Regression

Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.

Measures of Central Tendency

Statistics in MSmcDESPOT

Multiple Regression Analysis and Model Building

Essentials of Modern Business Statistics (7e)

APPROACHES TO QUANTITATIVE DATA ANALYSIS

Research design I: Experimental design and quasi-experimental research

BPK 304W Regression Linear Regression Least Sum of Squares

Multivariate Analysis Lec 4

Advanced Analytics Using Enterprise Miner

BPK 304W Correlation.

Multiple Regression – Part II

Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II

Correlation and Regression

Diagnostics and Transformation for SLR

Prepared by Lee Revere and John Large

Correlations: testing linear relationships between two metric variables Lecture 18:

Review: What influences confidence intervals?

Scatter Plots of Data with Various Correlation Coefficients

CHAPTER- 17 CORRELATION AND REGRESSION

Random Heading Angle in Reliability Analyses

About me – Matthew Jones

Chapter 8: Weighting adjustment

Trip Generation II Meeghat Habibian Transportation Demand Analysis

Trip Generation II Meeghat Habibian Transportation Demand Analysis

Elements of a statistical test Statistical null hypotheses

Product moment correlation

SIMPLE LINEAR REGRESSION

Diagnostics and Transformation for SLR

Chapter 9 Hypothesis Testing: Single Population

Warsaw Summer School 2017, OSU Study Abroad Program

MGS 3100 Business Analysis Regression Feb 18, 2016

Presentation transcript:

Piecewise Logistic Regression: An application in credit scoring By Raymond Anderson Standard Bank of South Africa Presented at the Credit Scoring and Control IV conference Edinburgh 26-28 August, 2015 10/09/2018

Variable selection is like landing a plane in a cross-wind: Aircraft analogy Variable selection is like landing a plane in a cross-wind: Airplane - always adjusting to ensure a safe landing Regression – always adjusting to ensure the best fit Standard WOE regression – like a fixed wing aircraft. Piecewise WOE regression – more like a bird, adjusting each wing independently as required. Greater maneuverability. 10/09/2018

Definitions With respect to a number of discrete intervals, sets, or pieces <piecewise continous functions> [Merriam Webster] Denoting that a function has a specified property [such] as smoothness or continuity, on each of a finite number of pieces into which its domain is divided [www.disctionary.reference]. In mathematics, a piecewise-defined function (also called a piecewise or hybrid function) is a function which is defined by multiple sub functions, each applying to a certain interval of the main functions domain [Wikipedia] Piecewise regression, also known as segmented or “broken-stick” regression, is a method in regression analysis in which the independent variable is partitioned into intervals and a separate line segment is fit to each interval [Wikipedia] Mostly associated with linear regresssion Few or no references found for logistic regession 10/09/2018

Piece Assignment Split into high- and low- risk pieces, and treat discontinuities (e.g. at 0 or 100%) separately. - High- vs. low risk to address different ends of risk spectrum - Discontinuities where data is conflicted (e.g. what do 0 and 100 really mean?) Maximum of 4 pieces used, usually only 2 or 3 10/09/2018

Low-income loan portfolio High default portfolio Loan volumes peaked in H2 2011 and H1 2012 With reduced risk appetite, loan volumes fell heavily Policy decline on Gross Income increased from $100 to $300 p.m. Loans for terms less than 12 months curtailed Thru-the-door profile is now much lower risk 10/09/2018

Bought performance! Reject performance was “bought”, i.e. no reject inference was done Through-the-door population now better than development accepts Old score implemented during Out-of-time period 10/09/2018

Types Transformation type: Regression type: Time: Base case, using one variable per characteristic; Piecewise, using multiple user-defined variables per characteristic; Dummy, using multiple variables per characteristic, one per coarse class. Regression type: Positive/negative, the normal stepwise logistic results; Positive only, which removed any variables with negative beta coefficients; and Limited, where the number of variables is limited to further avoid overfitting; Time: Development, second half of 2012; First out-of-time, first half of 2013; and Second out-of-time, second half of 2013. 10/09/2018

Gini results Reduces over time, due to risk homogenisation of portfolio and scorecard deterioration. Reduces as steps taken to ensure model robustness, but marginally. Piecewise almost always performs better (exception is Dev/Stepped/Dummy) 10/09/2018

Power Progression Used 25 variables that appeared in any of the “Stepped” models Starting point was Piecewise, and then moved towards extreme ends Modified characteristics in alphabetical (near random) order Gini coefficient shows a curvilinear relationship Gini peaks in middle around Piecewise, and then deteriorates marginally 10/09/2018

Why???? Hypothesis: Base Case vs. Piecewise Dummy WOE vs. Piecewise Better reflects the non-linear relationships within the data Provided many common-sense insights Emphasis on “Time with Bank” for up to 7 years (higher beta), “Age of customer < 45” and “Time at Employment < 7 years” not penalised but higher values rewarded. Lack of negative bureau information rewarded, but penalties for recent credit appetite (enquiries, new loans) and lack of credit experience (number of profiles). Dummy WOE vs. Piecewise Dummy model focused on tails of the distribution Insufficient focus on the mid-range Model too simple Einstein: “Everything must be as simple as possible… but not simpler”. 10/09/2018

Model complexity Base case one variable per characteristic, dummy one per attribute, piecewise varies. As variable rationalisation progresses, complexity reduces. Attributes decrease, variables increase, and characteristics vary. = = 10/09/2018

Caveats Development was done on a unique dataset, rich in bad accounts over a period that provided multiple out-of-time samples. The process of eliminating unstable characteristics by referring to out-of-time samples is not standard, which may raise expectations for out-of-time benefits. Where reject inference is done, the inference results will heavily influence the results in the higher risk spectrum. Extra care needs to be taken where there are low good or bad volumes in any of the risk buckets There may be some issues with monitoring and validation, as processes/calculations may need to be modified. Benefits may reduce if staging done, or variables forced into model. 10/09/2018

Conclusion Piecewise Benefits Caveats Better represents non-linear relationships, without being too simple Marginally more predictive on development, but more robust out of time Relative 0.9% lift on development, but up to 4.4% out-of-time Model interpretation provided greater portfolio insights Potentially easier implementation Less need for multiple models where segmentation done on risk Caveats Unusual situation, with many bads and multiple out-of-time samples High out-of-time results were raised by eliminating unstable characteristics For application scoring, reject inference will have a greater effect on high-risk region Validation and monitoring may be affects: a) marginally more characteristics to monitor; b) some calculations may need reviewing (e.g. characteristic contribution) 10/09/2018

QUESTIONS??? 10/09/2018