CHURN PREDICTION IN THE MOBILE TELECOMMUNICATIONS INDUSTRY An application of Survival Analysis in Data Mining L.J.S.M. Alberts, 29-09-2006.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

Random Forest Predrag Radenković 3237/10
Brief introduction on Logistic Regression
Chapter 8 – Logistic Regression
HSRP 734: Advanced Statistical Methods July 24, 2008.
Evaluation.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
Evaluation.
Chapter 7 Multicollinearity. What is in this Chapter? In Chapter 4 we stated that one of the assumptions in the basic regression model is that the explanatory.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Survival Analysis for Risk-Ranking of ESP System Performance Teddy Petrou, Rice University August 17, 2005.
Classification and Prediction: Regression Analysis
Ensemble Learning (2), Tree and Forest
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Decision Tree Models in Data Mining
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Analysis of Complex Survey Data
Introduction to Directed Data Mining: Decision Trees
Business cost and revenue
Estimating cancer survival and clinical outcome based on genetic tumor progression scores Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,, Wolfgang A. Schulz.
1 Forecasting Field Defect Rates Using a Combined Time-based and Metrics-based Approach: a Case Study of OpenBSD Paul Luo Li Jim Herbsleb Mary Shaw Carnegie.
Overview DM for Business Intelligence.
Simple Linear Regression
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
CLassification TESTING Testing classifier accuracy
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
by B. Zadrozny and C. Elkan
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
Chapter 9 – Classification and Regression Trees
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
HSRP 734: Advanced Statistical Methods July 17, 2008.
A way to integrate IR and Academic activities to enhance institutional effectiveness. Introduction The University of Alabama (State of Alabama, USA) was.
Carlos H. R. Lima - Depto. of Civil and Environmental Engineering, University of Brasilia. Brazil. Upmanu Lall - Water Center, Columbia.
Stats 845 Applied Statistics. This Course will cover: 1.Regression –Non Linear Regression –Multiple Regression 2.Analysis of Variance and Experimental.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
01/20151 EPI 5344: Survival Analysis in Epidemiology Cox regression: Introduction March 17, 2015 Dr. N. Birkett, School of Epidemiology, Public Health.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Copyright © 2011 Pearson Education, Inc. Regression Diagnostics Chapter 22.
Classification and Regression Trees
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Additional Regression techniques Scott Harris October 2009.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Quantitative Methods for Business Studies
Data Science Credibility: Evaluating What’s Been Learned
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Data Based Decision Making
Erasmus University Rotterdam
Introduction to Data Mining, 2nd Edition by
Analyzing Reliability and Validity in Outcomes Assessment Part 1
Data Analysis Learning from Data
If we can reduce our desire,
Generally Discriminant Analysis
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Analyzing Reliability and Validity in Outcomes Assessment
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

CHURN PREDICTION IN THE MOBILE TELECOMMUNICATIONS INDUSTRY An application of Survival Analysis in Data Mining L.J.S.M. Alberts,

OVERVIEW Introduction Research questions Operational churn definition Data Survival Analysis Predictive churn models Tests and results Conclusions and recommendations Questions

INTRODUCTION Changed from a rapidly growing market, into a state of saturation and fierce competition. Focus shifted from building a large customer base into keeping customers ‘in house’. Acquiring new customers is more expensive than retaining existing customers. Mobile telecommunications industry

INTRODUCTION A term used to represent the loss of a customer is churn. Churn prevention: – Acquiring more loyal customers initially – Identifying customers most likely to churn Churn Predictive churn modelling

INTRODUCTION Applied in the field of – Banking – Mobile telecommunication – Life insurances – Etcetera Common model choices – Neural networks – Decision trees – Support vector machines Predictive churn modelling

INTRODUCTION Trained by offering snapshots of churned customers and non- churned customers. Disadvantage: The time aspect often involved in these problems is neglected. How to incorporate this time aspect? Predictive churn modelling Survival analysis

INTRODUCTION Vodafone is interested in churn of prepaid customers. Prepaid: Not bound by a contract  pay per call – As a consequence: irregular usage Prepaid: No registration required – As a consequence: passing of sim-cards and – loss of information Prepaid versus postpaid

INTRODUCTION Prepaid versus postpaid Prepaid: Actual churn date in most cases difficult to assess – As a consequence: churn definition required

RESEARCH QUESTIONS Is it possible to make a prepaid churn model based on the theory of survival analysis? What is a proper, practical and measurable prepaid churn definition? How well do survival models perform in comparison to the ‘established’ predictive models? Do survival models have an added value compared to the ‘established’ predictive models?

RESEARCH QUESTIONS To answer the 2 nd and 3 rd sub question, a second predictive model is considered  Decision tree Direct comparison in ‘tests and results’.

OPERATIONAL CHURN DEFINITION Should indicate when a customer has permanently stopped using his sim-card as early as possible. Necessary since the proposed models are supervised models  require a labeled dataset for training purposes. Based on number of successive months with zero usage.

OPERATIONAL CHURN DEFINITION The definition consists of two parameters, α and β, where α = fixed value β = the maximum number of successive months with zero usage α + β is used as a threshold.

OPERATIONAL CHURN DEFINITION α = 3 β = 2

OPERATIONAL CHURN DEFINITION Two variations are examined: – Churn definition 1: α = 2 – Churn definition 2: α = 3 Customers with β >= 5 left out  outliers.

DATA Database provided by Vodafone. Already monthly aggregated data. Only usage and billing information. Derived variables: capture customer behaviour in a better way. – recharge this month yes/no  time since last recharge

SURVIVAL ANALYSIS Survival analysis is a collection of statistical methods which model time-to-event data. The time until the event occurs is of interest. In our case the event is churn.

SURVIVAL ANALYSIS Survival function S(t): T =event time, f(t) = density function, F(t) = cum. Density function. The survival at time t is the probability that a subject will survive to that point in time.

SURVIVAL ANALYSIS

Hazard rate function : The hazard (rate) at time t describes the frequency of the occurance of the event in “events per ”.  instantaneous Probability that event occurs in current interval, given that event has not already occurred.

SURVIVAL ANALYSIS

commitment date time scale = month 15 months after commitment date

SURVIVAL ANALYSIS How can accommodate to an individual? Survival regression models Can be used to examine the influence of explanatory variables on the event time. Accelerated failure time models Cox model (Proportional hazard model)

Hazard for individual i at time t Baseline hazard: the ‘average’ hazard curve Regression part: the influence of the variables X i on the baseline hazard SURVIVAL MODEL Cox model

SURVIVAL MODEL Cox model

SURVIVAL MODEL Drawback: hazard at time t only dependent on baseline hazard, not on variables. We want to include time-dependent covariates  variables that vary over time, e.g. the number of SMS messages per month. Cox model

SURVIVAL MODEL This is possible: Extended Cox model Extended Cox model

SURVIVAL MODEL Now we can compute the hazard for time t, but in fact we want to forecast. In fact, the data from this month is already outdated. Lagging of variables is required: Extended Cox model

SURVIVAL MODEL Principal component analysis (PCA): – Reduce the dimensionality of the dataset while retaining as much as possible of the variation present in the dataset. Transform variables into new ones  principal components. Principal component regression

SURVIVAL MODEL Principal component regression

SURVIVAL MODEL Principal component regression: – Use principal components as variables in model. First reason: – Reduces collinearity. – Collinearity causes inaccurate estimations of the regression coefficients. Principal component regression

SURVIVAL MODEL

Second reason: – Reduce dimensionality – The first 20 components are chosen. – Safe choice, because principal components with largest variances are not necessarily the best predictors. Principal component regression

SURVIVAL MODEL Survival models not designed to be predictive models. How do we decide if a customer is churned?  Scoring method A threshold applied on the hazard is used to indicate churn. Extended Cox model

SURVIVAL MODEL Example

SURVIVAL MODEL Example

DECISION TREE Compare with the performance the extended Cox model. Classification and regression trees. – Classification trees  predict a categorical outcome. – Regression trees  predict a continuous outcome.

DECISION TREE

Recursive partitioning. An iterative process of splitting the data up into (in this case) two partitions.

DECISION TREE Overfitting  capture artefacts and noise present in the dataset. Predictive power is lost. Solution: – prepruning – postpruning Optimal tree size

DECISION TREE 10-fold cross-validation The training set is split into 10 subsets. Each of the 10 subsets is left out in turn. – train on the other subsets – Test on the one left out Optimal tree size

DECISION TREE Optimal tree size

DECISION TREE Oversampling: alter the proportion of the outcomes in the training set. Increases the proportion of the less frequent outcome (churn). Why? Otherwise not sensible enough. Proportion changed to 1/3 churn and 2/3 non-churn. Oversampling

DECISION TREE Churn definition 1

DECISION TREE Churn definition 2

TESTS AND RESULTS Goal: gain insight into the performance of the extended Cox model. Same test set for extended Cox model and decision tree. Direct comparison possible. Tests

TESTS AND RESULTS Dataset: customers – training set: customers – test set: 5000 customers The test set consists of – 1313 churned customers – 3403 non-churned customers – 284 outliers All months of history are offered. Tests

TESTS AND RESULTS Results

TESTS AND RESULTS Results

TESTS AND RESULTS Extended Cox model gives satisfying results with both a high sensitivity and specificity. However, the decision tree performs even better. Time aspect incorporated by the extended Cox model does not provide an advantage over the decision tree in this particular problem. Results

TESTS AND RESULTS Put the results in perspective  dependent on churn definition. Already difference between churn definition 1 and 2. A new and different churn definition is likely to yield different results. Churn definition too simple?  Size of the decision trees. Results

CONCLUSIONS AND RECOMMENDATIONS What is a proper, practical and measurable prepaid churn definition? Extensive examination of the customer behaviour. Churn definition is consistent and intuitive. Allows for large range of customer behaviours. For larger periods of zero usage the definition becomes less reliable. Conclusions

CONCLUSIONS AND RECOMMENDATIONS How well do survival models perform in comparison to the established predictive models? Survival model = Extended Cox model. ‘Established’ predictive model = Decision tree. High sensitivity and specificity. However, not better than the decision tree. Conclusions

CONCLUSIONS AND RECOMMENDATIONS Do survival models have an added value compared to the established predictive models? Models time aspect through baseline hazard. Can handle censored data. Stratification  customer groups. If only time-independent variables  predict at a future time. Conclusions

CONCLUSIONS AND RECOMMENDATIONS Is it possible to make a prepaid churn model based on the theory of survival analysis? Yes! We have shown that it gives results with both a high sensitivity and specificity. In this particular prepaid problem, no benefit over decision tree. Conclusions

CONCLUSIONS AND RECOMMENDATIONS Recommendations Better churn definition. Based on reliable data. Switching of sim-cards. Neural networks for survival data  can handle nonlinear relationships. Other scoring methods.

QUESTIONS