Presentation is loading. Please wait.

Presentation is loading. Please wait.

Boosting Customer Response Rate to Service Offers Using Data Mining

Similar presentations


Presentation on theme: "Boosting Customer Response Rate to Service Offers Using Data Mining"— Presentation transcript:

1 Boosting Customer Response Rate to Service Offers Using Data Mining
Hi everyone, today we will talk about our project on Boosting Customer Response Rate to Service Offers Using Data Mining. BIT Applied Business Intelligence and Analytics - Spring 2018 Group 1: Matthew Schumaecker & Xioyuan Zhao

2 Project Summary Business Understanding Data Understanding
Data Preparation Initial Modeling Final Model Results Recommendations We will discuss each step of the project including the business goals established and questions raised, the data understanding and processing, the modeling, our discoveries and implications of the discoveries. The conduction of this project follows the CRISP-DM process. The unique charm about data mining and the CRISP-DM process, as pointed out by the authors of our textbook, is that this process is not necessarily linear, you do not always proceed from one phase to the next listed phase.  Therefore, we emphasized on an iterative process of modeling, evaluation, refining data preparation, re-modeling and re-testing to produce an optimized model and solutions to our business problems.

3 Business Understanding
XYMS (fictional) Auto Insurance Company Business Understanding Business Problem: Low positive response rate to offers Goals and Benefits: Boosting positive response rate to >25% Business Questions: What kind of customers? Who would respond yes? How to predict? Boosting high value customers’ response rate? Pilot Offer Responses XYMS, Inc. is a (fictional) national auto insurance company that provides personal and corporate automobile coverage to over one million Americans. It has decided to partner with another underwriter to expand its insurance offerings into the term life insurance market.   A direct marketing offer to its customers has been made to a pilot group of 9,134 policy owners. After reviewing the statistics on the responses of the pilot offer that the company made, it is discovered that the company received a really low positive responses, only 14.32% and the positive rate for offer 3 and 4 is almost non-exist. Therefore, we aim to help the company by leveraging the obtained dataset to create a predictive model to narrow the scope of its future marketing efforts to a population with a greater likelihood of responding to the offers. This will allow the company to save on marketing expense and target the demographic most likely to be interested in these offers more aggressively. Specifically, the following business questions will be studied: what kind of customers the company have and who would be interested in these offers? Whether a new customer would respond yes or no to the offer. Who are our more profitable customers? How can we improve our high profitable customers’ response rate to offers? Total

4 Selected Variable Dictionary Demo
DATA Understanding Variable Type Description Customer Nominal Primary ID State State of Residence Customer Lifetime Value Continuous Expected profit over lifetime with a particular customer Response Nominal (TARGET VARIABLE) Whether or not the customer responded to the offer Coverage Type of auto coverage Education Ordinal Level of highest education Location Code Urban, Suburban or Rural location of customer Months Since Policy Inception Length of time customer has had policy Number of Open Complaints How many claims are open Data Source: IBM Watson Analytics Attributes: Demographic Features (5) Employment Status (1) Education Level (1) Current Vehicle, Policy and Claim Features (13) Offer Type (1) Customer Lifetime Value (1) Target: Customer’s Response to the Offer (Y/N) The dataset for this project is a sample provided by IBM Watson Analytics. The dataset has 24 attributes including demographic information, employment and education information, information about the existing insurance policy and claims type of offer that is provided to the customer and a metric called “Customer Lifetime Value” . The target variable is a binary outcome of the customer’s response to the offer.

5 Initial DATA Preparation
Customer Distribution by State Initial DATA Preparation Coverage Distributions in the State of California Data Consolidation Data Selected Data Dictionary generated Data Cleaning No Missing Value Variable Distributions Visualized Outliers detected for: Customer Lifetime Value; Monthly Premium Auto; Number of Open Complaints; Number of Policies; and Total Claim Amount Variable Frequency Analysis Summarized Data Transformation Dummy Variables Created Data Reduction Making Training and Validation Subsets Initial Modeling After obtaining the data, a series of data preparation have been conducted. This data-set is well-organized with 9134 instances and 21 attributes at our disposal. A variable dictionary is made and each variable is correctly classified as continuous, nominal, ordinal or date. The dataset has been cleaned to ensure that there are no empty fields or invalid values. Descriptive statistics have been visualized in the form of histograms with mean, median and standard deviations to determine the nature of distribution of each variable and to determine the presence of statistical outliers. it is discovered in that the following variables contain a significant amount of statistical outliers: Customer Lifetime Value, Number of Open Complaints, Number of Policies and Total Claim Amount. For the initial modeling, the outliers are kept in case they contain information that will be otherwise eliminated. Frequency analysis on the categorical variables are also conducted. It was discovered that about 63% of the customers are from the state of Oregon and California; 60% of customers have basic coverage and only 1% have premium coverage. (optional) About 60% of customers went to college; 62% of the customers are employed; gender distribution is about equal; For the variables Customer Lifetime Value, Monthly Premium Auto, Number of Open Complaints, Number of Policies and Total Claim Amount, it is observed that the percent of customers with values below average is higher than the percent of the customers with values above average implying a left skewness to the distributions. We used holdback and cross-validation as our primary validation technique. We used 80% of the data to train the model and the remaining 20% for validation.

6 Initial DATA EXPLORATION
Clustering Benefit: Customer groupings revealed Concern: Groupings are too general Customer Cluster Diagram Our initial effort in implementing a clustering model have revealed customer groups and traits, such as group 1 has the longest time since last claim, however has the lowest time since policy inception, group 1 customers are mostly in rural areas, mostly employed and mostly have just the basic coverage. but the grouping is for the entire customer pool, it is insufficient to provide implications for the company to target their offers.

7 Initial Modeling Logistic Regression Fitness:
Benefit: determined variables with most information Offer Type Employment Status Location Area Code Sales Channel Concern: Lack of fit Training Validation RSquare 0.23 0.21 RMSE 0.32 0.33 In our logistic regression analysis, the variables that provided the most information in log worth terms were: Offer Type, Employment Status, Location Area Code and Sales Channel. However, the global regression model only produced a R2 of 0.23 and the stepwise logistic regression did not show significant change in goodness-of-fit. This suggests that logistic regression on the available data will only provide a weak prediction of customer response.

8 Initial Modeling Decision Tree (258 splits) Fitness:
Benefit: A predictive model with high Accuracy, Precision, Sensitivity and Specificity Concern: Overfitting Training Validation RSquare 0.91 0.76 RMSE 0.09 0.20 Our initial effort in implementing a decision tree model utilizing the cross-validation have yielded very promising results. The accuracy of this model is 0.98 with precision 0.98, sensitivity 0.91, and specificity of for the training set and 0.94, 0.76, 0.90 and 0.95 respectively for the validation set. The AUC for the yes and no response are both high with the value for the training set and 0.98 for validation set, very close to 1. The lift curve for the Yes response also indicate advantages over random guessing. However, the decision tree showed a significant decrease in the fitness from the training set to the validation set (R2 of 0.91 to 0.76), we are concerned with issues such as overfitting.

9 Further DATA Preparation
Stratify Customers Re-code Variables Remove Outliers Clustering Reducing Variables Data Preparation Final Model Initial Model Following the CRISP-DM iterative process, a series of data processing was conducted to further prepare the data for our modeling. An effort was made to stratify our customer based on the ‘Customer Lifetime Value’ (CLTV) to determine if there is any difference in sales success between the lowest quartile of CLTV and highest quartile of CLTV. This was done by transforming the CLTV continuous variable into an ordinal variable grouping by quartiles. Besides the CLTV, all the continuous variables have also been categorized into ordinal variables to be appended to our variable lists to see if they could provide additional predictive power or facilitate in producing a simpler decision tree. In addition, we have kept the outliers in the analysis thus far. In the steps to follow, we will use the subset of the data excluding records with outliers and run models to determine the effect of outliers on the predictive power of our models.

10 Further data Exploration
Selected Cluster Mean Chart Demo Further data Exploration All Customers Clustering Customers Responded Yes to Offer Type 1 Based on the stratified groups, a final clustering is conducted among different groups of customers such as among the customers who have responded yes to each type of offer and among the customers who have responded no to each type of offer, among higher profitable customers who responded yes to offers and among higher profitable customers who responded no to offers. The results are compared with the clustering on overall customers. Based on the comparisons, all people who have responded yes to offer 1 and 2 have no open complaints, so customers who have been dissatisfied with the company in the past are unlikely to respond positively to new offers. These customers are in lower customer value range, mostly just have basic coverage, their monthly premium is lower, they have stayed longer with the policy. More of these customers buy from the local branch and they have lower total amount of claims. Nevertheless, the customers who responded yes to offer type 2 are more likely to be male and they are in the higher range of income. Many of these customers live in rural areas, and are less likely to be divorced. The customers who responded yes to offer type 1 tend to live in suburban areas. High-value customers who responded yes to offers are mostly married who have had the policy for a longer period of time. Compared to the high value customers who responded no to offers prefer to buy their policy online, the ones who responded yes mostly bought their policy by calling. Based on the information revealed by the initial logistic regression and the final clustering, a further predictor selection is conducted. Variables that demonstrates low difference between clusters such as education are to be removed in sequence to see if a more optimized model could be achieved. High Profitable Customers Responded No to Offers Highest Value Lowest Value

11 Final Modeling R2 = 0.29 Training Validation Accuracy 0.86 0.88
Fit Details R2 = 0.29 Training Validation Accuracy 0.86 0.88 Precision 0.98 0.99 Sensitivity 0.87 Specificity 0.68 0.82 Logistic Regression The logistic regression based on the stratified customer value groups showed some improvement (R2 is 0.29 compared to 0.22) but we were still unsatisfied with the weak correlation.

12 Decision Tree (168 splits)
Fit Details Initial Tree (258 Splits) With Re-coded Variables (164 Splits) Without Outliers (168 Splits) With Variable Reduction (173 Splits) Train. Valid. Entropy R2 0.91 0.76 0.74 0.63 0.88 0.85 0.67 Generalized R2 0.93 0.81 0.72 0.78 0.73 Mean -Log p 0.04 0.14 0.11 0.16 0.06 0.13 0.07 0.15 RMSE 0.20 0.19 0.23 0.12 Mean Abs Dev 0.03 0.08 0.10 0.05 0.09 Accuracy 098 0.94 0.92 0.99 0.97 Precision 0.98 0.83 0.95 0.70 0.8 Sensitivity 0.90 0.77 0.89 Specificity 0.996 Final Modeling Decision Tree (168 splits) In oder to produce a simpler tree with fewer splits, three efforts were made: 1. To remodel the tree using the re-coded variables transformed from our continuous variables to see if a simpler tree could be achieved. 2. To remodel the tree using a subset of the data with all the records containing outliers removed. 3. To remove the variables that demonstrates low difference between clusters such as education in sequence to produce a number of trees to see if a more optimized tree could be produced. All of the remodeled trees showed improvement in simplicity  (number of splits) . However, the tree produced using the re-coded variables showed reduced fitness and lower accuracy, precision, sensitivity and specificity. No improvement is achieved through the trees produced using different combination of reduced predictors, the best tree used 173 splits and showed lower fitness. For the tree produced without outliers, a subset of 5,912 records was utilized for testing, of which, 4,693 selected for training and 1,219 selected for validation. Removing the outliers helped to yield a simpler tree with 168 split. The R2, accuracy, sensitivity, speci‘ficity and precision showed some decrease from the initial tree. However, the decrease is not significant and this tree is the most optimized among the simplified trees.

13 Decision Tree (168 splits)
Final Modeling Decision Tree (168 splits) Further analysis on the positive response side of the tree showed that the first split is on employment and the retired group have a high positive response rate. Continue along the splits, if the customer is retired and have a median to large size car, they are more likely to respond positively, and if they bought the policy from a local agent and they are on the policy less than 6 years and 4 months, they have a 95% possibility to respond positively to the offer. Nevertheless, if a retired but not single customer with a median to large size car, who did not bought the policy from an agent, have a claim amount less than dollars and is paying monthly premium auto for less than 72 dollars, they have a 100% possibility to say yes to the offer.

14 Final Modeling K-NN For the k-NN model, JMP created k-NN from 1 to 21 and the resulting confusion matrix was best for a k=5. This had the lowest amount of misclassifications (433) out of the total number of 7373 in the training set yielding an accuracy rate of 94.1%. As with other predictive models, we used a 20% holdback for validation. The training set had 7373 rows and the validations set contained the remaining 1761 rows. As we tested 21 predictors, the analysis is done in 21 dimensional Euclidean space and is therefore impossible to represent visually. The best way to evaluate a k-NN model is by its confusion matrix and resulting measures.

15 Results Measure Logistic Regression Decision Tree k-NN AUC = No 0.82
Receiver Operating Characteristics (ROC) Confusion Matrix Measure Logistic Regression Decision Tree k-NN AUC = No 0.82 0.99 AUC = YES Logistic Regression Decision Tree k-NN R Square 0.21 0.63 N/A RMSE 0.33 0.20 Accuracy 0.88 0.94 Precision 0.99 0.70 0.93 Sensitivity 0.90 Specificity 0.82 0.71 Based on the model comparison report, the best model is the decision tree with highest R2 value of 0.63, lowest RMSE of 0.20, lowest log likelihood of 0.06, lowest mean absolute deviation of The decision tree also has the highest AUC of 0.99, highest accuracy value of 0.94, a high sensitivity value of 0.90 and the highest specificity value of 0.94 which is comparable to that of the other models. The lift chart also reveals that the decision tree curve has the highest lift.

16 Results & Recommendations
Generally: good education; unemployed & retired Specifically: 10 groups with different traits Tip 1: Implement the tree model What kind of customers? Tip 2: Tailor offer to suit customer groups Generally: customers with higher satisfaction who stayed longer Who would respond yes? Tip 3: Hand out offers at branch offices  We have proposed a number of models that could be implemented to predict if a certain customer would respond positively to the offer for life insurance. Given the simplicity of the model and goodness of fit, we recommend the tree model without outliers. When this model is used, XYMS, Inc. should be able to predict positive response rate to the life insurance offer with 94% accuracy (based on validation data). And among these 94%, if a customer is retired but not single, with a median to large size car and did not bought the policy from an agent, have a claim amount less than dollars and is paying monthly premium auto for less than 72 dollars, he/she have a 100% possibility saying yes to the offers. Based on our findings, the following recommendations are made to help the company reach its goals in boosting the positive response rate to offers and planning successful promotion effort in the future: 1. The company should first use the tree model to identify all customers in its current base that would be likely to accept the offer and those who will not. This will help to maximize current sales revenue and save expend such as significant amount of resources offering to uninterested customers. 2. The company should customize offer to suit the characteristics of each customer groups. The company should also run focus groups among those who are less likely to accept the offer (as predicted by this model) in order to determine if alternative products would be more attractive 3. The company may also consider increasing offers at branch offices as more people who responded yes bought their auto policy from a local branch.  4. The company can also try to reach out to high profitable customers by calling to maximize their possibility of positively responding to the offer.   These recommendations should help the company in targeted marketing efforts in order to maximize revenue and minimize cost. We also strongly recommend that the company should invest in improving customer service and customer retention. Customers who are satisfied with services and who remain longer in their auto policy are more likely to accept new products offered by the company. Tip 4: Reach out to high value customers by phone How to predict? A tree predictive model Boosting high value customers’ response rate? High profitable customers who bought policy on the phone are more likely to say yes Long Term Tips: Improving customer service and customer retention

17 References Berger & Nasr (1998), Customer lifetime value: Marketing models and applications. Journal of Interactive Marketing 12(1) (199824)12:1<17::AID-DIR3>3.0.CO;2-K Klimberg, R., & McCullough, B. (2013). Fundamentals of Predictive Analytics with JMP, Second Edition. SAS Institute Inc.  Song, Y., & Lu, Y. (2015). Decision tree methods: applications for classification and prediction. Shanghai Archives of Psychiatry, 27(2), 130– 135.  Thank you for taking the time to view our presentation. We hope that you enjoyed this data story that we presented for XYMS Auto Insurance Company. We look forward to exchange with you on this story. .


Download ppt "Boosting Customer Response Rate to Service Offers Using Data Mining"

Similar presentations


Ads by Google