Evaluating Insurance Marketing Strategies through Data Mining Spring 2016 – BIT 5534 Group Project Group 3: Thomas George Shailesh Mittal Lonnie Elrod Jr. Read title Read the group members names
Business Objective Business Strategy Company’s business strategy to offer a wide range of insurance policies mostly with higher end premiums which in turn returns high benefits Business Problem The company needs to identify the correct demographic of potential customers which can afford a higher premium offering from a dataset of employee salaries from the city of San Francisco, CA For this project, we will be examining the business related data mining activities necessary for a successful launch of a start-up insurance company named ‘LST-Insurances Inc.’. The company offers a wide range of insurance policies, most with higher premiums which in turn returns high benefits. The firm offers insurance policies concentrated mainly in the health, retirement, and related areas. Before devising the marketing strategy for the high premium insurance offerings, the company needs strategic information about the salary range of the targeted groups, specifically which group or groups can afford such a product. Therefore the basic aim of the company is to obtain a detailed information analysis of the salary set of different individual groups to identify existing benefits, like health, retirement, etc. As the start-up business will reside in the city of San Francisco, CA, a city-wide salary dataset was obtained and subjected to an intensive data mining analysis. The ultimate goal of the data mining effort was to identify one or more classification models which is/are capable of accurately identifying individual groups whose total benefits received exceeds the lower threshold of $100,000
Additional Problems to Explore Should the firm’s new insurance plans be marketed to previously held or current benefit plans? Is there any different approach which can be adopted to sell insurance plans based on nature of work one performs? Can we define customized schemes for different sets of employee groups? What group could be targeted first to maximize the success of the firm’s potential entry to the market? Can we identify the premium policy price or prices? From the activities performed during analyzation of this business problem, additional questions were identified which should be explored before any definitive solution be declared. These include but are not limited to Read each question…
Analytical Goal Bridge the gap between current action and business strategy Identify the target audience Identify the predictive model of premium to be offered for target audience The analytical goal of answering any business problem involves transitioning from a business’ current landscape or configuration to one where expected profits are obtained through implementing a change in the company’s business strategy. In the case of our business problem, since our company is a startup with no existing frame of reference, we must determine which city employee groups should be targeted with our products. Since we are essentially bridging the gap between no strategy at all to a comprehensive marketing strategy, we must rely on a significant data mining exercise to help identify the correct target audience.
Organization Group Code Average total compensation Data Description Organization Group Code Group Name Average total compensation Population size 1 Public Protection $175,965.20 5,966 6 General Administration & Finance $151,676.10 1,609 4 Community Health $150,442.80 3,860 2 Public Works, Transportation & Commerce $141,729.70 6,995 5 Culture & Recreation $127,497.50 653 3 Human Welfare & Neighbourhood Development $126,125.60 1,245 Group 1 Sub No. Department Name Department Code 1 Superior Court CRT 2 District Attorney DAT 3 Department of Emergency Management ECD 4 Fire Department FIR 5 Juvenile Probation JUV 6 Police POL 7 Sheriff SHF After careful consideration of various datasets with business objective in mind, a dataset released by the city government of San Francisco, California obtained from the open data source ‘Kaggle.com’ for the fiscal years 2013-2015 was chosen for performing the data mining activities. Since the job market and salary trends of the employee groups fluctuate in an unsteady manner, only the data for the fiscal year 2015 will be considered. The dataset provides the details about various employee groups and its associated salary components. The chosen dataset has a population size of 247,737 with a variable set of 11 (see Appendix 1). Choosing only the 2015 subset decreases the population size to 20,329. Also to support the initial salary threshold assumption, each record with a salary less than $100,000 is removed.
Linear Regression Model With total benefits as the target variable and 4 independent variables (department code, salaries, overtime and other salaries), a linear regression model is developed (see Appendix 3). The results of the model show acceptable values for RSquare and variance.
Linear Regression Model (continued) The parameter estimates prove the various department codes are acceptable (below 0.05) which implies any department can be targeted for insurance premium at this level. This linear regression model has predicted the total benefits of each employee and from this we can say that if the predicted total benefit of an employee is greater than the existing total benefits then that employee can be targeted for purchasing a new premium policy from LST insurance company. However further analysis proves we can derive a better identification of target group through another data mining model and is discussed later.
Box Plot Interpretation In order to identify which specific department should be targeted, box plots are created against each desirable predicted benefit with respect to the department codes (Box Plots were generated through Tableau). This infers the ones with highest mean value against these policies can be identified as the top target for those respective premiums and a comprehensive package option can be provided as a trade off for those who aren’t the top value in any category. Based on the inferences from these box plots, we can see that department FIR should be targeted for retirement and health/dental benefits while department ECD should be targeted for other benefits policies. Based on this presumption, other departments can be targeted based on the trade-off between these benefits and the total benefits. Therefore the salary group to be targeted is those with the predicted benefits higher than the existing benefits because these employees expect more benefits for their salary range.
Leaf Report In order to understand the relationship between independent and target variables better, decision trees (using a bootstrap model) are generated. For our problem, we generated 4 decision trees each with the independent variables retirement benefits, health/dental benefits, other benefits and total benefits) against our target insurance premiums. The leaf report is generated for all these four cases in order to identify the predicted target parameter for each identified group from the box plots. From the results of the box plots displayed previously we can see, which are the next two favourable groups in each case based on their mean value. After identification of these groups were made, the leaf report corresponding to the individual policy can provide the salary range for those groups and the predicted value of the respective policy benefits.
Model Comparison In order to identify the best model for the prediction, a model comparison is made between three cases: linear regression, bootstrap model, and neural networks. From the results we can see that the decision tree bootstrap model turns out to be the best predictor out of all the cases considering the value of R-square and other parameters. Hence this model should be used for solving our business problem and for similar activities on other datasets for prediction.
Conclusion The identification of an appropriate strategy for setting up the business plan in the case of opening a new firm is always a tedious work. In our business problem we have explored various data mining activities to guide the firm through this decision making process. The steps we have considered from the preparation of the data to the creation of various data mining models is a logical path every project team can identify while considering a business problem like identifying potential customers. The results obtained from this analysis can help the firm in identifying the appropriate target group and in carrying a proper oriented marketing strategy on these groups to obtain the expected results.
Future The identification of a price for the high premium products was not made in this specific case. This can be made by the adoption of a dataset which can provide inputs for existing insurance policies while considering the results obtained from the business problem explored within this project. Also, a more elaborate study can be performed on the other groups (2 and 6) to formulate a comprehensive business plan.