Download presentation
Presentation is loading. Please wait.
Published byLoraine Phillips Modified over 9 years ago
1
Analytical Model Development & Implementation Experience from the Field Bhavani Raskutti
2
2 Topics to be covered Model development & implementation process Case Study 1: Corporate Customer Modelling at Telcos Case Study 2: Sales Opportunities for wholesalers Take-Home Points
3
3 Model Development & Implementation Process Solution enabling business to make strategic & operational decisions Business Problem Data Acquisition & Preparation DAP Analytical Problem Definition APD D Deployment Presentation P Mathematical Modelling (Algorithms) Data Matrix MM Model Validation MV Decision-making by users Insights via GUI Automation Training Documentation IT Support Model Development Iterative 90% DAP
4
4 Topics to be covered Model development & implementation process Case Study 1: Corporate Customer Modelling at Telcos Case Study 2: Sales Opportunities for wholesalers Take-Home Points
5
5 Business Problem Large drops in margins & revenue in corporate customer base Partial churn of some corporate customers to other telcos Lack of understanding of customer’s needs Project will target revenue improvement opportunities with an indicative $15 million in sales by: undertaking a rapid analysis of Customer data from core systems, including front of house, customer satisfaction and marketing for customers with a spend greater than $100k, excluding state and local government outcomes are to be validated using artificial intelligence tools and rigorous methodology by … Verbatim from client’s presentation to stake holders Using data analysis, increase revenue from corporate customers whose spend is > $100k
6
6 1. Analytical Problem Definition Increase revenue from corporate customers by -W-Win-back (database look-up)? -c-churn reduction? -U-Up-sell/cross-sell to an existing customer? Customer data -R-Relationship with customer –C–Customer satisfaction survey data –S–Service assurance data (customer complaints) -D-Demographic information about business customer –I–Industry segment information –N–Number of sites -R-Revenue from customer –Q–Quarterly revenue from different products Create models to predict up-sell based on revenue data 1. Analytical Problem Definition Using data analysis, increase revenue from corporate customers whose spend is > $100k
7
7 2. Data Acquisition & Processing Population: - Customers in a segment who currently do not have the product being modelled Target or positive case definition: - Customers in the segment who take up the product within a time period Predictors for modelling Using revenue data, create models to predict customers likely to take up a specific product 2. Data Acquisition & Processing
8
8 Population and Target Definition Let r iP be the revenue from a customer on product P in billing period i Population in period i includes all customers with r (i-1)P = 0 Target or Product take-up in period i iff r (i-1)P =0 and r iP >TU MIN - TU MIN > 0 is the minimum take-up amount determined by the business Predictors Labels TRAIN: r (i-1)P = 0 Predict for r iP = 0 i i+1 i-1 i 2. Data Acquisition & Processing
9
9 Low take-up rates: not enough targets Average number of take-ups for any product in any period is small - Large businesses – Less than 20 take-ups in a period for 70 of the 100+ products – Less than 10 take-ups for 45 products - Medium businesses – Less than 20 take-ups for 71 products – Less than 10 take-ups for 60 products Reasons - “niche” products - Saturated products 2. Data Acquisition & Processing
10
10 Low take-up rates (Cont’d) Impact of data aggregation k=2 is useful Minimum take-ups(n) for modelling Aggregate data over multiple billing periods k Product take-up in periods i to i+k-1 iff r (i-j)P =0 for j=1..k and j=0..k-1 r (i+j)P >(k TU MIN )) Predictors Labels i-3 i-2 i-1 i TRAIN target: r (i-j)P = 0, j = 0..1 Predict if r (i+j)P = 0 or 1; j = 1..2 i-1 i i+1 i+2 2. Data Acquisition & Processing
11
11 Low take-up rates (cont’d) Use of time interleaving - Aggregate data with k=2 - Generate 3 sets of data moved forward by a period - Concatenate the 3 sets to get 3 times as much training data as for data aggregation with k=2 Impact of time interleaving Time interleaving enormously enhances modellability i-5 i-4 i-3 i-2 Predictors Prediction Labels TRAINTRAIN i-4 i-3 i-2 i-1 i-3 i-2 i-1 i i-1 i i+1 i+2 2. Data Acquisition & Processing
12
12 Predictors for Modelling Revenue predictors used - r (i-3)Q – revenue for all products in billing period i-3 - Change in revenue from period i-3 to i-2, r (i-3)Q - r (i-2)Q - Projected revenue for period i-1, 2r (i-3)Q - r (i-2)Q All revenue predictors used both as raw values, and normalised by total customer revenue Binary predictors indicating churn/take-up in period i-2 All continuous predictors converted to binary using 10 equisize bins - Overcomes the negative impact of large variance in revenues - Allows generation of non-linear models using linear techniques Predictors Labels i-3 i-2 i-1 i TRAIN target: r (i-j)P = 0, j = 0..1 2. Data Acquisition & Processing
13
13 3. Mathematical Modelling Imbalance in class sizes - Large businesses – 51 products with < 0.5% take-up on average – 25 products with < 0.1% take-up - Medium businesses – 74 products with < 0.5% take-up on average – 54 products with < 0.1% take-up Maximisation of total take-up revenue - Identifying new high value customers is a priority - Extent of variance – Take-up amounts range from TU MIN to over a million dollars – Take-up amounts are not correlated with total revenue in previous billing periods 3. Mathematical Modelling
14
14 Imbalance in class sizes Use of Support Vector Machines (SVMs) instead of decision trees, neural nets or logistic regression - Based on Vapnik’s statistical learning theory - Maximises the margin of separation between two classes Two different SVM implementations - SVM std : equal weight to all training examples - SVM bal : class dependent weights so all take-ups have a higher weight than all non-take-ups m + and m - : number of +ve and -ve examples C + and C - : weight of +ve and -ve examples 3. Mathematical Modelling
15
15 Identifying high value take-up SVM val : SVM with different weights for different positive (take-up) training examples - All take-up examples have a higher weight than all the non-take-up examples (as for SVM bal ) - Each take-up training example has a weight proportional to the amount of take-up m + and m - : number of +ve and -ve examples C - : weight of -ve examples TU(i) : Take-up amount of the i th +ve example C + (i) : weight of the i th +ve example 3. Mathematical Modelling
16
16 4. Model Validation Model assessment - Two tests for assessing quality of models (~4,000 models) – 10-fold cross validation tests to determine the best of the 3 SVMs – Tests in production setting to evaluate time interleaving - All tests on 30 product take-up prediction problems in 4 segments - Performance measures on unseen test set – Area under receiver operating characteristic curve (AUC) Measures quality of sorting Decision threshold independent metric – Value weighted AUC (VAUC) Indicates potential revenue from the sorting SVM val with time interleaved data is used for generating models - SVM val significantly more accurate than the other two - Time interleaving produces more stable models 4. Model Validation
17
17 Model Validation by Business Predictive models identify more sales opportunities than that identified manually - 3 times as many in large businesses segment - 5 times as many in medium businesses segment Results for 2 different regions in medium businesses - Region 1: Predictions for just 5 products generated 9 new opportunities with an increase in revenue of ~400K A$ - Region 2: Predictions identified opportunities that were already being processed by sales consultants Predictive modelling spreads the techniques of good sales teams across the whole organisation 4. Model Validation
18
18 5. Presentation Output in Excel Spread Sheet automatically generated One customer list per segment with: - Take-up likelihood for all modelled products - Last quarter revenue for all products 5. Presentation
19
19 6. Deployment Implementation in Matlab & C with output in Excel Automatic quarterly updates of model after consolidated revenue figures are available Models for ~50 products for each of the 4 business segments Output delivered to business analytics group - Different cut-offs for different products/regions - Superimposition of other data for filtering/sorting Use of output by sales consultants for renegotiating contracts with customers 6. Deployment
20
20 Project Timeline Initial approach to data availability for pilot: 12 weeks Data to pilot: 6 weeks Model validation by business: 12 weeks Pilot deployment (5 products, 1 segment): 6 weeks Acceptance by business teams: over 9 months Final deployment: 4 weeks In operation for more than 8 years!! 6. Deployment
21
21 Key Success Factors Willingness of stake-holders to try non-standard solutions Innovative solution: Paper published in KDD 2005 - Target definition using multiple overlapping time periods to boost the number of rare events for modelling - Use of support vector machines for customer analytics Being lazy - Scope change from 4 to 50 products - Scope change from 2 to 4 segments - Development of ~200 predictive models in one shot - No stale models in production Working with business analysts to instigate change: - Product-centric modelling to customer-centric product packaging
22
22 Topics to be covered Model development & implementation process Case Study 1: Corporate Customer Modelling at Telcos Case Study 2: Sales Opportunities for wholesalers Take-Home Points
23
23 DAP APD D P MM MV - Sales demand - Similar products @ similar outlets have similar demand to sales relationship - Anomaly may be due to lack of stock Increase wholesale sales into major retailers - Quantify demand - Define normalised sell-rate - Define a long term in-stock measure - Define products & outlets that are similar - Weekly SOH & sales for each store & SKU - SKU master - Store master Simple univariate regression in SQL Perform comparisons & find anomalies with stock issues - Self-serve report for each sales rep - Presents list of products with sales opportunities - Click thru’ to detailed graphs Case Study: Wholesale Sales - Absolute error - Validate with retail
24
24 Demand In-stock % · R1 · R2 Demand Sell Rate Sell rate vs Consumer Demand plot Each point is a store R1 & R2 are comparable retailers Values for the same product Possible reasons for difference Competing product at R2 Pricing at R2 vs R1 Lack of stock at R2 Case Study: Wholesale Sales (Cont’d)
25
25 DAP APD D P MM MV - Sales demand - Similar products @ similar outlets have similar demand to sales relationship - Anomaly may be due to lack of stock Increase wholesale sales into major retailers - Quantify demand - Define normalised sell-rate - Define a long term in-stock measure - Define products & outlets that are similar - Weekly SOH & sales for each store & SKU - SKU master - Store master Simple univariate regression in SQL - Self-serve report for each sales rep - Presents list of products with sales opportunities - Click thru’ to detailed graphs - SQL & Cognos - Automatic weekly updates - Training by corporate training team - Support from IT helpdesk Perform comparisons & find anomalies with stock issues Case Study: Wholesale Sales (Cont’d) - Absolute error - Validate with retail
26
26 Topics to be covered Model development & implementation process Case Study 1: Corporate Customer Modelling at Telcos Case Study 2: Sales Opportunities for wholesalers Take-Home Points
27
27 Take-home points Data acquisition & processing phase forms 80-90% of any analytics project Business users are tool agnostic - R, SAS, Matlab, SPSS, … for statistical analysis - Tableau, Cognos, Excel, VB, … for presentation Business adoption of analytics driven by - Utility of application - Validation of results by using real-life cases - Ease of decision-making from insights - Ability to explain insights
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.