Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 10 MARK2039 Summer 2006 George Brown College Wednesday 9-12.

Similar presentations


Presentation on theme: "Lecture 10 MARK2039 Summer 2006 George Brown College Wednesday 9-12."— Presentation transcript:

1 Lecture 10 MARK2039 Summer 2006 George Brown College Wednesday 9-12

2 2 Assignment 8: Geocoding example Example: –A retailer has the following information: Name and address of its customers Address of its stores Stats Can Information –As a marketer, how would you intelligently use this information Get Postal codes of customers and stores Get geocodes(latitude and longitude numbers of each postal code) Calculate distance between each customer and neares store Create trading area around store to determine relevant customers for store Identify best stores and calculate demographics of best stores vs. the remaining stores Use above learning to either promote non performing stores with similar customer demographic makeup of best stores Use above info to determine where to open up or perhaps close stores

3 3 Assignment 8 Why do we look at correlation analysis as our first statistical exercise in the data mining process Allows us to initially use statistics as a prescreen tool in eliminating variables from the data mining exercise

4 4 Assignment 8 Give me an example of a correlation table of 5 variables where two variables are significant and three variables are not significant. Provide correlation values that support your results

5 5 Recapping from last week Geocoding –What are key things to think of. Look at answer from two slides ago.Geo coding gives us numbers to calculate distance between two postal codes More Material on correlation analysis How do EDA reports tie into the correlation analysis –They are trend-like reports which demonstrate why a given variable has a strong relationship with the objective function. How should we present the final results of a model? How is the above derived? From the partial R2 of each variable divided by the total R2 of the equation.

6 6 Notion of Lift What is Lift: the performance of a group relative to the performance of the benchmark Examples: Type of Activity Untargetted/ Benchmark Targetted/ ChallengerLift Acquisition Campaign Response Rate 1%2%200. Retention Campaign Churn Rate 15%25%166 Credit Card Loss Rate 5%8%160 Product Affinity Rate 10%30%300 The targetted group represents those names as determined by a data mining tool such as a predictive model.

7 7 Notion of Lift Examples of cases where lift is below 100 Type of Activity Untargetted/ Benchmark Targetted/ ChallengerLift Acquisition Campaign Response Rate 1%.5%50 Retention Campaign Churn Rate 15%10%66 Credit Card Loss Rate 5%2%40 Product Affinity Rate 10%6%60

8 8 Validating the Model: Example of a Gains Chart Listed below are the hard numbers that might comprise a lift curve n Revenue per order is $60. n Cost of 1 mail piece is $.855 n Benefits of modelling are the foregone promotion costs by promoting fewer names to achieve a given # of orders at a higher response rate. % of ListValidationCum.Cum. %Cum.IntervalBenefits (Ranked byMailResp.of allLiftROI ModelQuantityRateResp Score) 0-10%200003.50%23.33%233145%$22799 10-20%400003.00%40%20075%$34200 20-30%600002.75%55%18358%$42750 30-40%800002.50%67%16723%$45600 40-50%1000002.25%75%150-12.2%$42750... 90-100%20,00001.50%100%100-58%$0 How might this be plotted?-in class we saw this as a straight decreasing linear slope if we were plotting interval resp. rate against the deciles. If we plot the Cum % of responders, then the shape would be a parobola type curve with a larger parobola representing a better model. Meanwhile, a steeper slope if we plotted interval response rate against deciles would represent a stronger model.

9 9 Validating the Model: Calculating the metrics on the gains charts. Cum. % of Responders in top 10%: –Total Responders: 200000 X 1.5%: 3000 –# of responders in top 10%:20000X3.5%: 700 –Cum. % in top 10%: 700/3000: 23% Cum. Lift in top 10%: –Average Response Rate: 1.5% –Cum. Response Rate in top 10%: 3.5% –Cum.Lift: 233

10 10 Calculating the metrics on the gains charts. Interval ROI in 10%-20% –# of persons mailed: 20000 –# of responders in 10%-20%(40%-23.33%)*3000: 500 –Net revenue: (500*60)-.855*20000: 12900 –Costs: 17100 –ROI:(12900/17100): 75% Calculating Benefits Column at 30%: –Mailed costs to achieve 1650 responders without modelling: ((.0275*60000)/.015) *.855=94050 –Mailed costs with modelling=60000*.855=51300 –Benefits: 94050-51300= $42750

11 11 Gains Chart Examples Assume a mail cost of $1.00 per piece and a revenue per order of $50.00. Please fill in the blanks for the first 4 rows. Cum. # of Names Mailed Cum. Response RateInterval Resp.Rate Interval LiftBenefitsInterval ROI 100002.50% 200002.25% 300002.10% 400001.80%.... 1000001% 2.5% 1 IntervalResp.Rate 10,000*0.025=250=2.5% 20,000*0. 2.5% 250 $15,000 $25,000 $33,000 $32,000 200 1.8% 180 0.9%90 2.5% 25% 0 -10% -55%

12 12 Lift Curve with Zero Model Effectiveness What does this look like if we plot it on a lift curve A line rather than a parobola if we plot cum % of responders

13 13 Gains Chart Examples What is the best model?-Model 1 What is the worst model?-Model 4 What are the Model 3 results telling you. –we have some rank ordering all the way down to 70000 names and then the model flattens out-may need a strategy here for this bottom segment.

14 14 Gains Chart Examples In each response model case, answer the following questions: Where would you cutoff be with a budget of $80000 and a cost per piece of $2.00 40000 names Where would you cutoff be if you needed to attain a forecasted order qty of 350. Between 10000 and 20000 names-model 1 and 2, between 20000 and 30000 for model 3 and between 30000 and 40000 for model 4 Where would your optimum cutoff be presuming that budget nor forecasted order model quantities were constraints? 50000-model 1,2, and 60000 for model 3 –it does not matter for model 4

15 15 Gains Chart Examples Calculate the Following: -Interval Names Mailed -Cum. Response Rate Calculate the Following: -Interval Names Mailed -Cum. Response Rate Assuming a cost per name of $1.50 and revenue per responder of $75, calculate the interval ROI for each interval and modelling benefits for each interval? Assuming a cost per name of $1.50 and revenue per responder of $75, calculate the interval ROI for each interval and modelling benefits for each interval?

16 16 Tracking of Models Two models are used in two campaigns. In campaign A, the overall response rate is 3.5% which is above the breakeven response rate of 2%. In campaign B, the overall response rate is 1.2% which is below the breakeven response rate of 2%. Yet, the model in campaign B is more effective. Explain Why? Model is rank ordering names quite well for campaign B(1.2% overall) while the better campaign overall (3.5%) exhibits no rank ordering of response rate between deciles.

17 17 CHAID CHAID” is an acronym for Chi-square Automatic Interaction Detection Produces decision-tree like report –Branches and Nodes Non parametric approach –Output of routine is a segment or group as opposed to a score Uses Chi-Square statistics to determine statistically significant breaks Conceptual Interpretation: (Observed-Expected)/Expected

18 18 CHAID What criteria determine the end nodes?


Download ppt "Lecture 10 MARK2039 Summer 2006 George Brown College Wednesday 9-12."

Similar presentations


Ads by Google