Download presentation
Presentation is loading. Please wait.
1
Data Mining Research David L. Olson University of Nebraska
2
Data Mining Research Business Applications – Credit scoring – Customer classification – Fraud detection – Human resource management Algorithms Database related – Data warehouse products claim internal data mining Text mining Data Mining Process
3
Personal (with others) Business Applications – Introduction to Business Data Mining with Yong Shi [2006] – Qing Cao - RFM Algorithms – Advanced Data Mining Techniques with Dursun Delen [2008] – Moshkovich & Mechitov – Ordinal scales in trees – Data set balancing Database related – encyclopedia Text mining – Web log ethics Data Mining Process – Ton Stam, Dursun Delen
4
RFM with Qing Cao, Ching Gu, Donhee Lee Recency – Time since customer made last purchase Frequency – Number of purchases this customer made over time frame Monetary – Average purchase amount (or total)
5
Variants F & M highly correlated – Bult & Wansbeek [1995] Journal of Marketing Science Value = M/R – Yang (2004) Journal of Targeting, Measurement and Analysis for Marketing
6
Limitations Other attributes may be important – Product variation – Customer age – Customer income – Customer lifestyle Still, RFM widely used – Works well if response rate is high
7
Data Meat retailer in Nebraska 64,180 purchase orders (mail) 10,000 individual customers Oct 11, 1998 to Oct 3, 2003 ORDER DATA ORDER AMOUNT PRESENCE OF PROMOTION
8
Data Nebraska food products firm 64,180 individual purchase orders (by mail) 10,000 individual customers 11 Oct 1998 to 3 Oct 2003 Data: – Order date – Order amount (price) – Whether or not promotion involved
9
Treatment Used 5,000 observations to build model – To the end of 2002 Used another 5,000 for testing – 2003
10
Correlations * - 0.01 significance; ** - 0.05 significance; *** - 0.001 significance RFM R 1.0 F -0.3711.0 M -0.2780.7491.0 Response 2003 0.209 ***0.135 ***0.133 *** Response 2003 $ -0.100 ***0.534 ***0.751 ***
11
Data FactorMinMaxGroup1Group2Group3Group4Group5 R115421233 +925-1232617-924309-6161-308 Count 5482092974643482 F1561-1112-2223-3334-4445 + Count 450343149134 M20151990-30403041- 6060 6061- 9080 9081- 12100 12100 + Count 4900811711
12
Count by RFM Cell RFRFM1M2M3M4M5 55R 1-308F 45+12100 54 F 34-4436400 53 F 23-332223400 52 F 12-2235536511 51 F 1-11300312300 45R309-616F 45+00000 44 F 34-4400000 43 F 23-3300000 42 F 12-22291000 41 F 1-114331000 35R617-924F 45+00000 34 F 34-4400000 33 F 23-3300000 32 F 12-2230000 31 F 1-112940000 25R 925-1232F 45+00000 24 F 34-4400000 23 F 23-3300000 22 F 12-2200000 21 F 1-112090000 15R 1233+F 45+00000 14 F 34-4400000 13 F 23-3300000 12 F 12-2200000 11 F 1-115480000
13
Basic Model Coincidence Matrix Correct 0.6076 Actual 0Actual 1Totals Model 087219492821 Model 11321662179 Totals88541145000
14
BALANCE CELLS Adjusted boundaries of 5 x 5 x 5 matrix Can’t get all to equal average of 8 – Lumpy (due to ties) – Ranged from 4 to 11
15
Balanced Cell Densities Correct 0.8380 RFRFM1M2M3M4M5 55R 1-22F 9 +4341 42 54 F 6-8434443 44 53 F 4-55764636163 52 F = 330313534 51 F 1-22625 2124 45R 23-48F 9 +41 44 F 6-83438363837 43 F 4-5585657 42 F = 34036373837 41 F 1-21920182220 35R 49-151F 9 +63646263 34 F 6-84950 33 F 4-543 444544 32 F = 32921282728 31 F 1-21923191821 25R 152-672F 9 +38 24 F 6-850495150 23 F 4-551 22 F = 33233313233 21 F 1-22925332630 15R 673+F 9 +1615 14 F 6-81516151615 13 F 4-530 12 F = 359706364 11 F 1-26773937669
16
Alternatives LIFT – Sort groups by best response – Apply your marketing budget to the most profitable (until you run out of budget) – LIFT is the gain obtained above par (random) VALUE FUNCTION (Yang, 2004) – Throw out F (correlated with M) – Use ratio of M/R Logistic Regression Decision Tree Neural Network
17
LIFT Equal Groups
18
V Value by Cell CellMinn01%Avg$ 1042434210.99394.42 20.046428622840.993101.26 30.10828112800.996107.18 40.1953520 1.000108.25 50.37630323010.993119.99 60.7228592760.968136.13 71.252292572350.805127.05 81.9523191201990.62498.31 92.732931011920.655101.01 103.742291021270.555101.69 114.972231871440.623102.33 126.524254861680.661107.12 138.6218751430.656119.83 1411.08216711450.671119.99 1514.34191491420.743122.55 1618.35207461610.778157.82 1724.15166301360.819159.17 1832.87148171310.885220.18 1948.2175161590.909284.74 2092130111190.915424.69 Totals500088541150.823131.33
19
V Model Lift
20
Models Regression: -0.4775 + 0.00853 R + 0.1675 F + 0.00213 M Test data: Correct 0.8230 Decision Tree IF R ≤ 82 AND R ≤ 32YES (1567 right, 198 wrong) ELSE R > 32 AND F ≤ 3 AND M ≤ 296NO (285 right, 91 wrong) ELSE M > 296YES (28 right, 9 wrong) ELSE F > 3YES (729 right, 110 wrong) ELSE R > 82YES (2391 right, 3 wrong) Test data: Correct 0.8678 Neural Network Test data: Correct 0.8674
21
COMPARISONS ModelTest AccuracyBenefitsDrawbacks RFM0.6076Simplest dataUneven cell densities Degenerate (all 1)0.8230 Balanced cell sizes0.7156Better statisticallyMore data manipulation Balanced cell sizes $0.8380Better statistically Value function0.8180Condense to one IVLess information Logistic regression 0.8230 (degenerate) Additional IVsFormula hard to apply Decision tree0.8678Easy to interpret Neural network0.8674Fit nonlinear dataHard to apply model
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.