Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Mining Research David L. Olson University of Nebraska.

Similar presentations


Presentation on theme: "Data Mining Research David L. Olson University of Nebraska."— Presentation transcript:

1 Data Mining Research David L. Olson University of Nebraska

2 Data Mining Research Business Applications – Credit scoring – Customer classification – Fraud detection – Human resource management Algorithms Database related – Data warehouse products claim internal data mining Text mining Data Mining Process

3 Personal (with others) Business Applications – Introduction to Business Data Mining with Yong Shi [2006] – Qing Cao - RFM Algorithms – Advanced Data Mining Techniques with Dursun Delen [2008] – Moshkovich & Mechitov – Ordinal scales in trees – Data set balancing Database related – encyclopedia Text mining – Web log ethics Data Mining Process – Ton Stam, Dursun Delen

4 RFM with Qing Cao, Ching Gu, Donhee Lee Recency – Time since customer made last purchase Frequency – Number of purchases this customer made over time frame Monetary – Average purchase amount (or total)

5 Variants F & M highly correlated – Bult & Wansbeek [1995] Journal of Marketing Science Value = M/R – Yang (2004) Journal of Targeting, Measurement and Analysis for Marketing

6 Limitations Other attributes may be important – Product variation – Customer age – Customer income – Customer lifestyle Still, RFM widely used – Works well if response rate is high

7 Data Meat retailer in Nebraska 64,180 purchase orders (mail) 10,000 individual customers Oct 11, 1998 to Oct 3, 2003 ORDER DATA ORDER AMOUNT PRESENCE OF PROMOTION

8 Data Nebraska food products firm 64,180 individual purchase orders (by mail) 10,000 individual customers 11 Oct 1998 to 3 Oct 2003 Data: – Order date – Order amount (price) – Whether or not promotion involved

9 Treatment Used 5,000 observations to build model – To the end of 2002 Used another 5,000 for testing – 2003

10 Correlations * - 0.01 significance; ** - 0.05 significance; *** - 0.001 significance RFM R 1.0 F -0.3711.0 M -0.2780.7491.0 Response 2003 0.209 ***0.135 ***0.133 *** Response 2003 $ -0.100 ***0.534 ***0.751 ***

11 Data FactorMinMaxGroup1Group2Group3Group4Group5 R115421233 +925-1232617-924309-6161-308 Count 5482092974643482 F1561-1112-2223-3334-4445 + Count 450343149134 M20151990-30403041- 6060 6061- 9080 9081- 12100 12100 + Count 4900811711

12 Count by RFM Cell RFRFM1M2M3M4M5 55R 1-308F 45+12100 54 F 34-4436400 53 F 23-332223400 52 F 12-2235536511 51 F 1-11300312300 45R309-616F 45+00000 44 F 34-4400000 43 F 23-3300000 42 F 12-22291000 41 F 1-114331000 35R617-924F 45+00000 34 F 34-4400000 33 F 23-3300000 32 F 12-2230000 31 F 1-112940000 25R 925-1232F 45+00000 24 F 34-4400000 23 F 23-3300000 22 F 12-2200000 21 F 1-112090000 15R 1233+F 45+00000 14 F 34-4400000 13 F 23-3300000 12 F 12-2200000 11 F 1-115480000

13 Basic Model Coincidence Matrix Correct 0.6076 Actual 0Actual 1Totals Model 087219492821 Model 11321662179 Totals88541145000

14 BALANCE CELLS Adjusted boundaries of 5 x 5 x 5 matrix Can’t get all to equal average of 8 – Lumpy (due to ties) – Ranged from 4 to 11

15 Balanced Cell Densities Correct 0.8380 RFRFM1M2M3M4M5 55R 1-22F 9 +4341 42 54 F 6-8434443 44 53 F 4-55764636163 52 F = 330313534 51 F 1-22625 2124 45R 23-48F 9 +41 44 F 6-83438363837 43 F 4-5585657 42 F = 34036373837 41 F 1-21920182220 35R 49-151F 9 +63646263 34 F 6-84950 33 F 4-543 444544 32 F = 32921282728 31 F 1-21923191821 25R 152-672F 9 +38 24 F 6-850495150 23 F 4-551 22 F = 33233313233 21 F 1-22925332630 15R 673+F 9 +1615 14 F 6-81516151615 13 F 4-530 12 F = 359706364 11 F 1-26773937669

16 Alternatives LIFT – Sort groups by best response – Apply your marketing budget to the most profitable (until you run out of budget) – LIFT is the gain obtained above par (random) VALUE FUNCTION (Yang, 2004) – Throw out F (correlated with M) – Use ratio of M/R Logistic Regression Decision Tree Neural Network

17 LIFT Equal Groups

18 V Value by Cell CellMinn01%Avg$ 1042434210.99394.42 20.046428622840.993101.26 30.10828112800.996107.18 40.1953520 1.000108.25 50.37630323010.993119.99 60.7228592760.968136.13 71.252292572350.805127.05 81.9523191201990.62498.31 92.732931011920.655101.01 103.742291021270.555101.69 114.972231871440.623102.33 126.524254861680.661107.12 138.6218751430.656119.83 1411.08216711450.671119.99 1514.34191491420.743122.55 1618.35207461610.778157.82 1724.15166301360.819159.17 1832.87148171310.885220.18 1948.2175161590.909284.74 2092130111190.915424.69 Totals500088541150.823131.33

19 V Model Lift

20 Models Regression: -0.4775 + 0.00853 R + 0.1675 F + 0.00213 M Test data: Correct 0.8230 Decision Tree IF R ≤ 82 AND R ≤ 32YES (1567 right, 198 wrong) ELSE R > 32 AND F ≤ 3 AND M ≤ 296NO (285 right, 91 wrong) ELSE M > 296YES (28 right, 9 wrong) ELSE F > 3YES (729 right, 110 wrong) ELSE R > 82YES (2391 right, 3 wrong) Test data: Correct 0.8678 Neural Network Test data: Correct 0.8674

21 COMPARISONS ModelTest AccuracyBenefitsDrawbacks RFM0.6076Simplest dataUneven cell densities Degenerate (all 1)0.8230 Balanced cell sizes0.7156Better statisticallyMore data manipulation Balanced cell sizes $0.8380Better statistically Value function0.8180Condense to one IVLess information Logistic regression 0.8230 (degenerate) Additional IVsFormula hard to apply Decision tree0.8678Easy to interpret Neural network0.8674Fit nonlinear dataHard to apply model


Download ppt "Data Mining Research David L. Olson University of Nebraska."

Similar presentations


Ads by Google