Presentation is loading. Please wait.

Presentation is loading. Please wait.

MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)

Similar presentations


Presentation on theme: "MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)"— Presentation transcript:

1 MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)

2 Classification and Prediction

3 Classification Unsupervised Learning

4 Predicting Supervised Learning

5 SPSS Direct Marketing ClassificationPredictive Unsupervised Learning RFM Cluster analysis Postal Code Responses NA Supervised LearningCustomer ProfilingPropensity to buy

6 SPSS Analysis ClassificationPredictive Unsupervised Learning Hierarchical Cluster Two-Step Cluster K-Means Cluster NA Supervised LearningClassification Trees -CHAID -CART Linear Regression Logistic Regression Artificial Neural Nets

7 Major Algorithms ClassificationPredictive Unsupervised Learning Euclidean Distance Log Likelihood NA Supervised LearningChi-square Statistics Log Likelihood GINI Impurity Index F-Statistics (ANOVA) Log Likelihood F-Statistics (ANOVA) Nominal: Chi-square, Log Likelihood Continuous: F-Statistics, Log Likelihood

8 Euclidean Distance

9 Euclidean Distance for Continuous Variables Pythagorean distance  √d 2 = √(a 2 +b 2 ) Euclidean space  √d 2 = √(a 2 +b 2 +c 2 ) Euclidean distance  d = [(d i ) 2 ] 1/2 (Cluster Analysis with continuous var.)

10 Pearson’s Chi-Square

11 Contingency Table NorthSouthEastWestTot. Yes68755779279 No32453331141 Tot.10012090110420

12 Observed and theoretical Frequencies NorthSouthEastWestTot. Yes68 66 75 80 57 60 79 73 279 66% No32 34 45 40 33 30 31 37 141 34% Tot.10012090110420

13 Chi-Square: Obs. f o fefe fo-fefo-fe (f o -f e ) 2 f e 1,1 68 1,2 75 1,3 57 1,4 79 2,1 32 2,2 45 2,2 33 2,4 31 66 80 60 73 34 40 30 37 2 -5 -3 6 -2 5 3 6 4 25 9 36 4 25 9 36.0606.3125.1500.4932.1176.6250.3000.9730 X 2 = 3.032

14 Statistical Inference DF: (4 col –1) (2 rows –1) = 3 3.0327.8156.251.10.05

15 Log Likelihood Chi-Square

16 Log Likelihood Based on probability distributions rather than contingency (frequency) tables. Applicable to both categorical and continuous variables, contrary to chi-square which must be discreticized.

17 Contingency Table (Observed Frequencies) Cluster 1Cluster 2Total Male103040

18 Contingency Table (Expected Frequencies) Cluster 1Cluster 2Total Male10 20 30 20 40

19 Chi-Square: Obs. f o FeFe fo-fefo-fe (f o -f e ) 2 f e 1,1 10 1,2 30 20 -10 10 100 5.00 X 2 = 10.00 p < 0.05; DF = 1; Critical value = 3.84

20 Log Likelihood Distance & Probability Cluster 1Cluster 2 Male O E 10 20 30 20 O/E Ln (O/E) O * Ln (O/E) 2∑O*Ln(O/E) 10/20 =.50 -.693 10*-.693 -6.93 30/20=1.50.405 30*.405 12.164 2*(-6.93+12.164) = 10.46 p < 0.05; critical value = 3.84

21 Variance, ANOVA, and F Statistics

22 F-Statistics For metric or continuous variables Compares explained (in the model) and unexplained variances (errors)

23 Variance SQUARED VALUEMEANDIFFERENCE 20 43.6 557 34 43.6 92.16 34 43.6 92.16 38 43.6 31.36 38 43.6 31.36 40 43.6 12.96 41 43.6 6.76 41 43.6 6.76 41 43.6 6.76 42 43.6 2.56 43 43.6 0.36 47 43.6 11.56 47 43.6 11.56 48 43.6 19.36 49 43.6 29.16 49 43.6 29.16 55 43.6 130 55 43.6 130 55 43.6 130 55 43.6 130 COUNT20SS =1461 DF=19 VAR =76.88 MEAN43.6SD=8.768 SS is Sum of Squares DF = N-1 VAR=SS/DF SD = √VAR

24 ANOVA Two Groups: T-test Three + Group Comparisons: Are errors (discrepancies between observations and the overall mean) explained by group membership or by some other (random) effect?

25 Oneway ANOVA Grand mean Group 1Group 2Group 35.042 683 592(X-Mean) 2 4710.918 5830.002 4921.085 6710.002 5831.085 4920.918 0.002 Group means1.085 4.8758.1252.1258.752 15.668 3.835 8.752 (X-Mean) 2 15.668 1.2660.0160.7663.835 0.0160.7660.0168.752 0.7661.266 15.668 0.016 0.7664.168 0.766 0.0169.252 1.266 16.335 0.016 0.7664.168 0.766 0.0169.252 16.335 4.875 4.168 9.252 SS Within14.625 Total SS158.958

26 MSS(Between)/MSS(Within) Winthin groups Between Groups Total Errors SS14.625+144.333=158.958 DF24-3=213-1=224-1=23 Mean SS0.696 72.167 6.911 Between Groups Mean SS72.167 103.624p-value <.05 Within Groups Mean SS0.696

27 ONEWAY (Excel or SPSS) Anova: Single Factor SUMMARY GroupsCountSumAverageVariance Group 18394.8750.696 Group 28658.1250.696 Group 38172.1250.696 ANOVA Source of VariationSSdfMSFP-valueF crit Between Groups144.333272.167103.6241.318E-113.467 Within Groups14.625210.696 Total158.95823

28 Profiling

29 Customer Profiling: Documenting or Describing Who is likely to buy or not respond? Who is likely to buy what product or service? Who is in danger of lapsing?

30 CHAID or CART Chi-Square Automatic Interaction Detector Based on Chi-Square All variables discretecized Dependent variable: nominal Classification and Regression Tree Variables can be discrete or continuous Based on GINI or F-Test Dependent variable: nominal or continuous

31 Use of Decision Trees Classify observations from a target binary or nominal variable  Segmentation Predictive response analysis from a target numerical variable  Behaviour Decision support rules  Processing

32 Decision Tree

33 Example: dmdata.sav Underlying Theory  X 2

34 CHAID Algorithm Selecting Variables Example Regions (4), Gender (3, including Missing) Age (6, including Missing) For each variable, collapse categories to maximize chi-square test of independence: Ex: Region (N, S, E, W,*)  (WSE, N*) Select most significant variable Go to next branch … and next level Stop growing if …estimated X 2 < theoretical X 2

35 CART (Nominal Target) Nominal Targets: GINI (Impurity Reduction or Entropy) Squared probability of node membership Gini=0 when targets are perfectly classified. Gini Index =1-∑p i 2 Example Prob: Bus = 0.4, Car = 0.3, Train = 0.3 Gini = 1 –(0.4^2 + 0.3^2 + 0.3^2) = 0.660

36 CART (Metric Target) Continuous Variables: Variance Reduction (F-test)

37 Comparative Advantages (From Wikipedia) Simple to understand and interpret Requires little data preparation Able to handle both numerical and categorical data Uses a white box model easily explained by Boolean logic. Possible to validate a model using statistical tests Robust

38


Download ppt "MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)"

Similar presentations


Ads by Google