MKT 700 Business Intelligence and Decision Models Week 8: Algorithms and Customer Profiling (1)
Classification and Prediction
Classification Unsupervised Learning
Predicting Supervised Learning
SPSS Direct Marketing ClassificationPredictive Unsupervised Learning RFM Cluster analysis Postal Code Responses NA Supervised LearningCustomer ProfilingPropensity to buy
SPSS Analysis ClassificationPredictive Unsupervised Learning Hierarchical Cluster Two-Step Cluster K-Means Cluster NA Supervised LearningClassification Trees -CHAID -CART Linear Regression Logistic Regression Artificial Neural Nets
Major Algorithms ClassificationPredictive Unsupervised Learning Euclidean Distance Log Likelihood NA Supervised LearningChi-square Statistics Log Likelihood GINI Impurity Index F-Statistics (ANOVA) Log Likelihood F-Statistics (ANOVA) Nominal: Chi-square, Log Likelihood Continuous: F-Statistics, Log Likelihood
Euclidean Distance
Euclidean Distance for Continuous Variables Pythagorean distance √d 2 = √(a 2 +b 2 ) Euclidean space √d 2 = √(a 2 +b 2 +c 2 ) Euclidean distance d = [(d i ) 2 ] 1/2
Pearson’s Chi-Square
Contingency Table NorthSouthEastWestTot. Yes No Tot
Observed and theoretical Frequencies NorthSouthEastWestTot. Yes % No % Tot
Chi-Square: Obs. f o fefe fo-fefo-fe (f o -f e ) 2 f e 1,1 68 1,2 75 1,3 57 1,4 79 2,1 32 2,2 45 2,2 33 2, X 2 = 3.032
Statistical Inference DF: (4 col –1) (2 rows –1) =
Log Likelihood Chi-Square
Log Likelihood Based on probability distributions rather than contingency (frequency) tables. Applicable to both categorical and continuous variables, contrary to chi-square which must be discreticized.
Contingency Table (Observed Frequencies) Cluster 1Cluster 2Total Male103040
Contingency Table (Expected Frequencies) Cluster 1Cluster 2Total Male
Chi-Square: Obs. f o FeFe fo-fefo-fe (f o -f e ) 2 f e 1,1 10 1, X 2 = p < 0.05; DF = 1; Critical value = 3.84
Log Likelihood Distance & Probability Cluster 1Cluster 2 Male O E O/E Ln (O/E) O * Ln (O/E) 2∑O*Ln(O/E) 10/20 = * /20= * * = p < 0.05; critical value = 3.84
Variance, ANOVA, and F Statistics
F-Statistics For metric or continuous variables Compares explained (in the model) and unexplained variances (errors)
Variance SQUARED VALUEMEANDIFFERENCE COUNT20SS =1461 DF=19 VAR =76.88 MEAN43.6SD=8.768 SS is Sum of Squares DF = N-1 VAR=SS/DF SD = √VAR
ANOVA Two Groups: T-test Three + Group Comparisons: Are errors (discrepancies between observations and the overall mean) explained by group membership or by some other (random) effect?
Oneway ANOVA Grand mean Group 1Group 2Group (X-Mean) Group means (X-Mean) SS Within Total SS
MSS(Between)/MSS(Within) Winthin groups Between Groups Total Errors SS = DF24-3=213-1=224-1=23 Mean SS Between Groups Mean SS p-value <.05 Within Groups Mean SS0.696
ONEWAY (Excel or SPSS) Anova: Single Factor SUMMARY GroupsCountSumAverageVariance Group Group Group ANOVA Source of VariationSSdfMSFP-valueF crit Between Groups E Within Groups Total
Profiling
Customer Profiling: Documenting or Describing Who is likely to buy or not respond? Who is likely to buy what product or service? Who is in danger of lapsing?
Profiling/Decision Tree SPSS Direct Marketing Customer Profiling Postal Code responses SPSS Analysis Classification Decision Tree CHAID (Chi-Square Automatic Interactive Detector) CART (Classification and Regression Tree)