MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Classification and Prediction
Classification Unsupervised Learning
Predicting Supervised Learning
SPSS Direct Marketing ClassificationPredictive Unsupervised Learning RFM Cluster analysis Postal Code Responses NA Supervised LearningCustomer ProfilingPropensity to buy
SPSS Analysis ClassificationPredictive Unsupervised Learning Hierarchical Cluster Two-Step Cluster K-Means Cluster NA Supervised LearningClassification Trees -CHAID -CART Linear Regression Logistic Regression Artificial Neural Nets
Major Algorithms ClassificationPredictive Unsupervised Learning Euclidean Distance Log Likelihood NA Supervised LearningChi-square Statistics Log Likelihood GINI Impurity Index F-Statistics (ANOVA) Log Likelihood F-Statistics (ANOVA) Nominal: Chi-square, Log Likelihood Continuous: F-Statistics, Log Likelihood
Euclidean Distance
Euclidean Distance for Continuous Variables Pythagorean distance √d 2 = √(a 2 +b 2 ) Euclidean space √d 2 = √(a 2 +b 2 +c 2 ) Euclidean distance d = [(d i ) 2 ] 1/2 (Cluster Analysis with continuous var.)
Pearson’s Chi-Square
Contingency Table NorthSouthEastWestTot. Yes No Tot
Observed and theoretical Frequencies NorthSouthEastWestTot. Yes % No % Tot
Chi-Square: Obs. f o fefe fo-fefo-fe (f o -f e ) 2 f e 1,1 68 1,2 75 1,3 57 1,4 79 2,1 32 2,2 45 2,2 33 2, X 2 = 3.032
Statistical Inference DF: (4 col –1) (2 rows –1) =
Log Likelihood Chi-Square
Log Likelihood Based on probability distributions rather than contingency (frequency) tables. Applicable to both categorical and continuous variables, contrary to chi-square which must be discreticized.
Contingency Table (Observed Frequencies) Cluster 1Cluster 2Total Male103040
Contingency Table (Expected Frequencies) Cluster 1Cluster 2Total Male
Chi-Square: Obs. f o FeFe fo-fefo-fe (f o -f e ) 2 f e 1,1 10 1, X 2 = p < 0.05; DF = 1; Critical value = 3.84
Log Likelihood Distance & Probability Cluster 1Cluster 2 Male O E O/E Ln (O/E) O * Ln (O/E) 2∑O*Ln(O/E) 10/20 = * /20= * *( ) = p < 0.05; critical value = 3.84
Variance, ANOVA, and F Statistics
F-Statistics For metric or continuous variables Compares explained (in the model) and unexplained variances (errors)
Variance SQUARED VALUEMEANDIFFERENCE COUNT20SS =1461 DF=19 VAR =76.88 MEAN43.6SD=8.768 SS is Sum of Squares DF = N-1 VAR=SS/DF SD = √VAR
ANOVA Two Groups: T-test Three + Group Comparisons: Are errors (discrepancies between observations and the overall mean) explained by group membership or by some other (random) effect?
Oneway ANOVA Grand mean Group 1Group 2Group (X-Mean) Group means (X-Mean) SS Within Total SS
MSS(Between)/MSS(Within) Winthin groups Between Groups Total Errors SS = DF24-3=213-1=224-1=23 Mean SS Between Groups Mean SS p-value <.05 Within Groups Mean SS0.696
ONEWAY (Excel or SPSS) Anova: Single Factor SUMMARY GroupsCountSumAverageVariance Group Group Group ANOVA Source of VariationSSdfMSFP-valueF crit Between Groups E Within Groups Total
Profiling
Customer Profiling: Documenting or Describing Who is likely to buy or not respond? Who is likely to buy what product or service? Who is in danger of lapsing?
CHAID or CART Chi-Square Automatic Interaction Detector Based on Chi-Square All variables discretecized Dependent variable: nominal Classification and Regression Tree Variables can be discrete or continuous Based on GINI or F-Test Dependent variable: nominal or continuous
Use of Decision Trees Classify observations from a target binary or nominal variable Segmentation Predictive response analysis from a target numerical variable Behaviour Decision support rules Processing
Decision Tree
Example: dmdata.sav Underlying Theory X 2
CHAID Algorithm Selecting Variables Example Regions (4), Gender (3, including Missing) Age (6, including Missing) For each variable, collapse categories to maximize chi-square test of independence: Ex: Region (N, S, E, W,*) (WSE, N*) Select most significant variable Go to next branch … and next level Stop growing if …estimated X 2 < theoretical X 2
CART (Nominal Target) Nominal Targets: GINI (Impurity Reduction or Entropy) Squared probability of node membership Gini=0 when targets are perfectly classified. Gini Index =1-∑p i 2 Example Prob: Bus = 0.4, Car = 0.3, Train = 0.3 Gini = 1 –(0.4^ ^ ^2) = 0.660
CART (Metric Target) Continuous Variables: Variance Reduction (F-test)
Comparative Advantages (From Wikipedia) Simple to understand and interpret Requires little data preparation Able to handle both numerical and categorical data Uses a white box model easily explained by Boolean logic. Possible to validate a model using statistical tests Robust