MKT 700 Business Intelligence and Decision Models Week 8: Algorithms and Customer Profiling (1)

Slides:



Advertisements
Similar presentations
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Advertisements

Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
EPI 809/Spring Probability Distribution of Random Error.
INTRODUCTION TO NON-PARAMETRIC ANALYSES CHI SQUARE ANALYSIS.
Statistical Methods in Computer Science Hypothesis Testing III: Categorical dependence and Ido Dagan.
Basic Data Mining Techniques
Final Review Session.
Chi-Square and Analysis of Variance (ANOVA) Lecture 9.
Chapter 14 Conducting & Reading Research Baumgartner et al Chapter 14 Inferential Data Analysis.
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Chi Square Test Dealing with categorical dependant variable.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 11 th Edition.
Analysis of variance (2) Lecture 10. Normality Check Frequency histogram (Skewness & Kurtosis) Probability plot, K-S test Normality Check Frequency histogram.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Statistical Analysis KSE966/986 Seminar Uichin Lee Oct. 19, 2012.
Inferential Statistics
Leedy and Ormrod Ch. 11 Gray Ch. 14
AS 737 Categorical Data Analysis For Multivariate
Testing Group Difference
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
Business Research Methods William G. Zikmund Chapter 22: Bivariate Analysis - Tests of Differences.
Statistical Analysis I have all this data. Now what does it mean?
CHAPTER 8 Basic Data Analysis for Quantitative Research ESSENTIALS OF MARKETING RESEARCH Hair/Wolfinbarger/Ortinau/Bush.
Fall 2013 Lecture 5: Chapter 5 Statistical Analysis of Data …yes the “S” word.
One-Way Analysis of Variance Comparing means of more than 2 independent samples 1.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
MKT 700 Business Intelligence and Decision Models Week 6: Segmentation and Cluster Analysis.
MKT 700 Business Intelligence and Decision Models Week 6: Segmentation and Cluster Analysis.
Multivariate Analysis. One-way ANOVA Tests the difference in the means of 2 or more nominal groups Tests the difference in the means of 2 or more nominal.
Exploring Marketing Research William G. Zikmund Chapter 22: Bivariate Statistics- Tests of Differences.
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
2nd Half Review ANOVA (Ch. 11) Non-Parametric (7.11, 9.5) Regression (Ch. 12) ANCOVA Categorical (Ch. 10) Correlation (Ch. 12)
Chap 14-1 Copyright ©2012 Pearson Education, Inc. publishing as Prentice Hall Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Kano Model & Multivariate Statistics Dr. Surej P John.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Analysis of Variance (ANOVA) Brian Healy, PhD BIO203.
Previous Lecture: Phylogenetics. Analysis of Variance This Lecture Judy Zhong Ph.D.
CHI SQUARE TESTS.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
Chapter Seventeen. Figure 17.1 Relationship of Hypothesis Testing Related to Differences to the Previous Chapter and the Marketing Research Process Focus.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Statistics in IB Biology Error bars, standard deviation, t-test and more.
Statistical Analysis. Z-scores A z-score = how many standard deviations a score is from the mean (-/+) Z-scores thus allow us to transform the mean to.
Chapter Outline Goodness of Fit test Test of Independence.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Chapter 13 Design of Experiments. Introduction “Listening” or passive statistical tools: control charts. “Conversational” or active tools: Experimental.
Non-parametric Tests e.g., Chi-Square. When to use various statistics n Parametric n Interval or ratio data n Name parametric tests we covered Tuesday.
Making Comparisons All hypothesis testing follows a common logic of comparison Null hypothesis and alternative hypothesis – mutually exclusive – exhaustive.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Remember You just invented a “magic math pill” that will increase test scores. On the day of the first test you give the pill to 4 subjects. When these.
Logistic Regression Saed Sayad 1www.ismartsoft.com.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Competition II: Springleaf Sha Li (Team leader) Xiaoyan Chong, Minglu Ma, Yue Wang CAMCOS Fall 2015 San Jose State University.
Formula for Linear Regression y = bx + a Y variable plotted on vertical axis. X variable plotted on horizontal axis. Slope or the change in y for every.
Nonparametric Statistics
Research Methods William G. Zikmund Bivariate Analysis - Tests of Differences.
Simple Statistical Designs One Dependent Variable.
The 2 nd to last topic this year!!.  ANOVA Testing is similar to a “two sample t- test except” that it compares more than two samples to one another.
Nonparametric Statistics
Chapter 13 Group Differences
CLASS 6 CLASS 7 Tutorial 2 (EXCEL version)
Presentation transcript:

MKT 700 Business Intelligence and Decision Models Week 8: Algorithms and Customer Profiling (1)

Classification and Prediction

Classification Unsupervised Learning

Predicting Supervised Learning

SPSS Direct Marketing ClassificationPredictive Unsupervised Learning RFM Cluster analysis Postal Code Responses NA Supervised LearningCustomer ProfilingPropensity to buy

SPSS Analysis ClassificationPredictive Unsupervised Learning Hierarchical Cluster Two-Step Cluster K-Means Cluster NA Supervised LearningClassification Trees -CHAID -CART Linear Regression Logistic Regression Artificial Neural Nets

Major Algorithms ClassificationPredictive Unsupervised Learning Euclidean Distance Log Likelihood NA Supervised LearningChi-square Statistics Log Likelihood GINI Impurity Index F-Statistics (ANOVA) Log Likelihood F-Statistics (ANOVA) Nominal: Chi-square, Log Likelihood Continuous: F-Statistics, Log Likelihood

Euclidean Distance

Euclidean Distance for Continuous Variables Pythagorean distance  √d 2 = √(a 2 +b 2 ) Euclidean space  √d 2 = √(a 2 +b 2 +c 2 ) Euclidean distance  d = [(d i ) 2 ] 1/2

Pearson’s Chi-Square

Contingency Table NorthSouthEastWestTot. Yes No Tot

Observed and theoretical Frequencies NorthSouthEastWestTot. Yes % No % Tot

Chi-Square: Obs. f o fefe fo-fefo-fe (f o -f e ) 2 f e 1,1 68 1,2 75 1,3 57 1,4 79 2,1 32 2,2 45 2,2 33 2, X 2 = 3.032

Statistical Inference DF: (4 col –1) (2 rows –1) =

Log Likelihood Chi-Square

Log Likelihood Based on probability distributions rather than contingency (frequency) tables. Applicable to both categorical and continuous variables, contrary to chi-square which must be discreticized.

Contingency Table (Observed Frequencies) Cluster 1Cluster 2Total Male103040

Contingency Table (Expected Frequencies) Cluster 1Cluster 2Total Male

Chi-Square: Obs. f o FeFe fo-fefo-fe (f o -f e ) 2 f e 1,1 10 1, X 2 = p < 0.05; DF = 1; Critical value = 3.84

Log Likelihood Distance & Probability Cluster 1Cluster 2 Male O E O/E Ln (O/E) O * Ln (O/E) 2∑O*Ln(O/E) 10/20 = * /20= * * = p < 0.05; critical value = 3.84

Variance, ANOVA, and F Statistics

F-Statistics For metric or continuous variables Compares explained (in the model) and unexplained variances (errors)

Variance SQUARED VALUEMEANDIFFERENCE COUNT20SS =1461 DF=19 VAR =76.88 MEAN43.6SD=8.768 SS is Sum of Squares DF = N-1 VAR=SS/DF SD = √VAR

ANOVA Two Groups: T-test Three + Group Comparisons: Are errors (discrepancies between observations and the overall mean) explained by group membership or by some other (random) effect?

Oneway ANOVA Grand mean Group 1Group 2Group (X-Mean) Group means (X-Mean) SS Within Total SS

MSS(Between)/MSS(Within) Winthin groups Between Groups Total Errors SS = DF24-3=213-1=224-1=23 Mean SS Between Groups Mean SS p-value <.05 Within Groups Mean SS0.696

ONEWAY (Excel or SPSS) Anova: Single Factor SUMMARY GroupsCountSumAverageVariance Group Group Group ANOVA Source of VariationSSdfMSFP-valueF crit Between Groups E Within Groups Total

Profiling

Customer Profiling: Documenting or Describing Who is likely to buy or not respond? Who is likely to buy what product or service? Who is in danger of lapsing?

Profiling/Decision Tree SPSS Direct Marketing  Customer Profiling Postal Code responses SPSS Analysis  Classification  Decision Tree CHAID (Chi-Square Automatic Interactive Detector) CART (Classification and Regression Tree)