THE BEGINNING.

Slides:



Advertisements
Similar presentations
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Advertisements

Correlation and Linear Regression.
Chapter 17 Overview of Multivariate Analysis Methods
By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Predictive Analysis in Marketing Research
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Regression The basic problem Regression and Correlation Accuracy of prediction in regression Hypothesis testing Regression with multiple predictors.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Relationships Among Variables
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Introduction to Directed Data Mining: Decision Trees
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
Relationships between Variables. Two variables are related if they move together in some way Relationship between two variables can be strong, weak or.
Business Research Methods William G. Zikmund Chapter 23 Bivariate Analysis: Measures of Associations.
1 GE5 Tutorial 4 rules of engagement no computer or no power → no lessonno computer or no power → no lesson no SPSS → no lessonno SPSS → no lesson no.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Correlation & Regression Chapter 15. Correlation It is a statistical technique that is used to measure and describe a relationship between two variables.
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
Chapter 9 Correlational Research Designs. Correlation Acceptable terminology for the pattern of data in a correlation: *Correlation between variables.
1 UNIT 13: DATA ANALYSIS. 2 A. Editing, Coding and Computer Entry Editing in field i.e after completion of each interview/questionnaire. Editing again.
McGraw-Hill/Irwin Business Research Methods, 10eCopyright © 2008 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 18 Measures of Association.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Topics, Summer 2008 Day 1. Introduction Day 2. Samples and populations Day 3. Evaluating relationships Scatterplots and correlation Day 4. Regression and.
Correlations: Linear Relationships Data What kind of measures are used? interval, ratio nominal Correlation Analysis: Pearson’s r (ordinal scales use Spearman’s.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
© 2006 The McGraw-Hill Companies, Inc., All Rights Reserved.McGraw-Hill/Irwin 19-1 Chapter 19 Measures of Association.
Class 10 Jeff Driskell, MSW, PhD
Chapter 12 Understanding Research Results: Description and Correlation
CHAPTER 3 Describing Relationships
REGRESSION G&W p
Decomposition of Sum of Squares
REGRESSION (R2).
CHAPTER 3 Describing Relationships
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/6.
Multiple Regression.
Linear Regression Prof. Andy Field.
Correlation and Regression Basics
Learning Objective: What is referential integrity and how does it enhance marketer productivity when using a database? Discuss the implications of the.
CHOOSING A STATISTICAL TEST
Lifetime Value Analysis
Do you like to receive mail and messages from businesses?
Non-Parametric Tests.
Correlation and Regression Basics
Understanding Research Results: Description and Correlation
Essentials of Marketing Research William G. Zikmund
Theme 7 Correlation.
Stats Club Marnie Brennan
NURS 790: Methods for Research and Evidence Based Practice
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
CORRELATION ANALYSIS.
Unit XI: Data Analysis in nursing research
Linear Regression and Correlation
Product moment correlation
CHAPTER 3 Describing Relationships
15.1 The Role of Statistics in the Research Process
Multiple Regression – Split Sample Validation
Section 6.2 Prediction.
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Linear Regression and Correlation
CHAPTER 3 Describing Relationships
CHAPTER 3 Describing Relationships
Business Application & Conceptual Issues
Decomposition of Sum of Squares
Introductory Statistics
CHAPTER 3 Describing Relationships
Presentation transcript:

THE BEGINNING

Learning Objecitve Be prepared to use the auto numeric node in IBM SPSS Modeler. Having used the node, identify the best model. Be prepared to interpret the outputs of CHAID, Regression, Linear, KNN (K Nearest Neighbors) Algorithm, and C&R (Classification and Regression) Tree.

$ Productivity Drivers Creation of Value Through Exchange Competition: $ $ (Falling Prices) $ Productivity Drivers $ $ $ $ $ $ $ $ $ $ $ $ Mass Marketing Customize, target low incidence, high value customers Mass Customization DATABASE Marketer Pushes Product (Queries, Mail Merge) Customer Pulls Product (Web App) 1 to 1 1 to 1 Segmentation & Targeting Identifying Segments Measuring Market Segment Value: Predicting Consumer Response (Models) Cluster Analysis Gain Scores Lifetime Value Analysis Non-Statistical Statistical Table Design Relationship Editor --Joins: 1 to 1, 1 to ∞. ∞ to ∞ Queries --Select Query (Bring data from one or more tables into virtual table.) ● Sort ● And / Or Logic --Inner / Outer Join RFM (current customers) Market Basket Analysis (Web, Directed Web, Apriori) Data Attributes Data Types: Norminal, Ordinal, Interval, Ratio Data Attributes: Central Tendency, Spread Relationship Tests Correlation: Pearson (Interval/Ratio), Spearman (Ordinal) Difference Tests T – Test (Nominal/Interval or Ratio) Mann-Whitney U (Nominal/Ordinal) Chi Square (Nominal/Nominal or just Nominal) Comprehensive Models CHAID Regression (simple & multiple) IBM Modeler Autonumeric (compares multiple models)

KNN Algorithm K Nearest Neighbors This procedure plots all observations in multidimensional space, here on three dimensions for volatile acidity, alcohol, and Award. Each wine is plotted in this space. The place of each wine can be determined by putting the cursor on a dot. When that happens, the number of the observation is given and its dependent variable value. In this case, wine 1002 has a quality value of 5. Values are represented by the darkness of the dot, as indicated by the key on the left of the chart.

CHAID Output in Viewer

C&R Tree uses a GINI purity measure (as CHAID uses chi square) to split off branches that differentiate groups with different patterns of response. It always splits into two branches. Improvement is a measure of increase in the purity measure following the spilt.

The Regression node functions the same way as it would in SPSS. It uses the Enter method, i.e., uses the predictors in the same order that you list them. The outputs are interpreted in exactly the same way as they would be in SPSS.

Regression Prediction Calculator

The Linear node is a regression node that works with transformed data. In other words, the data is modified in certain ways (e.g., trimming outliers) before running the regression. While the printouts are somewhat different from those for the regression node, they should be interpreted in the same way. The r – square value is given in the Accuracy pane, i.e., 36.5% of the variance is explained by this model. The coefficients and p – values are shown when you put the cursor over one of the line. For nominal variables, the coefficient is given for all forms except the baseline value. Here Award = 1 is the baseline. No coefficient is given for that value. Use coefficient given X 1 for Award = 0.

Nominal & Ordinal Variables in Linear Node If you have a nominal or ordinal variable in the Linear node, a coefficient will be reported for all values of the variable except one, the baseline value. Suppose you have a nominal ethnicity variable with the following values: 1 Hispanic, 2 Black, 3 Asian, 4 Caucasian, 5 Native American. The Linear node would report 4 coefficient values, e.g., for values 1, 2, 3, and 4. At least one value will not be given, e.g., 5. The missing value or values are the baseline. In calculating the predicted value if ethnicity is Asian, the coefficient for 3 Asian would be used (with an X value of 1) to get the predicted value. If the ethnicity is Native American, just calculate the predicted value from all coefficients except ethnicity to get this baseline group predicted value.

We have a coefficient for all of the ethnic groups except group 5, Native Americans. To get the predicted Native American value, just use Intercept, Alcohol, and Volatileacidity.

THE END