Download presentation
Presentation is loading. Please wait.
Published byDaniella Cox Modified over 8 years ago
1
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT
2
CHAID or CART Chi-Square Automatic Interaction Detector Based on Chi-Square All variables discretecized Dependent variable: nominal Classification and Regression Tree Variables can be discrete or continuous Based on GINI or F-Test Dependent variable: nominal or continuous
3
Use of Decision Trees Classify observations from a target binary or nominal variable Segmentation Predictive response analysis from a target numerical variable Behaviour Decision support rules Processing
4
Decision Tree
5
Example: dmdata.sav Underlying Theory X 2
6
CHAID Algorithm Selecting Variables Example Regions (4), Gender (3, including Missing) Age (6, including Missing) For each variable, collapse categories to maximize chi-square test of independence: Ex: Region (N, S, E, W,*) (WSE, N*) Select most significant variable Go to next branch … and next level Stop growing if …estimated X 2 < theoretical X 2
7
CART (Nominal Target) Nominal Targets: GINI (Impurity Reduction or Entropy) Squared probability of node membership Gini=0 when targets are perfectly classified. Gini Index =1-∑p i 2 Example Prob: Bus = 0.4, Car = 0.3, Train = 0.3 Gini = 1 –(0.4^2 + 0.3^2 + 0.3^2) = 0.660
8
CART (Metric Target) Continuous Variables: Variance Reduction (F-test)
9
Comparative Advantages (From Wikipedia) Simple to understand and interpret Requires little data preparation Able to handle both numerical and categorical data Uses a white box model easily explained by Boolean logic. Possible to validate a model using statistical tests Robust
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.