Lecture Notes 4 Pruning Zhangxi Lin ISQS

Name: Lecture Notes 4 Pruning Zhangxi Lin ISQS
Uploaded: 2017-10-09T22:10:35+00:00
Duration: PTM15S33
Channel: Marilynn Dorsey
Description: Lecture Notes 4 Pruning Zhangxi Lin ISQS

Lecture Notes 4 Pruning Zhangxi Lin ISQS 7342-001
Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS

Objectives Understand how CHAID and CART algorithms, and other variations, finalize a decision tree by pruning Pre-pruning vs. post-pruning (top-down vs. bottom-up) Cross validation Understand the use of tree modeling parameters Prior probabilities Decision weights Kass adjustment Examine the performance of different tree modeling configurations with SAS Enterprise Miner 5.2 Know how Proc ARBORETUM is used

Chapter 3: Pruning 4 3.1 Pruning 3.2 Pruning for Profit
3.3 Pruning for Profit Using Cross Validation (Optional) 3.4 Compare Various Tree Settings and Performance

Maximal Tree A maximal classification tree gives 100%
accuracy on the training data and has no residual variability.

Overfitting Training Data New Data
An maximal tree is the result of overfitting

Underfitting Training Data New Data
An small tree with a few branches may underfit the data

The “Right-Sized” Tree
Top-Down Stopping Rules (Pre-Pruning) Node size Tree depth Statistical significance Bottom-Up Selection Criteria (Post-Pruning) Accuracy Profit Class-probability trees Least squares

Top-Down Pruning 26.7 3.12 1.63 2.40 24.9 1.97 .039 1.36 1.67 .26 .76 53 14 39 11 1 2

Depth Multiplier 1 3 6 12 36 24 48 The depth adjustment =
p-value X Depth multiplier 1 3 6 12 36 24 48 Depth multiplier 48 = 2x2x4x3

Tree Node Defaults and Options
Splitting Rule Node Split Search Subtree P-Value Adjustment

Top-Down Pruning Options
The default maximum depth in the Decision Tree node is 6. The value can be changed with the Maximum Depth option. The Split Size option specifies the smallest number of training observations that a node must have to be considered for splitting. Valid values are between 2 and The liberal significance level of .2 (logworth = 0.7) is the default. It can be changed with the Significance Level option. By default, the depth multiplier is applied. It can be turned off with the Split Adjustment option in the P-Value Adjustment properties. A further adjustment for the Number of Inputs available at a node for splitting can be used. This option is available in the P-Value Adjustment properties. It is not activated by default (Inputs=No). To specify pre-pruning only, set the SubTree option to Largest.

Bottom-Up Pruning Leaves Performance Generalization Training Data

Top-down vs. Bottom-up Top-down pruning is usually faster, but less effective than bottom-up pruning Breiman and Friedman, in their criticism of the FACT tree algorithm (Loh and Vanichsetakul 1988): Each stopping rule was tested on hundreds of simulated data sets with different structures. Each new stopping rule failed on some data set. It was not until a very large tree was built and then pruned, using cross-validation to govern the degree of pruning, that we observed something that worked consistently.

Model Selection Criteria
.90/.88 .89/.91 .88/.91 .59/.64 Accuracy 5 4 3 2 1 Leaves .51/.43 .51/.40 .49/.44 .04/ .1 Profit .17/.15 .18/.14 .19/.16 .20/.16 .48/.46 ASE

Bottom-up Selection Criteria
The default tree selection criterion is Decision. The final tree will be selected based upon profit or loss if a decision matrix has been specified. The Lift criterion of Assessment Measure enables the user to restrict assessment to a specified proportion of the data. By default Assessments Fraction is set to 0.25.

Effect of Prior Probabilities: Confusion Matrix
Actual Class Decision/Action 1 Corrected i – population of the original data; i - sample population

Tree Accuracy t1 t2 t3 Tree accuracy is based on leave’s accuracy weighted by the size of leaves

Maximize Accuracy 1: 0: tot: Class: Tr 85% 15% 42% 1 Va 83% 17% 40% 1
8.6% 91% 58% Va 3.4% 97% 60% Training Accuracy = (.42)(.85) + (.58)(.91) = .88 Validation Accuracy = (.40)(.83) + (.60)(.97) = .91

Profit Matrix Actual Class Decision Bayes Rule: Decision 1 if 1

Maximize Profit Tr 8.6% 91% 58% .78 Va 3.4% 97% 60% .91 1: 0: tot:
Va 3.4% 97% 60% .91 1: 0: tot: P1: P0: Class: 85% 15% 42% 1.18 1 83% 18% 40% 1.11 Training Profit = (.42)(1.18) + (.58)(0) = .50 Validation Profit = (.40)(1.11) + (.60)(0) = .44 1.56 1 actual predicted Profit Matrix

Chapter 3: Pruning 4 3.2 Pruning for Profit 3.1 Pruning
3.3 Pruning for Profit Using Cross Validation (Optional) 3.4 Compare Various Tree Settings and Performance

Demonstration – Pruning for Profit
Data set: INSURANCE Parameters Prior probabilities: (0.02, 0.98) Decision weights: $150, -$3 Purposes To get familiar with defining prior probabilities for the target variable (recall how this is done in SAS EM 4.3) To view the results of the tree node To understand how parameters define in the tree node panel affect the results Note: Interactive tree growing is not working at this moment

Cross Validation A B C D E Train BCDE ACDE ABDE ABCE ABCD Validate 1)
2) 3) 4) 5) Why cross validation? When the holdout set is small, performance measure can be unreliable How 1) Build a CHAID-type tree using the p-value associated with the chi-square or F-stat as a forward stopping rule. 2) Use v-fold cross validation, in which data is split into several equal sets and One of these sets is in turn used for validation. Then average the results.

CV Program Summary CV is most efficiently performed using the
PREPARE DATA FOR CV CV is most efficiently performed using the ABORETUM procedure and SAS code. The procedure uses the p-value setting DO LOOP Vary P-value settings for tree NESTED DO LOOP 10x CV for each P-value END END SELECT BEST P-VALUE SETTING FIT FINAL MODEL

4 Chapter 3: Pruning 3.1 Pruning 3.2 Pruning for Profit 3.3 Pruning for Profit Using Cross Validation (Optional) 3.4 Compare Various Tree Settings and Performance

Demonstration – Cross Validation
Data set: INS_SMALL SAS Code: ex3.2.sas Parameters: p-value = 0.052 Purposes: How SAS generated graph is displayed with the web browser How to use PROC ARBOR How to customize the tree node

Configure the tree node
Parameters (Proc ARBOR) Maximum Branch=4 (MAXBRANCH=4); Split Size=80 (SPLITSIZE=80); Leaf Size (LEAFSIZE=40); Exhaustive=0 (EXHAUST=0); Method=Largest (SUBTREE=largest); Minimum Categorical Size (MINCATSIZE=15); Time of Kass Adjustment= after (PADJUST=chaidafter).

Class Probability Tree
Profit ASE

Least Squares Pruning (for regression trees)
Binary Target

What is regression tree?
In a Linear regression model, when the data has lots of features which interact in complicated, nonlinear ways, assembling a single global model can be very difficult. An alternative approach to nonlinear regression is to partition the space into smaller regions, where the interactions are more manageable. The sub-divisions can be partitioned further, i.e. recursive partitioning, until finally the chunks of the space are reached, each of which can fit simple models. In this way, the global linear regression model has two parts: one is just the recursive partition, i.e. regression tree, and the other is a simple model for each cell of the partition. There are two kinds of predictive trees: regression trees and classification trees (or class probability trees).

CART-Like Class Probability Tree Settings

Chapter 3: Pruning 4 3.4 Compare Various Tree Settings and Performance
3.2 Pruning for Profit 3.3 Pruning for Profit Using Cross Validation (Optional) 3.4 Compare Various Tree Settings and Performance

Demonstration – Tree Settings Comparison
Data set: CUSTOMERS (used as a test data) Purposes: How to use a test data set from another data source node Compare the performance between cross validation tree model and the model using partitioned data Compare typical decision tree model and CHAID model as well as CART model

Models Diagram for the case in Chapter 1

CV Tree vs. CART-Like Class Probability Tree
Better Worse (overfitting?) $200 more

Models

CHAID-like

CART-like

CART-like Class Probability

CHAID-like + Validation Data

Decision Tree

CART-Like

CHAID-Like

CART-Like Class Probability

CHAID-Like + Validation Data

Questions Why the model is called “CART-like” or “CHAID-like”?
How the settings match the features of CHAID algorithm or CART algorithm? Try fitting a tree using the entropy criterion used in machine learning (e.g. C4.5/5.0) tree algorithms. How does it perform?

Lecture Notes 4 Pruning Zhangxi Lin ISQS

Similar presentations

Presentation on theme: "Lecture Notes 4 Pruning Zhangxi Lin ISQS"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture Notes 4 Pruning Zhangxi Lin ISQS

Similar presentations

Presentation on theme: "Lecture Notes 4 Pruning Zhangxi Lin ISQS"— Presentation transcript:

Similar presentations

About project

Feedback