Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees.

Similar presentations


Presentation on theme: "Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees."— Presentation transcript:

1 Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees

2 Chapter 5: Ensembles of Trees 5.1 Forests 5.2 Bagged Tree Models

3 Chapter 5: Ensembles of Trees 5.1 Forests 5.2 Bagged Tree Models

4 Instability Accuracy = 80% One reversal

5 The Challenge Decision trees are unstable models. Small changes in the training data can cause large changes in the topology of the tree. However, the overall performance of the tree remains stable (Breiman et al. 1984). The instability results from the large number of univariate splits considered and the fragmentation of the data. At each split, there are typically a number of splits on the same and different inputs that give similar performance (competitor splits). A small change in the data can easily result in a different split being chosen. This in turn produces different subsets in the child nodes; the changes in the data are even larger in the child nodes. The changes continue to cascade down the tree.

6 Competitor Splits Logworth Input Range minmax X1 X2

7 Overcome the Instability Using perturb and combine (P & C) methods to generate multiple models by manipulating the distribution of the data or altering the construction method and then averaging the results (Breiman 1998). Any unstable modeling method can be used, but trees are most often chosen because of their speed and flexibility. Some perturbation methods: –resample –subsample –add noise –adaptively reweight –randomly choose from the competitor splits

8 Perturb

9 Ensemble Model An ensemble model is the combination of multiple models. The combinations can be formed by –voting on the classifications –weighted voting where some models have more weight –averaging (weighted or unweighted) the predicted values. The attractiveness of P & C methods is their improved performance because of variance reduction. If the base models have low bias and high variance, then averaging decreases the variance. However, combining stable models can negatively affect performance. The reasons why adaptive P & C methods work go beyond simple variance reduction and are the topic of current research. Graphical explanations show that ensembles of trees have decision boundaries of much finer resolution.

10 Combine T1T1 T2T2 T3T3 ave(T 1, T 2, T 3 ) = Truth

11 Bagging case 1 2 3 4 5 6 freq 1 0 2 0 2 1 k=1 freq 0 1 0 2 1 k=2 freq 3 1 0 2 0 k=3 freq 1 2 0 1 k=4 …

12 Bagging (bootstrap aggregation) The original P & C method (Breiman 1996). Steps: Draw B bootstrap samples. –A bootstrap sample is a random sample of size n drawn from the empirical distribution of a sample of size n. Build a tree on each bootstrap sample. Vote or average. –For classification problems, take the mean of the posterior probabilities or take the plurality vote of the predicted class. –For regression problems, take a mean of the predicted values. For example, –Breiman (1996) used 50 bootstrap replicates for classification, and 25 for regression and for averaging the posterior probabilities. –Bauer and Kohavi (1999) used 25 replicates for both voting and averaging.

13 Arc-x4 case 1 2 3 4 5 6 freq 1 k=1 freq 1.5.75 1.5.75 k=2 freq.5.25 4.25.5.25 k=3 freq.97.06 4.69.11.06.11 k=4 … m101000m101000 m102100m102100 m203101m203101 ARC: Adaptive Resampling and Combining

14 Single, Bagged, and Boosted Tree

15 Chapter 5: Ensembles of Trees 5.1 Forests 5.2 Bagged Tree Models

16 Bagged Tree Models The current version of SAS Enterprise Miner does not support bagging or boosting. They can be implemented in a Code node using SAS programming and the ARBORETUM procedure. The program over-fits 25 trees, each to a different 50% random sample of the training data. Validation data is not used to tune the models. Training, Validation, and Test sets are scored from within PROC ARBORETUM. The model predictions are then averaged, and profit is calculated for each set.


Download ppt "Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 6 Ensembles of Trees."

Similar presentations


Ads by Google