Download presentation
Presentation is loading. Please wait.
Published byAdrian Patrick Modified over 9 years ago
1
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 “BOF” Trees Diagram as a Visual Way to Improve Interpretability of Tree Ensembles Vesna Luzar-Stiffler, Ph.D. University Computing Centre, and CAIR Research Centre, Zagreb, Croatia Charles Stiffler, Ph.D. CAIR Research Centre, Zagreb, Croatia vluzar@srce.hrvluzar@srce.hr, charles.stiffler@cair-center.hrcharles.stiffler@cair-center.hr
2
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Outline Introduction/Background Trees Ensemble Trees Visualization Tools Simulation Results Web Survey Results Conclusions/Recommendations
3
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Introduction / Background Classification / Decision Trees Data mining (statistical learning) method for classification Invented twice: Statistical community: Breiman: Friedman et.al. (1984) Machine Learning community: Quinlan (1986) Many positive features Interpretability, ability to handle data of mixed type and missing values, robustness to outliers, etc. Disadvantage unstable vis-à-vis seemingly minor data perturbations low predictive power
4
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Introduction / Background Possible improvements: Ensembles Bagging i.e., Bootstraping trees (Breiman, 1996) Boosting, e.g., AdaBoost (Freund & Schapire, 1997) Random Forests (Breiman, 2001) Stacking, randomized trees, etc. Advantage: Improved prediction Disadvantage Loss of interpretability (“black box”)
5
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Classification Tree Let be the classification tree prediction at input x obtained from the full “training” data Z= {(x 1,y 1 ),(x 2,y 2 )…(x N,y N )}
6
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Bagging Classification Tree Let be the classification tree prediction at input x obtained from the bootstrap sample Z* b, b=1,2,…B. Bagging estimate: 1 2 B
7
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Visualization tools Graphs based on predictor “importances” (Bxp) matrix F (p=# of predictors) For bagged trees, we take the avg: Diagram 1, importance mean bar chart Diagram 2, (“BOF Clusters”) is the cluster means chart (NEW) Diagram 3, (“BOF MDPREF”) is the multidimensional preference bi-plot (NEW)
8
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Visualization tools Graphs based on proximity (nxn) matrix P, (n=# of cases) Diagram 4 (“Proximity Clusters”) is the cluster means chart (Breiman,2002) Diagram 5 (“Proximity MDS”) is the multidimensional scaling plot of “similar” cases (Breiman,2002)
9
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Simulation experiments S1: Generate a sample of size n=30, two classes, and p=5 variables (x 1 -x 5 ), with a standard normal distribution and pair-wise correlation 0.95. The responses are generated according to Pr(Y=1|x 1 ≤0.5) = 0.2, Pr(Y=1|x 1 >0.5)=0.8. S2: Generate a sample of size n=30, two classes, and p=5 variables (x 1 -x 5 ), with a standard normal distribution and pair-wise correlation 0.95 between x 1 and x 2, and 0 among other predictors. The responses are generated according to Pr(Y=1|x 1 ≤0.5) = 0.2, Pr(Y=1|x 1 >0.5)=0.8.
10
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 1, Mean importance S1 S2
11
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 2, “BOF Clusters” S1 S2
12
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 3, “BOF MDPREF” S1 S2
13
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 4, “Proximity Clusters” S1 S2
14
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Web Survey data ICT infrastructure/usage in Croatian primary and secondary schools 25,000+ teachers (cases) 200+ variables Response: “classroom use of a computer by educators” (yes/no) Partition 50% training 25% validation 25% test
15
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Initial tree (before bagging)
16
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 1, “Mean importance”
17
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 2, “BOF Clusters”
18
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 3, “BOF MDPREF”
19
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Bootstrap tree 11
20
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Bootstrap tree 22
21
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Bootstrap tree 12
22
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Clustering trees
23
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Diagram 5, “Proximity MDS”
24
BOF Trees Visualization Zagreb, June 12, 2004 BOF Trees Visualization Zagreb, June 12, 2004 Conclusions/ Recommendations There are SWs for trees There are some SWs for tree ensembles There are some visualization tools (old and new) The problem is they are not “interfaced” (integrated)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.