Insight – extracting conceptually appealing information from data Exposition – displaying the decision tree results in a form to communicate insight and inform policy and planning Tell a story
Conceptual model Operationalize the conceptual model Develop the story in context Key relationships and story plot Create testable hypotheses
Top down decision tree creation Select branches that conform to model Can be lower logworth than other branches Include non-significant branches that reflect the conceptual model Test hypotheses
Based on underlying conceptual model Ishikawa Diagram (Fishbone Diagram) Determine likely relevant dimensions from data Test hypotheses
Expository needs interpretation Prediction does not need to tell a story Prediction needs to accurately predict future values, have reproducibility and reliability
Sample Design to gain knowledge of the environment Data Efficacy and Operational Measures – data that relates to known or likely factors predicting the target. True measures
The Challenge – Identifying strong predictors Matching predictors with range Combinations of predictors Approach – Bonferroni and validation or cross-validation
Stand-in variables Create Composites (Principal components or factor scores or reduction measures) More data Best fit is the right size, but what is the right size?
Multi-way splits: Use as many partitions as distinct values. Binary splits: Divides values into two subsets. Need to find optimal partitioning
In theory, multiway splits are no more flexible than binary splits. Multiway splits often give more interpretable trees because split variables tend to be used fewer times. Many prefer binary splits because an exhaustive search is more feasible.