Zhangxi Lin ISQS 7342-001 Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Chapter 7 Classification and Regression Trees
Random Forest Predrag Radenković 3237/10
Brief introduction on Logistic Regression
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Section 2.1 Introduction to Enterprise Miner. 2 Objectives Open Enterprise Miner. Explore the workspace components of Enterprise Miner. Set up a project.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Introduction to Data Mining with XLMiner
Chapter 13 Multiple Regression
x – independent variable (input)
Decision Tree Rong Jin. Determine Milage Per Gallon.
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Additional Topics in Regression Analysis
Tree-based methods, neutral networks
Data mining and statistical learning - lecture 13 Separating hyperplane.
Data Mining: A Closer Look Chapter Data Mining Strategies (p35) Moh!
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
Decision Tree Models in Data Mining
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.
A Presentation on the Implementation of Decision Trees in Matlab
Introduction to Directed Data Mining: Decision Trees
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
1 Chapter 1: Introduction 1.1 Introduction to SAS Enterprise Miner.
Chapter 1: Introduction
Chapter 13: Inference in Regression
Chapter 3 Data Exploration and Dimension Reduction 1.
Simple Linear Regression
Lecture Notes 4 Pruning Zhangxi Lin ISQS
Slides for “Data Mining” by I. H. Witten and E. Frank.
Mailing Campaign Model Nan Yang University of Central Florida 04/11/2008.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Chapter 9 – Classification and Regression Trees
Copyright © 2010, SAS Institute Inc. All rights reserved. Applied Analytics Using SAS ® Enterprise Miner™
Review - Decision Trees
So far... We have been estimating differences caused by application of various treatments, and determining the probability that an observed difference.
Rasch trees: A new method for detecting differential item functioning in the Rasch model Carolin Strobl Julia Kopf Achim Zeileis.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Chapter 4: Introduction to Predictive Modeling: Regressions
Chapter 13 Multiple Regression
MKT 700 Business Intelligence and Decision Models Algorithms and Customer Profiling (1)
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Introduction to Machine Learning and Tree Based Methods
Statistical Quality Control, 7th Edition by Douglas C. Montgomery.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Advanced Analytics Using Enterprise Miner
CHAPTER 29: Multiple Regression*
CSCI N317 Computation for Scientific Applications Unit Weka
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Data Mining CSCI 307, Spring 2019 Lecture 6
Presentation transcript:

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees

2 Chapter 4: Auxiliary Uses of Trees 4.1 Auxiliary Uses of Trees 4.2 Selecting Important Inputs 4.3 Collapsing Levels 4.4 Auxiliary Uses of Trees 4.5 Interactive Training

Auxiliary Uses of Trees Data exploration Selecting important inputs Collapsing levels Discretizing interval inputs Interaction detection Regression imputation 3

4 Chapter 4: Auxiliary Uses of Trees 4.1 Auxiliary Uses of Trees 4.2 Selecting Important Inputs 4.3 Collapsing Levels 4.4 Auxiliary Uses of Trees 4.5 Interactive Training

Data Exploration Initial data analysis (IDA) helps getting a preliminary impression of the predictive power of the data. Decision trees are well suited for exploratory data analysis It is often useful to build several tree varieties using different settings. Trees are used as auxiliary tools in building more familiar models such as logistic regression. Standard regression models are constrained to be linear and additive (on the link scale). They require more data preparation steps such as missing value imputation and dummy-coding nominal variables. Their statistical and computational efficiency can be badly affected by the presence of many irrelevant and redundant input variables. 5

6 IDA/EDA Interpretability No strict assumptions concerning the functional form of the model Resistant to the curse of dimensionality Robust to outliers in the input space No need to create dummy variables for nominal inputs Missing values do not need to be imputed Computationally fast (usually)

7 Modifying the Input Space – the Use of Tree Dimension Reduction Input subset selection Collapsing levels of nominal inputs Dimension Enhancement Discretizing interval inputs Stratified modeling

8 Input Selection Inputs Input Subset Tree Neural

9 Chapter 4: Auxiliary Uses of Trees 4.1 Auxiliary Uses of Trees 4.2 Selecting Important Inputs 4.3 Collapsing Levels 4.4 Auxiliary Uses of Trees 4.5 Interactive Training

Demonstration Data Set: INSURANCE Parameters Partition: 70%, 30% Decision Tree (2): # of Surrogate = 1 Purposes Observe the effect of surrogate split on the output variables 10

The Diagram 11

Explore the variable selection feature In the partial output, only those inputs with Training Importance values greater than 0.05 will be set to Input in the subsequent Neural Network node. The Nodes column indicates the number of times a variable was used to split a node. 12 In the partial output above, only those inputs with Training Importance values greater than 0.05 will be set to Input in the subsequent Neural Network node. The Nodes column indicates the number of times a variable was used to split a node.

The Output Variables without Using Surrogate Split 13

Variable Importance table with # Surrogate Rules = 1 14 The number of selected inputs is now 15. The Surrogates column indicates the number of times that a variable was used as a surrogate. Notice that Money Market Balance is the fifth most important variable in the tree model, yet it is never used as a primary splitter (NODES=0).

The Output Variables by Using Surrogate Split 15

Maximum depth =10, # of surrogate rules = 0 16 In this example, the depth increase resulted in one additional (11) inputs.

Splitting criterion = Gini Assessment Measure = Average Square Error Compared to 11 inputs selected by the classification tree, the class probability tree selects 23. The variable Credit Score will be rejected because it has a training importance value less than

18 Collapsing Levels x a b x f x cd e x a bd ecf one input one level deep multiway split one input

Collapsing Levels Nominal inputs can be included in regression models by using binary indicators (dummy variables) for each level. This practice can cause a disastrous increase in dimension when the nominal inputs have many levels (for example, zip codes). Trees are effective tools for collapsing levels of nominal inputs. If a tree is fitted using only a single nominal input, the leaves represent subsets of the levels. Any type of tree will suffice, but a depth-one tree with a multiway split is easier to interpret visually. Collapsing levels based on subject-matter considerations is usually better than any data-driven approach. However, there are many situations where the knowledge about potentially important inputs is lacking. 19

20 Chapter 4: Auxiliary Uses of Trees 4.1 Auxiliary Uses of Trees 4.2 Selecting Important Inputs 4.3 Collapsing Levels 4.4 Auxiliary Uses of Trees 4.5 Interactive Training

Collapsing Levels  A Decision Tree is used to convert the nominal variable with too many values into a few groups of values.  Choose the right variable and deselect others  Depth = 1  Allow multiway split  No post-pruning  Use the group as a new input to replace the original input for the next classification node, such as Neural Network, or Regression  Set the original variable “Don’t use”  Set the new variable as “Use ” 21

Demonstration Data Set: INSURANCE Parameters –Set use of all input variables except BRANCH to No –Maximum number of branches from a node = the number of bank branch levels (19). –Set the maximum depth to 1. –Use a large p-value. Set the significance level to 1.0. –Bonferonni Adjustment = No –SubTree Method = Largest –Output Variable Selection = No and Leaf Role = Input. Purposes How trees are applied for the collapsed categorical variables 22

Diagram and Results 23 Decision Tree (2) generates four branches. Make sure variable Branch will be rejected for the Regression node. The dummy variable _NODE_ will be used.

24 Tree Application - Discretizing Interval Inputs Y X X Dimension Inflation 6 df

25 Interaction Detection obvious interaction subtle interaction

Interaction Detection  Trees do not produce anything as simple as an ANOVA table. It is usually quite difficult to determine the strengths of interactions just by looking at a tree diagram.  Trees might be better described as automatic interaction accommodators.  Crossover (qualitative) interactions (effect reversals) are somewhat easier to detect by scanning splits on the same input in different regions of the tree.  Magnitude (quantitative) interactions can be considerably more difficult to detect. 26

27 Regression Imputation x ? ? x ? ? x 3 ? ? ?

28 Chapter 4: Auxiliary Uses of Trees 4.1 Auxiliary Uses of Trees 4.2 Selecting Important Inputs 4.3 Collapsing Levels 4.4 Auxiliary Uses of Trees 4.5 Interactive Training

Demonstration Data Sets: INSURANCE, CUSTOMERS Purposes: Explore auxiliary uses of trees Imputation Variable selection Collapsed levels 29

Diagram 30

Model configuration 31

Results from the Variable Selection Tree Node 32 a. Configure the variable selection Tree node to fit a Class Probability-like tree (Splitting Rule Criterion – Gini, Maximum Depth - 8, Subtree Assessment Measure – Average Square Error). b. Run the Tree node and view the Variable Importance list in the Tree Results Output window or in the Variables window in the Interactive Tree Desktop Application view. Seventeen inputs are selected by the tree.

33 Chapter 4: Auxiliary Uses of Trees 4.1 Auxiliary Uses of Trees 4.2 Selecting Important Inputs 4.3 Collapsing Levels 4.4 Auxiliary Uses of Trees 4.5 Interactive Training

34 Interactive Training  Force and Remove Inputs  Define Split Values  Manually Prune branches and leaves  Demonstration: INSURANCE

Diagram 35

Manual Split Input Selection 36

Results from the First Split 37

The Outcome of the Second Level Split 38

Interactive Tree Training 39 Manual Pruning

Assessment Plot 40