Classification and Regression Trees

Slides:



Advertisements
Similar presentations
Chapter 7 Classification and Regression Trees
Advertisements

Random Forest Predrag Radenković 3237/10
CHAPTER 9: Decision Trees
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Decision Tree.
Classification Techniques: Decision Tree Learning
Overview Previous techniques have consisted of real-valued feature vectors (or discrete-valued) and natural measures of distance (e.g., Euclidean). Consider.
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Decision Tree Algorithm
Lecture 5 (Classification with Decision Trees)
Three kinds of learning
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Example of a Decision Tree categorical continuous class Splitting Attributes Refund Yes No NO MarSt Single, Divorced Married TaxInc NO < 80K > 80K.
Classification.
Ordinal Decision Trees Qinghua Hu Harbin Institute of Technology
Comp 540 Chapter 9: Additive Models, Trees, and Related Methods
R OBERTO B ATTITI, M AURO B RUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Classification Part 4: Tree-Based Methods
Data Mining: Classification
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
CART:Classification and Regression Trees Presented by; Pavla Smetanova Lütfiye Arslan Stefan Lhachimi Based on the book “Classification and Regression.
Chapter 9 – Classification and Regression Trees
Classification and Regression Trees (CART). Variety of approaches used CART developed by Breiman Friedman Olsen and Stone: “Classification and Regression.
Lecture 7. Outline 1. Overview of Classification and Decision Tree 2. Algorithm to build Decision Tree 3. Formula to measure information 4. Weka, data.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Biophysical Gradient Modeling. Management Needs Decision Support Tools – Baseline Information Vegetation characteristics Forest stand structure Fuel loads.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Decision Tree Learning Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 25, 2014.
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
Today Ensemble Methods. Recap of the course. Classifier Fusion
For Wednesday No reading Homework: –Chapter 18, exercise 6.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Decision Trees Example of a Decision Tree categorical continuous class Refund MarSt TaxInc YES NO YesNo Married Single, Divorced < 80K> 80K Splitting.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Lecture Notes for Chapter 4 Introduction to Data Mining
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
ECE 471/571 – Lecture 20 Decision Tree 11/19/15. 2 Nominal Data Descriptions that are discrete and without any natural notion of similarity or even ordering.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Decision Tree Learning DA514 - Lecture Slides 2 Modified and expanded from: E. Alpaydin-ML (chapter 9) T. Mitchell-ML.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Chapter 7. Classification and Prediction
Introduction to Machine Learning and Tree Based Methods
LECTURE 20: DECISION TREES
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
C4.5 algorithm Let the classes be denoted {C1, C2,…, Ck}. There are three possibilities for the content of the set of training samples T in the given node.
Ch9: Decision Trees 9.1 Introduction A decision tree:
Decision Tree Saed Sayad 9/21/2018.
ECE 471/571 – Lecture 12 Decision Tree.
Roberto Battiti, Mauro Brunato
Lecture 05: Decision Trees
Decision Trees By Cole Daily CSCI 446.
Statistical Learning Dong Liu Dept. EEIS, USTC.
Classification with CART
INTRODUCTION TO Machine Learning
LECTURE 18: DECISION TREES
INTRODUCTION TO Machine Learning 2nd Edition
STT : Intro. to Statistical Learning
Presentation transcript:

Classification and Regression Trees Classification and regression trees (CART) is a non-parametric technique that produces either classification or regression trees, depending on whether the dependent variable is categorical or numeric, respectively.

Terminology A decision tree consists of nodes and leaves, with each leaf denoting a class. Classes (tall or short) are the outputs of the tree. Attributes (gender and height) are a set of features that describe the data. The input data consists of values of the different attributes. Using these attribute values, the decision tree generates a class as the output for each input data.

Advantages Can cope with any data structure or type No distributional assumptions are required. Invariant under transformations of the variables No assumption of homogeneity. The explanatory variables can be a mixture of categorical, interval and continuous. Classification has a simple form. Can apply in a GIS. Uses conditional information effectively Is robust with respect to outliers Gives an estimate of the misclassification rate Especially good for high-dimensional and large data sets. Produce useful results by using a few important variables.

Disadvantages CART does not use combinations of variables Tree can be deceptive – if variable not included it could be as it was “masked” by another (similar to regression). Tree structures may be unstable – a change in the sample may give different trees Tree is optimal at each split – it may not be globally optimal. An important weakness: Not based on a probabilistic model, no confidence interval.

Features of CART Binary Splits Splits based only on one variable Decisions in the process Selection of the Splits (Thresholds) Decisions when to decide that a node is a terminal node (i.e. not to split it any further) Assigning a class to each terminal node

Cart Algorithm Define the problem and important variables. Select a splitting criterion (likelihood). Initialization: create a tree with one node containing all the training data. Splitting: find the best question for splitting each terminal node. Split the one terminal node that results in the greatest increase in the likelihood. Stopping: if each leaf node contains data samples from the same class, or some pre-set threshold is not satisfied, stop. Otherwise, continue splitting. Pruning: use an independent test set or cross-validation to prune the tree.

Impurity of a Node Need a measure of impurity of a node to help decide on how to split a node, or which node to split The measure should be at a maximum when a node is equally divided amongst all classes The impurity should be zero if the node is all one class

Measures of Impurity Misclassification Rate Information, or Entropy Gini Index In practice the first is not used for the following reasons: Situations can occur where no split improves the misclassification rate The misclassification rate can be equal when one option is clearly better for the next step

Problems with Misclassification Rate I 40 of A 60 of A 60 of B 40 of B Possible split Possible split Neither improves misclassification rate, but together give perfect classification!

Information If a node has a proportion of pj of each of the classes then the information or entropy is: where 0log0 = 0 Note: p=(p1,p2,…. pn)

Gini Index This is the most widely used measure of impurity (at least by CART) Gini index is:

Tree Impurity We define the impurity of a tree to be the sum over all terminal nodes of the impurity of a node multiplied by the proportion of cases that reach that node of the tree Example i) Impurity of a tree with one single node, with both A and B having 400 cases, using the Gini Index: Proportions of the two cases= 0.5 Therefore Gini Index= 1-(0.5)2- (0.5)2 = 0.5

Tree Impurity Calculations Numbers of Cases Proportion of Cases Gini Index A B pA pB p2A p2B 1- p2A- p2B 400 0.5 0.25

Number of Cases Proportion of Cases Gini Index Contrib. To Tree A B pA pB p2A p2B 1- p2A - p2B 300 100 0.75 0.25 0.5625 0.0625 0.375 0.1875 Total 200 400 0.33 0.67 0.1111 0.4444 0.3333 1

Selection of Splits We select the split that most decreases the Gini Index. This is done over all possible places for a split and all possible variables to split. We keep splitting until the terminal nodes have very few cases or are all pure – this is an unsatisfactory answer to when to stop growing the tree, but it was realized that the best approach is to grow a larger tree than required and then to prune it!

Pruning the Tree As I said earlier it has been found that the best method of arriving at a suitable size for the tree is to grow an overly complex one then to prune it back. The pruning is based on the misclassification rate. However the error rate will always drop (or at least not increase) with every split. This does not mean however that the error rate on Test data will improve.

Source: CART by Breiman et al.

Pruning the Tree The solution to this problem is cross-validation. One version of the method carries out a 10 fold cross validation where the data is divided into 10 subsets of equal size (at random) and then the tree is grown leaving out one of the subsets and the performance assessed on the subset left out from growing the tree. This is done for each of the 10 sets. The average performance is then assessed.

Results for MRT Model Input for MRT Model can be simulation results