Classification and Regression Trees (CART)

Classification and Regression Trees (CART)
An Introduction

Acknowledgements Some slides and content are borrowed from Mike Coan (USGS) who taught a CART workshop at UW several years ago.

CART – What is it? Classification and Regression Trees (CARTs) are used to classify imagery using spectral and ancillary data of differing types Non-parametric (independent of the statistical distribution of the training data) Can use continuous and non-continuous predictor variables Can model continuous (regression trees) or categorical (classification trees) target variables Statistically selects the most useful data from a set of spectral and ancillary data Generates classification rules that can be interpreted (by humans) and evaluated Is computationally rapid and can provide high quality classification results

What does a decision tree model look like?
D-tree output syntax elev <= 1622: :...asp <= 2: 81 (62/1) : asp > 2: : :...asp <= 9: 41 (12/1) : asp > 9: 81 (15) elev > 1622: :...slp > 10: 41 (34) slp <= 10: :...pidx > 64: 41 (15) pidx <= 64: :...slp <= 1: 81 (37/13) slp > 1: :...elev > 1885: 41 (42/3) elev <= 1885: :...asp <= 12: :...slp <= 9: 41 (75/24) : slp > 9: 81 (2) Comparable psuedo-code syntax If elev <= 1622 if asp <= 2 then landcover = 81 otherwise if asp <= 9 then landcover = 41 Otherwise If elev > 1622 if slp > 10 then if pidx > 64 if slp <= 1 (and so on…)

General description of the algorithm
y Decision trees find logical ways to divide multi-dimensional feature space x

CART procedure Collect and process inputs Build and run the CART model
Training data Predictor variables (spectral + ancillary data) Build and run the CART model Tree builing Cross-validation Pruning Boosting and bagging Translate to spatial domain

Required/Optional Inputs
Training data Validation data Spectral Data Satellite imagery Aerial photography Derived spectral inputs (e.g., NDVI, texture) Temporal inputs Ancillary Data Terrain data particularly useful for land cover classes Any other continuous or nominal (categorical) data that might be predictive (e.g., soil types, geology, climate, etc.) Same data set split by user or software

Example of inputs for a CART
Sample data for mapping in Southern Utah Reflectance (TM bands 1,2,3,4,5,7 of leaf on, leaf off, spring dates): Tasseled Cap (wetness, greenness, brightness of same dates): Topographic Derivatives (aspect, DEM, position index, slope): Thermal (TM band 6 for each date): Date Mosaics (how each date mosaic was assembled): Training data :

CART Training Data CART algorithms are data hungry and require lots (hundreds to thousands) of training pixels (points) Some are used to “train” the model (build the tree) Some are reserved (either in a separate file or using cross validation internally) to validate the model The number of training pixels should be roughly equal for each class because CART is sensitive to the size of training sets Training data should represent all types and variation within types just as for any supervised classification

Training Data – point locations corresponding to particular pixels

Unbiased? Should they be?

Extracting Training Variables
CART software often operates in a non-spatial domain on text files that contain (x,y) coordinates of each training site with attributes of all predictor variables in same row of a table Use GIS or image processing software to extract appropriate values for each training “point”

Extract the Spectral Values of the Training Points: Manage Data > Convert pixels to ASCII (Erdas 2013)

Example of input text file
The order of the variables in the *.data file must match the order of the variables in the *.names file! lc. | to be classified elev: continuous. slp: continuous. asp: discrete 17. pidx: continuous. lform: 0,1,2,3,4,5,6. lc: 0,41,52,81. 1577,15,7,66,4,81 1499,19,5,50,4,81 1485,20,1,0,1,81 1507,10,10,50,4,81 1534,10,1,50,4,81 1548,1,1,50,4,81 1562,0,1,50,1,81 1542,13,17,33,1,81 Discrete data can be categorical or not, but can only have certain values

How does CART build a tree?
Decision trees can be univariate or multivariate Univariate trees have splits that are based on a single variable (parallel to feature space axes) Multivariate trees have splits based on more than one predictor simultaneously (not parallel to feature space axes). Most remote sensing implementations use univariate trees.

Univariate Splitting Multivariate Splitting

How does CART build a tree (cont.)?
CARTs use binary recursive splitting – at each node in the tree the remaining data (from training points) are split into two groups that have maximum dissimilarity At each node the predictor variable and thresholds are chosen to either maximize dissimilarity or minimize similarity between nodes (different statistical ways of measuring the same thing) Thresholds are chosen based on the data distributions of the training data

Shows spectral confusion—some spectral clusters contain multiple informational classes; other informational classes split among spectral classes.

Output decision tree translated to spatial domain = map!

Regression tree – continuous output classes
For estimating continuous variables like percent canopy cover and height, etc., use different statistical methods to estimate thresholds than for predicting categories

Alternatives for estimating a continuous variable
Physically-based models (e.g. Li and Strahler (1992)) Too complex to be inverted (Li and Strahler modeled forest canopy closure using BRDF, but many other factors besides BRDF are important and not included in model, so can’t estimate BRDF from canopy measurements) Spectral mixture models: End-members – green vegetation, non-photosynthetic vegetation, soil etc., not directly interpretable as the target variables (for example, fraction of veg might not be related to target variable of veg height) Assumptions on spectral mixing may not be valid; Inversion means retrieving input parameters from the model outputs.

Alternatives (cont.) Empirical models -- results directly interpretable as the target variables Linear regression cannot approximate non-linear relationships Regression tree can approximate complex nonlinear relationships Neural nets

Regression tree cartoon
Dependent variable could be canopy closure. Independent could be NDVI. But different relationships on different background soils, for example.

Model validation More than one way to evaluate decision tree quality during tree building 1) Create a file containing training points withheld from developing the decision tree, to be used exclusively for validation 2) Cross-validation option – more realistic than above, and uses all training data sequentially (none withheld).

Cross validation Divides the training samples into n equal sized subsets Develops a tree model using (n-1) subsets of training points and evaluate the model using the remaining subset Repeats Step 2 for n times, each time using a different subset for evaluation Averages the results of these n tests

Pruning the d-tree model
A d-tree model can be overfit CART can create a “branch” for every single training pixel. Essentially modeling the noise in the training data. Neither realistic nor practical Two options to control overfitting Stop Splitting Method: Specify the minimum number of pixels or minimum variance in a node which will no longer be split. Larger values increase severity of pruning Pruning: Build entire tree and then remove branches that don’t contribute much to accuracy (considered better than “stop splitting”).

Pruning Decision Trees

Pruning algorithms Reduced error pruning Pessimistic error pruning
Error based pruning Critical value pruning (Mingers 1989) Cost complexity pruning (Breiman et al. 1984) Ross Quinlan 1987, 1993 Reduced error pruning (REP): uses an independent data set and looks at error predicting this set with tree to prune tree Pessimistic error pruning (PEP): Does not require independent pruning set. Error based pruning (EBP): Improved version of pessimistic error pruning Critical value pruning (CVP): Cost complexity pruning (CCP): Uses a pruning data set and considers size of tree.

Improving trees: Boosting and Bagging
Two strategies have been developed for producing optimal trees Boosting: develop new classification trees based on the results of previous classification trees Bagging: uses subsets of the training data to develop new classification trees (Bauer & Kohavi, 1999 as cited by Lawrence et al. 2004)

Boosting Why? Often improves accuracy by 5% or more! Develops an initial tree model using all training pixels and classifies them Assigns higher weights to misclassified pixels, and then creates a new tree, forcing the algorithm to focus on difficult training data Repeat iteratively All developed d-tree models are used to classify new sample points, with the final prediction a weighted vote of the predictions of those models

Boosting (Example) Original Training set : Equal Weights to all training samples

First round…

Second round…

Third round…

Combine results…

Bagging Bagging uses random subsets of training data to produce trees and then chooses a final tree based on the best agreement among all of the trees

How do decision trees stack up against other methods?
Multivariate trees don’t appear to perform any better than univariate trees and are harder to interpret (Pal and Mather 2003) D-trees perform similarly to maximum likelihood classifications given same data, but D-trees can handle disparate data types and Max Likelihood can’t (Friedl and Brodley 1997) D-trees not recommended for high dimensionality data (e.g., hyperspectral)

CART Software CART is a fairly common statistical method and there are many software packages that can be used S-plus R (free statistical software): Random forest See5 (Categorical output) Cubist (Continuous output) Converting decision trees to the spatial domain requires some custom interface to a particular geospatial software package Erdas interface developed at the USGS but only for older version ENVI has decision tree capability R code, etc. Ross Quinlan, Rulequest, Australia

Classification and Regression Trees (CART)

Similar presentations

Presentation on theme: "Classification and Regression Trees (CART)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classification and Regression Trees (CART)

Similar presentations

Presentation on theme: "Classification and Regression Trees (CART)"— Presentation transcript:

Similar presentations

About project

Feedback