CHAPTER 29 Classification and Regression Trees Dean L. Urban From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon Tables, Figures, and Equations
Table A matrix matching statistical techniques to various applications that require group classification or discrimination. Applications are discussed in the Introduction, coded here as groups defined on species composition (SPP) or environmental variables (ENV). Techniques are discriminant analysis (DA), group- contrast Mantel test (GC-Mantel), multivariate analysis of variance (MANOVA), nonparametric MANOVA (NPMANOVA), multi-response permutation procedures (MRPP), classification and regression trees (CART), generalized linear models (GLM), and generalized additive models (GAM). ApplicationAppropriate Techniques Exploratory data analysis: 1a. Do SPP groups differ?CART, DA, GC-Mantel, MANOVA, NPMANOVA, MRPP 1b. On which ENV variable(s)?CART, DA, partial GC-Mantel 2a. Do ENV groups differ?ISA, CART, GC-Mantel, MRPP 2b. On which SPP?ISA, CART, partial GC-Mantel 3a. Do habitats differ?DA, CART, MANOVA, NPMANOVA, MRPP, logistic regression, GLM, GAM, etc. 3b. On which variable(s)?CART, DA, partial GC-Mantel, logistic regression, etc. Predict group membership: 1c. on SPPISA (with some modification) 2c. on ENVCART, DA, (multinomial) logistic regression 3c. habitat variablesCART, DA, logistic regression
Table Indicator Species Analysis for the seven forest types identified via hierarchical clustering. Indicator values (IV) are percentage of perfect fidelity. Indicator values were tested for statistical significance based on 1000 permutations (**, p < 0.001; *, p < 0.005). Sequence = order of groups in data, Identifier = group identifier, Avg =Average IV, Max = Maximum IV, MaxGrp = Group with highest IV.
Figure Upper: Classification tree for 7 forest types on 15 environmental variables (function rpart, complexity parameter (cp) = , minsplit = 10, split = information).
Figure (Lower): Pruned classification tree, simplified by stopping the tree at the number of nodes corresponding to the point where the pruning curve crosses the minimum (1 S.E.) line (Fig. 29.2).
Table Misclassification table for the 7 forest types, based on a pruned CART model with 11 nodes (Fig. 29.3). Rows are actual forest types, columns are predicted forest types. Row totals are indexed as number correct/number misclassified. Total misclassification rate based on jack-knifing is 39/98 (39.8%).
Figure Cost-complexity pruning curve for the classification tree in Figure Error bars are estimated from 10 cross-validation subsets of the samples. The horizontal line is one standard error above the minimum error rate. “Inf” = infinite. Relative error is calculated by cross-validation.