Biophysical Gradient Modeling
Management Needs Decision Support Tools – Baseline Information Vegetation characteristics Forest stand structure Fuel loads – Predictive Mapping Vegetation maps Fuels maps
What are the different vegetation types in the Sky Island systems of Chihuahuan Desert Borderlands? Are the local- and landscape-scale abundance and distribution patterns of vegetation related to variation in the biophysical environment and the spectral characteristics of the vegetation? Can those species-environment relationships be used in a predictive manner to map vegetation across the landscape? Research Questions
Species-Environment Relationships in SW North America Niering and Lowe (1984)
Sky Island Forests Sierra Madre Oriental and Occidental Post-Pleistocene refugia High vascular plant diversity and endemism
Integrated approach Merge extensive field sampling with image classification of vegetation/fuel characteristics and biophysical gradient modeling. Davis Mountain Alterna Lampropellis alterna
Vegetation Sampling 600 Permanent plots –Systematic sampling grid –Captured topographic variability –Circular, fixed-area plots Tree attributes –Species ID, DBH, height, live crown height, spatial location
Topographic Data } Digital Elevation Model
Analysis Species data for each plot (Basal Area/ Density) Cluster Analysis Species IV = Sum Rel BA + Rel Dens Vegetation Types CART Species-EnvironmentRelationships Topographic Data For Each Plot ENVI Decision Tree Vegetation and Fuels Maps
9 Dominant Forest Types Pinyon Pine Forest Oak-Pinyon-Juniper Forest Alligator Juniper Forest Gray Oak Forest Emory Oak Forest Cypress-Fir Forest Ponderosa-SW White Pine Forest Gallery Forest Graves Oak Forest Dry sites High solar radiation Upper topographic positions Mesic sites Low solar radiation Valley bottoms Tolerant Species Good Competitors Elevation high low
CART Basics: How do you parse these data into homogeneous groups?
Classification Given a collection of records Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes.
Development of CART Leo Breiman- discovered tree-based methods of Classification that later became machine learning. Also know as data mining. Wrote CART: Classification and Regression Trees with Jerome Friedman and Richard Olshen in and also Random Forests….
Classification and Regression Trees A supervised learning algorithm that recursively partitions heterogeneous data into successive homogeneous subsets using binary splits Non-parametric and non-linear Can handle numerical or categorical Easy interpretability of results Output can be directly fed into ENVI Decision Tree to classify your image
Steps for Producing a CART Model 1.Determine the vegetation/fuel types using field generated data or prior knowledge of the site. 2.Extract spectral and landform metric data from imagery and DEMs 3.Inspect the training data and check for an extremely unbalanced dataset. 4.Grow the CART model to its full size and prune it using the 1- SE rule. 5.Use 10-fold cross-validation and bootstrapping to validate the model accuracy using misclassification % and the Kappa statistic. 6.Code the maps using ENVI decision tree and visually asses the “look” of the map. 7.Validate the maps in the field to produce misclassification % and the Kappa statistic.
Impurity of a Node Need a measure of impurity of a node to help decide on how to split a node, or which node to split The measure should be at a maximum when a node is equally divided amongst all classes The impurity should be zero if the node is all one class
Measures of Impurity Misclassification Rate Gini Index In practice the first is not used for the following reasons: Situations can occur where no split improves the misclassification rate The misclassification rate can be equal when one option is clearly better for the next step
Visual Example
Selection of Splits We select the split that most decreases the Gini Index. This is done over all possible places for a split and all possible variables to split. We keep splitting until the terminal nodes have very few cases or are all pure – this is an unsatisfactory answer to when to stop growing the tree, but it was realized that the best approach is to grow a larger tree than required and then to prune it!
Pruning the Tree I The best method of arriving at a suitable size for the tree is to grow an overly complex one then to prune it back. The pruning is based on the misclassification rate. However the error rate will always drop (or at least not increase) with every split. This does not mean however that the error rate on Test data will improve.
Pruning the Tree II The solution to this problem is cross- validation. One version of the method carries out a 10 fold cross validation where the data is divided into 10 subsets of equal size (at random) and then the tree is grown leaving out one of the subsets and the performance assessed on the subset left out from growing the tree. This is done for each of the 10 sets. The average performance is then assessed.
Advantages and Disadvantages Advantages – Handles data with any structure – Robust to outliers – Machine learning-little input from analyst – Final results can be summarized in logical if- then conditions Disadvantages – Knowing when to stop splitting – Does not use combinations of variables – Computations are complex in determining best split conditions
…back to Mapping fuels and Vegetation in the Chihuahuan Desert Borderlands
Misclassification = 29.1% Kappa = 0.57
Once map is generated… perform field validation