The Nationwide Forest Imputation Study (NaFIS): Challenges, results and recommendations from the western United States Matt Gregory 1, Emilie Grossmann 2, Janet Ohmann 3, Heather Roberts 1 1 F orest Ecosystems and Society, Oregon State University 2 Institute for Natural Resources, Oregon State University 3 PNW Research Station, USDA Forest Service
The Genesis of NaFIS Loose affiliation of researchers from USFS and universities split into eastern and western teams Core objective: Assess the utility of nearest- neighbors mapping as a basis for nationwide: resource estimation landscape scenario/ecosystem modeling forest threats assessment and forest health monitoring Develop tools and software to aid mapping Funding from FHTET, WWETAC and FIA
Why Use Nearest Neighbor Techniques in Forest Mapping? Spatially explicit forest assessments for simulation modeling, e.g. studies require tree lists for growth and yield modeling multivariate maps for habitat capability modeling Small area estimation for national level inventories Role of forest inventories expanding from not only answering “How much?” but also “Where?” (McRoberts and Tomppo, 2007)
NaFIS Pilot Areas
Oregon (7)Montana (19)Colorado (28) Plot count Year range2001 – – – 2007 Total area (mi. ha) * Forest percentage estimate NaFIS West Pilot Areas OregonMontanaColorado
Plot Database NaFIS data and methodology concepts Geospatial datasetsFIA Annual plot data IDSpecies 1 BASpecies 2 BASpecies 3 BA Response variables | Species matrix | Y matrix 1 2 Topography ClimateLandsat TM
Plot Database NaFIS data and methodology concepts Geospatial datasetsFIA Annual plot data IDSpecies 1 BASpecies 2 BASpecies 3 BA Response variables | Species matrix | Y matrix 1 2 IDANNPRETM4DEM Environmental variables | Covariates | X matrix
Design Choices for Nearest Neighbors Mapping Distance metric to use to determine neighbor plots Euclidean (EUC), CCorA (MSN), CCA (GNN), Random Forest NN (RFNN) Number of neighbors (k) used in prediction With k>1, weighting of neighbor distances none, inverse distance, inverse squared distance
Distance metrics – Euclidean (k=1) Environment matrix (X) Species matrix (Y) geographic space X1X1 X2X2 gradient/feature space plot number
Distance metrics – MSN, GNN (k=1) geographic space LC 1 LC 2 gradient/feature space plot number Direct ordination (CCorA for MSN, CCA for GNN) Environment matrix (X) Species matrix (Y)
Distance metrics – RFNN (k=1) geographic space gradient/feature space Random forest trees Environment matrix (X) Species matrix (Y) ?
Distance metrics – RFNN (k=1) August maximum temp < PSME TSHE PSMETHPL ABAMTSME PSME PIPO High elevation (> 1244) High August temperature (> 23.24°C) High reflectance in TM Band 5 (> 24) Elevation < 1625TM Band 5 < 24 August maximum temp < Summer mean temp < Season temperature difference < Elevation < 1244 Simple classification tree for dominant species
Distance metrics – RFNN (k=1) Distance = number of trees minus number of times a plot was picked Random Forest - A “Forest” of classification trees Each tree is built from a random subset of plots and variables
Values of k geographic space gradient/feature space Axis 1 Axis 2 k=5 (Weighted) average value of attribute
Absent Nearest Neighbor Map Examples Color composite of Landsat TM 4|5|3 Quad. mean diameter of trees >= 3cm Basal area of trees >= 100cm Presence of Thuja plicata LowHighLowHigh Present
Map Assessment Protocols McRoberts (2009) Tailored for nearest neighbors mapping Homoscedasticity, RMSE, bias, outlier determination, mapped extrapolations, reference set distribution in feature space, maintenance of covariance Grossmann et al. (2009) Community composition dissimilarity metrics (Bray-Curtis, binomial) Diversity measures (Shannon-weaver, beta) Determination of unrealistic species assemblages Riemann et al. (2010) Diagnostics tailored for any continuous geospatial data Useful across many spatial scales
Accuracy Assessment Local (plot/pixel) scale Normalized RMSE, categorical kappa statistics, individual species kappa statistics Dissimilarity metrics, species richness, unlikely species co-occurrence Regional (whole map) scale Area comparison of design-based (plots) vs. model-based (map) estimates
Accuracy Assessment – Distance metric Normalized RMSE BAA_GE_3 Basal area per hectare of trees >= 2.5 cm BAA_GE_100 Basal area per hectare of trees >= 100 cm QMDA_GE_3 Quadratic mean diameter of trees >= 2.5 cm QMDA_GE_13 Quadratic mean diameter of trees >= 12.5 cm VPH_GE_3 Volume per hectare of trees >= 2.5 cm Forest type kappa statistics FOR_TYPE_AN Forest type as determined by FIA FOR_TYPE_GR Forest type group as determined by FIA From Oregon models with k=1 neighbor
Accuracy Assessment – Distance metric From Oregon models with k=1 neighbor Species presence-absence kappa for five most common species Species richnessBray-Curtis dissimilarityBinomial dissimilarity
Accuracy Assessment – Distance metric From Oregon models with k=1 neighbor Area comparison of design-based (plots) vs. model-based (map) estimates
Spatial pattern – Distance metric Low Quad. mean diameter of trees >= 3cm High Basal area per ha. of trees >= 100cm Low High Thuja plicata presence EUCMSNGNNRFNN Absent Present
Accuracy Assessment – Values of k Normalized RMSE BAA_GE_3 Basal area per hectare of trees >= 2.5 cm BAA_GE_100 Basal area per hectare of trees >= 100 cm QMDA_GE_3 Quadratic mean diameter of trees >= 2.5 cm QMDA_GE_13 Quadratic mean diameter of trees >= 12.5 cm VPH_GE_3 Volume per hectare of trees >= 2.5 cm Forest type kappa statistics FOR_TYPE_AN Forest type as determined by FIA FOR_TYPE_GR Forest type group as determined by FIA From Oregon RFNN models
Accuracy Assessment – Values of k From Oregon RFNN models Species presence-absence kappa for five most common species Species richnessBray-Curtis dissimilarityBinomial dissimilarity
Accuracy Assessment – Values of k From Oregon RFNN models Area comparison of design-based (plots) vs. model-based (map) estimates
Accuracy Assessment – Values of k Errors of species omission Errors of species commission Areal extent of common species From Oregon RFNN models
Spatial pattern – Values of k k = 1k = 5k = 10k = 20 Nonforest Both species absent Both species present Tsuga heterophylla Pinus ponderosa From Oregon RFNN models Percent overlap of unlikely co-occurring species
Spatial pattern – Values of k Low Quad. mean diameter of trees >= 3cm High Basal area per ha. of trees >= 100cm Low High Thuja plicata presence k = 1k = 5k = 10k = 20 Absent Present
Key Findings - Accuracy Assessment Accuracy varied little across distance metrics, although RFNN slightly better with categorical variables (such as forest type or forest type group) Accuracy varied substantially across values of k RMSE, forest type kappa improve with higher k Area distributions, species community metrics degrade with higher k New assessment protocols will help guide users on appropriate uses of nearest neighbors maps
The “k conundrum” Need for structural attribute accuracy must be weighed against need for reasonable forest community composition Possible approaches: Two step modeling where candidate neighbors must come from appropriate composition classes (McRoberts, 2009) Hierarchical nearest neighbor modeling – iterative neighbor finding based on spatial patterning grains
NaFIS implementation challenges Consistency/currency of plot data (greatly eased with FIA annual design) Mapping nonforest areas (some preliminary products have been developed) Currency of mapped information – how best to account for disturbance Incorporating emerging science into a production mapping environment
For more information NaFIS products and software NaFIS west final report Track me down for PDF NaFIS collaborators (in alphabetical order) Jerry Beatty (WWETAC), Ken Brewer (formerly RSAC), Mark Finco (RSAC), Andy Finley (MSU), Matt Gregory (OSU), Emilie Grossmann (OSU), Ron McRoberts (NRS), Janet Ohmann (PNWRS), Heather Roberts (OSU), Frank Sapio (FHTET), Eric Smith (FHTET), and Brian Roberts (MSU)
Accuracy Assessment – MRLC Regions Normalized RMSE BAA_GE_3 Basal area per hectare of trees >= 2.5 cm BAA_GE_100 Basal area per hectare of trees >= 100 cm QMDA_GE_3 Quadratic mean diameter of trees >= 2.5 cm QMDA_GE_13 Quadratic mean diameter of trees >= 12.5 cm VPH_GE_3 Volume per hectare of trees >= 2.5 cm Forest type kappa statistics FOR_TYPE_AN Forest type as determined by FIA FOR_TYPE_GR Forest type group as determined by FIA From RFNN models with k=1 neighbor
Accuracy Assessment – MRLC Regions From RFNN models with k=1 neighbor Species presence-absence kappa for five most common species Species richnessBray-Curtis distanceBinary distance
Distance metrics – RFNN (k=1)
Random Forest - A “Forest” of classification trees Distance metrics – RFNN (k=1) Each tree is built from a random subset of plots and variables.