All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather.

All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather Roberts and Harold Zald August 10, 2012 Ecological Society of America Annual Meeting Portland, Oregon

Species Distribution Modeling Been around for a long time, and has exploded over the last decade. With the rise of new powerful statistical techniques and GIS tools, the development of predictive habitat distribution models has rapidly increased in ecology. – Guisan and Zimmerman 2000 Generalized Linear/Additive Models Neural networks Bayesian models Ordination Classification methods Web of Knowledge: ‘species distribution’ – 2000 - 2001: 556 articles – 2011 – 2012: 1,389 articles

SDM Uses From Giusan and Thuiller 2005

Strategies for community-level modeling ‘assemble first, predict later’ ‘predict first, assemble later’ ‘assemble and predict together’ --Ferrier & Guisan 2006 Objective: Compare two strategies for community-level predictive mapping.

You Are Here Plot Data Forest Inventory and Analysis Annual Plots: 1948 plots Techniques – Random Forest Based (Breiman 2001, Cutler et al. 2007) Binary prediction (R package: randomForest, Liaw & Wiener 2002) Continuous prediction Nearest Neighbor Imputation (R package: yaImpute, Crookston & Finley 2008) Spatial Data Layers Climate (from PRISM climate data) Soil Parent Material (from SSURGO/Soil Resources Inventory) Topography (from National Elevation Dataset) Spectral reflectance (LANDSAT)

# True / # Trees = 4/6 =.66 For RF Regression, predicted value for a pixel is the average of all the predictions of nodes.

Random forest -- Nearest-Neighbor imputation Imputation = Filling in missing values from existing values.

study area (2) Place new pixel within feature space (3) find nearest- neighbor plot within feature space (4) impute nearest neighbor’s Plot ID # to pixel Methods: k-NN feature spacegeographic space Elevation Rainfall (1) Place plots within feature space “Assemble and Predict Together”

(2) calculate axis scores of pixel from mapped data layers study area (3) find nearest- neighbor plot in gradient space (4) impute nearest neighbor’s Plot ID# to pixel Methods: GNN (Ohmann and Gregory 2002) gradient spacegeographic space CCA Axis 2 (e.g., Temperature, Elevation) CCA Axis 1 (e.g., Rainfall, local topography) (1) conduct gradient analysis of plot data

study area Methods: Random Forest Nearest Neighbor Imputation Random Forest spacegeographic space

2 3 4 5 6 7 8 9 10 3 3 31 1 1 7 7 7 7 7 5 5 5 2 2 2 54 6 8 Nearest Neighbor Plot: #3 Second Nearest Neighbor: #5

Strategies for communitiy-level modeling ‘assemble first, predict later’ ‘predict first, assemble later’ – Random forest – classification (binary prediction) – Random forest – regression (continuous prediction) ‘assemble and predict together’ – Random forest – imputation (continuous prediction) --Ferrier & Giusan 2006

Dimensions of Map Accuracy Single-species metrics – Range – presence/absence – Abundance – How much basal area? – Is the distribution of values predicted realistic? Community-level metrics – Diversity – Composition

Sensitivity: True positives/(True Positives + False Negatives) Specificity: True Negatives/(True Negatives + False Positives) True Skill Statistic (TSS): Sensitivity + Specificity - 1

Root Mean Square Difference: 17.72 18.46

Root Mean Square Difference: 21.34 18.73

Single Species Models Range – Random Forest – Binary: best – Random Forest – Nearest Neighbor: acceptable – Random Forest -- Continuous: fail Abundance (Basal Area) – RMSD Random Forest – Continuous: best Random Forest – Nearest Neighbor: acceptable Random Forest – Binary: NA – Empirical Cumulative Distribution Functions: (predicted value distributions) Random Forest – Nearest Neighbor: best Random Forest – Continuous: fail Random Forest – Binary: NA

Diversity: Species Richness and Evenness

Beta Diversity

11356 11354 22354 22344 22344 Average Alpha Diversity for Blue Pixel: 3.04

11356 11354 22354 22344 22344

Results – Composition What is the Bray-Curtis distance between our observed and predicted communities?

Discussion Species absences are an important dimension of composition – Disturbance? – Succession? – Competition/Facilitation? – Dispersal limitations? Community assembly rules can be used to help refine mapped species lists. (e.g., Guisan and Rahbek, 2011) But… imputation avoids the pitfalls & complications of re-assembling communities after mapping because they are never taken apart.

Conclusions Practical Considerations: – Models of individual species may be Strongest in one dimension Useful for understanding species’ ecology The best option for some types of available data (e.g., presence-only data from museum specimens) – Nearest Neighbor mapping is a useful tool for building multipurpose maps. Ranges and abundances Composition Diversity

Acknowledgements Nationwide Forest Imputation Study Landscape Ecology Modeling Mapping and Analysis team in Corvallis.

All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather.

Similar presentations

Presentation on theme: "All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather.

Similar presentations

Presentation on theme: "All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather."— Presentation transcript:

Similar presentations

About project

Feedback