Variance of Similar Neighbors compared to Random Imputation Nearest Neighbor Conference August 28-30, 2006 Kenneth B. Pierce Jr and Janet L. Ohmann Forestry.

Slides:



Advertisements
Similar presentations
Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Advertisements

Sensitivity of wildlife habitat capability models to spatial resolution of underlying mapped vegetation data Matthew J. Gregory 1 Janet L. Ohmann 2 Brenda.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
 Will help you gain knowledge in: ◦ Improving performance characteristics ◦ Reducing costs ◦ Understand regression analysis ◦ Understand relationships.
An Introduction to Multivariate Analysis
The Nationwide Forest Imputation Study (NaFIS): Challenges, results and recommendations from the western United States Matt Gregory 1, Emilie Grossmann.
The West Cascades Park City The West Cascades NaFISNationwide Forest Imputation Study.
Spatial monitoring of late-successional forest habitat over large regions with nearest-neighbor imputation Janet Ohmann 1, Matt Gregory 2, Heather Roberts.
Prediction and Imputation in ISEE - Tools for more efficient use of combined data sources Li-Chun Zhang, Statistics Norway Svein Nordbotton, University.
Mapping change in live and dead forest biomass with Landsat time-series, remeasured plots, and nearest-neighbor imputation Janet Ohmann 1, Matt Gregory.
Basic geostatistics Austin Troy.
Gradients or hierarchies? Which assumptions make a better map? Emilie B. Grossmann Janet L. Ohmann Matthew J. Gregory Heather K. May.
All for one or One for All? Mapping many species individually vs. simultaneously with random forest. Emilie Henderson, Janet Ohmann, Matthew Gregory, Heather.
Gradient Nearest Neighbor (GNN) Method for Local-Scale Basal Area Mapping: FIA 2005 Symposium Interpolation Contest Kenneth B. Pierce Jr., Matthew J. Gregory*
Community Ecology Conceptual Issues –Community integrity (Clements v Gleason) Individualistic responses versus super-organism –Community change St ate-transition.
Section 12.2: Statistics and Parameters. You analyzed data collection techniques. Identify sample statistics and population parameters. Analyze data sets.
Spatial Interpolation
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Basics: Notation: Sum:. PARAMETERS MEAN: Sample Variance: Standard Deviation: * the statistical average * the central tendency * the spread of the values.
Simple Linear Regression Analysis
8-5 Testing a Claim About a Standard Deviation or Variance This section introduces methods for testing a claim made about a population standard deviation.
Mapping Chemical Contaminants in Oceanic Sediments Around Point Loma’s Treated Wastewater Outfall Kerry Ritter Ken Schiff N. Scott Urquhart Dawn Olson.
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
Why Worry? Predictive models of vegetation-environment relationships are an important first step in mapping vegetation classes at regional scales. There.
Ch 5 Practical Point Pattern Analysis Spatial Stats & Data Analysis by Magdaléna Dohnalová.
Title: Spatial Data Mining in Geo-Business. Overview  Twisting the Perspective of Map Surfaces — describes the character of spatial distributions through.
Data Mining Techniques
Gradient Nearest Neighbor Imputation Maps for Landscape Analysis in the Pacific Northwest Janet L. Ohmann Pacific Northwest Research Station USDA Forest.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Spatial Statistics Applied to point data.
PROBABILITY & STATISTICAL INFERENCE LECTURE 3 MSc in Computing (Data Analytics)
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.
Why Is It There? Getting Started with Geographic Information Systems Chapter 6.
REGENERATION IMPUTATION MODELS FOR INTERIOR CEDAR HEMLOCK STANDS Badre Tameme Hassani, M.Sc., Peter Marshall PhD., Valerie LeMay, PhD., Temesgen Hailemariam,
Gridding Daily Climate Variables for use in ENSEMBLES Malcolm Haylock, Climatic Research Unit Nynke Hofstra, Mark New, Phil Jones.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Report on Intrusion Detection and Data Fusion By Ganesh Godavari.
Data Types Entities and fields can be transformed to the other type Vectors compared to rasters.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
Applications of Spatial Statistics in Ecology Introduction.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 12 Making Sense of Advanced Statistical.
Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.
Geo479/579: Geostatistics Ch4. Spatial Description.
© Phil Hurvitz, Introduction to Geographic Information Systems and their Potential Uses as Management Tools in Commercial Shellfish Farming Introduction.
So, what’s the “point” to all of this?….
Lecture 6: Point Interpolation
Describing & Examining Scientific Data Science Methods & Practice BES 301 November 4 and 9, 2009.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Exploratory Spatial Data Analysis (ESDA) Analysis through Visualization.
Technical Details of Network Assessment Methodology: Concentration Estimation Uncertainty Area of Station Sampling Zone Population in Station Sampling.
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Matrix Models for Population Management & Conservation March 2014 Lecture 10 Uncertainty, Process Variance, and Retrospective Perturbation Analysis.
Grid-based Map Analysis Techniques and Modeling Workshop Part 1 – Maps as Data Part 2– Surface Modeling Part 3 – Spatial Data Mining Linking geographic.
Using Regional Models to Assess the Relative Effects of Stressors Lester L. Yuan National Center for Environmental Assessment U.S. Environmental Protection.
Puulajeittainen estimointi ja ei-parametriset menetelmät Multi-scale Geospatial Analysis of Forest Ecosystems Tahko Petteri Packalén Faculty.
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Why Is It There? Chapter 6. Review: Dueker’s (1979) Definition “a geographic information system is a special case of information systems where the database.
Introduction to Spatial Statistical Analysis
Multiple Imputation using SOLAS for Missing Data Analysis
PCB 3043L - General Ecology Data Analysis.
URBDP 422 Urban and Regional Geo-Spatial Analysis
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Mean Shift Segmentation
REMOTE SENSING Multispectral Image Classification
Stochastic Hydrology Random Field Simulation
A protocol for data exploration to avoid common statistical problems
Presentation transcript:

Variance of Similar Neighbors compared to Random Imputation Nearest Neighbor Conference August 28-30, 2006 Kenneth B. Pierce Jr and Janet L. Ohmann Forestry Sciences Lab, PNW Research Station, Corvallis

Project Objectives Map fuels and vegetation using Gradient Nearest Neighbor (GNN) imputationMap fuels and vegetation using Gradient Nearest Neighbor (GNN) imputation Produce maps of plot-level tree attributes as complete coveragesProduce maps of plot-level tree attributes as complete coverages Provide a high degree of analytical flexibility for end-usersProvide a high degree of analytical flexibility for end-users Provide robust accuracy assessmentProvide robust accuracy assessment California Sierra (Mediterranean) Coastal Oregon (Maritime) Eastern Washington (Temperate steppe)

Presentation Objectives Give an brief overview of Gradient Nearest Neighbor (GNN) imputation as a techniqueGive an brief overview of Gradient Nearest Neighbor (GNN) imputation as a technique Describe the use of imputation for mapping natural variabilityDescribe the use of imputation for mapping natural variability Describe the use of imputation for mapping sampling sufficiencyDescribe the use of imputation for mapping sampling sufficiency Examine the variability among nearest neighbors in gradient space versus a random set of neighborsExamine the variability among nearest neighbors in gradient space versus a random set of neighbors Examine the change in variability when restricting plot selection to those well represented in gradient spaceExamine the change in variability when restricting plot selection to those well represented in gradient space

Major Steps in GNN Imputation mapping: 1) Assembling Data1) Assembling Data 2) Statistical Modeling (CCA)2) Statistical Modeling (CCA) 3) Imputation/Map Creation3) Imputation/Map Creation 4) Accuracy Assessment4) Accuracy Assessment 5) Applications and Risk Assessment5) Applications and Risk Assessment

Statistical Modeling: Canonical Correspondence Analysis Multivariate statistical methodMultivariate statistical method –results in a weight for each spatial variable as to its relationship with the multiple response variables Modeling Variables-used as model Y’sModeling Variables-used as model Y’s –Structure models (BAC, BAH, STPH, CWD) –Species models Mapping Variables-retained with plot-map linkMapping Variables-retained with plot-map link

Neighbors in Gradient Space Direct gradient analysis allows assignment of a multi- dimensional location to each predicted pixelDirect gradient analysis allows assignment of a multi- dimensional location to each predicted pixel

A Pixel in Plotland (example 0.5 * elevation * precip)

A Pixel in Plotland Sample plot locations in gradient space (example 0.5 * elevation * precip)

A Pixel in Plotland Target Location in Gradient Space Sample plot locations in gradient space (example 0.5 * elevation * precip)

A Pixel in Plotland Five closest neighbors (example 0.5 * elevation * precip)

A Pixel in Plotland Twenty closest neighbors (example 0.5 * elevation * precip)

A Pixel in Plotland Interplot Distances (example 0.5 * elevation * precip)

How far is far in gradient space?

Major Steps in GNN mapping: 1) Data Preparation/Screening1) Data Preparation/Screening 2) Statistical Modeling2) Statistical Modeling 3) Imputation/Map Creation3) Imputation/Map Creation 4) Accuracy Assessment4) Accuracy Assessment 5) Applications and Risk Assessment5) Applications and Risk Assessment

Imputing/Assigning plot id’s Nearest neighbor (single neighbor, retains covariance, MSN-like)Nearest neighbor (single neighbor, retains covariance, MSN-like) Summary statistic of multiple neighbors (single value, kNN-like)Summary statistic of multiple neighbors (single value, kNN-like) Etc. (i.e. many other contortions possible)Etc. (i.e. many other contortions possible)

Process Uncertainty/Natural VariabilityProcess Uncertainty/Natural Variability –Uncontrollable (often unmeasurable) Natural disturbances Demographic stochasticity Anthropogenic disturbances Sampling UncertaintySampling Uncertainty –Not entirely uncontrollable Limited sampling Spatial averaging Temporal sample variation Sources of Uncertainty For Ecological Detectives Hilborn & Mangel 1997

Map integral (Value of Map)Map integral (Value of Map) –Confusion matrices/Kappa (local) –Correlation statistics (local) –Regional histograms (regional) Map explicit (Map of Values)Map explicit (Map of Values) –Confidence maps (Process) –Support (Sampling) Accuracy assessments “obsessive transparency”

Overview of maps Vegetation mapVegetation map –the predicted value Neighbor Count mapNeighbor Count map –a measure of sampling sufficiency for a specific ecological location Natural Variability mapNatural Variability map –the variability in response at the most similar locations

Natural variability maps Variability maps are created by calculating the variance for the 5 nearest neighbors at each location (a value other than 5 could certainly be used)Variability maps are created by calculating the variance for the 5 nearest neighbors at each location (a value other than 5 could certainly be used)

Sampling sufficiency maps Centile thresholds are selected from the histogram of interplot distancesCentile thresholds are selected from the histogram of interplot distances Gradient distance grids are retained for the 20 nearest neighbors during imputationGradient distance grids are retained for the 20 nearest neighbors during imputation The 20 distance grids are compared to the threshold values and a count grid is created where a value of 20 indicates 20 plots were within the threshold valueThe 20 distance grids are compared to the threshold values and a count grid is created where a value of 20 indicates 20 plots were within the threshold value

4m Aerial Photo

Expected value Basal Area m 2 /ha 0 61

10 th Quantile Threshold map 0 20 Neighbors out of 20 within the threshold distance

20th Quantile Threshold map 0 20 Neighbors out of 20 within the threshold distance

50th Quantile Threshold map 0 20 Neighbors out of 20 within the threshold distance

Natural Variability Standard deviation of 5 nearest neighbors for BA (m2/ha)

“Premise of Imputation” TheoremTheorem –Places similar in X-values should be similar in Y-values. PostulatePostulate –The 5 plots most similar to a location in X-values should have reduced variance in Y-values compared to 5 random plots

Methods 1.Create 1000 random spatial locations 2.Sample the plot ids from the 5 nearest neighbors and the 10-th and 20-th centile sufficiency grids 3.Select an attribute and query the plot data with the five nearest neighbor ids 4.Calculate the variance for the five nearest neighbors at each of the 1000 sample points 5.Plot the density of the variance values (Black line)

data Random sets of 5 values

Methods continued Create 1000 sets of 5 random plots and repeat the variance calculation and density plot (Open circles)Create 1000 sets of 5 random plots and repeat the variance calculation and density plot (Open circles) Subset the random locations and plot data sets into groups based on their sufficiency scores: 0, 5, >15 [# of 20 nearest neighbors w/in the threshold value]Subset the random locations and plot data sets into groups based on their sufficiency scores: 0, 5, >15 [# of 20 nearest neighbors w/in the threshold value] Plot densities by subgroupsPlot densities by subgroups Create and plot random sets from appropriate subgroupsCreate and plot random sets from appropriate subgroups

data Random sets of 5 values Bootstrap set All imputed Neighbors >=15 Neighbors >5 Neighbors <15 Neighbors <=5

Bootstrap set All imputed Neighbors >=15 Neighbors >5 Neighbors <15 Neighbors <=5

Is this a general result? Sorta.Sorta.

Conclusions