Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk

Slides:



Advertisements
Similar presentations
Statistical Sampling.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Inference for Regression
Lecture 6 Outline – Thur. Jan. 29
Estimation in Sampling
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
HSRP 734: Advanced Statistical Methods July 24, 2008.
Impact of plot size on the effect of competition in individual-tree models and their applications Jari Hynynen & Risto Ojansuu Finnish Forest Research.
Statistics: Data Analysis and Presentation Fr Clinic II.
Chapter 10 Sampling and Sampling Distributions
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Why sample? Diversity in populations Practicality and cost.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Chapter Topics Types of Regression Models
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
© 2000 Prentice-Hall, Inc. Chap Forecasting Using the Simple Linear Regression Model and Correlation.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Cruise Design Measurement Computations. Determined by 1.Value of product(s) 2.Variability within the stand 3.Budget limitations Sampling Intensity.
Sampling Designs Avery and Burkhart, Chapter 3 Source: J. Hollenbeck.
Slide 1 Detecting Outliers Outliers are cases that have an atypical score either for a single variable (univariate outliers) or for a combination of variables.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Comparison of FVS projection of oak decline on the Mark Twain National Forest to actual growth and mortality as measured over three FIA inventory cycles.
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Correlation & Regression
 Used by NRCS foresters  Simple and Quick way to determine  Average tree diameter  Range of tree diameters  Trees per acre  Stand composition 
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Rogue Valley LiDAR Mid Rogue River, South Roseburg, Rock Cr, Upper Coquille Flown in 2012 ~1.4 Million Acres ~650K BLM/ODF.
REGENERATION IMPUTATION MODELS FOR INTERIOR CEDAR HEMLOCK STANDS Badre Tameme Hassani, M.Sc., Peter Marshall PhD., Valerie LeMay, PhD., Temesgen Hailemariam,
A Statistical Analysis of Seedlings Planted in the Encampment Forest Association By: Tony Nixon.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 19 Linear Patterns.
PCB 3043L - General Ecology Data Analysis. OUTLINE Organizing an ecological study Basic sampling terminology Statistical analysis of data –Why use statistics?
6 - 1 © 1998 Prentice-Hall, Inc. Chapter 6 Sampling Distributions.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 7-1 Chapter 7 Sampling Distributions Basic Business Statistics.
Understanding Your Data Set Statistics are used to describe data sets Gives us a metric in place of a graph What are some types of statistics used to describe.
Academic Research Academic Research Dr Kishor Bhanushali M
Lecture PowerPoint Slides Basic Practice of Statistics 7 th Edition.
Imputating snag data to forest inventory for wildlife habitat modeling Kevin Ceder College of Forest Resources University of Washington GMUG – 11 February.
PCB 3043L - General Ecology Data Analysis.
ICCS 2009 IDB Seminar – Nov 24-26, 2010 – IEA DPC, Hamburg, Germany Training Workshop on the ICCS 2009 database Weights and Variance Estimation picture.
1 CSI5388 Practical Recommendations. 2 Context for our Recommendations I This discussion will take place in the context of the following three questions:
From the population to the sample The sampling distribution FETP India.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Establishing Plots to Monitor Growth and Treatment Response Some do’s and don’ts A discussion.
Dealing with Species (and other hard to get variables)
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Dan Couch Olympia, WA DNR January, Outline Rogue Valley LiDAR Background Stand Metrics Comparison Results:  LiDAR vs Timber Cruise BLM Forest Inventory.
Puulajeittainen estimointi ja ei-parametriset menetelmät Multi-scale Geospatial Analysis of Forest Ecosystems Tahko Petteri Packalén Faculty.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Variability. The differences between individuals in a population Measured by calculations such as Standard Error, Confidence Interval and Sampling Error.
Francisco Mauro, Vicente Monleon, and Hailemariam Temesgen
Variability.
Joonghoon Shin Oregon State University
Statistics Stratification.
Operationalizing Lidar in Forest Inventory
LiDAR Enhanced Forest Inventory
2. Stratified Random Sampling.
SAMPLE DESIGN: HOW MANY WILL BE IN THE SAMPLE—DESCRIPTIVE STUDIES ?
Bob McGaughey Pacific Northwest Research Station
CORRELATION AND MULTIPLE REGRESSION ANALYSIS
Presentation transcript:

Slide Number 1 of 31 Properties of a kNN tree-list imputation strategy for prediction of diameter densities from lidar Jacob L Strunk Nov 15, 2013

Slide Number 2 of 31 Note “Diameter Density” in this context is referring to the probability density function – Proportion of trees in a diameter class (dcl) p(d) dcl (cm)

Slide Number 3 of 31 Please! Share your critiques It will help the manuscript

Slide Number 4 of 31 Overview Conclusion Context kNN Tree List – some background Study objectives Indices of diameter density prediction performance Results Conclusion Revisited

Slide Number 5 of 31 Conclusion kNN diameter density estimation with LiDAR was comparable with or superior (precision) to a Post- stratification approach with 1600 variable radius plots – Equivalent: Stratum, Tract – Superior: Plot, Stand Mahalanobis with k=3, lidar P30 and P90 metrics worked well Stratification did not help – may be due to sample size (~200)

Slide Number 6 of 31 Aside: Brief Survey 1.Who uses diameter distributions in day to day work? 2.For distribution users: Inventory type? - Stand, Stratum, 2-stage, lidar … 3.Approach? – parametric, non-parametric 4.Sensitivity to noise in distribution? – Very, not very, what noise 5.What measure of reliability do you use for diameter information? Index of fit P-value None CIs for bins Other p(d) dcl (cm)

Slide Number 7 of 31 Study Context Lidar approaches can support many applications in forest inventory and monitoring But - Diameter densities are required for forestry applications - Lidar literature (on diameters) unclear on performance Problems: – Performance measures: p-values & indices* – No comparisons with traditional approaches – No Asymptotic properties *I am OK, with indices, but the suggested indices may not be enough Lidar x Field-Derived y

Slide Number 8 of 31 kNN – a flexible solution Multivariate Conceptually simple Works well with some response variables Realistic answers (can’t over-extrapolate) Can impute a tree list directly (kNN TL) – No need for theoretical distribution

Slide Number 9 of 31 KNN weaknesses Error statistics often not provided Sampling inference not well described in literature People don’t understand limitations in results Can’t extrapolate Imputed values may be noisier than using mean… Poorer performance than OLS (NLS) usually

Slide Number 10 of 31 kNN TL Imputation Impute: Substitute for a missing value 1.Measure X everywhere (U) 2.Measure Y on a sample (s) 3.Find distance from s to U In X space – height, cover, etc. 4.Donate y from sample to nearest (X space) neighbors – Bring distance-weighted tree list Auxiliary Data Plot Color = x values Forest (e.g.)

Slide Number 11 of 31 kNN Components k (number of neighbors imputed) Distance metric (Euc., Mah., MSN, RF) Explanatory variables – Age, Lidar height, lidar cover, FWOF (modeled) Response variables (only for MSN and RF) – Vol, BA, Ht, Dens., subgroups (> 5 in., > …) Stratification – dominant species group (5) – Hardwood, Lobl. Pine, Longl. Pine, Slash P.,

Slide Number 12 of 31 Distance Metrics yaImpute documentation: “Euclidean distance is computed in a normalized X space.” “Mahalanobis distance is computed in its namesakes space.” “MSN distance is computed in a projected canonical space.” “randomForest distance is one minus the proportion of randomForest trees where a target observation is in the same terminal node as a reference observation” I assume this means shifted and rescaled. normalized

Slide Number 13 of 31 Study Objectives Enable relative, absolute, comparative inference for diameter density prediction Contrast kNN and TIS performances Evaluate kNN strategies for diameter density prediction TIS “Traditional” inventory system

Slide Number 14 of 31 “Enable relative, absolute, comparative inference” I will argue that we have already settled on some excellent measures of performance: – Coefficient of determination (R 2 ) – Root mean square error (RMSE) – Standard error (sample based estimator of sd of estimator) Very convenient for inference Straight forward to translate to diameter densities…

Slide Number 15 of 31 Indices – Residual Computation Computed with Leave One Out (LOO) cross-validation LOO cross-validation 1.Omit one plot 2.Fit model 3.Predict omitted plot 4.Compute error metric (observed vs predicted) 5.Repeat n-1 times After LOO cross-validation 1.Compute indices from vector of residual

Slide Number 16 of 31 Proposed Indices – index I Similar to coefficient of determination – Relative inference Variability around population density Variability of predictions around observed densities

Slide Number 17 of 31 Proposed Indices – index K Similar to model RMSE – absolute (and comparative) inference

Slide Number 18 of 31 Proposed Indices – index k n Similar to standard error (estimated sd of estimator) – comparative inference

Slide Number 19 of 31 Why these indices Index I – Intuitive inference: how much variation did we explain – Doesn’t work well when comparing 2 designs… Index K – an absolute measure of prediction performance that to compare models from different sampling designs Index k n – Look at asymptotic estimation properties with different designs and modeling strategies

Slide Number 20 of 31 Study Area Savannah River Site – South Carolina – 200 k acres & wall to wall lidar – ~200 FR plots (40 trees / plot on average) – 1600 VR plots (10 trees / plot on average)

Slide Number 21 of 31 FR Design 200 Fixed radius 1/10 th or 1/5 th acre plots Distributed across size and species groups Survey-grade GPS positioning

Slide Number 22 of 31 Traditional Inventory System (TIS) “Traditional” –i.e. a fairly common approach Design: ~200K acres of forest on Savannah River Site 1607 Variable Radius Plots ~gridded Post-stratification on field measurements – Height – Cover – Dominant Species Group ->63 Strata Stands (~30 acres each) Serves as baseline or reference approach – Lots of people familiar with its performance

Slide Number 23 of 31 Results 1.Compare kNN with TIS Plot Stratum Stand Tract 2.kNN components K & distance metric predictors responses stratification

Slide Number 24 of 31 Results: Point /Plot kNN performance >> TIS performance – Reasonable result – kNN can vary with lidar height & cover metrics – Single density within a stratum for TIS K = Quasi RMSE (smaller is better)

Slide Number 25 of 31 Results Stratum: Setup 63 Strata 200 FR plots ~ 3 FR plots / stratum Stratum-level kNN performance: Single Stratum

Slide Number 26 of 31 Results Stand: Setup Stands 200 FR plots ~ 0 FR plots / stand No asymptotic properties Stand-level kNN performance: Stands w/in Single Stratum

Slide Number 27 of 31 K kNN TIS vs kNN Tract performances (k n ) were equivalent for kNN and TIS k n = Quasi Standard Error (smaller is better) K = Quasi RMSE (smaller is better) Stratum Level Performance (63 TIS Strata) *Stand* level performance (7000+ stands)

Slide Number 28 of 31 Tract Equivalent performance kNN and TIS – k n TIS: 0.12 – k n kNN: 0.10

Slide Number 29 of 31 kNN strategy Components

Slide Number 30 of 31 New Index Index I – Similar to coefficient of determination (R 2 ) – Closer to 1.0 is better

Slide Number 31 of 31 kNN: k & distance metric

Slide Number 32 of 31 kNN: Predictors Best Performing Worst Performing

Slide Number 33 of 31 kNN: Responses Best Performing Worst Performing

Slide Number 34 of 31 kNN: Stratification Large n Small n

Slide Number 35 of 31 Conclusion - Revisited kNN diameter density estimation with LiDAR is comparable with or superior (precision) to a Post- stratified approach with variable radius plots – Equivalent: Stratum, Tract – Superior: Plot, Stand Mahalanobis with k=3, lidar P30 and P90 metrics worked well Stratification did not help – may be due to sample size (~200)

Slide Number 36 of 31 Thank you! Any questions? Comments? Suggestions? I am planning to submit a manuscript in December