Download presentation
Presentation is loading. Please wait.
Published byMitchell McCarthy Modified over 9 years ago
1
Feature Selection for Tree Species Identification in Very High Resolution Satellite Images Matthieu Molinier and Heikki Astola VTT Technical Research Centre of Finland matthieu.molinier@vtt.fimatthieu.molinier@vtt.fi, heikki.astola@vtt.fiheikki.astola@vtt.fi IGARSS 2011 Vancouver, 28.7.2011
2
2 17/11/2015 Introduction NewForest – Renewal of Forest Resource Mapping A 1.5-year study (2009-2010) funded by The Finnish Funding Agency for Technology and Innovation (TEKES), with Finnish Companies (forest) and Research Organizations (VTT and University of Eastern Finland UEF) Study motivation Improve methods for operative forest inventory from remote sensing data Species-wise estimates (e.g. stem volume) not accurate enough (accuracy vs. cost)
3
3 17/11/2015 NewForest approach in forest variable estimation Modelling based on satellite image pixel reflectances and contextual features Individual tree crown (ITC) detection and crown width estimation Combining data to predict total amount and size variation by species segmentation estimates Refined, more accurate species-wise estimates
4
4 17/11/2015 Study site Karttula / Kuopio, Central Finland 62.9007º N 27.2392º E Karttula GeoEye image, 26.6.2009, RGB NIR 10.5 km x 11.5 km, 3% clouds Mixed forest, spruce dominated 25% pine, 45% spruce, 30% deciduous (mainly birch)
5
5 17/11/2015 Optical image data pre-processing Rectification to geographic coordinate system (WGS84, NUTM35) Geo-coding corrected using Digital Elevation Model (Airborne Laser Scanning DEM) : mean corrections 2.65 m, maximum 20 m Calibration to Top Of Atmosphere (TOA) reflectances using the band- specific calibration coefficients Atmospherical correction into surface reflectances by applying the SMAC4-radiation transfer code
6
6 17/11/2015 Ground reference data Training data – from 222 field plots 212 field plots within GeoEye image area (2009) 10 additional 0-stem volume plots extracted visually Tree species classification : training data from 20 pure species field plots Testing data – from 178 field plots (mixed species) 178 field plots acquired in 2009, limited spatial distribution (several plots per forest stand) In total : 1164 ground objects mapped (276 pines, 277 spruces, 347 deciduous, 264 non-trees) GeoEye image : 10.5 km x 11.5 km
7
7 17/11/2015 Input for feature selection – 35 + 4 features R G B NIR PAN mean intensity within 1.5 m radius around tree candidates (TC) SPECTRAL (5) – set A CONTEXTUAL (9) – set B From PAN, 7.5 m radius around TC mean mean / median skewness kurtosis contrast pm1 : mean of brightest pixels ps1 : std of brightest pixels pm2 : mean of darkest pixels ps2 : std of darkest pixels SEGMENT-WISE (21) – set C From PAN, 3 segment sizes : 50 m 2, 85 m 2, 125 m 2 mean mean / median skewness kurtosis std : standard deviation pmean : partial mean pstd : partial standard deviation Probe variables random vectors or random permutations of a feature vector probe_gauss1, probe_gauss2 probe_shuffle1, probe_shuffle2
8
8 17/11/2015 Class definitions and training scheme Class # Class name 1pine 2spruce 3deciduous 4shadow 5open area / sunlit 6bare ground 7green vegetation Tree classes Non-tree classes WHOLE DATASET (1164 samples) 900 trees, 264 non-trees TESTING (391) MODEL DESIGN (773) 2 / 31 / 3 TRAINING (512) VAL (261) 2 / 31 / 3 stratified sampling to preserve classes proportions model buildingranking
9
9 17/11/2015 Feature selection preparation (Guyon et al., 2003) Feature normalization to the range [0, 1] Visual screening of scatter plots on the 35 real features : no obvious correlations, very few outlier samples Variable ranking – assessing features one by one with the most simple classifier (single threshold), one(+) vs all(-). 4 scores : –Fisher criteria F, scaled to [0 1] –R 2 – Pearson correlation coefficient for a single feature vs +/- labels –AUC : Area under ROC curve (Receiver-Operative Curve) –sum of previous scores (FR2AUC) All scores computed for every class, then averaged to rank the variables for all 7 classes and for tree classes only (1,2,3). No single feature outperformed significantly and consistently the others
10
10 17/11/2015 Feature selection and image classification Classification accuracy on validation set VAL (261) as a score Sequential Forward Selection (SFS) with three classification methods : –Linear Discriminant Analysis (LDA) –Quadatric LDA –k-nearest neighbor (kNN) classifier, k [2 9]. Feature selection and choice of k at the same time. Find the best minimal feature subset by a brute-force approach –10 best features from the SFS –retrain the best model using all modeling dataset (TRAIN + VAL) and test with the independent TEST set –brute force approach tractable in this case with simple classifiers –overcome the sub-optimality of SFS
11
11 17/11/2015 6-10 features is enough Spectral features performed best segment-wise features not suited to mixed species study Overall classification accuracy on tree classes over 80% Probe variables selected more often in the first places with LDA than with kNN : linear classifier too simple. Quadratic LDA was overfitting. kNN, k=5 best overall performance, and lowest difference from training to validation error => lower risk of overfitting
12
12 17/11/2015 Example of tree species classification map pine : 76 % spruce : 76 % deciduous : 88 % non-forest Pan-sharpened GeoEye image extract of 1 km x 1 km Individual tree crown classification with 5-NN classifier trained with pure species training data Non-forest mask generated with k-means clustering + cluster labeling
13
13 17/11/2015 Predicted species-wise stem numbers vs. field plot data Nspruce [stems/ha]Npine [stems/ha] Predicted [stems/ha] Ndecid [stems/ha] Predicted stem number per species plot against test data (178 test plots) Systematic under-estimation of predicted stem number with spruce and deciduous classes Noise partly due the small collecting radius (r = 8 m) of test data, and to geolocation differences between satellite and ground data 0500100015002000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 True number of spruces/field plot Predicted number of spruces/field plot y=0.98*x + 137.1 R 2 = 0.24 y=0.98*x + 137.1 y=0.33*x + 239.8y=0.56*x + 21.0 R 2 = 0.54 True number of broadleaved/field plot Predicted number of broadleaved/field plot y=0.85*x + 45.0 R 2 = 0.34 True number of pines/field plot Predicted number of pines/field plot
14
14 17/11/2015 Conclusions The methodology could detect individual treetops, identify their species and determine species proportions in mixed forest. Feature ranking and feature selection was performed on a set of 35 features for tree species classification. Several classifiers (model including a feature subset and a classification method) were built. The best turned out to be 5-NN with a subset of 6 features, mostly spectral. Segment-wise features could be discarded. The tree species proportion accuracy was good (1.4% to 3.5%), but the correlation of stem numbers / species not as good as expected. Future work Model selection with more elaborate classifiers (e.g. SVMs) Embedding feature selection into a cross-validation scheme Improve stem number estimation with adaptive filtering Tree crown width estimation validation with ground data
15
15 17/11/2015 matthieu.molinier@vtt.fi heikki.astola@vtt.fi Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.