Download presentation
Presentation is loading. Please wait.
Published byEmery Robertson Modified over 8 years ago
1
Nonparametric Weighted Feature Extraction (NWFE) and Its Kernel-based Version (KNWFE) Bor-Chen Kuo Graduate School of Educational Measurement and Statistics, National Taichung University, Taiwan, R.O.C. kbc@mail.ntctc.edu.tw Cheng-Hsuan Li Institute of Electrical Control Engineering, National Chiao Tung University, Hsinchu, Taiwan, R.O.C. Graduate School of Educational Measurement and Statistics, National Taichung University, Taiwan, R.O.C. ChengHsuanLi@gmail.com
2
Outline Hyperspectral image data and some applications Hyperspectral image data and some applications The influence of increasing dimensionality The influence of increasing dimensionality The Hughes phenomenon The Hughes phenomenon Feature selection and feature extraction Feature selection and feature extraction Nonparametric Weighted Feature Extraction (NWFE) Nonparametric Weighted Feature Extraction (NWFE) Kernel method Kernel method Kernel Nonparametric Weighted Feature Extraction (KNWFE) Kernel Nonparametric Weighted Feature Extraction (KNWFE) The classified results of Washington DC image The classified results of Washington DC image Conclusions Conclusions
3
Image Space Spectral SpaceFeature Space Sample Hyperspectral Image Data Representation
4
Application I 資料來源:逢甲大學地理資訊系統研究中心
5
Application II
6
Application III
7
Application IV 都市地區之應用是在大地區之瞭解、變遷 地區之偵測及簡易之人口估算(巴黎地區 曾試行過)。 糧食局每年二次之水稻雜糧調查、大型環 境災害 。
8
The Power of Increasing Dimensionality x1x1x1x1 x2x2x2x2 x3x3x3x3 x2x2x2x2 x1x1x1x1 x3x3x3x3 x1x1x1x1 x3x3x3x3 x2x2x2x2
9
The Hughes Phenomenon (1)
10
The Hughes Phenomenon (2)
11
A System for Hyperspectral Data Classification Label Training Samples Determine Quantitative Class Descriptions Data Adjustment Calibration, Adjustment for the atmosphere, the solar curve, goniometric effects, etc. Clustering Prob. Map Results Map Pre-Gathered Spectra Observations from the ground Observations of the ground Indirect Method Direct Method Feature Selection Hyperspectral Data Collection Classifier Class Conditional Feature Extraction
12
x1x1x1x1 xpxpxpxp x1x1x1x1 xpxpxpxp f1f1f1f1 f2f2f2f2 f2f2f2f2 f1f1f1f1 Feature selection: select l out of p measurements Feature extraction: map p measurements to l measurements
13
Difference Between Feature Selection and Feature Extraction
14
Feature Extraction v.s. Feature Selection AdvantageDisadvantage Selection cut in measurements expensive easy interpretation often approximative Extractioncheap need all measurements can be nonlinearcriterion sub-optimal Selection Extraction
15
Feature Extraction and Classification Process Compute the Scatter Matrices S b and S w Compute the Scatter Matrices S b and S w Regularize the within-calss Scatter Matrix S w Regularize the within-calss Scatter Matrix S w Eigenvalue Decomposition Eigenvalue Decomposition Classifier Classification Result Classification Result Transformed Testing Data Transformed Testing Data Transformed Training Data Transformed Training Data FeatureExtraction
16
Principal component analysis (PCA, 1901) finds directions in data... - which retain as much variation as possible - which make projected data uncorrelated - which minimise squared reconstruction error Principal component analysis (PCA, 1901) finds directions in data... - which retain as much variation as possible - which make projected data uncorrelated - which minimise squared reconstruction error -50 5 0 5 RkRk RlRl Principle Component Analysis
17
Classification Using PCA -10-8-6-4-20246810 -10 -8 -6 -4 -2 0 2 4 6 8 10 -10-8-6-4-20246810 -10 -8 -6 -4 -2 0 2 4 6 8 10
18
What is the measure of separability? The purpose of FE is to mitigate the effect of Hughes Phenomenon. A separability The method is trying to find a transformation matrix A such that the class separability of transformed data (Y=A T X ) is maximized in a lower dimension space. separability What is the measure of separability ? separability. Usually the trace of is used as the separability.
19
Linear Discriminant Analysis Feature Extraction (LDA or DAFE)
20
is the between class distance. is the between class distance. is the within class distance. is the within class distance. The weights of between and within class distances are the same. Disadvantages: 1. only useful for normally distributed data. 2. only L-1 features can be extracted. 2. only L-1 features can be extracted. LDA (DAFE) Class i Class j
21
Difference Between Feature Selection and Feature Extraction
22
Nonparamentric Weighted Feature Extraction (NWFE)
26
Large Weight Small Weight
27
Nonparamentric Weighted Feature Extraction (NWFE)
30
Large Weight Small Weight
31
Nonparamentric Weighted Feature Extraction (NWFE)
32
NWFE focus on these vectors and put the different weights on them.
33
Nonparametric Weighted Feature Extraction (NWFE; Kuo & Landgrebe, 2002, 2004)
34
The Performance of NWFE In Kuo and Landgrebe paper, they compare the performances of NWFE, LDA, aPAC-LDR, and NDA. In Kuo and Landgrebe paper, they compare the performances of NWFE, LDA, aPAC-LDR, and NDA. The NWFE performs better than others. The NWFE performs better than others. Reference: Bor-Chen Kuo and David A. Landgrebe, “ Nonparametric weighted feature extraction for classification, ” IEEE Trans. Geosci. Remote Sens., vol. 42, no. 5, pp.1096-1105, May 2004.
35
The Kernel Trick Use a feature mapping to embed the samples from original space into a feature space H, a Hilbert space with higher dimensionality. Use a feature mapping to embed the samples from original space into a feature space H, a Hilbert space with higher dimensionality. In H, the patterns can be discovered as linear relations. In H, the patterns can be discovered as linear relations. We can compute the inner product of samples in the feature space directly from the original data items using a kernel function κ (not feature mapping ). We can compute the inner product of samples in the feature space directly from the original data items using a kernel function κ (not feature mapping ). Assume that the sample in H can be represented by a dual form, the combination of training samples. Assume that the sample in H can be represented by a dual form, the combination of training samples.
36
The Kernel Trick
37
Characterization of Kernels A function which is either continuous or has a finite domain, can be decomposed into a feature map into a Hilbert space H applied to both its arguments followed by the evaluation of the inner product in H if and only if it satisfies the finitely positive semi-definite property.
38
Some Widely Used Kernel Functions Linear Kernel Linear Kernel Polynomial Kernel Polynomial Kernel RBF (Gaussian) Kernel RBF (Gaussian) Kernel
39
PCA & KPCA PCAKPCA
40
Kernel-based Feature Extraction and Classification Process Compute the Scatter Matrices S b and S w in H Compute the Scatter Matrices S b and S w in H Regularize the within-calss Scatter Matrix S w Regularize the within-calss Scatter Matrix S w Eigenvalue Decomposition Eigenvalue Decomposition Classifier Classification Result Classification Result Transformed Testing Data Transformed Testing Data Transformed Training Data Transformed Training Data FeatureExtraction Testing Data in Feature Sapce H Training Data in Feature Space H Use Implicit Feature Map
41
Kernel Nonparametric Weighted Feature Extraction (KNWFE)
42
Problems Problem I: How to transform the scatter matrices of KNWFE with the kernel matrix? Problem II: How to solve the singularity of the kernel matrix? Problem III: How to project our samples in the projected space?
43
KNWFE Algorithm 1. 2. 3. 4. 5. 6. 1. Problem I is solved here.
44
7. 8. 9. 10. 11. KNWFE Algorithm 7. 7. Problem II is solved here.
45
1. 13. KNWFE Algorithm 12. 12. Compute dual form: Problem III is solved here.
46
Dataset Washington DC Washington DC The dimensionality of this hyperspectral image is 191. The dimensionality of this hyperspectral image is 191. The number of classes is 7. The number of classes is 7. There are two kinds of training data sets. One is with 40 training samples in every class, and the other is with 100 training samples. There are two kinds of training data sets. One is with 40 training samples in every class, and the other is with 100 training samples.
47
Experimental Design Every 20-th band, which begins from the first one, is selected for the 10 bands case. Every 20-th band, which begins from the first one, is selected for the 10 bands case. The parameter of RBF kernel is the mean of variances in every band of training samples. The parameter of RBF kernel is the mean of variances in every band of training samples. Feature Extraction NWFE KNWFE Linear Kernel (Linear K) Polynomial Kernel Degree 1 (Poly K-1) Degree 2 (Poly K-2) Degree 3 (Poly K-3) RBF Kernel (RBF K) Classifier Quadratic Bayes Normal Classifier (qdc) 1NN Classifier Parzen Classifier
48
The Classification Results of Real Dataset N i =40 N i =100 Mean of accuracies using 1-9 features (DC Mall, Quadratic Bayes Normal Classifier)
49
The Classification Results of Real Dataset N i =40 N i =100 Mean of accuracies using 1-9 features (DC Mall, 1NN Classifier )
50
The Classification Results of Real Dataset N i =40 N i =100 Mean of accuracies using 1-9 features (DC Mall, Parzen Classifier )
51
The Classification Results of Washington DC Mall Image Thematic MapNWFE KNWFE
52
The Classification Results of Washington DC Mall Image Thematic MapNWFE KNWFE
53
The Classification Results of Washington DC Mall Image Thematic MapNWFE KNWFE
54
The Classification Results of Washington DC Mall Image Thematic MapNWFE KNWFE
55
The Classification Results of Washington DC Mall Image Thematic MapNWFE KNWFE
56
Experimental Results The performances of all three classifiers with KNWFE features are better than those of classifiers with NWFE features. The performances of all three classifiers with KNWFE features are better than those of classifiers with NWFE features. The polynomial kernel with degree 2 outperforms other kernel functions in 1NN and Parzen cases. The polynomial kernel with degree 2 outperforms other kernel functions in 1NN and Parzen cases. Among three classifiers, quadratic Bayes normal classifier has the best performance. The best classification accuracy is 0.9 and obtained by qdc classifier with 9 features extracted by KNWFE with linear kernel in the case of N i =100. Among three classifiers, quadratic Bayes normal classifier has the best performance. The best classification accuracy is 0.9 and obtained by qdc classifier with 9 features extracted by KNWFE with linear kernel in the case of N i =100. Comparing figures from NWFE and KNWFE, one sees that the performance of KNWFE is better than that of NWFE in almost in all classes. Comparing figures from NWFE and KNWFE, one sees that the performance of KNWFE is better than that of NWFE in almost in all classes.
57
Conclusion We proposed a new kernel-based nonparametric weighted feature extraction. We proposed a new kernel-based nonparametric weighted feature extraction. We have analyzed and compared NWFE and KNWFE both theoretically and experimentally. We have analyzed and compared NWFE and KNWFE both theoretically and experimentally. From theoretical point of view, NWFE is a special case of KNWFE with linear kernel and the result of a real hyperspectral image shows that the average classification accuracy of applying KNWFE is better than that of applying NWFE. From theoretical point of view, NWFE is a special case of KNWFE with linear kernel and the result of a real hyperspectral image shows that the average classification accuracy of applying KNWFE is better than that of applying NWFE. We can state that, in our case study, the use of KNWFE is more beneficial and yielding better results than NWFE. We can state that, in our case study, the use of KNWFE is more beneficial and yielding better results than NWFE.
58
Thanks for Your Attentions ! and Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.