Download presentation
Presentation is loading. Please wait.
Published byJeffry Wells Modified over 9 years ago
1
Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement of the classification performance Danger: Possible loss of information
2
Basic approaches to DR Feature extraction Transform t : R D R n Creation of a new feature space. The features lose their original meaning. Feature selection Selection of a subset of the original features.
3
Principal Component Transform (Karhunen-Loeve) PCT belongs to “feature extraction”, t is a rotation y = Tx, T is a matrix of the eigenvectors of the original covariance matrix C x PCT creates D new uncorrelated features y, C y = T’ C x T n features with the highest variations are kept
4
Principal Component Transform
5
Applications of the PCT “Optimal” data representation, compaction of the energy Visualization and compression of multimodal images
6
PCT of multispectral images Satellite image: B, G, R, nIR, IR, thermal IR
7
Why is PCT bad for classification purposes? PCT evaluates the contribution of individual features solely by their variation, which may be different from their discrimination power.
8
Why is PCT bad for classification purposes?
9
Separability problem Dimensionality reduction methods for classification purposes (Two-class problem) must consider the discrimination power of individual features. The goal is to maximize the “distance” between the classes.
10
An Example 3 classes, 3D feature space, reduction to 2D High discriminabilityLow discriminability
11
DR via feature selection Two things needed: Discriminability measure (Mahalanobis distance, Bhattacharyya distance) MD 12 = (m 1 – m 2 )(C 1 + C 2 ) -1 (m 1 – m 2 )’ Selection strategy Feature selection optimization problem
12
Feature selection strategies Optimal - full search, complexity D!/(D-n)!n! - branch & bound Sub-optimal - direct selection (optimal if the features are not correlated) - sequential selection (SFS, SBS) - generalized sequential selection (SFS(k), Plus k minus m, floating search)
13
A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise.
14
A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise. Crucial questions in practical applications: Can the class-condicional distributions be assumed to be normal? What happens if this assumption is wrong?
15
A two-class example Class 2 Class 1 x2 is selected x1 is selected
16
Conclusion PCT is optimal for representation of “one- class” data (visualization, compression, etc). PCT should not be used for classification purposes. Use feature selection methods based on a proper discriminability measure. If you still use PCT before classification, be aware of possible errors.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.