Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement.

Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement of the classification performance Danger: Possible loss of information

Basic approaches to DR Feature extraction Transform t : R D  R n Creation of a new feature space. The features lose their original meaning. Feature selection Selection of a subset of the original features.

Principal Component Transform (Karhunen-Loeve) PCT belongs to “feature extraction”, t is a rotation y = Tx, T is a matrix of the eigenvectors of the original covariance matrix C x PCT creates D new uncorrelated features y, C y = T’ C x T n features with the highest variations are kept

Principal Component Transform

Applications of the PCT “Optimal” data representation, compaction of the energy Visualization and compression of multimodal images

PCT of multispectral images Satellite image: B, G, R, nIR, IR, thermal IR

Why is PCT bad for classification purposes? PCT evaluates the contribution of individual features solely by their variation, which may be different from their discrimination power.

Why is PCT bad for classification purposes?

Separability problem Dimensionality reduction methods for classification purposes (Two-class problem) must consider the discrimination power of individual features. The goal is to maximize the “distance” between the classes.

An Example 3 classes, 3D feature space, reduction to 2D High discriminabilityLow discriminability

DR via feature selection Two things needed: Discriminability measure (Mahalanobis distance, Bhattacharyya distance) MD 12 = (m 1 – m 2 )(C 1 + C 2 ) -1 (m 1 – m 2 )’ Selection strategy Feature selection  optimization problem

Feature selection strategies Optimal - full search, complexity D!/(D-n)!n! - branch & bound Sub-optimal - direct selection (optimal if the features are not correlated) - sequential selection (SFS, SBS) - generalized sequential selection (SFS(k), Plus k minus m, floating search)

A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise.

A priori knowledge in feature selection The above discriminability measures (MD, BD) require normally distributed classes. They are misleading and inapplicable otherwise. Crucial questions in practical applications: Can the class-condicional distributions be assumed to be normal? What happens if this assumption is wrong?

A two-class example Class 2 Class 1 x2 is selected x1 is selected

Conclusion PCT is optimal for representation of “one- class” data (visualization, compression, etc). PCT should not be used for classification purposes. Use feature selection methods based on a proper discriminability measure. If you still use PCT before classification, be aware of possible errors.

Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement.

Similar presentations

Presentation on theme: "Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement.

Similar presentations

Presentation on theme: "Why to reduce the number of the features? Having D features, we want to reduce their number to n, where n<<D Benefits: Lower computing complexity Improvement."— Presentation transcript:

Similar presentations

About project

Feedback