Unsupervised band selection for multispectral images using information theory J.M. Sotoca, F. Pla, A. C. Klaren Dpto. Lenguajes y Sistemas Informáticos. Universidad Jaume I. Spain
Multispectral images x y Can be specified as 3D space of two spatial coordinates + wavelength coordinate. Each band records a specific portion of the electromagnetic spectrum. Each band provides greater insight about composition of different image areas. High resolution imagery provides better distinguishing of features, but also requires greater data storage.
Fruit quality assessment The spectral information is of most importance to perform visual tasks. Fruit quality assessment: Defects of the skin. Amount of certain quality components like acids and sugar. Detection of chemical compounds. Fruit ripeness.
Aims in unsupervised learning in multispectral images Goal: Reduction of the features space in the data sets without requiring labeled data. In most real-life situations, we do not have an accurate knowledge about discriminant regions or classes that contain certain image bands. There are different approaches to solve the problem of feature selection via unsupervised learning: Find a subset of features that best covers “natural” groupings (clusters) using clustering techniques: EM clustering, k-means, … Selecting the image bands that provide the highest amplitude discrimination (image contrast). Search for redundant information that explains high-order correlations. - The contrast can be evaluate assuming that each class is defined as a homogeneous region.
Probability density distribution Let us consider an ensemble of image bands A1,…,An with different events of joint grey level with co-occurrence x = h(a1,…,an) that appear in the ensemble. We define the joint probability distribution of the different events as where the normalizing factor, M.N (M columns and N rows), is the number of pixels of the image. Then, the joint entropy H(A1,…,An) can be expressed as
Mutual Information Mutual information H(A:B) is a measure of dependence between two random variables : For the case of three images: One general expression is defined as: where the sum H(Ai1,…,Aik) runs over all possible combinations {í1,…,ik} {1,…,n}. Notice that H(A1:…:An) is symmetric under any permutation of A1,…,An.
Connections of Information in three images Relationships among entropies by entropy Venn diagrams. The ternary entropy diagrams with A, B, C has the following entries: H(A|B,C), H(B|A,C), H(C|A,B) are the conditional entropies. H(A:B|C), H(B:C|A), H(A:C|B) are the conditional information. H(A:B:C) is the mutual information. Example, define H(A:B|C) and H(A|B,C) as: Entropy Venn diagrams for spectral images with wavelength: 450, 540 and 580 nanometers.
Estimating discriminant information A open question is how to define the dependent information among a subset of image bands. In the band selection problem, we look for a subset of bands that: Contains as much information as possible with respect to the whole multispectral image. The information they represent has to be as much as discriminant as possible. In image registration, the aim is at maximizing the dependent information to establish the correspondence between multimodal images. Band selection in multispectral images supposes the opposite problem.
Estimating discriminant information One way could be measuring the mutual information among the image bands. Problems: The computational complexity depend on the estimation of the joint entropy. The number of joint entropies of the different combination of subsets of features grow exponentially with the dimension. It is not clear the relation between negative values of the mutual information and conditional information for an ensemble of features.
Dependent Information Regions A classical measure of the dependent information can be defined as: The measure of dependence corresponds to the shaded area ( the darker area is counted three times)
Dependent Information Regions A possible alternative is measuring the region of dependent information as, subtracting the independent information from the total information contained in a set of image bands. where Ai2,…,Ain are the complementary variables of Ai1. This criterion will be called hereafter the Minimization of the dependent Information (MDI)
Sum of Conditional Entropies vs. Joint Entropy Behaviour of the sum of Conditional Entropies vs. the Joint entropy with respect to the number of features. The features are added minimizing the MDI criterion through a sequential forward scheme (SFS).
Hyperspectral data scratch trip insect overripe The spectral range extended from 400 to 720 nanometers in the visible, with a spectral resolution of 10 nanometers for each band. Image database with nineteen multispectral orange images with four different types of defects on their surface. For experimental comparison, a set of 21684 pixels of different regions of the oranges were labelled into five classes (One for the healthy orange skin and four classes with different typologies of defects).
Supervised evaluation criterion ReliefF algorithm (Kononenko, 1994): Search for features weights using an iterative approach, optimizing a criterion function. In each iteration, randomly select a hyperspectral data x from the data set, calculating the nearest neighbour of the same class phit and the different class pmis. ReliefF is a extension of multiclass data sets.
Performance between MDI and ReliefF Accuracy of Nearest Neighbour Rule (NN) using the selected bands obtained with ReliefF and MDI.
New experiments between MDI and ReliefF New labelled data set with 135540 pixels ( 94875 pixels for training and 40665 pixels for test) Accuracy of Nearest Neighbour Rule (NN) using the selected bands obtained with ReliefF, MDI[training], MDI[1 orange] and MDI[9 oranges].
Image pixels labelling image using NN rule 1 band 4 bands 12 bands 33 bands Labelling from the classification of a multispectral image with NN rule for different number of image bands chosen with MDI[9 oranges].
Concluding Remarks Different properties can be measured from multispectral images to know about their relationship and information content. This work proposes the MDI criterion to estimate the dependent information among image bands. This criterion looks for sets of spectral bands with minimum interdependence and high information content. The principal advantage of this technique is its unsupervised nature, providing a good performance with respect to classical filter feature selection approaches like ReliefF.