March 2006Alon Slapak 1 of 32 Feature extraction A practical approach Feature selection Data synthesis Separability Feature extraction Guidelines Cluster.

March 2006Alon Slapak 1 of 32 Feature extraction A practical approach Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography

March 2006Alon Slapak 2 of 32 Females Geometric structure of a cluster (152,51.5) Each of the red circles represents an object pattern. The coordinates of the circle are the features (e.g. height and weight). The ellipse which encircles the major part of the cluster represents the distribution of the patterns. The center of the ellipse  is the mean of the patterns distribution. The length of the principle axes of the ellipse is proportional to twice the square root of the eigenvalues of the covariance matrix of the patterns distribution ( 1, 2 ). 11 22 The principles axes of the ellipse coincide with the eigenvectors of the covariance matrix of the patterns distribution. Feature selection Separability Feature extraction Guidelines Cluster Example Bibliography Data synthesis

March 2006Alon Slapak 3 of 32 Typical dimension: Mean  The center of the ellipse  is the mean of the patterns distribution. It can be estimated by: Whilex k -is the k th pattern in the cluster, N-is the number of patterns in the cluster. The mean is sometimes referred to as the centroid of the cluster. Feature selection Separability Feature extraction Guidelines Example Bibliography Cluster Data synthesis

March 2006Alon Slapak 4 of 32 Typical dimension: Scatter matrix S Scatter matrix S of a cluster is defined as: 120130140150160170180190200210220 30 40 50 60 70 80 90 100 110 120 height [cm] weight [kg] Larger scatter Smaller scatter Whilex k -is the k th pattern in the cluster, N-is the number of patterns in the cluster,  -is the mean of all the patterns in the cluster The scatter matrix may be interpreted as a biased estimation of the covariance matrix of the cluster. Feature selection Separability Feature extraction Guidelines Example Bibliography Cluster Data synthesis

March 2006Alon Slapak 5 of 32 Data synthesis In order to test and debug pattern recognition algorithms, it is customary to use synthesized data. The synthesized data may be drawn from arbitrary distribution, but in most of the literature it is customary to assume a normal distribution. But, it is not infrequent to come across applications involving other patterns distributions Feature selection Separability Feature extraction Guidelines Cluster Example Bibliography Data synthesis

March 2006Alon Slapak 6 of 32 Example: Two clusters synthesis clear all N1 = 150; N2 = 150; E1 = [150 15; 120 20]; E2 = [100 10; 70 30]; M1 = [170,75]'; M2 = [160,50]'; [P1,A1] = eig(E1); [P2,A2] = eig(E2); y1=randn(2,N1); y2=randn(2,N2); for i=1:N1, x1(:,i) =P1*sqrt(A1)* y1(:,i)+M1; end; for i=1:N2, x2(:,i) =P2*sqrt(A2)* y2(:,i)+M2; end; figure; plot(x1(1,:),x1(2,:),'.',x2(1,:),x2(2,:),'or'); axis([120 220 30 120]); xlabel ('height [cm]'); ylabel ('weight [kg]'); Feature selection Separability Feature extraction Guidelines Cluster Example Bibliography Data synthesis

March 2006Alon Slapak 7 of 32 Exercise Try to synthesis two classes, which have a common centroid at (3,5). One class coincides with horizontal and is ~40 units in length and ~8 units in width. The other class is inclined at 45 o from horizontal and is ~50 units in length and ~4 units in width. -50-40-30-20-1001020304050 -50 -40 -30 -20 -10 0 10 20 30 40 50 Hint: Use E=PAP -1 to build the covariance matrix. Feature selection Separability Feature extraction Guidelines Cluster Example Bibliography Data synthesis

March 2006Alon Slapak 8 of 32 Separability Q: How does feature extraction method affect the pattern recognition performance? height [cm] 120130140150160170180190200210220 30 40 50 60 70 80 90 100 110 120 height [cm] weight [kg] Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography

March 2006Alon Slapak 9 of 32 Separability A: More than the classification algorithm. If the patterns achieved by the feature extraction method creates nonseparate clusters, no classification algorithm can do the job. Feature extraction method Classification algorithm Feature selection Data synthesis Feature extraction Guidelines Cluster Example Bibliography Separability

March 2006Alon Slapak 10 of 32 Separability In fact, the separability achieved by the feature extraction method is the upper limit of the pattern recognition performance. Feature selection Data synthesis Feature extraction Guidelines Cluster Example Bibliography Separability

March 2006Alon Slapak 11 of 32 Separability Q: How can one assess the separability achieved by the feature extraction method? Good? Bad?Hopeless? Feature selection Data synthesis Feature extraction Guidelines Cluster Example Bibliography Separability

March 2006Alon Slapak 12 of 32 Separability A: Several separability criterias are exist, most are involving scatters matrices 120130140150160170180190200210220 30 40 50 60 70 80 90 100 110 120 height [cm] weight [kg] Larger scatter Smaller scatter Feature selection Data synthesis Feature extraction Guidelines Cluster Example Bibliography Separability

March 2006Alon Slapak 13 of 32 Separability If we define the within-class scatter matrix as: Whilex ki -is the k th pattern in the i th cluster, N i -is the number of patterns in the i th cluster,  i -is the mean of all the patterns in the i th cluster C -is the number of the clusters P i -is the a priori probability of the i th cluster, and may be estimated by and the between-class scatter matrix as: While  -is the mean of all the patterns in all the clusters and the total-class scatter matrix as: Feature selection Data synthesis Feature extraction Guidelines Cluster Example Bibliography Separability

March 2006Alon Slapak 14 of 32 Separability function J We can find in the literature several functions which represent the separability of clusters by a scalar: Pay attention to the fact that J is a scalar. It is of utmost importance because scalar function is necessary for later optimization (we will try to maximize J of course) Feature selection Data synthesis Feature extraction Guidelines Cluster Example Bibliography Separability

March 2006Alon Slapak 15 of 32 Feature extraction guidelines Example: Features of a student : Number of eyes Hair color Wear glasses or not Hair length Show size Height Weight Useless. Includes no information on gender. Useless. Very poor correlation with gender. Effective, but is hard to measure. Effective and simple. Q: What will be regarded as a reasonable feature? Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography

March 2006Alon Slapak 16 of 32 Features extraction guidelines Correlation with the classification feature Easy to measure Maximizing the separability function “When we have two or more classes, feature extraction consist of choosing those features which are most effective for preserving class separability” ( Fukunaga p. 441 ) Feature selection Data synthesis Separability Cluster Example Bibliography Feature extraction Guidelines

March 2006Alon Slapak 17 of 32 Features extraction guidelines Q: How can one determine what feature follows the guidelines? A1: There is an extensive literature dealing with features to specific applications (e.g. symmetry for face recognition) A2: A widespread approach is to emulate human mechanism (e.g. treating all the pixels as features in face recognition) Feature selection Data synthesis Separability Cluster Example Bibliography Feature extraction Guidelines

March 2006Alon Slapak 18 of 32 Example - handwritten recognition The following 20 figures depict examples of handwritten digits. Since almost everyone can recognize each of the digits, it is reasonable to assume that treating all the pixels as features will assure a successful pattern recognition process. Feature selection Data synthesis Separability Cluster Example Bibliography Feature extraction Guidelines

March 2006Alon Slapak 19 of 32 Feature selection While dealing with 28X28 pixels handwriting digits, the feature space dimensions is 784. While dealing with face recognition, we may have about 200X200 pixels. It means that the feature space dimensions is 40000. While dealing with voice recognition, we may have about 1 [Sec] X16 [kHz]. It means that the feature space dimensions is 16000. Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography

March 2006Alon Slapak 20 of 32 Feature selection Because of: The complexity of most of the classification algorithms is O(n 2 ). Learning time “Course of dimensionality” Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Feature selection

March 2006Alon Slapak 21 of 32 Feature selection “Feature selection, also known as subset selection or variable selection, is a process commonly used in machine learning, wherein a subset of the features available from the data are selected for application of a learning algorithm. Feature selection is necessary either because it is computationally infeasible to use all available features, or because of problems of estimation when limited data samples (but a large number of features) are present. The latter problem is related to the so-called curse of dimensionality.” (W IKIPEDI A) Object High dimension Pattern Feature extraction Low dimension Pattern Feature Selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Feature selection

March 2006Alon Slapak 22 of 32 Feature selection Q: How can one identify the irrelevant and the redundant features? A: Several options: 1.Heuristics In most applications, the relevant information resides in the low frequency range (e.g. images), hence it is logical to reduce dimensionality by taking the first coefficients of the Fourier/DCT transform. 2. Optimization approach (KLT, FLD) We may select a subset of the features (or linear combinations of the features) which best contribute to the separability of the clusters. 3. Grouping By grouping features one can represent every set of features by a small set of features. Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Feature selection

March 2006Alon Slapak 23 of 32 Feature selection - Example The separability of these two class is pretty good. But, do we really need two features? 120130140150160170180190200210220 30 40 50 60 70 80 90 100 110 120 height [cm] weight [kg] Males Females Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Feature selection

March 2006Alon Slapak 24 of 32 Feature selection - Example No! The same separability can be achieved by projecting the patterns onto the blue axis i.e. only one-dimension feature-space is needed. Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Feature selection

March 2006Alon Slapak 25 of 32 Feature selection But how can we find this “blue axis” or the feature subspace on which we should project the patterns? Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Feature selection Please refer to: Lec7.pdf - PCA and the course of dimensionality Lec8.pdf in - Fisher Linear Discriminant http://www.csd.uwo.ca/faculty/olga/Courses//CS434a_541a//index.html Of Prof. Olga Veksler http://www.csd.uwo.ca/faculty/olga/

March 2006Alon Slapak 26 of 32 Example – Handwriting recognition Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography The following example describes the application of FLD to handwriting recognition. The characters were derived from the MNIST database (http://yann.lecun.com/exdb/mnist/).http://yann.lecun.com/exdb/mnist/ The example demonstrates the separability of two classes while reducing the dimensionality from 784 (28X28 pixels) to 2.

March 2006Alon Slapak 27 of 32 Example – Handwriting recognition clear all; load img0 load img1 N1=length(img0); N2=length(img1); D = 50; % Low pass filtering Dd = 2; % Desired pattern dimension %----------------------------------------------------- % Test set synthesis %----------------------------------------------------- for i=1:N1, x = reshape(squeeze(img0(i,:,:)), 28*28,1); X=dct(x); l1(:,i) = X(1:D); end; for i=1:N2, x = reshape(squeeze(img1(i,:,:)), 28*28,1); X=dct(x); l2(:,i) = X(1:D); end; Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography The img0 and img1 files contain N 28X28 arrays of grayscale images of the handwriting characters. Reshaping the 28X28 arrays into a 784X1 features vector (pattern) DCT transform is recommended while dealing with mostly black&white images. The numerousity of identical numbers (0 and 255) damages the FLD computations, and DCT transform may alleviate this phenomena. Taking only the first 50 DCT coefficients decrease dimensionality (most of the energy resides in the low frequency range, but it is not essential for this example.

March 2006Alon Slapak 28 of 32 Example – Handwriting recognition %-------------------------------------------------------------------------- % Compute Sb amd Sw %-------------------------------------------------------------------------- Mu1 = mean(l1')'; % Mean of cluster 1 Mu2 = mean(l2')'; % Mean of cluster 2 Mu = (Mu1*N1 + Mu2*N2)/(N1+N2); % Total mean of all pattens Sw = zeros(D); for i=1:N1, Sw = Sw + (l1(:,i)-Mu1)*(l1(:,i)-Mu1)'; end; for i=1:N2, Sw = Sw + (l2(:,i)-Mu2)*(l2(:,i)-Mu2)'; end; Sb = N1*(Mu1-Mu)*(Mu1-Mu)' + N2*(Mu2-Mu)*(Mu2-Mu)'; Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Compute the between scatter matrix. Compute the within scatter matrices Compute the mean  of each cluster, and the mean  of all the patterns.

March 2006Alon Slapak 29 of 32 Example – Handwriting recognition %-------------------------------------------------------------------------- % Compute V % [W,D] = EIG(A,B) produces a diagonal matrix D of generalized % eigenvalues and a full matrix W whose columns are the % corresponding eigenvectors so that A*W = B*W*D. %-------------------------------------------------------------------------- [W,D]=eig(Sb,Sw); % Solve the generalized eigenvalue problem Lambda = diag(D); % Sort the eigenvalues. [Lam,p]=sort(Lambda); V=[]; % Build V form the eigen vectors for i=1:Dd, % which are corresponding to the V= [V W(:,p(end+1-i))]; % biggest eigenvalues end; Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Solve the generalized eigenvalues problem Construct the projection matrix V such that its column vectors are the eigenvectors corresponding to the largest eigenvalues.

March 2006Alon Slapak 30 of 32 Example – Handwriting recognition %-------------------------------------------------------------------------- % Project the initial patterns onto the reduced space %-------------------------------------------------------------------------- for i = 1:N1, r1(:,i) = V' * l1(:,i); end; for i = 1:N2, r2(:,i) = V' * l2(:,i); end; figure; plot(r1(1,:),r1(2,:),'.',r2(1,:),r2(2,:),'or'); xlabel ('feature1'); ylabel ('feature2'); Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography Project the initial patterns onto the reduced space. Plot 2-dim clusters

March 2006Alon Slapak 31 of 32 Example – Handwriting recognition The figure below depicts the 2-dim clusters. It is easy to see that the separability of the clusters is perfect even in one dimension, which is not surprising because there are only two clusters. Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography

March 2006Alon Slapak 32 of 32 Bibliography Prof. Olga Veksler pattern recognition course: http://www.csd.uwo.ca/faculty/olga/Courses//CS 434a_541a//index.html http://www.csd.uwo.ca/faculty/olga/Courses//CS 434a_541a//index.html Handwriting database: http://yann.lecun.com/exdb/mnist/ http://yann.lecun.com/exdb/mnist/ Feature selection Data synthesis Separability Feature extraction Guidelines Cluster Example Bibliography

March 2006Alon Slapak 1 of 32 Feature extraction A practical approach Feature selection Data synthesis Separability Feature extraction Guidelines Cluster.

Similar presentations

Presentation on theme: "March 2006Alon Slapak 1 of 32 Feature extraction A practical approach Feature selection Data synthesis Separability Feature extraction Guidelines Cluster."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

March 2006Alon Slapak 1 of 32 Feature extraction A practical approach Feature selection Data synthesis Separability Feature extraction Guidelines Cluster.

Similar presentations

Presentation on theme: "March 2006Alon Slapak 1 of 32 Feature extraction A practical approach Feature selection Data synthesis Separability Feature extraction Guidelines Cluster."— Presentation transcript:

Similar presentations

About project

Feedback