1 Modification of Correlation Kernels in SVM, KPCA and KCCA in Texture Classification Yo Horikawa Kagawa University, Japan
2 ・ Support vector machine (SVM) ・ Kernel principal component analysis (kPCA) ・ Kernel canonical correlation analysis (kCCA) with modified versions of correlation kernels → Invariant texture classification Compare the performance of the modified correlation kernels and the kernel methods.
3 Support vector machine (SVM) Sample data: x i (1 ≤ i ≤ n), belonging to Class c i ∊ {-1, 1} SVM learns a discriminant function for test data x: d(x) = sgn(∑ i=1 n’ α i c i k(x, x si ) + b) α i and b are obtained through the quadratic programming problem. Kernel function: Inner product of nonlinear maps φ(x): k(x i, x j ) = φ(x i ) ・ φ(x j ) Support vectors: x si (1 ≤ i ≤ n’ (≤ n)): a part of sample data Feature extraction process is implicitly done in SVM through the kernel function and support vectors.
4 Kernel principal component analysis (kPCA) Principal components for for the nonlinear map φ(x i ) are obtained through the eigenproblem: Φv =λv (Φ: Kernel matrix (Φ ij = φ(x i )∙φ(x j ) = k(x i, x j )) ) Let v r = (v r1, …, v rn ) T (1 ≤ r ≤ R ( ≤ n)) be the eigenvectors in the non-increasing order of the corresponding non-zero eigenvalues λ r, which are normalized as λ r v r ∙ T v r = 1. The rth principal component u r for a new data x is obtained by u r = ∑ i=1 n v ri φ(x i )∙φ(x) = ∑ i=1 n v ri k(x i, x) Classification methods, e.g., the nearest-neighbor method, can be applied in the principal component space (u 1, ∙∙∙, u R ).
5 Kernel canonical correlation analysis (kCCA) Pairs of feature vectors of sample objects: (x i, y i ) (1 ≤ i ≤ n) KCCA finds projections ( canonical variates ) (u, v) that yield maximum correlation between φ (x) and θ( y). (u, v) = (w φ ・ φ(x), w θ ・ θ(y)) w φ = ∑ i=1 n f i φ(x i ), w θ = ∑ i=1 n g i θ(y i ) where f T = (f 1, ∙∙∙, f n ) and g T = (g 1, ∙∙∙, g n ) are the eigenvectors of the generalized eigenvalue problem: Φ ij = φ(x i ) ・ φ(x j ) Θ ij = θ(y i ) ・ θ(y j ) I: Identity matrix of n×n
6 Application of KCCA for classification problems Use an indicator vector as the second feature vector y. y = (y 1, ∙∙∙, y nc ) corresponding to x: y c = 1 if x belongs to class c y c = 0 otherwise (n c : the number of classes) Mapping θ of y is not used. A total of n c -1 eigenvectors f r = ( f r1, …, f kn ) (1 ≤ k ≤ n c -1) corresponding to non-zero eigenvalues are obtained. Canonical variates u r (1 ≤ r ≤ n c -1) for a new object (x, ?) are calculated by u r = ∑ r=1 n f r φ(x r ) ・ φ(x) = ∑ r=1 n f r k(x r, x) Classification methods can be applied in the canonical variate space (u 1, …, u nc-1 ).
7 Correlation kernel The kth-order autocorrelation of data x i (t): r xi (t 1, t 2, ∙∙∙, t k-1 ) = ∫x i (t)x i (t+t 1 ) ・・・ x i (t+t k-1 )dt The inner product between r xi and r xj is calculated with the k-th power of the cross-correlation function (2nd-order): r xi ・ r xj =∫{cc xi, xj (t 1 )} k dt 1 (cc xi, xj (t 1 ) =∫x i (t)x j (t+t 1 )dt) The calculation of explicit values of the autocorrelations is avoided. → High-order autocorrelations are tractable with practical computational cost. ・ Linear correlation kernel: K(x i, x j ) = r xi ・ r xj ・ Gaussian correlation kernel: K(x i, x j ) = exp(-μ|r xi - r xj | 2 ) = exp(-μ(r xi ・ r xj + r xi ・ r xj - 2r xi ・ r xj ))
8 Calculation of correlation kernels r xi ・ r xj for 2- dimensional image data: x(l, m) (1≤ l ≤ L, 1≤ m ≤ M) ・ Calculate the cross-correlations between x i (l, m) and x j (l, m): cc xi, xj (l 1, m 1 ) = ∑ l=1 L-l1 ∑ m=1 M-m1 x i (l, m)x j (l+l 1, m+m 1 )/(LM) (1 ≤ l 1 ≤ L 1, 1 ≤ m 1 ≤ M 1 ) ・ Sum up the kth-power of the cross-correlations: r xi ・ r xj = ∑ l1=0 L1-1 ∑ m1=0 M1-1 {cc xi, xj (l 1, m 1 )} k /(L 1 M 1 ) L M M1M1 L1L1 x i (l, m) x j (l+l 1, m+m 1 ) ∑ l,m x i (l+m)x j (l+l 1, m+m 1 ) r xi ・ r xj = ∑ l1, m1 { ・ } k
9 Problem of correlation kernels The order k of correlation kernels increases. → The generalization ability and robustness are lost. r xi ・ r xj = ∑ t1 (cc xi, xj (t 1 )) k → δ i, j (k → ∞) For test data x (≠x i ), r xi ・ r x = 0 In kCCA, Φ= I, Θ: block matrix, eigenvectors: f = (p 1, …, p 1, p 2, …, p 2, …, p C, …, p C ) (f i = p c, if x i ∊ class c) For sample data, canonical variates lie on a line through the origin corresponding to its class: u xi = (r xi ・ r xi )p c (p c = (p c,1, ∙∙∙, p c,C-1 )), if x i ∊ class c For test data: u x ≈ 0
10 Fig. A. Scatter diagram of canonical variates (u1, u2) and (u3, u1) of Test 1 data of texture images in the Brodatz album in kCCA. Plotted are square ( ■ ) for D4, cross (×) for D84, circle ( ● ) for D5 and triangle (Δ) for D92. (a) linear kernel ( ⅰ )(b) Gaussian kernel ( ⅱ ) (c) 2nd-order correlation kernel ( ⅲ ) (d) 3rd-order correlation kernel ( ⅲ ) (e) 4th-order correlation kernel ( ⅲ )(f) 10th-order correlation kernel ( ⅲ ) Most of test data u ≈ 0
11 Modification of correlation kernels ・ The kth root of the kth-order correlation kernel in the limit of k→∞ is related to the max norm, which corresponds to the L p norm ||x|| p = {∑|x i | p } 1/p in the limit of p→∞. The max norm corresponds to the peak response of a matched filter, which maximizes SNR, and is then expected to have robustness. Then the correlation kernel can be modified with its kth root, taking account of its sign. ・ A difference between the even and odd-order correlations is that the odd-order autocorrelations are blind to sinusoidal signals and random signals with symmetric distributions. This is attributed to the fact that changes in the sign of the original data (x→-x) cause changes in the signs of the autocorrelations of odd-orders but not of even-orders. In the correlation kernel, it appears as the parity of the number of the power of the cross-correlations. Then the absolute values of the cross- correlations might be used instead.
12 Proposed modified autocorrelation kernels L p norm kernel (P) : sgn(cc xi, xj (l 1, m 1 ))|∑ l1,m1 {cc xi, xj (l 1, m 1 )} k | 1/k Absolute kernel (A) : ∑ l1, m1 |cc xi, xj (l 1, m 1 )| k Absolute L p norm kernel (AP): |∑ l1, m1 {cc xi, xj (l 1, m 1 )} k | 1/k Absolute L p norm absolute kernel (APA): |∑ l1, m1 |cc xi, xj (l 1, m 1 )| k | 1/k Max norm kernel (Max): max l1, m1 cc xi, xj (l 1, m 1 ) Max norm absolute kernel (MaxA): max l1, m1 |cc xi, xj (l 1, m 1 )|
13 Classification experiment Fig. 1. Texture images. Table 1. Sample and test sets. 4-class classification problems with SVM, kPCA and kCCA Original images: 512×512 pixels (256 gray scale) in the VisTex database and the Brodatz album Sample and test images: 50×50 pixels, chosen in the original images with random shift and scaling, rotation, Gaussian noise (100 images each)
14 Kernel functions K(x i, x j ) Linear kernel: x i ・ x j Gaussian kernel: exp(-μ||x i – x j || 2 ) Correlation kernels: r xi ・ r xj (C2-10) Modified correlation kernels: (P2-10, A3-7, AP3-7, APA3-7, Max, MaxA) Range of correlation lags: L 1 = M 1 = 10 (in 50×50 pixel images) The simple nearest-neighbor classifier is used for classification in the principal component space (u 1, ∙∙∙, u R ) in kPCA and with canonical variate space (u 1, …, u C-1 ) in kCCA. Parameter values are empirically chosen. (Soft margin: C = 100, Regularization:γ x =γ y = 0.1)
15 Fig. 2. Correct classification rates (CCR (%)) in SVM.
16 Fig. 3. Correct classification rates (CCR (%)) in kPCA.
17 Fig. 4. Correct classification rates (CCR (%)) in kCCA.
18 Comparison of the performance Correct classification rates (CCRs) of the correlation kernels (C2- 10) of odd- or higher-orders are low. With the modification, the L p norm kernels (P2-10) and the absolute kernels (A3-7) give high CCRs even for higher-orders and for odd-orders, respectively. Their combination (AP3-7, APA3-7), and the max norm kernels (Max, MaxA) also show good performance. Table 2. Highest correct classification rates.
19 Summary Modified versions of the correlation kernels are proposed. ・ Apply of the L p norm and max norm → The poor generalization of the higher-order correlation kernels is improved. ・ Use of the absolute values → The inferior performance of the correlation kernels of odd-orders to even-orders due to the blindness to sinusoidal or symmetrically distributed signals is also improved. SVMs, kPCA and kCCA with the modified correlation kernels show good performance in texture classification experiments.