An Automatic Method for Selecting the Parameter of the RBF Kernel Function to Support Vector Machines Cheng-Hsuan Li 1,2 Chin-Teng Lin 1 Bor-Chen Kuo 2 Hui-Shan Chu 2 1 Institute of Electrical Control Engineering, National Chiao Tung University, Hsinchu, Taiwan, R.O.C. 2 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung, Taiwan, R.O.C.
Motivation There are a lot of papers on TGARS which use the k-fold cross- validation to find the proper parameter of the RBF kernel function. B. Waske, S. van der Linden,J. A. Benediktsson, A. Rabe, and P. Hostert, “Sensitivity of Support Vector machines to random feature selection in classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 7, pp , July J. Chen, C. Wang, and R. Wang, “Using stacked generalization to combine SVMs in magnitude and shape feature spaces for classification of hyperspectral data,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 7, pp , July L. Bruzzone, and M. Marconcini, “Toward the automatic updating of land-cover maps by a domain-adaptation SVM classifier and a circular validation strategy,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 4, pp , April T.V. Bandos, L. Bruzzone, and G. Camps-Valls, “Classification of hyperspectral images with regularized linear discriminant analysis,” IEEE Trans. Geosci. Remote Sens., vol. 47, no. 3, pp , March G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J.L. Rojo-Alvarez, and M. Martinez-Ramon, “Kernel-based framework for multitemporal and multisource remote sensing data classification and change detection,” IEEE Trans. Geosci. Remote Sens., vol. 46, no. 6, pp , June etc. Nevertheless, it is time consuming.
Kernel Method The samples in the same class should be mapped into the same area. The samples in the different classes should be mapped into the different areas.
The Properties of the RBF Kernel In the feature space determined by the RBF kernel, the norm of every sample is one and positive. Hence, the samples will be mapped onto the surface of a hypersphere.
The Properties of the RBF Kernel Hence, the cosine values, i.e., the values of the RBF kernel function, indicate the similarities between samples.
The Properties of the RBF Kernel If the cosine values are close to 1, then these samples are more similar in the feature space. If the cosine values are close to 0, then these samples are more dissimilar in the feature space. They are more similar and the corresponding cosine value is close to 1. They are more dissimilar and the corresponding cosine value is close to 0.
The Ideal Distribution For example, there are three classes and are the -th samples in class,. We want to find a feature space such that the distribution of samples in this feature space is like the following figures.
The Ideal Distribution The values of the RBF kernel function should be close to 1 if the samples are in the same class. That is The values of the RBF kernel function would be close to 0 if the samples are in the different classes. That is
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
An Example of Varying the Parameter small large Vectors in the original space Vectors in the corresponding feature space
Proposed Criteria The mean of values applied by the RBF kernel function on the samples in the same class are close to 1, i.e.,
Proposed Criteria The mean of values applied by the RBF kernel function on the samples in different classes are close to 0, i.e.,
Proposed Criteria We want to find a parameter such that This means that The proposed criterion is defined as
VERSUS -RBF KERNEL Obtain from the Indian Pine Site dataset
VERSUS ACCURACIES-RBF KERNEL
Observations has a global minimum on. From the figure “ versus,” one can find that the values near the minimum value occur at in the rage (3500,4000). From the figure “ versus accuracies,” one can find that the values near the highest accuracy or kappa accuracy also occur at in the rage (3500,4500). The obtained by our proposed method is approximate to the which can achieve the highest accuracy. Therefore, the minimizer of is a good estimator of.
How to Find the Best and are differentiable with respect to. is also differentiable with respect to. The gradient descent method is used to find the optimizer.
Experiment Data sets IR Image Image (No. of bands) Indian Pine Site (dims=220) Washington, DC Mall (dims=191) # of class97 Category (No. of labeled data) Soybeans-min, Soybeans-notill Soybeans-clean, Corn-min Grass/Pasture, Grass/Tree Corn-notill, Hay-windrowed Woods Roof Road Path Grass Tree Water Shadow
Experiment Results Table 1. Overall and Kappa Accuracies in Indian Pine Site Dataset method CPU Time (sec) Overall Accuracy Overall Kappa Accuracy 20 CV OP CV OP CV OP : the number of samples in class i OP : our proposed method CV : 5-fold cross-validation on the set
Experiment Results Table 2. Overall and Kappa Accuracies in Washington, DC Mall method CPU Time (sec) Accuracy Kappa Accuracy 20 CV OP CV OP CV OP : the number of samples in class i OP : our proposed method CV : 5-fold cross-validation on the set
Conclusion In this paper, an automatic method for selecting the parameter of the RBF kernel was proposed. We have experimentally compared it to k-fold cross- validation. The experiment results of two hyperspectral images show that the cost of the proposed method is 9 times less. Furthermore, we will try to apply our proposed method to other kernel-based algorithms. In addition, we will develop similar procedures for other kernel functions.
Thanks for your attention Cheng-Hsuan Li Chin-Teng Lin Bor-Chen Kuo Hui-Shan Chu
Proposed Criteria Note that Hence,
Experimental Results Table 3. Overall and Kappa Accuracies in UCI datasets DatasetsMethod CPU Time (sec) Overall Accuracy Overall Kappa Accuracy Ionosphere CV OP Monks CV OP Pima CV OP Liver CV OP WDBC CV OP Iris CV OP
How to Find the Best