Libsvm-2.6使用介绍 quietsea@bbs.hit.edu.cn
Libsvm-2.6特点 Support multi-class classification Different SVM formulation Cross-validation for model selection Probability estimate Weighted SVM for unbalanced data Both C++ and Java sources Version 2.8 released on April fool’s day,2005
Libsvm-2.6程序结构 Kernel 类 Solver类:Generalized SMO和SVMLight algorithm 解二次规划问题 采用one-against-one 解决多类分类
Format of training and testing data file <label> <index1>:<value1> <index2>:<value2> ... +1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 -1 1:0.583333 2:-1 3:0.333333 4:-0.603774 5:1 6:-1 7:1 +1 1:0.166667 2:1 3:-1 4:-0.433962 5:-0.383562 6:-1 7:-1 -1 1:0.458333 2:1 3:1 4:-0.358491 5:-0.374429 6:-1 7:-1
Data scaling svmscale –l -1 –u 1 –s range train.1>train.1.scale Avoid attributes in greater numeric ranges dominate those in smaller number ranges. Usually scale each attribute to [0,1] or[-1,+1]. svmscale –l -1 –u 1 –s range train.1>train.1.scale svmscale –r range test.1>test.1.scale
Svmtrain One-class:Here a hyperplane is placed such that it separates the dataset from the origin with maximal margin. The regularization parameter nu(0,1), is a user defined parameter indicating the fraction of the data that should be accepted by the description. nu-SVR: nu回归机。引入能够自动计算epsilon的参数nu。若记错误样本的个数为q ,则nu大于等于q/l,即nu是错误样本的个数所占总样本数的份额的上界;若记支持向量的个数为p,则nu小于等于p/l,即nu是支持向量的个数所占总样本数的份额的下界。首先选择参数nu和C,然后求解最优化问题。 Shrinking: 优化求解过程中是否采用shrinking. 边界支持向量BSVs(ai=C的SV)在迭代过程中ai不会变化,如果找到这些点,并把它们固定为C,可以减少QP的规模。 Probability estimate: 是否训练SVC和SVR获得概率输出 -wi 不平衡样本的加权参数
Output of training C-SVM optimization finished, #iter = 219 nu = 0.431030 :nu-SVM is a somewhat equivalent form of C-SVM where C is replaced by nu. obj = -100.877286:optimal objective value of the dual problme. rho = 0.424632 :bias term of the decision function. nSV = 132, nBSV = 107: number of the bounded support vectors Total nSV = 132
Model file svm_type c_svc kernel_type rbf gamma 0.0769231 nr_class 2:number of classes. For regression and one-class model, this number is 2. total_sv 132 rho 0.424632 label 1 -1 nr_sv 64 68: number of support vector for each class. SV
Two tools for Model Selection Easy.py: does everything automatically-from data scaling to parameter selection Grid.py: uses grid search to find the best model parameters Grid.py的输出文件 -out: 搜索过程。每个参数取值及此时精度 -png: 搜索过程等高线图
Proposed procedure Transform data to the format of Libsvm. Conduct simple scaling on the data. Consider the RBF kernel. Using the cross-validate to find the best model parameters. Using the best parameters to train the whole training set. Test
Experiments Original sets with default parameters Accuracy=9.7561% Scaled sets with default parameters Accuracy=87.8049% Scaled sets with parameter selection Accuracy=95.123% Using an automatic script Accuracy=95.122%
Remark Recommend Python 2.3 Recommend Gnuplot version 3.7.3.Vesion 3.7.1 has a bug.
References A practical guide to support vector machines classification LIBSVM: a Library for Support Vector Machines FAQ and Readme in Libsvm-2.6 http://www.csie.ntu.edu.tw/~cjlin/