Topic 7 Support Vector Machine for Classification
Outline Linear Maximal Margin Classifier for Linearly Separable Data Linear Soft Margin Classifier for Overlapping Classes The Nonlinear Classifier
Linear Maximal Margin Classifier for linearly Separable Data
Goal: seeking an optimal separating plane. – That is, among all the hyperplanes that minimizes the training error (empirical risk), find the one with the largest margin. A classifier with a larger margin might have better performance in generalization; on the other hand, a classifier with a smaller margin might have a higher expected risk. Linear Maximal Margin Classifier for linearly Separable Data
Canonical hyperplane 1. Minimize the training error
maximize margin → minimize w T w 2. Maximize the margin
Linear Maximal Margin Classifier for linearly Separable Data
Rosenblatt ’ s Algorithm
Pattern= Target= norm = [1 1 1 [ ] ] -> R K=[ ]
1st iteration α=[ ], b=0, R= x1=[1 1]; y1=1; k(:,1)=[ ] X2=[1 2]; y2=1; k(:,2)=[ ] X3=[2 -1];y3=1;k(:,3)=[ ] 1*[1*1+8]=9>0 X4=[2 0];y4=1; k(:,4)=[ ] 1*[1*2+8]=10>0
1st iteration X5=[-1 2];y5=-1;k(:,5)=[ ] (-1)*[1*1+8]= -9 α=[ ], b=8-8=0 X6=[-2 1];y6=-1;k(:,6)=[ ] (-1)*[1*(-1)+(-1)*4+0]= 5>0 X7=[-1 -1];y7=-1;k(:,7)=[ ] (-1)*[1*(-2)+(-1)*(-1)+0]=1>0 X8=[-2 -2];y8=-1;k(:,8)=[ ] (-1)*[1*(-4)+(-1)*(-2)+0]=2>0
2ed iteration α=[ ], b=0, R= x1=[1 1]; y1=1; k(:,1)=[ ] 1*[1*2+(-1)*1+0]=1>0 X2=[1 2]; y2=1; k(:,2)=[ ] 1*[1*3+(-1)*3+0]=0 α=[ ], b=0+8=8 X3=[2 -1];y3=1;k(:,3)=[ ] 1*[1*1+1*0+(-1)*(-4)+8]=14>0 X4=[2 0];y4=1; k(:,4)=[ ] 1*[1*2+1*2+(-1)*(-2)+8]=14>0
2ed iteration X5=[-1 2];y5=-1;k(:,5)=[ ] (-1)*[1*1+1*3+(-1)*5+8]= -7 α=[ ], b=8-8=0 X6=[-2 1];y6=-1;k(:,6)=[ ] (-1)*[1*(-1)+1*0+(-2)*4+0]=9>0 X7=[-1 -1];y7=-1;k(:,7)=[ ] (-1)*[1*(-2)+1*(-3)+(-2)*(-1)+0]=3>0 X8=[-2 -2];y8=-1;k(:,8)=[ ] (-1)*[1*(-4)+1*(-6)+(-2)*(-2)+0]=6>0
3rd iteration α=[ ], b=0, R= x1=[1 1]; y1=1; k(:,1)=[ ] 1*[1*2+1*3+(-2)*1+0]=3>0 X2=[1 2]; y2=1; k(:,2)=[ ] 1*[1*3+1*(5)+(-2)*3+0]=2>0 X3=[2 -1];y3=1;k(:,3)=[ ] 1*[1*1+1*0+(-2)*(-4)+0]=9>0 X4=[2 0];y4=1; k(:,4)=[ ] 1*[1*2+1*2+(-2)*(-2)+0]=8>0 X5=[-1 2];y5=-1;k(:,5)=[ ] (-1)*[1*1+1*3+(-2)*5+0]= 6>0 X6=[-2 1];y6=-1;k(:,6)=[ ] (-1)*[1*(-1)+1*0+(-2)*4+0]=9>0 X7=[-1 -1];y7=-1;k(:,7)=[ ] (-1)*[1*(-2)+1*(-3)+(-2)*(-1)+0]=3>0 X8=[-2 -2];y8=-1;k(:,8)=[ ] (-1)*[1*(-4)+1*(-6)+(-2)*(-2)+0]=6>0
f(x)=sum(z.*y.*k(x,x)')+b=1*(1*x1+1*x2)+1*(1*x1+2*x2)+2*( -1*x1+2*x2)+0=7x2
Linear Maximal Margin Classifier for linearly Separable Data
Linear Soft Margin Classifier for Overlapping Classes Soft margin
2-parameter Sequential Minimal Optimization Algorithm At every step, SMO chooses two Lagrange multiplier to jointly optimize, finds the optimal values for these multipliers, and updates the SVM to reflect the new optimal values. Heuristic to choose which multipliers to optimize – first multiplier is the multiplier of the pattern with the largest current prediction error – Second multiplier is the multiplier of the pattern with the smallest current prediction error
Step 1. Choose 2 multiplier2 α 1 and α 2 Step 2. Define bounds for α2 If y1≠y2, If y1=y2, Step 3. Update α2 Step 4. Update α 1
K=[ ] Pattern= [1 1; 1 2; 1 0; 2 -1; 2 0; -1 2; -2 1; 0 0; -1 -1; -2 -2] Target= [ 1; 1; -1; 1; 1; -1; -1; 1; -1; -1 ] C=0.8
1st iteration F(x)-Y=[ ] ’ α=[ ]‘ b=1-(0.8*1*2+0.8*1*1+0.3*(-1)*1+0.8*(-1)*(-1)) =1-2.9= -1.9 f(x)=sum(z.*y.*k(x,x)')+b=0.8*(1*x1+1*x2)+0.8*(2*x1- 1*x2)+(-1)*(-0.3*x1+0.6*x2)+(-1)*(-1.6*x1+0.8*x2)-1.9 =4.3x1+1.4x2-1.9 e1 e2
U=0, V=0.8 η=k(4,4)+k(7,7)-2*k(4,7)=5+5-2*(-5)=20 α2_new=0.8+((-1)*(7.1-(-10.9))./20)= -0.1 α2_new,clipped=0 α1_new=0 α=[ ]‘ b=1-(0.8*1*2+0.3*(-1)*1)=1-1.3= -0.3 f(x)=0.8*(1*x1+1*x2)+ (-1)*(-0.3*x1+0.6*x2)-0.3 =1.1x1+0.2x2-0.3
2ed iteration F(x)-Y=[ ] ’ α=[ ] ‘ U=0, V=0 η=k(3,3)+k(10,10)-2*k(3,10)=1+8-2*(-2)=13 α2_new=0+((-1)*(1.8-(-1.9))./13)=0.28 α2_new,clipped=0 α1_new=0 α=[ ]‘ e1 e2
Trained by Rosenblatt ’ s Algorithm
Let α1*y1+α2*y2=R Case 1: y1=1, y2=1 (α1>=0, α2>=0, α1+α2=R>=0) α1α1 α2α2 R=0 R=C R=2C C C If C<R<2C, If 0 <R<=C,
Case 2: y1=-1, y2=1 (R=-α1+α2) α1α1 α2α2 R=-C R=0 R=C C C If -C<R<0, If 0= <R<C,
Case 3: y1=-1, y2=-1 (-α1-α2=R<=0) α1α1 α2α2 R=0 R=-C R=-2C C C If -2C<R<-C, If -C <=R<0,
Case 2: y1=1, y2=-1 (R=α1-α2) α1α1 α2α2 R=C R=0 R=-C C C If 0<=R<C, If -C <R<0,
The Nonlinear Classifier