Presentation is loading. Please wait.

Presentation is loading. Please wait.

Support Vector Machines Classification

Similar presentations


Presentation on theme: "Support Vector Machines Classification"— Presentation transcript:

1 Support Vector Machines Classification

2 Fundamental Problems in Learning
Classification problems (Supervised learning) Test classifier on fresh data to evaluate success Classification results are objective Decision trees, neural networks, support vector machines, k-nearest neighbor, Naive Bayes, etc. Feature selection Too many features could degrade generalization performance, curse of dimensionality Occam’s razor: the simplest is the best

3 Bankruptcy Prediction
Binary classification of firms: solvent vs. bankrupt The data are financial indicators from middle-market capitalization firms in Benelux. From a total of 422 firms, 74 went bankrupt and 348 were solvent. The variables to be used in the model as explanatory inputs are 40 financial indicators such as: liquidity, profitability and solvency measurements. T. Van Gestel, B. Baesens, J. A. K. Suykens, M. Espinoza, D. Baestaens, J. Vanthienen and B. De Moor, “Bankruptcy Prediction with Least Squares Support Vector Machine Classifiers”, International Conference in Computational Intelligence and Financial Engineering, 2003.

4 Binary Classification Problem Linearly Separable Case
Which one is the best? A+ A- Solvent Bankrupt

5 {  Perceptron . i=0n wi xi g 1 if i=0n wi xi >0 o(xi) =
Linear threshold unit (LTU) x0=1 x1 w1 w0 w2 x2 . i=0n wi xi g wn xn 1 if i=0n wi xi >0 o(xi) = -1 otherwise {

6 Possibilities for function g
Sign function Step function Sigmoid (logistic) function sign(x) = +1, if x > 0 -1, if x  0 step(x) = 1, if x > threshold 0, if x  threshold (in picture above, threshold = 0) sigmoid(x) = 1/(1+e-x) Adding an extra input with activation x0 = 1 and weight wi, 0 = -T (called the bias weight) is equivalent to having a threshold at T. This way we can always assume a 0 threshold.

7 Using a Bias Weight to Standardize the Threshold
1 -T w1 x1 w2 x2 w1x1+ w2x2 < T w1x1+ w2x2 - T < 0

8 Perceptron Learning Rule
(x, t)=([2,1], -1) o =sgn( ) =1 x2 x2 w = [0.25 – ] x2 = 0.2 x1 – 0.5 o=-1 w = [0.2 –0.2 –0.2] (x, t)=([-1,-1], 1) o = sgn( ) =-1 x1 x1 (x, t)=([1,1], 1) o = sgn( ) = -1 -0.5x1+0.3x2+0.45>0  o = 1 w = [ ] w = [-0.2 –0.4 –0.2] x2 x2 x1 x1

9 The Perceptron Algorithm Rosenblatt, 1956
Given a linearly separable training set and learning rate and the initial weight vector, bias: and let

10 The Perceptron Algorithm (Primal Form)
Repeat: until no mistakes made within the for loop return: . What is ?

11

12 The Perceptron Algorithm
( STOP in Finite Steps ) Theorem (Novikoff) Let be a non-trivial training set, and let Suppose that there exists a vector and . Then the number of mistakes made by the on-line perceptron algorithm on is at most

13 Proof of Finite Termination
Proof: Let The algorithm starts with an augmented weight vector and updates it at each mistake. Let be the augmented weight vector prior to the th mistake. The th update is performed when where is the point incorrectly classified by

14 Update Rule of Perceotron
Similarly,

15 Update Rule of Perceotron

16 The Perceptron Algorithm (Dual Form)
Given a linearly separable training set and Repeat: until no mistakes made within the for loop return:

17 What We Got in the Dual Form Perceptron Algorithm?
The number of updates equals: implies that the training point has been misclassified in the training process at least once. implies that removing the training point will not affect the final results The training data only appear in the algorithm through the entries of the Gram matrix, which is defined below:


Download ppt "Support Vector Machines Classification"

Similar presentations


Ads by Google