Download presentation
Presentation is loading. Please wait.
1
This whole paper is about...
1) Objective: we want to minimize the number of misclassified examples on the training data (Minimum Classification Error objective: MCE) 2) Write down the objective as a differentiable function 3) Perform gradient descent on it
2
Some notation * means transpose
gi(...) := discriminant for class i, i is one of M classes x := one of N observations [wi ,w0i,] = lambda := trainable parameters for ith discriminant y=[x 1] Classification rule:
3
Other objectives besides MCE
Notation assumes two classes (M=2) but argument is true for M>2 Such as Minimum squared error Perceptron Selective Squared Distance Some others (e.g. SVM, large-margin perceptron) , bi >0
4
Other objectives besides MCE
Notation assumes two classes (M=2) but argument is true for M>2 Such as Minimum squared error Perceptron Selective Squared Distance Some others (e.g. SVM, large-margin perceptron) Linearly seperable case , bi >0 Converges to MCE (Ho- Kashyap procedure) Converges to MCE Converges to MCE Converges to MCE
5
Other objectives besides MCE
Notation assumes two classes (M=2) but argument is true for M>2 Such as Minimum squared error Perceptron Selective Squared Distance Some others (e.g. SVM, large-margin perceptron) Non-Linearly seperable case , bi >0 Converge but not to MCE in general Does not converge Does not converge Converge but not to MCE in general
6
Paper's key contribution
Encode the classification instructions in a differentiable function such that =0 when x is classified correctly, and 1 when x is classified incorrectly (The example 0-1 loss function)
7
Paper's key contribution
Encode the classification instructions in a differentiable function such that =0 when x is classified correctly, and 1 when x is classified incorrectly (The example 0-1 loss function) Then we can optimize by gradient descent to directly minimize and (with a small extra step) the MCE on the training dataset
8
Paper's key contribution
Encode the classification instructions in three steps Define a differentiable misclassification measure for class k, on example x Convert the misclassification measure into 0 when correct, and 1 when misclassified combine the misclassification measures from step 2 into a single 0-1 loss function
9
From example 0-1 loss to MCE on the training set
Emperical average cost: i
10
Application: MCE multi-layer Perceptron
M outputs (classes) K inputs Traditional objective: minimize Instead minimize (from previous slide)
11
Application: MCE multi-layer Perceptron
Traditional objective: minimize Instead minimize (from previous slide) non-linearity on the output nodes is removed error back-propagation on internal nodes remains exactly the same Results? Crazy good on Iris data (3 classes):
12
More results: MCE also beats Perceptron and Min-Squared- Error on Iris task And on 2-class problem with each class generated from a mixture of 2 gaussians Also beats improved variants of dynamic time warping on isolated word classification task (10- word vocab: b,c,d,e,g,p,t,v,z)
13
Future work idea Hack quicknet to do MCE training of MLPs for tandem models.
14
The end
15
Min Squared Error
16
Discriminative Learning for Minimum Error Classification
Biing-Hwang Juang, Shigeru Katagiri Presented by Arthur Kantor
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.