Download presentation
Presentation is loading. Please wait.
Published byDoreen Eaton Modified over 9 years ago
1
Non-Bayes classifiers. Linear discriminants, neural networks.
2
Discriminant functions(1) Bayes classification rule: Instead might try to find a function: is called discriminant function. - decision surface
3
Discriminant functions (2) Class 1 Class 2 Class 1 Class 2 Decision surface is a hyperplane Linear discriminant function:
4
Linear discriminant – perceptron cost function Replace Thus now decision function is and decision surface is Perceptron cost function: where
5
Linear discriminant – perceptron cost function Perceptron cost function: Class 1 Class 2 Value of is proportional to the sum of distances of all misclassified samples to the decision surface. If discriminant function separates classes perfectly, then Otherwise, and we want to minimize it. is continuous and piecewise linear. So we might try to use gradient descent algorithm.
6
Linear discriminant – Perceptron algorithm Gradient descent: At points where is differentiable Thus Perceptron algorithm converges when classes are linearly separable with some conditions on
7
Sum of error squares estimation Want to find discriminant function whose output is similar to Let denote as desired output function, 1 for one class and –1 for the other. Use sum of error squares as similarity criterion:
8
Sum of error squares estimation Minimize mean square error: Thus
9
Neurons
10
Artificial neuron. f Above figure represent artificial neuron calculating:
11
Artificial neuron. Threshold functions f: 0 1 0 1 Step functionLogistic function
12
Combining artificial neurons Multilayer perceptron with 3 layers.
14
Discriminating ability of multilayer perceptron Since 3-layer perceptron can approximate any smooth function, it can approximate - optimal discriminant function of two classes.
15
Training of multilayer perceptron f f f Layer r-1 Layer r f f f
16
Training and cost function Desired network output: Trained network output: Cost function for one training sample: Total cost function: Goal of the training: find values of which minimize cost function.
17
Gradient descent Denote: Gradient descent: Since, we might want to update weights after processing each training sample separately:
18
Gradient descent Chain rule for differentiating composite functions: Denote:
19
Backpropagation If r=L, then If r<L, then
20
Backpropagation algorithm Initialization: initialize all weights with random values. Forward computations: for each training vector x(i) compute all Backward computations: for each i, j and r=L, L-1,…,2 compute Update weights:
21
MLP issues What is the best network configuration? How to choose proper learning parameter ? When training should be stopped? Choose another threshold function f or cost function J?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.