Download presentation
Presentation is loading. Please wait.
Published byJanis Rice Modified over 9 years ago
1
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA
2
Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers Definition: Hyperplane Classifier Minimum Classification Error Training Methods –Empirical risk –Differentiable estimates of the 0-1 loss function –Error backpropagation Kernel Methods –Nonparametric expression of a hyperplane –Mathematical properties of a dot product –Kernel-based classifier –The implied high-dimensional space –Error backpropagation for a kernel-based classifier Useful kernels –Polynomial kernel –RBF kernel
3
Classifier Terminology
4
Hyperplane Classifier Class Boundary (“Separatrix”): The plane w T x=b Normal Vector w x x x x x x x x x x x x x x x x x x Origin (x=0) Distance=b
5
Loss, Risk, and Empirical Risk
6
Empirical Risk with 0-1 Loss Function = Error Rate on Training Data
7
Differentiable Approximations of the 0-1 Loss Function: Hinge Loss
9
Differentiable Empirical Risks
10
Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss
11
Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries x x x x x x x x x x x x x x x x More Red Less Red Less Blue More Blue
12
Error Backpropagation: Sigmoidal Classifier with Absolute Loss
13
Sigmoidal Classifier: Signal Flow Diagram x1x1 x2x2 x3x3 + Hypothesis h(x) Input x Sigmoid input g(x) Connection weights w w3w3 w2w2 w1w1
14
Multilayer Perceptron +++ x1x1 x2x2 x3x3 + Hypothesis h 2 (x) Input h 0 (x)≡x Sigmoid inputs g 1 (x) Sigmoid outputs h 1 (x) w 133 w 123 w 113 Connection weights w 1 Sigmoid inputs g 2 (x) Connection weights w 1 w 313 w 312 w 311 b 11 b 12 b 13 b 21
15
Multilayer Perceptron: Classification Equations
16
Error Backpropagation for a Multilayer Perceptron
17
Classification Power of a One- Layer Perceptron
18
Classification Power of a Two- Layer Perceptron
19
Classification Power of a Three- Layer Perceptron
20
Output of Multilayer Perceptron is an Approximation of Posterior Probability
21
Kernel-Based Classifiers
22
Representation of Hyperplane in terms of Arbitrary Vectors
23
Kernel-based Classifier
24
Error Backpropagation for a Kernel- Based Classifier
25
The Implied High-Dimensional Space
26
Some Useful Kernels
27
Polynomial Kernel
28
Polynomial Kernel: Separatrix (Boundary Between Two Classes) is a Polynomial Surface
29
Classification Boundaries Available from a Polynomial Kernel (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)
30
Implied Higher-Dimensional Space has a Dimension of K d
31
The Radial Basis Function (RBF) Kernel
32
RBF Classifier Can Represent Any Classifier Boundary
33
RBF Classifier Can Represent Any Classifier Boundary (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004) In these figures, C was adjusted, not , but a similar effect can be achieved by setting N<<M and adjusting . - More training corpus errors - Smoother boundary - Fewer training corpus errors - Wigglier boundary
34
If N<M, Gamma can Adjust Boundary Smoothness
35
Summary Classifier definitions –Classifier = a function from x into y –Loss = the cost of a mistake –Risk = the expected loss –Empirical Risk = the average loss on training data Multilayer Perceptrons –Sigmoidal classifier is similar to hyperplane classifier with sigmoidal loss function –Train using error backpropagation –With two hidden layers, can model any boundary (MLP is a “universal approximator”) –MLP output is an estimate of p(y|x) Kernel Classifiers –Equivalent to: (1) project into (x), (2) apply hyperplane classifier –Polynomial kernel: separatrix is polynomial surface of order d –RBF kernel: separatrix can be any surface (RBF is also a “universal approximator”) –RBF kernel: if N<M, can adjust the “wiggliness” of the separatrix
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.