Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA

Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers Definition: Hyperplane Classifier Minimum Classification Error Training Methods –Empirical risk –Differentiable estimates of the 0-1 loss function –Error backpropagation Kernel Methods –Nonparametric expression of a hyperplane –Mathematical properties of a dot product –Kernel-based classifier –The implied high-dimensional space –Error backpropagation for a kernel-based classifier Useful kernels –Polynomial kernel –RBF kernel

Classifier Terminology

Hyperplane Classifier Class Boundary (“Separatrix”): The plane w T x=b Normal Vector w x x x x x x x x x x x x x x x x x x Origin (x=0) Distance=b

Loss, Risk, and Empirical Risk

Empirical Risk with 0-1 Loss Function = Error Rate on Training Data

Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

Differentiable Empirical Risks

Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss

Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries x x x x x x x x x x x x x x x x More Red Less Red Less Blue More Blue

Error Backpropagation: Sigmoidal Classifier with Absolute Loss

Sigmoidal Classifier: Signal Flow Diagram x1x1 x2x2 x3x3 + Hypothesis h(x) Input x Sigmoid input g(x) Connection weights w w3w3 w2w2 w1w1

Multilayer Perceptron +++ x1x1 x2x2 x3x3 + Hypothesis h 2 (x) Input h 0 (x)≡x Sigmoid inputs g 1 (x) Sigmoid outputs h 1 (x) w 133 w 123 w 113 Connection weights w 1 Sigmoid inputs g 2 (x) Connection weights w 1 w 313 w 312 w 311 b 11 b 12 b 13 b 21

Multilayer Perceptron: Classification Equations

Error Backpropagation for a Multilayer Perceptron

Classification Power of a One- Layer Perceptron

Classification Power of a Two- Layer Perceptron

Classification Power of a Three- Layer Perceptron

Output of Multilayer Perceptron is an Approximation of Posterior Probability

Kernel-Based Classifiers

Representation of Hyperplane in terms of Arbitrary Vectors

Kernel-based Classifier

Error Backpropagation for a Kernel- Based Classifier

The Implied High-Dimensional Space

Some Useful Kernels

Polynomial Kernel

Polynomial Kernel: Separatrix (Boundary Between Two Classes) is a Polynomial Surface

Classification Boundaries Available from a Polynomial Kernel (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

Implied Higher-Dimensional Space has a Dimension of K d

The Radial Basis Function (RBF) Kernel

RBF Classifier Can Represent Any Classifier Boundary

RBF Classifier Can Represent Any Classifier Boundary (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004) In these figures, C was adjusted, not , but a similar effect can be achieved by setting N<<M and adjusting . - More training corpus errors - Smoother boundary - Fewer training corpus errors - Wigglier boundary

If N<M, Gamma can Adjust Boundary Smoothness

Summary Classifier definitions –Classifier = a function from x into y –Loss = the cost of a mistake –Risk = the expected loss –Empirical Risk = the average loss on training data Multilayer Perceptrons –Sigmoidal classifier is similar to hyperplane classifier with sigmoidal loss function –Train using error backpropagation –With two hidden layers, can model any boundary (MLP is a “universal approximator”) –MLP output is an estimate of p(y|x) Kernel Classifiers –Equivalent to: (1) project into  (x), (2) apply hyperplane classifier –Polynomial kernel: separatrix is polynomial surface of order d –RBF kernel: separatrix can be any surface (RBF is also a “universal approximator”) –RBF kernel: if N<M,  can adjust the “wiggliness” of the separatrix

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson

Similar presentations

Presentation on theme: "Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson

Similar presentations

Presentation on theme: "Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson"— Presentation transcript:

Similar presentations

About project

Feedback