Presentation is loading. Please wait.

Presentation is loading. Please wait.

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson

Similar presentations


Presentation on theme: "Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson"— Presentation transcript:

1 Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson jhasegaw@uiuc.edu University of Illinois at Urbana-Champaign, USA

2 Lecture 4: Hyperplanes, Perceptrons, and Kernel-Based Classifiers Definition: Hyperplane Classifier Minimum Classification Error Training Methods –Empirical risk –Differentiable estimates of the 0-1 loss function –Error backpropagation Kernel Methods –Nonparametric expression of a hyperplane –Mathematical properties of a dot product –Kernel-based classifier –The implied high-dimensional space –Error backpropagation for a kernel-based classifier Useful kernels –Polynomial kernel –RBF kernel

3 Classifier Terminology

4 Hyperplane Classifier Class Boundary (“Separatrix”): The plane w T x=b Normal Vector w x x x x x x x x x x x x x x x x x x Origin (x=0) Distance=b

5 Loss, Risk, and Empirical Risk

6 Empirical Risk with 0-1 Loss Function = Error Rate on Training Data

7 Differentiable Approximations of the 0-1 Loss Function: Hinge Loss

8

9 Differentiable Empirical Risks

10 Error Backpropagation: Hyperplane Classifier with Sigmoidal Loss

11 Sigmoidal Classifier = Hyperplane Classifier with Fuzzy Boundaries x x x x x x x x x x x x x x x x More Red Less Red Less Blue More Blue

12 Error Backpropagation: Sigmoidal Classifier with Absolute Loss

13 Sigmoidal Classifier: Signal Flow Diagram x1x1 x2x2 x3x3 + Hypothesis h(x) Input x Sigmoid input g(x) Connection weights w w3w3 w2w2 w1w1

14 Multilayer Perceptron +++ x1x1 x2x2 x3x3 + Hypothesis h 2 (x) Input h 0 (x)≡x Sigmoid inputs g 1 (x) Sigmoid outputs h 1 (x) w 133 w 123 w 113 Connection weights w 1 Sigmoid inputs g 2 (x) Connection weights w 1 w 313 w 312 w 311 b 11 b 12 b 13 b 21

15 Multilayer Perceptron: Classification Equations

16 Error Backpropagation for a Multilayer Perceptron

17 Classification Power of a One- Layer Perceptron

18 Classification Power of a Two- Layer Perceptron

19 Classification Power of a Three- Layer Perceptron

20 Output of Multilayer Perceptron is an Approximation of Posterior Probability

21 Kernel-Based Classifiers

22 Representation of Hyperplane in terms of Arbitrary Vectors

23 Kernel-based Classifier

24 Error Backpropagation for a Kernel- Based Classifier

25 The Implied High-Dimensional Space

26 Some Useful Kernels

27 Polynomial Kernel

28 Polynomial Kernel: Separatrix (Boundary Between Two Classes) is a Polynomial Surface

29 Classification Boundaries Available from a Polynomial Kernel (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004)

30 Implied Higher-Dimensional Space has a Dimension of K d

31 The Radial Basis Function (RBF) Kernel

32 RBF Classifier Can Represent Any Classifier Boundary

33 RBF Classifier Can Represent Any Classifier Boundary (Hastie, Rosset, Tibshirani, and Zhu, NIPS 2004) In these figures, C was adjusted, not , but a similar effect can be achieved by setting N<<M and adjusting . - More training corpus errors - Smoother boundary - Fewer training corpus errors - Wigglier boundary

34 If N<M, Gamma can Adjust Boundary Smoothness

35 Summary Classifier definitions –Classifier = a function from x into y –Loss = the cost of a mistake –Risk = the expected loss –Empirical Risk = the average loss on training data Multilayer Perceptrons –Sigmoidal classifier is similar to hyperplane classifier with sigmoidal loss function –Train using error backpropagation –With two hidden layers, can model any boundary (MLP is a “universal approximator”) –MLP output is an estimate of p(y|x) Kernel Classifiers –Equivalent to: (1) project into  (x), (2) apply hyperplane classifier –Polynomial kernel: separatrix is polynomial surface of order d –RBF kernel: separatrix can be any surface (RBF is also a “universal approximator”) –RBF kernel: if N<M,  can adjust the “wiggliness” of the separatrix


Download ppt "Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson"

Similar presentations


Ads by Google