CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS
Generative vs. Discriminative
The Perceptron Model
Example: Spam
Binary Decision Rule
Online Perceptron Training
Perceptron Training Illustration
Properties of Perceptrons
Issues with Perceptrons
Reasoning over Time Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Need to introduce time into our models Basic approach: hidden Markov models (HMMs) More general: dynamic Bayes’ nets
Markov Models
Conditional Independence
Weather Example
Mini-Forward Algorithm
Example
Stationary Distributions If we simulate the chain long enough: What happens? Uncertainty accumulates Eventually, we have no idea what the state is! Stationary distributions: For most chains, the distribution we end up in is independent of the initial distribution Called the stationary distribution of the chain Usually, can only predict a short time out
Example: Web Link Analysis
Mini-Viterbi Algorithm
Hidden Markov Models
Example
Conditional Independence
HMM Applications
Forward Algorithm
Viterbi Algorithm
Viterbi Example
Viterbi Properties Designed for computing the most likely state hidden sequence given a sequence of observations in Hidden Markov Models Two passes, forward to compute the forward probabilities, and then backward to reconstruct the maximum sequence What’s the time complexity? O(d2n) - Why is this exciting? There are many extensions to the basic Viterbi algorithm which have been developed for other models which have similar local structure: syntactic parsing, for instance.
Speech in an Hour
HMMs for Speech
HMMs for Continuous Obs.? Before: discrete, finite set of observations Now: spectral feature vectors are real-valued! Solution 1: discretization Solution 2: continuous emissions models Gaussians Multivariate Gaussians Mixtures of Multivariate Gaussians A state is progressively: Context independent subphone (~3 per phone) Context dependent phone (=triphones) State-tying of CD phone
ASR Lexicon: Markov Models
Viterbi with 2 Words + Unif. LM
Conclusion Perceptron A discriminative model, an alternative to generative models like Naïve Bayes Simple classification rule, based on a weight vector Simple online learning algorithm, guaranteed to converge if training set is separable Hidden Markov Models A special kind of Bayesian Network designed for reasoning about sequences of hidden states Polynomial time inference for most likely state sequence (Viterbi) and marginalization (Forward- Backward) Many applications