CHAPTER 15 SECTION 3 – 4 Hidden Markov Models
Terminology
It get’s big!
Conditional independence
P(Toothache, Cavity, Catch) If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: P(+catch | +toothache, +cavity) = P(+catch | +cavity) The same independence holds if I don’t have a cavity: P(+catch | +toothache, -cavity) = P(+catch| -cavity) Catch is conditionally independent of Toothache given Cavity: P(Catch | Toothache, Cavity) = P(Catch | Cavity) Equivalent statements: P(Toothache | Catch, Cavity) = P(Toothache | Cavity) P(Toothache, Catch | Cavity) = P(Toothache | Cavity) P(Catch | Cavity) One can be derived from the other easily
Probability Recap
Reasoning over Time or Space Often, we want to reason about a sequence of observations Speech recognition Robot localization User attention Medical monitoring Need to introduce time (or space) into our models
Markov Models Recap
Example: Markov Chain
Mini-Forward Algorithm
Example Run of Mini-Forward Algorithm From initial observations of sun: From initial observations of rain:
Example Run of Mini-Forward Algorithm From yet another initial distribution P(X 1 ):
Hidden Markov Models Markov chains not so useful for most agents Eventually you don’t know anything anymore Need observations to update your beliefs Hidden Markov models (HMMs) Underlying Markov chain over states S You observe outputs (effects) at each time step As a Bayes’ net:
Example
Hidden Markov Models
HMM Computations Given parameters evidence E 1:n =e 1:n Inference problems include: Filtering, find P(X t |e 1:t ) for all t Smoothing, find P(X t |e 1:n ) for all t Most probable explanation, find x* 1:n = argmax x1:n P(x 1:n |e 1:n )
Real HMM Examples Speech recognition HMMs: Observations are acoustic signals (continuous valued) States are specific positions in specific words (so, tens of thousands)
Real HMM Examples Machine translation HMMs: Observations are words (tens of thousands) States are translation options
Real HMM Examples Robot tracking: Observations are range readings (continuous) States are positions on a map (continuous)
Conditional Independence HMMs have two important independence properties: Markov hidden process, future depends on past via the present
Conditional Independence HMMs have two important independence properties: Markov hidden process, future depends on past via the present Current observation independent of all else given current state
Conditional Independence HMMs have two important independence properties: Markov hidden process, future depends on past via the present Current observation independent of all else given current state Quiz: does this mean that observations are independent given no evidence?
HMM Notation
HMM Problem 1 Evaluation Consider the problem where we have a number of HMMs (that is, a set of ( ,A,B) triples) describing different systems, and a sequence of observations. We may want to know which HMM most probably generated the given sequence. Solution: Forward Algorithm
HMM Problem 2 Decoding: Finding the most probable sequence of hidden states given some observations Find the hidden states that generated the observed output. In many cases we are interested in the hidden states of the model since they represent something of value that is not directly observable Solution: Backward Algorithm or Viterbi Algorithm
HMM Problem 3 Learning: Generating a HMM from a sequence of obersvations Solution: Forward-Backward Algorithm
Exhaustive Search Solution Sequence of observations for seaweed state: Dry Damp Soggy
Exhaustive Search Solution Pr(dry,damp,soggy | HMM) = Pr(dry,damp,soggy | sunny,sunny,sunny) + Pr(dry,damp,soggy | sunny,sunny,cloudy) + Pr(dry,damp,soggy | sunny,sunny,rainy) Pr(dry,damp,soggy | rainy,rainy,rainy)
A better solution: dynamic programming We can calculate the probability of reaching an intermediate state in the trellis as the sum of all possible paths to that state.
A better solution: dynamic programming t ( j )= Pr( observation | hidden state is j ) x Pr(all paths to state j at time t)
A better solution: dynamic programming the sum of these final partial probabilities is the sum of all possible paths through the trellis
A better solution: dynamic programming
Exhaustive search: O(T m ) Dynamic programming: O(T)
References CSE473: Introduction to Artificial Intelligence Hidden Markov Models Tutorial Models/html_dev/main.html Models/html_dev/main.html