Download presentation
Presentation is loading. Please wait.
1
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005
2
Markov Models: Definition Markov chains are Bayesian networks that model sequences of events (states) Sequential events are dependent Two non-sequential events are conditionally independent given the intermediate events (MM-1)
3
Markov chains q1 q4q3q2 q0q1q4q3q2 q0q1q4q3q2 q0q1q4q3q2 MM-0 MM-1 MM-2 MM-3 … … … …
4
Markov Chains MM-0: P(q 1,q 2.. q N ) = n=1..N P(q n ) MM-1: P(q 1,q 2.. q N ) = n=1..N P(q n |q n-1 ) MM-2: P(q 1,q 2.. q N ) = n=1..N P(q n |q n-1,q n-2 ) MM-3: P(q 1,q 2.. q N ) = n=1..N P(q n |q n-1,q n-2,q n-3 )
5
Hidden Markov Models Hidden Markov chains model sequences of events and corresponding sequences of observations Events form an Markov chain (MM-1) Observations are conditionally independent given the sequence of events Each observation is directly connected with a single event (and conditionally independent with the rest of the events in the network)
6
Hidden Markov Models q0q1q4q3q2 … o0o1o4o3o2 … P(o 0,o 1..o N, q 0,q 1..q N ) = n=0..N P(q n |q n-1 )P(o n |q n ) HMM-1
7
Parameter Estimation The parameters that have to be estimated are the a-priori probabilities P(q 0 ) transition probabilities P(q n |q n-1 ) observation probabilities P(o n |q n ) For example if there are 3 types of events and continuous 1-D observations that follow a Gaussian distribution there are 18 parameters to estimate: 3 a-priori probabilities 3x3 transition probabilities matrix 3 means and 3 variances (observation probabilities)
8
Parameter Estimation If both the sequence of events and sequences of observations are fully observable then ML is used Usually the sequence of events q 0,q 1..q N are non-observable in which case EM is used The EM algorithm for HMMs is the Baum- Welsh or forward-backward algorithm
9
Inference/Decoding The main inference problem for HMMs is known as the decoding problem: given a sequence of observations find the best sequence of states: q = argmax q P(q|O) = argmax q P(q,O) An efficient decoding algorithm is the Viterbi algorithm
10
Viterbi algorithm max q P(q,O) = max q P(o 0,o 1..o N, q 0,q 1..q N ) = max q n=0..N P(q n |q n-1 )P(o n |q n ) = max q N {P(o N |q N ) max q N-1 {P(q N |q N-1 )P(o N-1 |q N-1 ) … max q2 {P(q 3 |q 2 )P(o 2 |q 2 ) max q1 {P(q 2 |q 1 )P(o 1 |q 1 ) max q0 {P(q 1 |q 0 ) P(o 0 |q 0 ) P(q 0 )}}}…}}
11
Viterbi algorithm 1 2 3 4 K.... time At each node keep only the best (most probable) path from all the paths passing through that node
12
Deep Thoughts HMM-0 (HMM with MM-0 event chain) is the Bayes classifier!!! MMs and HMMs are poor models but simple and efficient computationally How do you fix this? (dependent observations?)
13
Some Applications Speech Recognition Optical Character Recognition Part-of-Speech Tagging …
14
Conclusions HMMs and MMs are useful modeling tools for dependent sequence of events (states or classes) Efficient algorithms exist for training HMM parameters (Baum-Welsh) and decoding the most probable sequence of states given an observation sequence (Viterbi) HMMs have many applications
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.