Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models (HMM) Rabiner’s Paper

Similar presentations


Presentation on theme: "Hidden Markov Models (HMM) Rabiner’s Paper"— Presentation transcript:

1 Hidden Markov Models (HMM) Rabiner’s Paper
Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

2 Stationary and Non-stationary
Stationary Process: Its statistical properties do not vary with time Non-stationary Process: The signal properties vary over time Markoviana Reading Group Fatih Gelgi – Feb, 2005

3 HMM Example - Casino Coin
0.9 Two CDF tables 0.2 0.1 Fair Unfair State transition Pbbties. States 0.8 Symbol emission Pbbties. 0.5 0.5 0.3 0.7 Observation Symbols H T H T Observation Sequence HTHHTTHHHTHTHTHHTHHHHHHTHTHH FFFFFFUUUFFFFFFUUUUUUUFFFFFF State Sequence Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? Markoviana Reading Group Fatih Gelgi – Feb, 2005

4 Properties of an HMM First-order Markov process Time is discrete
qt only depends on qt-1 Time is discrete Markoviana Reading Group Fatih Gelgi – Feb, 2005

5 Elements of an HMM N, the number of States M, the number of Symbols
States S1, S2, … SN Observation Symbols O1, O2, … OM l, the Probability Distributions a, b, p Markoviana Reading Group Fatih Gelgi – Feb, 2005

6 HMM Basic Problems Given an observation sequence O=O1O2O3…OT and l, find P(O|l) Forward Algorithm / Backward Algorithm Given O=O1O2O3…OT and l, find most likely state sequence Q=q1q2…qT Viterbi Algorithm Given O=O1O2O3…OT and l, re-estimate l so that P(O|l) is higher than it is now Baum-Welch Re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

7 Forward Algorithm Illustration
at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

8 Forward Algorithm Illustration (cont’d)
at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Total of this column gives solution State Sj SN pNbN(O1) S (a1(i) aiN) bN(O2) S6 p6b6(O1) S (a1(i) ai6) b6(O2) S5 p5b5(O1) S (a1(i) ai5) b5(O2) S4 p4b4(O1) S (a1(i) ai4) b4(O2) S3 p3b3(O1) S (a1(i) ai3) b3(O2) S2 p2b2(O1) S (a1(i) ai2) b2(O2) S1 p1b1(O1) S (a1(i) ai1) b1(O2) at(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

9 Forward Algorithm Definition: Initialization: Induction:
Problem 1 Answer: at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Complexity: O(N2T) Markoviana Reading Group Fatih Gelgi – Feb, 2005

10 Backward Algorithm Illustration
t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

11 Backward Algorithm Definition:
Initialization: Induction: t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

12 Q2: Optimality Criterion 1
* Maximize the expected number of correct individual states Definition: Initialization: Problem 2 Answer: t(i) is the probability of being in state Si at time t given the observation sequence O and the model . Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence. Markoviana Reading Group Fatih Gelgi – Feb, 2005

13 Q2: Optimality Criterion 2
* Find the single best state sequence (path), i.e. maximize P(Q|O,). Definition: dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

14 Viterbi Algorithm The major difference from the forward algorithm:
Maximization instead of sum Markoviana Reading Group Fatih Gelgi – Feb, 2005

15 Viterbi Algorithm Illustration
dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Max of this col indicates traceback start State Sj SN pN bN(O1) max [d1(i) aiN] bN(O2) S6 p6 b6(O1) max [d1(i) ai6] b6(O2) S5 p5 b5(O1) max [d1(i) ai5] b5(O2) S4 p4 b4(O1) max [d1(i) ai4] b4(O2) S3 p3 b3(O1) max [d1(i) ai3] b3(O2) S2 p2 b2(O1) max [d1(i) ai2] b2(O2) S1 p1 b1(O1) max [d1(i) ai1] b1(O2) dt(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

16 Relations with DBN Forward Function: Backward Function:
Viterbi Algorithm: t+1(j) bj(Ot+1) aij t(i) t(i) bj(Ot+1) t+1(j) aij T(i)=1 t+1(j) bj(Ot+1) aij t(i) Markoviana Reading Group Fatih Gelgi – Feb, 2005

17 Some more definitions gt(i) is the probability of being in state Si at time t xt(i,j) is the probability of being in state Si at time t, and Sj at time t+1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

18 Baum-Welch Re-estimation
Expectation-Maximization Algorithm Expectation: Markoviana Reading Group Fatih Gelgi – Feb, 2005

19 Baum-Welch Re-estimation (cont’d)
Maximization: Markoviana Reading Group Fatih Gelgi – Feb, 2005

20 Notes on the Re-estimation
If the model does not change, it means that it has reached a local maxima. Depending on the model, many local maxima can exist Re-estimated probabilities will sum to 1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

21 Implementation issues
Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type Markoviana Reading Group Fatih Gelgi – Feb, 2005

22 Scaling calculation: Recursion to calculate: Markoviana Reading Group
Fatih Gelgi – Feb, 2005

23 Scaling (cont’d) calculation: Desired condition:
* Note that is not true! Markoviana Reading Group Fatih Gelgi – Feb, 2005

24 Scaling (cont’d) Markoviana Reading Group Fatih Gelgi – Feb, 2005

25 Maximum log-likelihood
Initialization: Recursion: Termination: Markoviana Reading Group Fatih Gelgi – Feb, 2005

26 Multiple observations sequences
Problem with re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

27 Initial estimates of parameters
For  and A, Random or uniform is sufficient For B (discrete symbol prb.), Good initial estimate is needed Markoviana Reading Group Fatih Gelgi – Feb, 2005

28 Insufficient training data
Solutions: Increase the size of training data Reduce the size of the model Interpolate parameters using another model Markoviana Reading Group Fatih Gelgi – Feb, 2005

29 References L Rabiner. ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989. S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft). V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001. T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001. S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’  Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001. Markoviana Reading Group Fatih Gelgi – Feb, 2005


Download ppt "Hidden Markov Models (HMM) Rabiner’s Paper"

Similar presentations


Ads by Google