Download presentation
Presentation is loading. Please wait.
Published byOphelia Quinn Modified over 9 years ago
1
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Forward Backward probability; Viterbi Algorithm
2
Another Example Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 A colored ball choosing example : U1U2U3 U10.10.40.5 U20.60.2 U30.30.40.3 Probability of transition to another Urn after picking a ball:
3
Example (contd.) U1U2U3 U10.10.40.5 U20.60.2 U30.30.40.3 Given : Observation : RRGGBRGR State Sequence : ?? Not so Easily Computable. and RGB U10.30.50.2 U20.10.40.5 U30.60.10.3
4
Example (contd.) Here : – S = {U1, U2, U3} – V = { R,G,B} For observation: – O ={o 1 … o n } And State sequence – Q ={q 1 … q n } π is U1U2U3 U10.10.40.5 U20.60.2 U30.30.40.3 RGB U10.30.50.2 U20.10.40.5 U30.60.10.3 A = B=
5
Hidden Markov Models
6
Model Definition Set of states : S where |S|=N Output Alphabet : V Transition Probabilities : A = {a ij } Emission Probabilities : B = {b j (o k )} Initial State Probabilities : π
7
Markov Processes Properties – Limited Horizon :Given previous n states, a state i, is independent of preceding 0…i-n+1 states. P(X t =i|X t-1, X t-2,… X 0 ) = P(X t =i|X t-1, X t-2 … X t-n ) – Time invariance : P(X t =i|X t-1 =j) = P(X 1 =i|X 0 =j) = P(X n =i|X 0-1 =j)
8
Three Basic Problems of HMM 1.Given Observation Sequence O ={o 1 … o T } – Efficiently estimate P(O|λ) 2.Given Observation Sequence O ={o 1 … o T } – Get best Q ={q 1 … q T } i.e. Maximize P(Q|O, λ) 3.How to adjust to best maximize – Re-estimate λ
9
Three basic problems (contd.) Problem 1: Likelihood of a sequence – Forward Procedure – Backward Procedure Problem 2: Best state sequence – Viterbi Algorithm Problem 3: Re-estimation – Baum-Welch ( Forward-Backward Algorithm )
10
Problem 2 Given Observation Sequence O ={o 1 … o T } – Get “best” Q ={q 1 … q T } i.e. Solution : 1.Best state individually likely at a position i 2.Best state given all the previously observed states and observations Viterbi Algorithm
11
Example Output observed – aabb What state seq. is most probable? Since state seq. cannot be predicted with certainty, the machine is given qualification “hidden”. Note: ∑ P(outlinks) = 1 for all states
12
Probabilities for different possible seq 1 1,2 1,1 0.4 1,1,1 0.16 1,1,2 0.06 1,2,1 0.0375 1,2,2 0.0225 1,1,1,1 0.016 1,1,1,2 0.056...and so on 1,1,2,1 0.018 1,1,2,2 0.018 0.15
13
If P(s i |s i-1, s i-2 ) (order 2 HMM) then the Markovian assumption will take effect only after two levels. (generalizing for n-order… after n levels) Viterbi for higher order HMM
14
Forward and Backward Probability Calculation
15
A Simple HMM q r a: 0.3 b: 0.1 a: 0.2 b: 0.1 b: 0.2 b: 0.5 a: 0.2 a: 0.4
16
Forward or α-probabilities Let α i (t) be the probability of producing w 1,t-1, while ending up in state s i α i (t)= P(w 1,t-1,S t =s i ), t>1
17
Initial condition on α i (t) α i (t)= 1.0 if i=1 0 otherwise
18
Probability of the observation using α i (t) P(w 1,n ) =Σ 1 σ P(w 1,n, S n+1 =s i ) = Σ i=1 σ α i (n+1) σ is the total number of states
19
Recursive expression for α α j (t+1) =P(w 1,t, S t+1 =s j ) =Σ i=1 σ P(w 1,t, S t =s i, S t+1 =s j ) =Σ i=1 σ P(w 1,t-1, S t =s j ) P(w t, S t+1 =s j |w 1,t-1, S t =s i ) =Σ i=1 σ P(w 1,t-1, S t =s i ) P(w t, S t+1 =s j |S t =s i ) = Σ i=1 σ α j (t) P(w t, S t+1 =s j |S t =s i )
20
Time Ticks123 4 5 INPUTεbbb bbb bbba 1.00.20.05 0.017 0.0148 0.00.10.07 0.04 0.0131 P(w,t)1.00.30.12 0.057 0.0279 The forward probabilities of “bbba”
21
Backward or β-probabilities Let β i (t) be the probability of seeing w t,n, given that the state of the HMM at t is s i β i (t)= P(w t,n,S t =s i )
22
Probability of the observation using β P(w 1,n )=β 1 (1)
23
Recursive expression for β β j (t-1) =P(w t-1,n |S t-1 =s i ) =Σ j=1 σ P(w t-1,n, S t =s j | S t-1 =s i ) =Σ j=1 σ P(w t-1, S t =s j |S t-1 =s i ) P(w t,n,|w t-1,S t =s j, S t-1 =s i ) =Σ j=1 σ P(w t-1, S t =s j |S t-1 =s i ) P(w t,n, |S t =s j ) (consequence of Markov Assumption) = Σ j=1 σ P(w t-1, S t =s j |S t-1 =s i ) β j (t)
24
Forward Procedure Forward Step:
25
Forward Procedure
26
Backward Procedure
28
Forward Backward Procedure Benefit – Order N 2 T as compared to 2TN T for simple computation Only Forward or Backward procedure needed for Problem 1
29
Problem 2 Given Observation Sequence O ={o 1 … o T } – Get “best” Q ={q 1 … q T } i.e. Solution : 1.Best state individually likely at a position i 2.Best state given all the previously observed states and observations Viterbi Algorithm
30
Viterbi Algorithm Define such that, i.e. the sequence which has the best joint probability so far. By induction, we have,
31
Viterbi Algorithm
33
Problem 3 How to adjust to best maximize – Re-estimate λ Solutions : – To re-estimate (iteratively update and improve) HMM parameters A,B, π Use Baum-Welch algorithm
34
Baum-Welch Algorithm Define Putting forward and backward variables
35
Baum-Welch algorithm
36
Define Then, expected number of transitions from S i And, expected number of transitions from S j to S i
38
Baum-Welch Algorithm Baum et al have proved that the above equations lead to a model as good or better than the previous
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.