Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.

Similar presentations


Presentation on theme: "Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science."— Presentation transcript:

1 Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Forward Backward probability; Viterbi Algorithm

2 Another Example Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 A colored ball choosing example : U1U2U3 U10.10.40.5 U20.60.2 U30.30.40.3 Probability of transition to another Urn after picking a ball:

3 Example (contd.) U1U2U3 U10.10.40.5 U20.60.2 U30.30.40.3 Given : Observation : RRGGBRGR State Sequence : ?? Not so Easily Computable. and RGB U10.30.50.2 U20.10.40.5 U30.60.10.3

4 Example (contd.) Here : – S = {U1, U2, U3} – V = { R,G,B} For observation: – O ={o 1 … o n } And State sequence – Q ={q 1 … q n } π is U1U2U3 U10.10.40.5 U20.60.2 U30.30.40.3 RGB U10.30.50.2 U20.10.40.5 U30.60.10.3 A = B=

5 Hidden Markov Models

6 Model Definition Set of states : S where |S|=N Output Alphabet : V Transition Probabilities : A = {a ij } Emission Probabilities : B = {b j (o k )} Initial State Probabilities : π

7 Markov Processes Properties – Limited Horizon :Given previous n states, a state i, is independent of preceding 0…i-n+1 states. P(X t =i|X t-1, X t-2,… X 0 ) = P(X t =i|X t-1, X t-2 … X t-n ) – Time invariance : P(X t =i|X t-1 =j) = P(X 1 =i|X 0 =j) = P(X n =i|X 0-1 =j)

8 Three Basic Problems of HMM 1.Given Observation Sequence O ={o 1 … o T } – Efficiently estimate P(O|λ) 2.Given Observation Sequence O ={o 1 … o T } – Get best Q ={q 1 … q T } i.e. Maximize P(Q|O, λ) 3.How to adjust to best maximize – Re-estimate λ

9 Three basic problems (contd.) Problem 1: Likelihood of a sequence – Forward Procedure – Backward Procedure Problem 2: Best state sequence – Viterbi Algorithm Problem 3: Re-estimation – Baum-Welch ( Forward-Backward Algorithm )

10 Problem 2 Given Observation Sequence O ={o 1 … o T } – Get “best” Q ={q 1 … q T } i.e. Solution : 1.Best state individually likely at a position i 2.Best state given all the previously observed states and observations  Viterbi Algorithm

11 Example Output observed – aabb What state seq. is most probable? Since state seq. cannot be predicted with certainty, the machine is given qualification “hidden”. Note: ∑ P(outlinks) = 1 for all states

12 Probabilities for different possible seq 1 1,2 1,1 0.4 1,1,1 0.16 1,1,2 0.06 1,2,1 0.0375 1,2,2 0.0225 1,1,1,1 0.016 1,1,1,2 0.056...and so on 1,1,2,1 0.018 1,1,2,2 0.018 0.15

13 If P(s i |s i-1, s i-2 ) (order 2 HMM) then the Markovian assumption will take effect only after two levels. (generalizing for n-order… after n levels) Viterbi for higher order HMM

14 Forward and Backward Probability Calculation

15 A Simple HMM q r a: 0.3 b: 0.1 a: 0.2 b: 0.1 b: 0.2 b: 0.5 a: 0.2 a: 0.4

16 Forward or α-probabilities Let α i (t) be the probability of producing w 1,t-1, while ending up in state s i α i (t)= P(w 1,t-1,S t =s i ), t>1

17 Initial condition on α i (t) α i (t)= 1.0 if i=1 0 otherwise

18 Probability of the observation using α i (t) P(w 1,n ) =Σ 1 σ P(w 1,n, S n+1 =s i ) = Σ i=1 σ α i (n+1) σ is the total number of states

19 Recursive expression for α α j (t+1) =P(w 1,t, S t+1 =s j ) =Σ i=1 σ P(w 1,t, S t =s i, S t+1 =s j ) =Σ i=1 σ P(w 1,t-1, S t =s j ) P(w t, S t+1 =s j |w 1,t-1, S t =s i ) =Σ i=1 σ P(w 1,t-1, S t =s i ) P(w t, S t+1 =s j |S t =s i ) = Σ i=1 σ α j (t) P(w t, S t+1 =s j |S t =s i )

20 Time Ticks123 4 5 INPUTεbbb bbb bbba 1.00.20.05 0.017 0.0148 0.00.10.07 0.04 0.0131 P(w,t)1.00.30.12 0.057 0.0279 The forward probabilities of “bbba”

21 Backward or β-probabilities Let β i (t) be the probability of seeing w t,n, given that the state of the HMM at t is s i β i (t)= P(w t,n,S t =s i )

22 Probability of the observation using β P(w 1,n )=β 1 (1)

23 Recursive expression for β β j (t-1) =P(w t-1,n |S t-1 =s i ) =Σ j=1 σ P(w t-1,n, S t =s j | S t-1 =s i ) =Σ j=1 σ P(w t-1, S t =s j |S t-1 =s i ) P(w t,n,|w t-1,S t =s j, S t-1 =s i ) =Σ j=1 σ P(w t-1, S t =s j |S t-1 =s i ) P(w t,n, |S t =s j ) (consequence of Markov Assumption) = Σ j=1 σ P(w t-1, S t =s j |S t-1 =s i ) β j (t)

24 Forward Procedure Forward Step:

25 Forward Procedure

26 Backward Procedure

27

28 Forward Backward Procedure Benefit – Order N 2 T as compared to 2TN T for simple computation Only Forward or Backward procedure needed for Problem 1

29 Problem 2 Given Observation Sequence O ={o 1 … o T } – Get “best” Q ={q 1 … q T } i.e. Solution : 1.Best state individually likely at a position i 2.Best state given all the previously observed states and observations  Viterbi Algorithm

30 Viterbi Algorithm Define such that, i.e. the sequence which has the best joint probability so far. By induction, we have,

31 Viterbi Algorithm

32

33 Problem 3 How to adjust to best maximize – Re-estimate λ Solutions : – To re-estimate (iteratively update and improve) HMM parameters A,B, π Use Baum-Welch algorithm

34 Baum-Welch Algorithm Define Putting forward and backward variables

35 Baum-Welch algorithm

36 Define Then, expected number of transitions from S i And, expected number of transitions from S j to S i

37

38 Baum-Welch Algorithm Baum et al have proved that the above equations lead to a model as good or better than the previous


Download ppt "Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science."

Similar presentations


Ads by Google