Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

Similar presentations


Presentation on theme: "Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)"— Presentation transcript:

1 Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

2 an observable Markov model –directly get sequence of states –p(s 1,s 2...s n |  )=p(s 1 )  i=2..n p(s i |s i-1 ) –(why I don’t like the Urn example in the book) Hidden Markov model –only observe sequence of symbols generated by states –for each state, there is a probability distribution over finite set of symbols (emission probabilities) –example: think of soda machine observations: messages display (“insert 20 cents more”), output can, give change states: coins inserted so far add up to N cents... state transitions are determined by coins input

3 tasks 1)given a sequence, compute the probability it came from one of a set of models (e.g. most likely phoneme) – classification 2)infer the most likely sequence of states underlying sequence of symbols find Q* such that: 3)train the HMM by learning the parameters (transition probabilities) from a set of examples given seqs X, find * such that

4 given an observation sequence O=o 1...o T –if we also knew state seq Q=q 1..q T, then we could easily calculate p(O|Q, ) –joint probability: p(O,Q| )=p(q 1 ) ·  i=2..T p(q i |q i-1 ) ·  i=1..T p(o i |q i ) –could calculate by marginalization p(O| ) =  Q p(O,Q| ) intractable, have to sum over all possible sequences Q –the forward-backward algorithm is a recursive procedure that solves this efficiently (via dynamic programming)

5 Forward variable: 5 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)  t (i) is the probability of observing prefix o 1..o t and ending in state q i...

6 Backward variable: 6 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)  t (i) is the probability of being in state q i at time t and observing suffix o t+1..o T...

7 Forward-backward algorithm, O(N 2 T) forward pass: for each time step i=1..T calculate  (i) by summing over all predecessor states j reverse pass for each time step i=T..1 calculate  (i) by summing over all successor states j

8 A01 function ForwardBackward( O, S,π,A,B ) : returns p(O|π,A,B) A02 for each state s i do A03  1 (i)←π i *B i (O 1 ) A04 end for A05 for i←2,3,...,T do A06 for each state s j do A07  i (j)←  k (  i-1 (k)*A kj *B j (O i )) A08 end for A09 end for //  is not needed for output, but is often computed for other purposes A10  T (i)←1 A11 for i←T-1,...,1 do A12 for each state s j do A13  i (j)←  k (A jk *B k (O i+1 )*  i+1 (k)) A14 end for A15 end for A16 return  i  T (i) A17 end function

9 Finding the State Sequence 9 No! Choose the state that has the highest probability, for each time step: q t * = arg max i γ t (i) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)

10 Viterbi’s Algorithm δ t (i) ≡ max q1q2∙∙∙ qt-1 p(q 1 q 2 ∙∙∙q t-1,q t =S i,O 1 ∙∙∙O t | λ) Initialization: δ 1 (i) = π i b i (O 1 ), ψ 1 (i) = 0 Recursion: δ t (j) = max i δ t-1 (i)a ij b j (O t ), ψ t (j) = argmax i δ t-1 (i)a ij ψ t (j) –note: I think the book has wrong formula for ψ t (j) Termination: p * = max i δ T (i), q T * = argmax i δ T (i) Path backtracking: q t * = ψ t+1 (q t+1 * ), t=T-1, T-2,..., 1 10

11 A01 function VITERBI( O, S,π,A,B ) : returns state sequence q 1 *..q T * A02 for each state si do A03  1 (i)←π i *B i (O 1 ) A04  1 (i)←0 A05 end for A06 for i←2,3,...,T do A07 for each state sj do A08  i (j)←max k (  i-1 (k)*A kj *B j (O i )) A09  i (j)←argmax k (  i-1 (k)*A kj *B j (O i )) A10 end for A11 end for // traceback, extract sequence of states A12 p*←max i  T (i) A13 q T *← argmax i  T (i) A14 for i←T-1,T-2,...,1 do A15 q i *=  j+1 (q i+1 *) A16 end for A17 return q 1 *..q T * A19 end function

12 Learning 12 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) learn model parameters (transition a ij and emission probabilities b ij with highest likelihood for a given set of training examples define  t (i,j) as prob of being in si at time t and sj at time t+1, given sequence of observations O define latent variables z j t and z ij t as indicators of which state a sequence passes through at each time step

13 Baum-Welch (EM) 13 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) recall,  i t =  i t  i t, prob of being in state i at time t expectation of transition


Download ppt "Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)"

Similar presentations


Ads by Google