Download presentation
Presentation is loading. Please wait.
Published byAndra Lindsey Modified over 9 years ago
1
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
2
an observable Markov model –directly get sequence of states –p(s 1,s 2...s n | )=p(s 1 ) i=2..n p(s i |s i-1 ) –(why I don’t like the Urn example in the book) Hidden Markov model –only observe sequence of symbols generated by states –for each state, there is a probability distribution over finite set of symbols (emission probabilities) –example: think of soda machine observations: messages display (“insert 20 cents more”), output can, give change states: coins inserted so far add up to N cents... state transitions are determined by coins input
3
tasks 1)given a sequence, compute the probability it came from one of a set of models (e.g. most likely phoneme) – classification 2)infer the most likely sequence of states underlying sequence of symbols find Q* such that: 3)train the HMM by learning the parameters (transition probabilities) from a set of examples given seqs X, find * such that
4
given an observation sequence O=o 1...o T –if we also knew state seq Q=q 1..q T, then we could easily calculate p(O|Q, ) –joint probability: p(O,Q| )=p(q 1 ) · i=2..T p(q i |q i-1 ) · i=1..T p(o i |q i ) –could calculate by marginalization p(O| ) = Q p(O,Q| ) intractable, have to sum over all possible sequences Q –the forward-backward algorithm is a recursive procedure that solves this efficiently (via dynamic programming)
5
Forward variable: 5 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) t (i) is the probability of observing prefix o 1..o t and ending in state q i...
6
Backward variable: 6 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) t (i) is the probability of being in state q i at time t and observing suffix o t+1..o T...
7
Forward-backward algorithm, O(N 2 T) forward pass: for each time step i=1..T calculate (i) by summing over all predecessor states j reverse pass for each time step i=T..1 calculate (i) by summing over all successor states j
8
A01 function ForwardBackward( O, S,π,A,B ) : returns p(O|π,A,B) A02 for each state s i do A03 1 (i)←π i *B i (O 1 ) A04 end for A05 for i←2,3,...,T do A06 for each state s j do A07 i (j)← k ( i-1 (k)*A kj *B j (O i )) A08 end for A09 end for // is not needed for output, but is often computed for other purposes A10 T (i)←1 A11 for i←T-1,...,1 do A12 for each state s j do A13 i (j)← k (A jk *B k (O i+1 )* i+1 (k)) A14 end for A15 end for A16 return i T (i) A17 end function
9
Finding the State Sequence 9 No! Choose the state that has the highest probability, for each time step: q t * = arg max i γ t (i) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0)
10
Viterbi’s Algorithm δ t (i) ≡ max q1q2∙∙∙ qt-1 p(q 1 q 2 ∙∙∙q t-1,q t =S i,O 1 ∙∙∙O t | λ) Initialization: δ 1 (i) = π i b i (O 1 ), ψ 1 (i) = 0 Recursion: δ t (j) = max i δ t-1 (i)a ij b j (O t ), ψ t (j) = argmax i δ t-1 (i)a ij ψ t (j) –note: I think the book has wrong formula for ψ t (j) Termination: p * = max i δ T (i), q T * = argmax i δ T (i) Path backtracking: q t * = ψ t+1 (q t+1 * ), t=T-1, T-2,..., 1 10
11
A01 function VITERBI( O, S,π,A,B ) : returns state sequence q 1 *..q T * A02 for each state si do A03 1 (i)←π i *B i (O 1 ) A04 1 (i)←0 A05 end for A06 for i←2,3,...,T do A07 for each state sj do A08 i (j)←max k ( i-1 (k)*A kj *B j (O i )) A09 i (j)←argmax k ( i-1 (k)*A kj *B j (O i )) A10 end for A11 end for // traceback, extract sequence of states A12 p*←max i T (i) A13 q T *← argmax i T (i) A14 for i←T-1,T-2,...,1 do A15 q i *= j+1 (q i+1 *) A16 end for A17 return q 1 *..q T * A19 end function
12
Learning 12 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) learn model parameters (transition a ij and emission probabilities b ij with highest likelihood for a given set of training examples define t (i,j) as prob of being in si at time t and sj at time t+1, given sequence of observations O define latent variables z j t and z ij t as indicators of which state a sequence passes through at each time step
13
Baum-Welch (EM) 13 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) recall, i t = i t i t, prob of being in state i at time t expectation of transition
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.