Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM.

Similar presentations


Presentation on theme: "1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM."— Presentation transcript:

1 1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07

2 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM

3 3 Definition of HMM

4 4 Hidden Markov Models There are n states s 1, …, s n in an HMM, and the states are connected. The output symbols are produced by the states or edges in HMM. An observation O=(o 1, …, o T ) is a sequence of output symbols. Given an observation, we want to recover the hidden state sequence. An example: POS tagging –States are POS tags –Output symbols are words –Given an observation (i.e., a sentence), we want to discover the tag sequence.

5 5 Same observation, different state sequences V DT PN timeflieslikeanarrow NN DT VN timeflieslikeanarrow N

6 6 Two types of HMMs State-emission HMM (Moore machine): –The output symbol is produced by states: By the from-state By the to-state Arc-emission HMM (Mealy machine): –The output symbol is produce by the edges; i.e., by the (from-state, to-state) pairs.

7 7 PFA recap

8 8 Formal definition of PFA A PFA is Q: a finite set of N states Σ: a finite set of input symbols I: Q  R + (initial-state probabilities) F: Q  R + (final-state probabilities) : the transition relation between states. P: (transition probabilities)

9 9 Constraints on function: Probability of a string:

10 10 An example of PFA q 0 :0 q 1 :0.2 b:0.8 a:1.0 I(q 0 )=1.0 I(q 1 )=0.0 P(ab n )=I(q 0 )*P(q 0,ab n,q 1 )*F(q 1 ) =1.0 * 1.0*0.8 n *0.2 F(q 0 )=0 F(q 1 )=0.2

11 11 Arc-emission HMM

12 12 Definition of arc-emission HMM A HMM is a tuple : –A set of states S={s 1, s 2, …, s N }. –A set of output symbols Σ={w 1, …, w M }. –Initial state probabilities –Transition prob: A={a ij }. –Emission prob: B={b ijk }

13 13 Constraints in an arc-emission HMM For any integer n and any HMM

14 14 An example: HMM structure s1s1 s2s2 sNsN … w1w1 w5w5 Same kinds of parameters but the emission probabilities depend on both states: P(w k | s i, s j )  # of Parameters: O(N 2 M + N 2 ). w4w4 w3w3 w2w2 w1w1 w1w1

15 15 A path in an arc emission HMM o1o1 onon X1X1 X2X2 XnXn … o2o2 X n+1 State sequence: X 1,n+1 Output sequence: O 1,n

16 16 PFA vs. Arc-emission HMM A PFA is Q: a finite set of N states Σ: a finite set of input symbols I: Q  R + (initial-state probabilities) F: Q  R + (final-state probabilities) : the transition relation between states. P: (transition probabilities) A HMM is a tuple : –A set of states S={s 1, s 2, …, s N }. –A set of output symbols Σ={w 1, …, w M }. –Initial state probabilities –Transition prob: A={a ij }. –Emission prob: B={b ijk }

17 17 State-emission HMM

18 18 Definition of state-emission HMM A HMM is a tuple : –A set of states S={s 1, s 2, …, s N }. –A set of output symbols Σ={w 1, …, w M }. –Initial state probabilities –Transition prob: A={a ij }. –Emission prob: B={b jk } We use s i and w k to refer to what is in an HMM structure. We use X i and O i to refer to what is in a particular HMM path and its output

19 19 Constraints in a state-emission HMM For any integer n and any HMM

20 20 An example: the HMM structure Two kinds of parameters: Transition probability: P(s j | s i ) Emission probability: P(w k | s i )  # of Parameters: O(NM+N 2 ) w1w1 w2w2 w1w1 s1s1 s2s2 sNsN … w5w5 w3w3 w1w1

21 21 Output symbols are generated by the from-states State sequence: X 1,n Output sequence: O 1,n o1o1 onon X1X1 X2X2 XnXn … o2o2

22 22 Output symbols are generated by the to-states State sequence: X 1,n+1 Output sequence: O 1,n o1o1 onon X2X2 X3X3 X n+1 … o2o2 X1X1

23 23 A path in a state-emission HMM o1o1 onon X1X1 X2X2 XnXn … o2o2 o1o1 onon X2X2 X3X3 X n+1 … o2o2 X1X1 Output symbols are produced by the from-states: Output symbols are produced by the to-states:

24 24 Arc-emission vs. state-emission o1o1 onon X2X2 X3X3 X n+1 … o2o2 X1X1 o1o1 onon X1X1 X2X2 XnXn … o2o2

25 25 Properties of HMM Markov assumption (Limited horizon): Stationary distribution (Time invariance): the probabilities do not change over time: The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.

26 26 Are the two types of HMMs equivalent? For each state-emission HMM 1, there is an arc-emission HMM 2, such that for any sequence O, P(O|HMM 1 )=P(O|HMM 2 ). The reverse is also true. How to prove that?

27 27 Applications of HMM N-gram POS tagging –Bigram tagger: o i is a word, and s i is a POS tag. Other tagging problems: –Word segmentation –Chunking –NE tagging –Punctuation predication –…–… Other applications: ASR, ….

28 28 Three HMM questions

29 29 Three fundamental questions for HMMs Training an HMM: given a set of observation sequences, learn its distribution, i.e. learn the transition and emission probabilities HMM as a parser: Finding the best state sequence for a given observation HMM as an LM: compute the probability of a given observation

30 30 Training an HMM: estimating the probabilities Supervised learning: –The state sequences in the training data are known –ML estimation Unsupervised learning: –The state sequences in the training data are unknown –forward-backward algorithm

31 31 HMM as a parser

32 32 HMM as a parser: Finding the best state sequence Given the observation O 1,T =o 1 …o T, find the state sequence X 1,T+1 =X 1 … X T+1 that maximizes P(X 1,T+1 | O 1,T ).  Viterbi algorithm X1X1 X2X2 XTXT … o1o1 o2o2 oToT X T+1

33 33 “time flies like an arrow” \init BOS 1.0 \transition BOS N 0.5 BOS DT 0.4 BOS V 0.1 DT N 1.0 N N 0.2 N V 0.7 N P 0.1 V DT 0.4 V N 0.4 V P 0.1 V V 0.1 P DT 0.6 P N 0.4 \emission N time 0.1 V time 0.1 N flies 0.1 V flies 0.2 V like 0.2 P like 0.1 DT an 0.3 N arrow 0.1

34 34 Finding all the paths: to build the trellis time flies like an arrow NN N N VVVV PPPP DT BOS N V P DT

35 35 Finding all the paths (cont) time flies like an arrow NN N N VVVV PPPP DT BOS N V P DT

36 36 Viterbi algorithm The probability of the best path that produces O 1,t-1 while ending up in state s j : Initialization: Induction:  Modify it to allow ² -emission

37 37 Proof of the recursive function

38 38 Viterbi algorithm: calculating ± j (t) # N is the number of states in the HMM structure # observ is the observation O, and leng is the length of observ. Initialize viterbi[0..leng] [0..N-1] to 0 for each state j viterbi[0] [j] = ¼ [j] back-pointer[0] [j] = -1 # dummy for (t=0; t<leng; t++) for (j=0; j<N; j++) k=observ[t] # the symbol at time t viterbi[t+1] [j] = max i viterbi[t] [i] a ij b jk back-pointer[t+1] [j] = arg max i viterbi[t] [i] a ij b jk

39 39 Viterbi algorithm: retrieving the best path # find the best path best_final_state = arg max j viterbi[leng] [j] # start with the last state in the sequence j = best_final_state push(arr, j); for (t=leng; t>0; t--) i = back-pointer[t] [j] push(arr, i) j = i return reverse(arr)

40 40 Hw7 and Hw8 Hw7: write an HMM “class”: –Read HMM input file –Output HMM Hw8: implement the algorithms for two HMM tasks: –HMM as parser: Viterbi algorithm –HMM as LM: the prob of an observation

41 41 Implementation issue storing HMM Approach #1: ¼ i : pi {state_str} a ij : a {from_state_str} {to_state_str} b jk : b {state_str} {symbol} Approach #2: state2idx{state_str} = state_idx symbol2idx{symbol_str} = symbol_idx ¼ i : pi [state_idx] = prob a ij : a [from_state_idx] [to_state_idx] = prob b jk : b [state_idx] [symbol_idx] = prob idx2state[state_idx] = state_str Idx2symbol[symbol_idx] = symbol_str

42 42 Storing HMM: sparse matrix a ij : a [i] [j] = prob b jk : b [j] [k] = prob a ij : a[i] = “j1 p1 j2 p2 …” a ij : a[j] = “i1 p1 i2 p2 …” b jk : b[j] = “k1 p1 k2 p2 ….” b jk : b[k] = “j1 p1 j2 p2 …”

43 43 Other implementation issues Index starts from 0 in programming, but often starts from 1 in algorithms The sum of logprob is used in practice to replace the product of prob. Check constraints and print out warning if the constraints are not met.

44 44 HMM as LM

45 45 HMM as an LM: computing P(o 1, …, o T ) 1 st try: - enumerate all possible paths - add the probabilities of all paths

46 46 Forward probabilities Forward probability: the probability of producing O 1,t-1 while ending up in state s i :

47 47 Calculating forward probability Initialization: Induction:

48 48

49 49 Summary Definition: hidden states, output symbols Properties: Markov assumption Applications: POS-tagging, etc. Three basic questions in HMM –Find the probability of an observation: forward probability –Find the best sequence: Viterbi algorithm –Estimate probability: MLE Bigram POS tagger: decoding with Viterbi algorithm


Download ppt "1 HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07. 2 HMM Definition and properties of HMM –Two types of HMM Three basic questions in HMM."

Similar presentations


Ads by Google