Forward-backward algorithm LING 572 Fei Xia 02/23/06
Outline Forward and backward probability Expected counts and update formulae Relation with EM
HMM A HMM is a tuple : –A set of states S={s 1, s 2, …, s N }. –A set of output symbols Σ={w 1, …, w M }. –Initial state probabilities –State transition prob: A={a ij }. –Symbol emission prob: B={b ijk } State sequence: X 1 …X T+1 Output sequence: o 1 …o T
Constraints
Decoding Given the observation O 1,T =o 1 …o T, find the state sequence X 1,T+1 =X 1 … X T+1 that maximizes P(X 1,T+1 | O 1,T ). Viterbi algorithm X1X1 X2X2 XTXT … o1o1 o2o2 oToT X T+1
Notation A sentence: O 1,T =o 1 …o T, T is the sentence length The state sequence X 1,T+1 =X 1 … X T+1 t: time t, range from 1 to T+1. X t : the state at time t. i, j: state s i, s j. k: word w k in the vocabulary
Forward and backward probabilities
Forward probability The probability of producing o i,t-1 while ending up in state s i :
Calculating forward probability Initialization: Induction:
Backward probability The probability of producing the sequence O t,T, given that at time t, we are at state s i.
Calculating backward probability Initialization: Induction:
Calculating the prob of the observation
Estimating parameters The prob of traversing a certain arc at time t given O: (denoted by p t (i, j) in M&S)
The prob of being at state i at time t given O:
Expected counts Sum over the time index: Expected # of transitions from state i to j in O: Expected # of transitions from state i in O:
Update parameters
Final formulae
Emission probabilities Arc-emission HMM:
The inner loop for forward-backward algorithm Given an input sequence and 1.Calculate forward probability: Base case Recursive case: 2.Calculate backward probability: Base case: Recursive case: 3.Calculate expected counts: 4.Update the parameters:
Relation to EM
HMM is a PM (Product of Multi-nominal) Model Forward-back algorithm is a special case of the EM algorithm for PM Models. X (observed data): each data point is an O 1T. Y (hidden data): state sequence X 1T. Θ (parameters): a ij, b ijk, π i.
Relation to EM (cont)
Iterations Each iteration provides values for all the parameters The new model always improve the likeliness of the training data: The algorithm does not guarantee to reach global maximum.
Summary A way of estimating parameters for HMM –Define forward and backward probability, which can calculated efficiently (DP) –Given an initial parameter setting, we re-estimate the parameters at each iteration. –The forward-backward algorithm is a special case of EM algorithm for PM model
Additional slides
Definitions so far The prob of producing O 1,t-1, and ending at state s i at time t: The prob of producing the sequence O t,T, given that at time t, we are at state s i : The prob of being at state i at time t given O:
Emission probabilities Arc-emission HMM: State-emission HMM: