Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven
Regular expressions Alignment Regular expression Problem: regular expression does not distinguish Exceptional TGCTAGG Consensus ACACATC ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC [AT][CG][AC][ACGT]*A[TG][GC]
Hidden Markov Models A.8 C 0 G 0 T.2 A 0 C.8 G.2 T 0 A.8 C.2 G 0 T 0 A 1 C 0 G 0 T 0 A 0 C 0 G.2 T.8 A 0 C.8 G.2 T 0 A.2 C.4 G.2 T Sequence score Transition probabilities Emission probabilities
Log odds Use logarithm for scaling and normalize by random model Log odds for sequence S : A: 1.16 T:-0.22 C: 1.16 G:-0.22 A: 1.16 C:-0.22 A: 1.39 G:-0.22 T: 1.16 C: 1.16 G:-0.22 A:-0.22 C: 0.47 G:-0.22 T:
Log odds SequenceLog odds ACAC--ATC (consensus) 6.7 ACA---ATG 4.9 TCAACTATC 3.0 ACAC--AGC 5.3 AGA---ATC 4.9 ACCG--ATC 4.6 TGCT--AGG (exceptional) -0.97
Markov chain Sequence: Example of a Markov chain Probabilistic model of a DNA sequence Transition probabilities A CG T
Markov property Probability of a sequence through Bayes’ rule Markov property “The future is only function of the present and not of the past”
Beginning and end of a sequence Computation of the probability is not homogeneous Length distribution is not modeled P(length=L) unspecified Solution Modeling of beginning and end of the sequence The probability to observe a sequence of a given length decreases with the length of the sequence A CG T
Hidden Markov Models A.8 C 0 G 0 T.2 A 0 C.8 G.2 T 0 A.8 C.2 G 0 T 0 A 1 C 0 G 0 T 0 A 0 C 0 G.2 T.8 A 0 C.8 G.2 T 0 A.2 C.4 G.2 T Sequence score Transition probabilities Emission probabilities
Hidden Markov Model In a hidden Markov model, we observe the symbol sequence x but we want to reconstruct the hidden state sequence (path ) Transition probabilities ( : a 0l, : a k0 ) Emission probabilities Joint probability of the sequence ,x 1,...,x L, and the path
Casino (I) – problem setup The casino uses mostly a fair die but switches sometimes to a loaded die We observe the outcome x of the successive throws but want to know when the die was fair or loaded (path ) 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/ Fair Loaded
Estimation of the sequence and state probabilities
The Viterbi algorithm We look for the most probable path * This problem can be tackled by dynamic programming Let us define v k (i) as the probability of the most probable path that ends in state k for the emission of symbol x i Then we can compute this probability recursively as
The Viterbi algorithm The Viterbi algorithm grows the best path dynamically Initial condition: sequence in beginning state Traceback pointers tot follow the best path (= decoding)
Casino (II) - Viterbi
The forward algorithm The forward algorithm let us compute the probability P(x) of a sequence w.r.t. an HMM This is important for the computation of posterior probabilities and the comparison of HMMs The sum over all paths (exponentially many) can be computed by dynamic programming Les us define f k (i) as the probability of the sequence for the paths that end in state k with the emission of symbol x i Then we can compute this probability as
The forward algorithm The forward algorithm grows the total probability dynamically from the beginning to the end of the sequence Initial condition: sequence in beginning state End: all states converge to the end state
The backward algorithm The backward algorithm let us compute the probability of the complete sequence together with the condition that symbol x i is emitted from state k This is important to compute the probability of a given state at symbol x i P(x 1,...,x i, i =k) can be computed by the forward algorithm f k (i) Let us define b k (i) as the probability that the rest of the sequence for the paths that pass through state k at symbol x i
The backward algorithm The backward algorithm grows the probability b k (i) dynamically backwards (from end to beginning) Border condition: start in end state Once both forward and backward probabilities are available, we can compute the posterior probability of the state
Posterior decoding Instead of using the most probable path for decoding (Viterbi), we can use the path of the most probable states The path ^ can be “illegal” ( P( ^ |x)=0 ) This approach can also be used when we are interested in a function g(k) of the state (e.g., labeling)
Casino (III) – posterior decodering Posterior probability of the state “fair” w.r.t. the die throws
Casino (IV) – posterior decodering New situation : P(x i+1 = FAIR | x i = FAIR) = 0.99 Viterbi decoding cannot detect the cheating from 1000 throws, while posterior decoding does 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/ Fair Loaded
Parameter estimation for HMMs
Choice of the architecture For the parameter estimation, we assume that the architecture of the HMM is known Choice of architecture is an essential design choice Duration modeling “Silent states” for gaps
Parameter estimation with known paths HMM with parameters (transition and emission probabilities) Training set D of N sequences x 1,...,x N Score of the model is the likelihood of the parameters given the training data
Parameter estimation with known paths If the state paths are known, the parameters are estimated through counts (how often is a transition used, how often is a symbol produced by a given state) Use of ‘pseudocounts’ if necessary A kl = number of transitions from k to l in training set + pseudocount r kl E k (b) = number of emissions of b from k in training set + pseudocount r k (b)
Parameter estimation with unknown paths: Viterbi training Strategy: iterative method Suppose that the parameters are known and find the best path Use Viterbi decoding to estimate the parameters Iterate till convergence Viterbi training does not maximize the likelihood of the parameters Viterbi training converges exactly in a finite number of steps
Parameter estimation with unknown paths: Baum-Welch training Strategy: parallel to Viterbi but we use the expected value for the transition and emission counts (instead of using only the best path) For the transitions For the emissions
Parameter estimation with unknown paths: Baum-Welch training Initialization: Choose arbitrary model parameters Recursion: Set all transitions and emission variables to their pseudocount For all sequences j = 1,...,n Compute f k (i) for sequence j with the forward algorithm Compute b k (i) for sequence j with the backward algorithm Add the contributions to A and E Compute the new model parameters a kl =A kl / kl’ and e k (b) Compute the log-likelihood of the model End: stop when the log-likelihood does not change more than by some threshold or when the maximum number of iterations is exceeded
Casino (V) – Baum-Welch training 1: : : : : : : : : : : : Fair Loaded 1: : : : : : : : : : : : Fair Loaded 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/ Fair Loaded Original model 300 throws throws
Numerical stability Many expressions contain products of many probabilities This causes underflow when we compute these expressions For Viterbi, this can be solved by working with the logarithms For the forward and backward algorithms, we can work with an approximation to the logarithm or by working with rescaled variables
Summary Hidden Markov Models Computation of sequence and state probabilities Viterbi computation of the best state path The forward algorithm for the computation of the probability of a sequence The backward algorithm for the computation of state probabilities Parameter estimation for HMMs Parameter estimation with known paths Parameter estimation with unknown paths Viterbi training Baum-Welch training