CSE182-L10 HMM applications
Probability of being in specific states What is the probability that we were in state k at step I? Pr[All paths that passed through state k at step I, and emitted x] Pr[All paths that emitted x]
The Forward Algorithm x1…xi Recall v[i,j] : Probability of the most likely path the automaton chose in emitting x1…xi, and ending up in state j. Define f[i,j]: Probability that the automaton started from state 1, and emitted x1…xi What is the difference? x1…xi
Most Likely path versus Probability of Arrival There are multiple paths from states 1..j in which the automaton can output x1…xi In computing the viterbi path, we choose the most likely path V[i,j] = maxπ Pr[x1…xi|π] The probability of emitting x1…xi and ending up in state j is given by F[i,j] = ∑π Pr[x1…xi|π]
The Forward Algorithm Recall that Instead 1 j v(i,j) = max lQ {v(i-1,l).A[l,j] }.ej(xi) Instead F(i,j) = ∑lQ (F(i-1,l).A[l,j] ).ej(xi) 1 j
The Backward Algorithm Define b[i,j]: Probability that the automaton started from state i, emitted xi+1…xn and ended up in the final state xi+1…xn x1…xi 1 m i
Forward Backward Scoring F(i,j) = ∑lQ (F(i-1,l).A[l,j] ).ej(xi) B[i,j] = ∑lQ (A[j,l].el(xi+1) B(i+1,l)) Pr[x,πi=k]=F(i,k) B(i,k)
Application of HMMs How do we modify this to handle indels? 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0 A C G T 1 2 3 4 5 6 7 8
Applications of the HMM paradigm Modifying Profile HMMs to handle indels States Ii: insertion states States Di: deletion states 1 2 3 4 5 6 7 8 A C G T 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0
Profile HMMs An assignment of states implies insertion, match, or deletion. EX: ACACTGTA 1 2 3 4 5 6 7 8 A C G T 0.9 0.4 0.3 0.6 0.1 0.0 0.2 1.0 0.0 0.2 0.7 0.0 0.3 0.0 0.0 0.0 0.1 0.2 0.0 0.0 0.3 1.0 0.3 0.0 0.0 0.2 0.0 0.4 0.3 0.0 0.5 0.0 C A A C T G T A
Viterbi Algorithm revisited Define vMj (i) as the log likelihood score of the best path for matching x1..xi to profile HMM ending with xi emitted by the state Mj. vIj(i) and vDj(i) are defined similarly.
Viterbi Equations for Profile HMMs vMj-1(i-1) + log(A[Mj-1, Mj]) vMj(i) = log (eMj(xi)) + max vIj-1(i-1) + log(A[Ij-1, Mj]) vDj-1(i-1) + log(A[Dj-1, Mj]) vMj(i-1) + log(A[Mj-1, Ij]) vIj(i) = log (eIj(xi)) + max vIj(i-1) + log(A[Ij-1, Ij]) vDj(i-1) + log(A[Dj-1, Ij])
Compositional Signals CpG islands. In genomic sequence, the CG di-nucleotide is rarely seen CG helps methylation of C, and subsequent mutation to T. In regions around a gene, the methylation is suppressed, and therefore CG is more common. CpG islands: Islands of CG on the genome. How can you detect CpG islands?
An HMM for Genomic regions Node A emits A with Prob. 1, and 0 for all other bases. The start and end node do not emit any symbol. All outgoing edges from nodes are equi-probable, except for the ones coming out of C. A G .25 0.1 end start C 0.4 T .25
An HMM for CpG islands Node A emits A with Prob. 1, and 0 for all other bases. The start and end node do not emit any symbol. All outgoing edges from nodes are equi-probable, except for the ones coming out of C. A G 0.25 0.25 end start C 0.25 T
HMM for detecting CpG Islands B A G A 0.1 end G start end C start 0.4 T C T In the best parse of a genomic sequence, each base is assigned a state from the sets A, and B. Any substring with multiple states coming from B can be described as a CpG island.
HMM: Summary HMMs are a natural technique for modeling many biological domains. They can capture position dependent, and also compositional properties. HMMs have been very useful in an important Bioinformatics application: gene finding.