Hidden Markov models Sushmita Roy BMI/CS Oct 16 th, 2014
Key concepts What are Hidden Markov models (HMMs)? – States, Emission characters, Parameters Three important questions in HMMs and algorithms to solve these questions – Probability of a sequence observations: Forward algorithm – Most likely path (sequence of states): Viterbi – Parameter estimation: Baum-Welch/Forward- backward algorithm
Revisiting the CpG question Given a sequence x 1..x T we can use two Markov chains to decide if x 1..x T is a CpG island or not. What do we do if we were asked to “find” the CpG islands in the genome? We have to search for these “islands” in a “sea” of non-islands
A simple HMM for identifying CpG islands A+ T+C+ G+ A- T-C- G- CpG Island Background Note, there is no more one-to-one correspondence between states and observed symbols
An HMM for an occasionally dishonest casino /6 21/6 31/6 41/6 51/6 61/ /10 21/10 31/10 41/10 51/10 61/2 FairLoaded What is hidden? What is observed? Which dice is rolled Number (1-6) on the die
What does an HMM do? Enables us to model observed sequences of characters generated by a hidden dynamic system The system can exist in a fixed number of “hidden” states The system probabilistically transitions between states and at each state it emits a symbol/character
Formally defining a HMM States Emission alphabet Parameters – State transition probabilities for probabilistic transitions from state at time t to state at time t+1 – Emission probabilities for probabilistically emitting symbols from a state
Notation States with emissions will be numbered from 1 to K – 0 begin state, N end state observed character at position t Observed sequence Hidden state sequence or path Transition probabilities Emission probabilities: Probability of emitting symbol b from state k
An example HMM 0.8 probability of emitting character A in state 2 probability of a transition from state 1 to state A 0.4 C 0.1 G 0.2 T 0.3 A 0.1 C 0.4 G 0.4 T 0.1 A 0.2 C 0.3 G 0.3 T 0.2 beginend A 0.4 C 0.1 G 0.1 T 0.4
Path notation A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 beginend A 0.4 C 0.1 G 0.2 T 0.3 A 0.2 C 0.3 G 0.3 T 0.2
Three important questions in HMMs How likely is an HMM to have generated a given sequence? – Forward algorithm What is the most likely “path” for generating a sequence of observations – Viterbi algorithm How can we learn an HMM from a set of sequences? – Forward-backward or Baum-Welch (an EM algorithm)
How likely is a given sequence from an HMM? Initial transition Emitting symbol x t State transition between consecutive time points
How likely is a given sequence from an HMM? But we don’t know what the path is So we need to sum over all paths The probability over all paths is:
Example Consider an candidate CpG island Considering our HMM for CpG island model, some possible paths that are consistent with this CpG island are CGCGC C+G+C+G+C+C+G+C+G+C+ C-G-C-G-C-C-G-C-G-C- C-G+C-G+C-C-G+C-G+C-
Number of paths A 0.4 C 0.1 G 0.1 T 0.4 A 0.4 C 0.1 G 0.2 T 0.3 beginend for a sequence of length T, how many possible paths through this HMM are there? 2T2T the Forward algorithm enables us to compute the probability of a sequence by efficiently summing over all possible paths
How likely is a given sequence: Forward algorithm Define as the probability of observing and ending in state k at time t This can be written recursively as follows
Steps of the Forward algorithm Initialization: denote 0 for the “begin” state Recursion: for t=1 to T Termination
Forward algorithm example A 0.4 C 0.1 G 0.2 T 0.3 A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 A 0.2 C 0.3 G 0.3 T 0.2 beginend What is the probability of sequence TAGA ?
In class exercise
Table for TAGA TAGA t=1t=2t=3t= States 5 This entry is also P(x). Does not require f 1 (4) and f 2 (4)
Three important questions in HMMs How likely is an HMM to have generated a given sequence? – Forward algorithm What is the most likely “path” for generating a sequence of observations – Viterbi algorithm How can we learn an HMM from a set of sequences? – Forward-backward or Baum-Welch (an EM algorithm)
Viterbi algorithm Viterbi algorithm gives an efficient way to find the most likely/probable state Consider the dishonest casino example – Given a sequence of dice rolls can you infer when the casino was using the loaded versus fair dice? – Viterbi algorithm gives this answer Viterbi is very similar to the Forward algorithm – Except instead of summing we maximize
Notation for Viterbi probability of the most likely path for x 1.. x t ending in state k pointer to the state that gave the maximizing transition is the most probability path sequence x 1,..,x T
Steps of the Viterbi algorithm Initialization: Recursion:for t=1 to T Termination: Probability associated with the most likely path
Traceback in Viterbi Traceback for t=T to 1
Viterbi algorithm example A 0.4 C 0.1 G 0.2 T 0.3 A 0.1 C 0.4 G 0.4 T 0.1 A 0.4 C 0.1 G 0.1 T 0.4 A 0.2 C 0.3 G 0.3 T 0.2 beginend What is the most likely path for TAG ?
In class exercise
Viterbi computations for TAG T AG t=1 t=2t= States TAG t=1t=2t=3 v k (t) ptr t (k) Trace back path
Using HMM to detect CpG islands Recall the 8-state HMM for our CpG island Apply the Viterbi algorithm to a DNA sequence on this HMM Contiguous assignments of ‘+’ states will correspond to CpG islands
Summary Hidden Markov models are extensions to Markov chains enabling us to model and segment sequence data HMMs are defined by a set of states and emission characters, transition probabilities and emission probabilities We have examined two questions for HMMs – Computing the probability of a sequence of observed characters given an HMM (Forward algorithm) – Computing the most likely sequence of states (or path) for a sequence of observed characters