Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified by Benny Chor, using also some slides of Nir Friedman (Hebrew Univ.), for the Computational Genomics Course, Tel-Aviv Univ., Dec. 2002

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Outline Discrete Markov Models Hidden Markov Models Three major questions: Q1. Computing the probability of a given observation. A1. Forward – Backward (Baum Welch) DP algorithm. Q2. Computing the most probable sequence, given an observation. A2. Viterbi DP Algorithm Q3. Given an observation, learn best model. A3. Expectation Maximization (EM): A Heuristic.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Markov Models A discrete (finite) system: N distinct states. Begins (at time t=1) in some initial state. At each time step (t=1,2,…) the system moves from current to next state (possibly the same as the current state) according to transition probabilities associated with current state. This kind of system is called a Discrete Markov Model

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Discrete Markov Model Example: Discrete Markov Model with 5 states Each of the a ij represents the probability of moving from state i to state j The a ij are given in a matrix A = {a ij } The probability to start in a given state i is  i, The vector  represents these  start  probabilities.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Types of Models Ergodic model Strongly connected - directed path w/ positive probabilities from each state i to state j (but not necessarily complete directed graph)

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Types of Models (cont.) Left-to-Right (LR) model Index of state non-decreasing with time

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Discrete Markov Model - Example States – Rainy:1, Cloudy:2, Sunny:3 Matrix A – Problem – given that the weather on day 1 (t=1) is sunny(3), what is the probability for the observation O:

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Discrete Markov Model – Example (cont.) The answer is -

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Hidden Markov Models (probabilistic finite state automata) Often we face scenarios where states cannot be directly observed. We need an extension: Hidden Markov Models a 11 a 22 a 33 a 44 a 12 a 23 a 34 b 11 b 14 b 12 b 13 1 2 3 4 Observed phenomenon a ij are state transition probabilities. b ik are observation (output) probabilities. b 11 + b 12 + b 13 + b 14 = 1, b 21 + b 22 + b 23 + b 24 = 1, etc.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Example: Dishonest Casino Actually, what is hidden in this model?

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Biological Example: CpG islands In human genome, CpG dinucleotides are relatively rare CpG pairs undergo a process called methylation that modifies the C nucleotide A methylated C can (with relatively high probability) mutate to a T Promoter regions are CpG rich These regions are not methylated, and thus mutate less often These are called CpG islands

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics CpG Islands We construct two Markov chains: One for CpG rich, one for CpG poor regions. Using observations from 60K nucleotide, we get two models, + and -.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMMs – Question I Given an observation sequence O = (O 1 O 2 O 3 … O T ), and a model M = {A, B,   }  how do we efficiently compute P(O|M), the probability that the given model M produces the observation O in a run of length T ? This probability can be viewed as a measure of the quality of the model M. Viewed this way, it enables discrimination/selection among alternative models.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Question II (Harder) Given an observation sequence, O = (O 1 O 2 O 3 … O T ), and a model, M = {A, B,  }  how do we efficiently compute the most probable sequence(s) of states, Q? That is, the sequence of states Q = (Q 1 Q 2 Q 3 … Q T ), which maximizes P(O|Q,M), the probability that the given model M produces the given observation O when it goes through the specific sequence of states Q. Recall that given a model M, a sequence of observations O, and a sequence of states Q, we can efficiently compute P(O|Q,M) (should watch out for numeric underflows)

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Question III (Hardest) Given an observation sequence O = (O 1 O 2 O 3 … O T ), and a class of models, each of the form M = {A, B,  }, which specific model “best” explains the observations? A solution to question I enables the efficient computation of P(O|M) (the probability that a specific model M produces the observation O). Question III can be viewed as a learning problem: We want to use the sequence of observations in order to “train” an HMM and learn the optimal underlying model parameters (transition and output probabilities).

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM Recognition (question I) For a given model M = { A, B,  } and a given state sequence Q 1 Q 2 Q 3 … Q T,, the probability of an observation sequence O 1 O 2 O 3 … O T is P(O|Q,M) = b Q 1 O 1 b Q 2 O 2 b Q 3 O 3 … b Q T O T For a given hidden Markov model M = { A, B,  } the probability of the state sequence Q 1 Q 2 Q 3 … Q T is ( the initial probability of Q 1 is taken to be  Q 1 ) P(Q|M) =  Q 1 a Q 1 Q 2 a Q 2 Q 3 a Q 3 Q 4 … a Q T-1 Q T So, for a given hidden Markov model, M the probability of an observation sequence O 1 O 2 O 3 … O T is obtained by summing over all possible state sequences

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Recognition (cont.) P(O| M) =  P(O|Q) P(Q|M) =  Q  Q 1 b Q 1 O 1 a Q 1 Q 2 b Q 2 O 2 a Q 2 Q 3 b Q 2 O 2 … Requires summing over exponentially many paths But can be made more efficient

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Recognition (cont.) Why isn’t it efficient? – O(2TQ ) For a given state sequence of length T we have about 2T calculations P(Q|M) =  Q 1 a Q 1 Q 2 a Q 2 Q 3 a Q 3 Q 4 … a Q T-1 Q T P(O|Q) = b Q 1 O 1 b Q 2 O 2 b Q 3 O 3 … b Q T O T There are Q possible state sequence So, if Q=5, and T=100, then the algorithm requires 2 100 5 1.6 10 computations We can use the forward-backward (F-B) algorithm T xx 100 ~ ~ x 72 T

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm Some definitions 1. Legal final state – a state at which a path through the model may end. 2.  - a “forward-going” 3.  – a “backward-going” 4. a(j|i) = a ij ; b(O|i) = b iO 5. O = the observation O 1 O 2 …O t in times 1,2,…,t ( O 1 on t=1, O 2 on t=2, etc.) 1 t

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.)  can be recursively calculated Stopping condition Moving from state i to state j But we can enter state j from all others states

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.) Now we can work sequentially And on time t=T we get what we wanted -

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.) The full algorithm – Run Demo

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.) The likelihood is measured using any sequence of states of length T This is known as the “Any Path” Method We can choose an HMM by the probability generated using the best possible sequence of states We’ll refer to this method as the “Best Path” Method

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Most Probable States Sequence (ques. II) Idea: If we know the value of Q i, then the most probable sequence on i+1, …,n does not depend on observations before time i Let V l (i) be the probability of the best sequence Q 1, …, Q i such that Q i = l

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm A DP problem Grid X – frame index, t (time) Q – State index, i Constraints Every path must advance in time by one, and only one, time step for each path segment Final grid points on any path must be of the form (T, i f ), where i f is a legal final state in a model

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm (cont.) Cost Node (t,i) – the probability to emit the observation y(t) on state i = b iy Transition from (t-1,i) to (t,j) – the probability to change state from i to j = a ij The total cost associated with the path is given by the product of the costs (type B) Initial Transition cost: a 0i =  i Goal The best path will be the one of maximum cost

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm (cont.) We can use the trick of taking negative logarithms Multiplications of probabilities are expansive and numerically problematic Sums of numerically stable numbers are simpler The problem is turned into a minimal-cost path search

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm (cont.) Run Demo

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – EM Training Using the Baum-Welch algorithm Is an EM algorithm Estimate – approximate the result Maximize – and if needed, re-estimate The estimation algorithm is based on DP algorithms (F-B & Viterbi)

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – EM Training (cont.) Initializing Begin with an arbitrary model M Estimate Evaluate the likelihood P(O|M) Along the way, keep track of some tallies Recalculate the matrixes A and B e.g, a ij = Maximize If P(O|M) – P(O|M) ≥ , re-estimate with M=M Use several initial models to find a favorable local maximum of P(O|M) number of transitions from i to j number of transitions exiting state i

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Training (cont.) Why a local maximum?

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Auxiliary PhysiologyModel

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Auxiliary cont. Articulation

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Auxiliary cont. Spectrogram Patterson - Barney Diagram Mapping by the formants

Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Similar presentations

Presentation on theme: "Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Similar presentations

Presentation on theme: "Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified."— Presentation transcript:

Similar presentations

About project

Feedback