Hidden Markov Models 1 2 K … x1 x2 x3 xK.

Slides:

Advertisements

Similar presentations

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

Advertisements

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:

Learning HMM parameters

Hidden Markov Model.

Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.

Hidden Markov Models Eine Einführung.

Hidden Markov Models.

 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.

Hidden Markov Models Modified from:

Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.

Statistical NLP: Lecture 11

Hidden Markov Models Theory By Johan Walters (SR 2003)

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.

. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Lecture 6, Thursday April 17, 2003

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.

Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.

Hidden Markov Models Lecture 6, Thursday April 17, 2003.

Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.

. Sequence Alignment via HMM Background Readings: chapters 3.4, 3.5, 4, in the Durbin et al.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.

CpG islands in DNA sequences

Hidden Markov Models Lecture 5, Tuesday April 15, 2003.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Hidden Markov Models—Variants Conditional Random Fields 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter

Hidden Markov Models.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.

CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.

Variants of HMMs. Higher-order HMMs How do we model “memory” larger than one time point? P(  i+1 = l |  i = k)a kl P(  i+1 = l |  i = k,  i -1 =

1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:

Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.

Hidden Markov Models for Sequence Analysis 4

. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.

What if a new genome comes? We just sequenced the porcupine genome We know CpG islands play the same role in this genome However, we have no known CpG.

Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.

Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.

Hidden Markov Models Usman Roshan CS 675 Machine Learning.

CS5263 Bioinformatics Lecture 12: Hidden Markov Models and applications.

Hidden Markov Models CBB 231 / COMPSCI 261 part 2.

S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.

Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)

Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.

Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.

Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.

Hidden Markov Model ..

Variants of HMMs.

Presentation transcript:

Hidden Markov Models 1 2 K … x1 x2 x3 xK

Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: Start at state 1 according to prob a01 Emit letter x1 according to prob e1(x1) Go to state 2 according to prob a12 … until emitting xn 1 2 K … 1 2 K … 1 1 2 K … 1 2 K … … a02 2 2 K e2(x1) x1 x2 x3 xn

Evaluation We will develop algorithms that allow us to compute: P(x) Probability of x given the model P(xi…xj) Probability of a substring of x given the model P(i = k | x) “Posterior” probability that the ith state is k, given x A more refined measure of which states x may be in

The Forward Algorithm fk(i) = P(x1…xi, i = k) (the forward probability) Initialization: f0(0) = 1 fk(0) = 0, for all k > 0 Iteration: fk(i) = ek(xi) l fl(i – 1) alk Termination: P(x) = k fk(N)

Motivation for the Backward Algorithm We want to compute P(i = k | x), the probability distribution on the ith position, given x We start by computing P(i = k, x) = P(x1…xi, i = k, xi+1…xN) = P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k) = P(x1…xi, i = k) P(xi+1…xN | i = k) Then, P(i = k | x) = P(i = k, x) / P(x) Forward, fk(i) Backward, bk(i)

The Backward Algorithm – derivation Define the backward probability: bk(i) = P(xi+1…xN | i = k) “starting from ith state = k, generate rest of x” = i+1…N P(xi+1,xi+2, …, xN, i+1, …, N | i = k) = l i+1…N P(xi+1,xi+2, …, xN, i+1 = l, i+2, …, N | i = k) = l el(xi+1) akl i+1…N P(xi+2, …, xN, i+2, …, N | i+1 = l) = l el(xi+1) akl bl(i+1)

The Backward Algorithm We can compute bk(i) for all k, i, using dynamic programming Initialization: bk(N) = 1, for all k Iteration: bk(i) = l el(xi+1) akl bl(i+1) Termination: P(x) = l a0l el(x1) bl(1)

Computational Complexity What is the running time, and space required, for Forward, and Backward? Time: O(K2N) Space: O(KN) Useful implementation technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each few positions by multiplying by a constant

Posterior Decoding We can now calculate fk(i) bk(i) P(i = k | x) = P(i = k , x)/P(x) = P(x1, …, xi, i = k, xi+1, … xn) / P(x) = P(x1, …, xi, i = k) P(xi+1, … xn | i = k) / P(x) = fk(i) bk(i) / P(x) We can now calculate fk(i) bk(i) P(i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at position i of sequence x: Define ^ by Posterior Decoding: ^i = argmaxk P(i = k | x)

Posterior Decoding For each state, Posterior Decoding gives us a curve of likelihood of state for each position That is sometimes more informative than Viterbi path * Posterior Decoding may give an invalid sequence of states (of prob 0) Why?

Posterior Decoding =  {:[i] = k} P( | x) x1 x2 x3 …………………………………………… xN State 1 l P(i=l|x) k P(i = k | x) =  P( | x) 1(i = k) =  {:[i] = k} P( | x) 1() = 1, if  is true 0, otherwise

Viterbi, Forward, Backward Initialization: V0(0) = 1 Vk(0) = 0, for all k > 0 Iteration: Vl(i) = el(xi) maxk Vk(i-1) akl Termination: P(x, *) = maxk Vk(N) FORWARD Initialization: f0(0) = 1 fk(0) = 0, for all k > 0 Iteration: fl(i) = el(xi) k fk(i-1) akl Termination: P(x) = k fk(N) BACKWARD Initialization: bk(N) = 1, for all k Iteration: bl(i) = k el(xi+1) akl bk(i+1) Termination: P(x) = k a0k ek(x1) bk(1)

Variants of HMMs

Higher-order HMMs How do we model “memory” larger than one time point? P(i+1 = l | i = k) akl P(i+1 = l | i = k, i -1 = j) ajkl … A second order HMM with K states is equivalent to a first order HMM with K2 states aHHT state HH state HT aHT(prev = H) aHT(prev = T) aHTH state H state T aTHH aHTT aTHT state TH state TT aTH(prev = H) aTH(prev = T) aTTH

Similar Algorithms to 1st Order P(i+1 = l | i = k, i -1 = j) Vlk(i) = maxj{ Vkj(i – 1) + … } Time? Space?

Modeling the Duration of States 1-p Length distribution of region X: E[lX] = 1/(1-p) Geometric distribution, with mean 1/(1-p) This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions X Y p q 1-q

Example: exon lengths in genes Why the algorithm has to be generalized is because in a standard HMM the output in each step would be one base, leading to state durations of geometric length. If we look at empirical data, such as these plots, we see that the intron lengths seem to follow the geometric distribution fairly well, but for the exon that would be a pretty bad model. So in our state space the exons are generalized states, choosing a length from a general distribution and outputting the entire exon in one step, while the intron and intergene states still output one base at a time and follow the geometric distribution.

Solution 1: Chain several states p 1-p X X X Y q 1-q Disadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)

Solution 2: Negative binomial distribution p p p 1 – p 1 – p 1 – p X(1) X(2) X(n) Y …… Duration in X: m turns, where During first m – 1 turns, exactly n – 1 arrows to next state are followed During mth turn, an arrow to next state is followed m – 1 m – 1 P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-n

Example: genes in prokaryotes EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A Negative binomial with n = 3

Solution 3: Duration modeling Upon entering a state: Choose duration d, according to probability distribution Generate d letters according to emission probs Take a transition to next state according to transition probs Disadvantage: Increase in complexity of Viterbi: Time: O(D) Space: O(1) where D = maximum duration of state F d<Df xi…xi+d-1 Pf Warning, Rabiner’s tutorial claims O(D2) & O(D) increases

Viterbi with duration modeling emissions emissions F L d<Df d<Dl Pf Pl transitions xi…xi + d – 1 xj…xj + d – 1 Precompute cumulative values Recall original iteration: Vl(i) = maxk Vk(i – 1) akl  el(xi) New iteration: Vl(i) = maxk maxd=1…Dl Vk(i – d)  Pl(d)  akl  j=i-d+1…iel(xj)

Proteins, Pair HMMs, and Alignment

A state model for alignment (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J I (+1, 0) J (0, +1) -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

Let’s score the transitions s(xi, yj) M (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J s(xi, yj) s(xi, yj) -d -d I (+1, 0) J (0, +1) -e -e -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

Alignment with affine gaps – state version Dynamic Programming: M(i, j): Optimal alignment of x1…xi to y1…yj ending in M I(i, j): Optimal alignment of x1…xi to y1…yj ending in I J(i, j): Optimal alignment of x1…xi to y1…yj ending in J The score is additive, therefore we can apply DP recurrence formulas

Alignment with affine gaps – state version Initialization: M(0,0) = 0; M(i, 0) = M(0, j) = -, for i, j > 0 I(i,0) = d + ie; J(0, j) = d + je Iteration: M(i – 1, j – 1) M(i, j) = s(xi, yj) + max I(i – 1, j – 1) J(i – 1, j – 1) e + I(i – 1, j) I(i, j) = max d + M(i – 1, j) e + J(i, j – 1) J(i, j) = max d + M(i, j – 1) Termination: Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }