Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.

Slides:



Advertisements
Similar presentations
Hidden Markov Model in Biological Sequence Analysis – Part 2
Advertisements

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Hidden Markov Model.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
… Hidden Markov Models Markov assumption: Transition model:
Lecture 6, Thursday April 17, 2003
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models Lecture 6, Thursday April 17, 2003.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
CpG islands in DNA sequences
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
Hidden Markov model BioE 480 Sept 16, In general, we have Bayes theorem: P(X|Y) = P(Y|X)P(X)/P(Y) Event X: the die is loaded, Event Y: 3 sixes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models – Concepts 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.
Hidden Markov Models BMI/CS 576
Hidden Markov Models - Training
Three classic HMM problems
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
CSE 5290: Algorithms for Bioinformatics Fall 2009
CISC 667 Intro to Bioinformatics (Fall 2005) Hidden Markov Models (I)
Presentation transcript:

Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven

Regular expressions Alignment Regular expression Problem: regular expression does not distinguish Exceptional TGCTAGG Consensus ACACATC ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC [AT][CG][AC][ACGT]*A[TG][GC]

Hidden Markov Models A.8 C 0 G 0 T.2 A 0 C.8 G.2 T 0 A.8 C.2 G 0 T 0 A 1 C 0 G 0 T 0 A 0 C 0 G.2 T.8 A 0 C.8 G.2 T 0 A.2 C.4 G.2 T Sequence score Transition probabilities Emission probabilities

Log odds Use logarithm for scaling and normalize by random model Log odds for sequence S : A: 1.16 T:-0.22 C: 1.16 G:-0.22 A: 1.16 C:-0.22 A: 1.39 G:-0.22 T: 1.16 C: 1.16 G:-0.22 A:-0.22 C: 0.47 G:-0.22 T:

Log odds SequenceLog odds ACAC--ATC (consensus) 6.7 ACA---ATG 4.9 TCAACTATC 3.0 ACAC--AGC 5.3 AGA---ATC 4.9 ACCG--ATC 4.6 TGCT--AGG (exceptional) -0.97

Markov chain Sequence: Example of a Markov chain Probabilistic model of a DNA sequence Transition probabilities A CG T

Markov property Probability of a sequence through Bayes’ rule Markov property “The future is only function of the present and not of the past”

Beginning and end of a sequence Computation of the probability is not homogeneous Length distribution is not modeled P(length=L) unspecified Solution Modeling of beginning and end of the sequence The probability to observe a sequence of a given length decreases with the length of the sequence A CG T  

Hidden Markov Models A.8 C 0 G 0 T.2 A 0 C.8 G.2 T 0 A.8 C.2 G 0 T 0 A 1 C 0 G 0 T 0 A 0 C 0 G.2 T.8 A 0 C.8 G.2 T 0 A.2 C.4 G.2 T Sequence score Transition probabilities Emission probabilities

Hidden Markov Model In a hidden Markov model, we observe the symbol sequence x but we want to reconstruct the hidden state sequence (path  ) Transition probabilities (  : a 0l,  : a k0 ) Emission probabilities Joint probability of the sequence ,x 1,...,x L,  and the path

Casino (I) – problem setup The casino uses mostly a fair die but switches sometimes to a loaded die We observe the outcome x of the successive throws but want to know when the die was fair or loaded (path  ) 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/ Fair Loaded

Estimation of the sequence and state probabilities

The Viterbi algorithm We look for the most probable path  * This problem can be tackled by dynamic programming Let us define v k (i) as the probability of the most probable path that ends in state k for the emission of symbol x i Then we can compute this probability recursively as

The Viterbi algorithm The Viterbi algorithm grows the best path dynamically Initial condition: sequence in beginning state Traceback pointers tot follow the best path (= decoding)

Casino (II) - Viterbi

The forward algorithm The forward algorithm let us compute the probability P(x) of a sequence w.r.t. an HMM This is important for the computation of posterior probabilities and the comparison of HMMs The sum over all paths (exponentially many) can be computed by dynamic programming Les us define f k (i) as the probability of the sequence for the paths that end in state k with the emission of symbol x i Then we can compute this probability as

The forward algorithm The forward algorithm grows the total probability dynamically from the beginning to the end of the sequence Initial condition: sequence in beginning state End: all states converge to the end state

The backward algorithm The backward algorithm let us compute the probability of the complete sequence together with the condition that symbol x i is emitted from state k This is important to compute the probability of a given state at symbol x i P(x 1,...,x i,  i =k) can be computed by the forward algorithm f k (i) Let us define b k (i) as the probability that the rest of the sequence for the paths that pass through state k at symbol x i

The backward algorithm The backward algorithm grows the probability b k (i) dynamically backwards (from end to beginning) Border condition: start in end state Once both forward and backward probabilities are available, we can compute the posterior probability of the state

Posterior decoding Instead of using the most probable path for decoding (Viterbi), we can use the path of the most probable states The path  ^ can be “illegal” ( P(  ^ |x)=0 ) This approach can also be used when we are interested in a function g(k) of the state (e.g., labeling)

Casino (III) – posterior decodering Posterior probability of the state “fair” w.r.t. the die throws

Casino (IV) – posterior decodering New situation : P(x i+1 = FAIR | x i = FAIR) = 0.99 Viterbi decoding cannot detect the cheating from 1000 throws, while posterior decoding does 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/ Fair Loaded

Parameter estimation for HMMs

Choice of the architecture For the parameter estimation, we assume that the architecture of the HMM is known Choice of architecture is an essential design choice Duration modeling “Silent states” for gaps

Parameter estimation with known paths HMM with parameters  (transition and emission probabilities) Training set D of N sequences x 1,...,x N Score of the model is the likelihood of the parameters given the training data

Parameter estimation with known paths If the state paths are known, the parameters are estimated through counts (how often is a transition used, how often is a symbol produced by a given state) Use of ‘pseudocounts’ if necessary A kl = number of transitions from k to l in training set + pseudocount r kl E k (b) = number of emissions of b from k in training set + pseudocount r k (b)

Parameter estimation with unknown paths: Viterbi training Strategy: iterative method Suppose that the parameters are known and find the best path Use Viterbi decoding to estimate the parameters Iterate till convergence Viterbi training does not maximize the likelihood of the parameters Viterbi training converges exactly in a finite number of steps

Parameter estimation with unknown paths: Baum-Welch training Strategy: parallel to Viterbi but we use the expected value for the transition and emission counts (instead of using only the best path) For the transitions For the emissions

Parameter estimation with unknown paths: Baum-Welch training Initialization: Choose arbitrary model parameters Recursion: Set all transitions and emission variables to their pseudocount For all sequences j = 1,...,n Compute f k (i) for sequence j with the forward algorithm Compute b k (i) for sequence j with the backward algorithm Add the contributions to A and E Compute the new model parameters a kl =A kl /  kl’ and e k (b) Compute the log-likelihood of the model End: stop when the log-likelihood does not change more than by some threshold or when the maximum number of iterations is exceeded

Casino (V) – Baum-Welch training 1: : : : : : : : : : : : Fair Loaded 1: : : : : : : : : : : : Fair Loaded 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/ Fair Loaded Original model 300 throws throws

Numerical stability Many expressions contain products of many probabilities This causes underflow when we compute these expressions For Viterbi, this can be solved by working with the logarithms For the forward and backward algorithms, we can work with an approximation to the logarithm or by working with rescaled variables

Summary Hidden Markov Models Computation of sequence and state probabilities Viterbi computation of the best state path The forward algorithm for the computation of the probability of a sequence The backward algorithm for the computation of state probabilities Parameter estimation for HMMs Parameter estimation with known paths Parameter estimation with unknown paths Viterbi training Baum-Welch training