Three classic HMM problems

Slides:



Advertisements
Similar presentations
. Lecture #8: - Parameter Estimation for HMM with Hidden States: the Baum Welch Training - Viterbi Training - Extensions of HMM Background Readings: Chapters.
Advertisements

. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
HMM for CpG Islands Parameter Estimation For HMM Maximum Likelihood and the Information Inequality Lecture #7 Background Readings: Chapter 3.3 in the.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Models David Meir Blei November 1, 1999.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models BMI/CS 576
Learning, Uncertainty, and Information: Learning Parameters
Hidden Markov Models.
Hidden Markov Models - Training
Hidden Markov Models Part 2: Algorithms
1.
Hidden Markov Model LR Rabiner
4.0 More about Hidden Markov Models
N-Gram Model Formulas Word sequences Chain rule of probability
- Viterbi training - Baum-Welch algorithm - Maximum Likelihood
Hidden Markov Models (HMMs)
CONTEXT DEPENDENT CLASSIFICATION
Handwritten Characters Recognition Based on an HMM Model
Algorithms of POS Tagging
LECTURE 15: REESTIMATION, EM AND MIXTURES
CSE 5290: Algorithms for Bioinformatics Fall 2009
Hidden Markov Models Teaching Demo The University of Arizona
Introduction to HMM (cont)
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

Three classic HMM problems Decoding: given a model and an output sequence, what is the most likely state sequence through the model that generated the output? A solution to this problem gives us a way to match up an observed sequence and the states in the model. In gene finding, the states correspond to sequence features such as start codons, stop codons, and splice sites

Three classic HMM problems Learning: given a model and a set of observed sequences, how do we set the model’s parameters so that it has a high probability of generating those sequences? This is perhaps the most important, and most difficult problem. A solution to this problem allows us to determine all the probabilities in an HMMs by using an ensemble of training data

Viterbi algorithm Where Vi(t) is the probability that the HMM is in state i after generating the sequence y1,y2,…,yt, following the most probable path in the HMM

Our sample HMM Let S1 be initial state, S2 be final state

A trellis for the Viterbi Algorithm 1.0 0.0 S1 S2 Time t=0 t=2 t=3 t=1 Output: A C (0.6)(0.8)(1.0) 0.48 max State (0.1)(0.1)(0) (0.4)(0.5)(1.0) max 0.20 (0.9)(0.3)(0)

A trellis for the Viterbi Algorithm 1.0 0.0 S1 S2 Time t=0 t=2 t=3 t=1 Output: A C (0.6)(0.8)(1.0) (0.6)(0.2)(0.48) 0.48 max(.0576,.018) = .0576 .0576 max max State (0.1)(0.1)(0) (0.1)(0.9)(0.2) (0.4)(0.5)(1.0) (0.4)(0.5)(0.48) max max 0.20 max(.126,.096) = .126 .126 (0.9)(0.3)(0) (0.9)(0.7)(0.2)

Learning in HMMs: the E-M algorithm In order to learn the parameters in an “empty” HMM, we need: The topology of the HMM Data - the more the better The learning algorithm is called “Estimate-Maximize” or E-M Also called the Forward-Backward algorithm Also called the Baum-Welch algorithm

An untrained HMM

Some HMM training data CACAACAAAACCCCCCACAA ACAACACACACACACACCAAAC CAACACACAAACCCC CAACCACCACACACACACCCCA CCCAAAACCCCAAAAACCC ACACAAAAAACCCAACACACAACA ACACAACCCCAAAACCACCAAAAA

Step 1: Guess all the probabilities We can start with random probabilities, the learning algorithm will adjust them If we can make good guesses, the results will generally be better

Step 2: the Forward algorithm Reminder: each box in the trellis contains a value i(t) i(t) is the probability that our HMM has generated the sequence y1, y2, …, yt and has ended up in state i.

Reminder: notations sequence of length T: all sequences of length T: Path of length T+1 generates Y: All paths:

Step 3: the Backward algorithm Next we need to compute i(t) using a Backward computation i(t) is the probability that our HMM will generate the rest of the sequence yt+1,yt+2, …, yT beginning in state i

A trellis for the Backward Algorithm Time t=0 t=1 t=2 t=3 S1 (0.6)(0.2)(0.0) 0.2 0.0 + State (0.1)(0.9)(0) (0.4)(0.5)(1.0) + S2 0.63 1.0 (0.9)(0.7)(1.0) A C C Output:

A trellis for the Backward Algorithm (2) Time t=0 t=1 t=2 t=3 S1 .024 + .126 = .15 (0.6)(0.2)(0.2) (0.1)(0.9)(0) (0.9)(0.7)(1.0) + (0.6)(0.2)(0.0) (0.4)(0.5)(1.0) .15 0.2 0.0 + State (0.1)(0.9)(0.2) (0.4)(0.5)(0.63) + S2 .397 + .018 = .415 .415 0.63 1.0 (0.9)(0.7)(0.63) A C C Output:

A trellis for the Backward Algorithm (3) Time t=0 t=1 t=2 t=3 S1 .072 + .083 = .155 (0.6)(0.8)(0.15) (0.1)(0.9)(0.2) (0.9)(0.7)(0.63) + (0.6)(0.2)(0.2) (0.4)(0.5)(0.63) (0.1)(0.9)(0) (0.9)(0.7)(1.0) + (0.6)(0.2)(0.0) (0.4)(0.5)(1.0) .155 .15 0.2 0.0 State (0.1)(0.1)(0.15) (0.4)(0.5)(0.415) S2 .112 + .0015 = .1135 .114 .415 0.63 1.0 (0.9)(0.3)(0.415) A C C Output:

Step 4: Re-estimate the probabilities After running the Forward and Backward algorithms once, we can re-estimate all the probabilities in the HMM SF is the prob. that the HMM generated the entire sequence Nice property of E-M: the value of SF never decreases; it converges to a local maximum We can read off  and  values from Forward and Backward trellises

Compute new transition probabilities  is the probability of making transition i-j at time t, given the observed output  is dependent on data, plus it only applies for one time step; otherwise it is just like aij(t)

What is gamma? Sum  over all time steps, then we get the expected number of times that the transition i-j was made while generating the sequence Y:

How many times did we leave i? Sum  over all time steps and all states that can follow i, then we get the expected number of times that the transition i-x as made for any state x:

Recompute transition probability In other words, probability of going from state i to j is estimated by counting how often we took it for our data (C1), and dividing that by how often we went from i to other states (C2)

Recompute output probabilities Originally these were bij(k) values We need: expected number of times that we made the transition i-j and emitted the symbol k The expected number of times that we made the transition i-j

New estimate of bij(k)

Step 5: Go to step 2 Step 2 is Forward Algorithm Repeat entire process until the probabilities converge Usually this is rapid, 10-15 iterations “Estimate-Maximize” because the algorithm first estimates probabilities, then maximizes them based on the data “Forward-Backward” refers to the two computationally intensive steps in the algorithm

Computing requirements Trellis has N nodes per column, where N is the number of states Trellis has S columns, where S is the length of the sequence Between each pair of columns, we create E edges, one for each transition in the HMM Total trellis size is approximately S(N+E)