S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
… Hidden Markov Models Markov assumption: Transition model:
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Bioinformatics Hidden Markov Models. Markov Random Processes n A random sequence has the Markov property if its distribution is determined solely by its.
Chapter 3 (part 3): Maximum-Likelihood and Bayesian Parameter Estimation Hidden Markov Model: Extension of Markov Chains All materials used in this course.
Hidden Markov Models.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
CS262 Lecture 5, Win07, Batzoglou Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models BMI/CS 576
Learning, Uncertainty, and Information: Learning Parameters
Hidden Markov Models - Training
Three classic HMM problems
Hidden Markov Models (HMMs)
Algorithms of POS Tagging
Presentation transcript:

S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the model that generated the output? A solution to this problem gives us a way to match up an observed sequence and the states in the model. In gene finding, the states correspond to sequence features such as start codons, stop codons, and splice sites

S. Salzberg CMSC 828N 2 Three classic HMM problems 3.Learning: given a model and a set of observed sequences, how do we set the model’s parameters so that it has a high probability of generating those sequences? This is perhaps the most important, and most difficult problem. A solution to this problem allows us to determine all the probabilities in an HMMs by using an ensemble of training data

S. Salzberg CMSC 828N 3 Viterbi algorithm Where V i (t) is the probability that the HMM is in state i after generating the sequence y 1,y 2,…,y t, following the most probable path in the HMM

S. Salzberg CMSC 828N 4 Our sample HMM Let S 1 be initial state, S 2 be final state

S. Salzberg CMSC 828N 5 A trellis for the Viterbi Algorithm State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.6)(0.8)(1.0) (0.4)(0.5)(1.0) (0.1)(0.1)(0) (0.9)(0.3)(0) max max

S. Salzberg CMSC 828N 6 A trellis for the Viterbi Algorithm State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.6)(0.8)(1.0) (0.4)(0.5)(1.0) (0.1)(0.1)(0) (0.9)(0.3)(0) max (0.6)(0.2)(0.48) (0.4)(0.5)(0.48) (0.1)(0.9)(0.2) (0.9)(0.7)(0.2) max(.0576,.018) =.0576 max(.126,.096) =.126 max

S. Salzberg CMSC 828N 7 Learning in HMMs: the E-M algorithm  In order to learn the parameters in an “empty” HMM, we need: The topology of the HMM Data - the more the better  The learning algorithm is called “Estimate-Maximize” or E-M Also called the Forward-Backward algorithm Also called the Baum-Welch algorithm

S. Salzberg CMSC 828N 8 An untrained HMM

S. Salzberg CMSC 828N 9 Some HMM training data  CACAACAAAACCCCCCACAA  ACAACACACACACACACCAAAC  CAACACACAAACCCC  CAACCACCACACACACACCCCA  CCCAAAACCCCAAAAACCC  ACACAAAAAACCCAACACACAACA  ACACAACCCCAAAACCACCAAAAA

S. Salzberg CMSC 828N 10 Step 1: Guess all the probabilities  We can start with random probabilities, the learning algorithm will adjust them  If we can make good guesses, the results will generally be better

S. Salzberg CMSC 828N 11 Step 2: the Forward algorithm  Reminder: each box in the trellis contains a value  i (t)  i (t) is the probability that our HMM has generated the sequence y 1, y 2, …, y t and has ended up in state i.

S. Salzberg CMSC 828N 12 Reminder: notations  sequence of length T:  all sequences of length T:  Path of length T+1 generates Y:  All paths:

S. Salzberg CMSC 828N 13 Step 3: the Backward algorithm  Next we need to compute  i (t) using a Backward computation  i (t) is the probability that our HMM will generate the rest of the sequence y t+1,y t+2, …, y T beginning in state i

S. Salzberg CMSC 828N 14 A trellis for the Backward Algorithm State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.1)(0.9)(0) (0.9)(0.7)(1.0) + + (0.6)(0.2)(0.0) (0.4)(0.5)(1.0)

S. Salzberg CMSC 828N 15 A trellis for the Backward Algorithm (2) State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.1)(0.9)(0) (0.9)(0.7)(1.0) + + (0.6)(0.2)(0.0) (0.4)(0.5)(1.0) (0.1)(0.9)(0.2) (0.9)(0.7)(0.63) + + (0.6)(0.2)(0.2) (0.4)(0.5)(0.63) = =.415

S. Salzberg CMSC 828N 16 A trellis for the Backward Algorithm (3) State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.1)(0.9)(0) (0.9)(0.7)(1.0) + + (0.6)(0.2)(0.0) (0.4)(0.5)(1.0) (0.1)(0.9)(0.2) (0.9)(0.7)(0.63) + + (0.6)(0.2)(0.2) (0.4)(0.5)(0.63) (0.6)(0.8)(0.15) (0.9)(0.3)(0.415) (0.1)(0.1)(0.15) (0.4)(0.5)(0.415) = =.1135

S. Salzberg CMSC 828N 17 Step 4: Re-estimate the probabilities  After running the Forward and Backward algorithms once, we can re-estimate all the probabilities in the HMM   SF is the prob. that the HMM generated the entire sequence  Nice property of E-M: the value of  SF never decreases; it converges to a local maximum  We can read off  and  values from Forward and Backward trellises

S. Salzberg CMSC 828N 18 Compute new transition probabilities   is the probability of making transition i-j at time t, given the observed output  is dependent on data, plus it only applies for one time step; otherwise it is just like a ij (t)

S. Salzberg CMSC 828N 19 What is gamma?  Sum  over all time steps, then we get the expected number of times that the transition i-j was made while generating the sequence Y:

S. Salzberg CMSC 828N 20 How many times did we leave i?  Sum  over all time steps and all states that can follow i, then we get the expected number of times that the transition i-x as made for any state x:

S. Salzberg CMSC 828N 21 Recompute transition probability In other words, probability of going from state i to j is estimated by counting how often we took it for our data (C1), and dividing that by how often we went from i to other states (C2)

S. Salzberg CMSC 828N 22 Recompute output probabilities  Originally these were b ij (k) values  We need: expected number of times that we made the transition i-j and emitted the symbol k The expected number of times that we made the transition i-j

S. Salzberg CMSC 828N 23 New estimate of b ij (k)

S. Salzberg CMSC 828N 24 Step 5: Go to step 2  Step 2 is Forward Algorithm  Repeat entire process until the probabilities converge Usually this is rapid, iterations  “Estimate-Maximize” because the algorithm first estimates probabilities, then maximizes them based on the data  “Forward-Backward” refers to the two computationally intensive steps in the algorithm

S. Salzberg CMSC 828N 25 Computing requirements  Trellis has N nodes per column, where N is the number of states  Trellis has S columns, where S is the length of the sequence  Between each pair of columns, we create E edges, one for each transition in the HMM  Total trellis size is approximately S(N+E)