Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Hidden Markov Models (HMM) Rabiner’s Paper
Hidden Markov Model in Biological Sequence Analysis – Part 2
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Profiles for Sequences
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Albert Gatt Corpora and Statistical Methods Lecture 8.
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
… Hidden Markov Models Markov assumption: Transition model:
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Hidden Markov Models Sasha Tkachev and Ed Anderson Presenter: Sasha Tkachev.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Elze de Groot1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Deepak Verghese CS 6890 Gene Finding With A Hidden Markov model Of Genomic Structure and Evolution. Jakob Skou Pedersen and Jotun Hein.
1 Markov Chains. 2 Hidden Markov Models 3 Review Markov Chain can solve the CpG island finding problem Positive model, negative model Length? Solution:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Models for Sequence Analysis 4
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
H IDDEN M ARKOV M ODELS. O VERVIEW Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
(H)MMs in gene prediction and similarity searches.
1 Applications of Hidden Markov Models (Lecture for CS498-CXZ Algorithms in Bioinformatics) Nov. 12, 2005 ChengXiang Zhai Department of Computer Science.
Introducing Hidden Markov Models First – a Markov Model State : sunny cloudy rainy sunny ? A Markov Model is a chain-structured process where future states.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models BMI/CS 576
What is a Hidden Markov Model?
Three classic HMM problems
Hidden Markov Models (HMMs)
CONTEXT DEPENDENT CLASSIFICATION
Presentation transcript:

Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010

S. Salzberg CMSC 828H 2 What are HMMs used for?  Real time continuous speech recognition (HMMs are the basis for all the leading products)  Eukaryotic and prokaryotic gene finding (HMMs are the basis of GENSCAN, Genie, VEIL, GlimmerHMM, TwinScan, etc.)  Multiple sequence alignment  Identification of sequence motifs  Prediction of protein structure

S. Salzberg CMSC 828H 3 What is an HMM?  Essentially, an HMM is just A set of states A set of transitions between states  Transitions have A probability of taking a transition (moving from one state to another) A set of possible outputs Probabilities for each of the outputs  Equivalently, the output distributions can be attached to the states rather than the transitions

S. Salzberg CMSC 828H 4 HMM notation  The set of all states: {s}  Initial states: S I  Final states: S F  Probability of making the transition from state i to j: a ij  A set of output symbols  Probability of emitting the symbol k while making the transition from state i to j: b ij (k)

S. Salzberg CMSC 828H 5 HMM Example - Casino Coin FairUnfair H HTT State transition probs. Symbol emission probs. HTHHTTHHHTHTHTHHTHHHHHHTHT HH Observation Sequence FFFFFFUUUFFFFFFUUUUUUUFFFFF F State Sequence Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? Observation Symbols States Two CDF tables Slide credit: Fatih Gelgi, Arizona State U.

S. Salzberg CMSC 828H 6 HMM example: DNA Consider the sequence AAACCC, and assume that you observed this output from this HMM. What sequence of states is most likely?

S. Salzberg CMSC 828H 7 Properties of an HMM  First-order Markov process s t only depends on s t-1 However, note that probability distributions may contain conditional probabilities  Time is discrete Slide credit: Fatih Gelgi, Arizona State U.

S. Salzberg CMSC 828H 8 Three classic HMM problems 1.Evaluation: given a model and an output sequence, what is the probability that the model generated that output? To answer this, we consider all possible paths through the model A solution to this problem gives us a way of scoring the match between an HMM and an observed sequence Example: we might have a set of HMMs representing protein families

S. Salzberg CMSC 828H 9 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the model that generated the output? A solution to this problem gives us a way to match up an observed sequence and the states in the model. In gene finding, the states correspond to sequence features such as start codons, stop codons, and splice sites

S. Salzberg CMSC 828H 10 Three classic HMM problems 3.Learning: given a model and a set of observed sequences, how do we set the model’s parameters so that it has a high probability of generating those sequences? This is perhaps the most important, and most difficult problem. A solution to this problem allows us to determine all the probabilities in an HMMs by using an ensemble of training data

S. Salzberg CMSC 828H 11 An untrained HMM

S. Salzberg CMSC 828H 12 Basic facts about HMMs (1)  The sum of the probabilities on all the edges leaving a state is 1 … for any given state j

S. Salzberg CMSC 828H 13 Basic facts about HMMs (2)  The sum of all the output probabilities attached to any edge is 1 … for any transition i to j

S. Salzberg CMSC 828H 14 Basic facts about HMMs (3)  a ij is a conditional probability; i.e., the probablity that the model is in state j at time t+1 given that it was in state i at time t

S. Salzberg CMSC 828H 15 Basic facts about HMMs (4)  b ij (k) is a conditional probability; i.e., the probablity that the model generated k as output, given that it made the transition ij at time t

S. Salzberg CMSC 828H 16 Why are these Markovian?  Probability of taking a transition depends only on the current state This is sometimes called the Markov assumption  Probability of generating Y as output depends only on the transition ij, not on previous outputs This is sometimes called the output independence assumption  Computationally it is possible to simulate an n th order HMM using a 0 th order HMM This is how some actual gene finders (e.g., VEIL) work

S. Salzberg CMSC 828H 17 Solving the Evaluation problem: the Forward algorithm  To solve the Evaluation problem, we use the HMM and the data to build a trellis  Filling in the trellis will give tell us the probability that the HMM generated the data by finding all possible paths that could do it

S. Salzberg CMSC 828H 18 Our sample HMM Let S 1 be initial state, S 2 be final state

S. Salzberg CMSC 828H 19 A trellis for the Forward Algorithm State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.6)(0.8)(1.0) (0.4)(0.5)(1.0) (0.1)(0.1)(0) (0.9)(0.3)(0)

S. Salzberg CMSC 828H 20 A trellis for the Forward Algorithm State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.6)(0.8)(1.0) (0.4)(0.5)(1.0) (0.1)(0.1)(0) (0.9)(0.3)(0) (0.6)(0.2)(0.48) (0.4)(0.5)(0.48) (0.1)(0.9)(0.2) (0.9)(0.7)(0.2) = =.222

S. Salzberg CMSC 828H 21 A trellis for the Forward Algorithm State S1S1 S2S2 Time t=0 t=2t=3t=1 Output: ACC (0.6)(0.8)(1.0) (0.4)(0.5)(1.0) (0.1)(0.1)(0) (0.9)(0.3)(0) (0.6)(0.2)(0.48) (0.4)(0.5)(0.48) (0.1)(0.9)(0.2) (0.9)(0.7)(0.2) (0.6)(0.2)(.0756) (0.4)(0.5)(0.0756) (0.1)(0.9)(0.222) (0.9)(0.7)(0.222) = =.15498

S. Salzberg CMSC 828H 22 Forward algorithm: equations  sequence of length T:  all sequences of length T:  Path of length T+1 generates Y:  All paths:

S. Salzberg CMSC 828H 23 Forward algorithm: equations In other words, the probability of a sequence y being emitted by an HMM is the sum of the probabilities that we took any path that emitted that sequence. * Note that all paths are disjoint - we only take 1 - so you can add their probabilities

S. Salzberg CMSC 828H 24 Forward algorithm: transition probabilities We re-write the first factor - the transition probability - using the Markov assumption, which allows us to multiply probabilities just as we do for Markov chains

S. Salzberg CMSC 828H 25 Forward algorithm: output probabilities We re-write the second factor - the output probability - using another Markov assumption, that the output at any time is dependent only on the transition being taken at that time

S. Salzberg CMSC 828H 26 Substitute back to get computable formula This quantity is what the Forward algorithm computes, recursively. *Note that the only variables we need to consider at each step are y t, x t, and x t+1

S. Salzberg CMSC 828H 27 Forward algorithm: recursive formulation Where  i (t) is the probability that the HMM is in state i after generating the sequence y 1,y 2,…,y t

S. Salzberg CMSC 828H 28 Probability of the model  The Forward algorithm computes P(y|M)  If we are comparing two or more models, we want the likelihood that each model generated the data: P(M|y)  Use Bayes’ law:  Since P(y) is constant for a given input, we just need to maximize P(y|M)P(M)