Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.

Slides:



Advertisements
Similar presentations
. Inference and Parameter Estimation in HMM Lecture 11 Computational Genomics © Shlomo Moran, Ydo Wexler, Dan Geiger (Technion) modified by Benny Chor.
Advertisements

Marjolijn Elsinga & Elze de Groot1 Markov Chains and Hidden Markov Models Marjolijn Elsinga & Elze de Groot.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
. Computational Genomics Lecture 7c Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Introduction to Hidden Markov Models
Hidden Markov Models Eine Einführung.
Hidden Markov Models.
. Computational Genomics Lecture 10 Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Models. Two learning scenarios 1.Estimation when the “right answer” is known Examples: GIVEN:a genomic region x = x 1 …x 1,000,000 where.
Hidden Markov Models. Decoding GIVEN x = x 1 x 2 ……x N We want to find  =  1, ……,  N, such that P[ x,  ] is maximized  * = argmax  P[ x,  ] We.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
. Computational Genomics Lecture 8a Hidden Markov Models (HMMs) © Ydo Wexler & Dan Geiger (Technion) and by Nir Friedman (HU) Modified by Benny Chor (TAU)
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Hidden Markov Models.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
HMM Hidden Markov Model Hidden Markov Model. CpG islands CpG islands In human genome, CG dinucleotides are relatively rare In human genome, CG dinucleotides.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CS Statistical Machine learning Lecture 24
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 DNA Analysis Part II Amir Golnabi ENGS 112 Spring 2008.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
Hidden Markov Model LR Rabiner
LECTURE 15: REESTIMATION, EM AND MIXTURES
Presentation transcript:

Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified by Benny Chor, using also some slides of Nir Friedman (Hebrew Univ.), for the Computational Genomics Course, Tel-Aviv Univ., Dec. 2002

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Outline Discrete Markov Models Hidden Markov Models Three major questions: Q1. Computing the probability of a given observation. A1. Forward – Backward (Baum Welch) DP algorithm. Q2. Computing the most probable sequence, given an observation. A2. Viterbi DP Algorithm Q3. Given an observation, learn best model. A3. Expectation Maximization (EM): A Heuristic.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Markov Models A discrete (finite) system: N distinct states. Begins (at time t=1) in some initial state. At each time step (t=1,2,…) the system moves from current to next state (possibly the same as the current state) according to transition probabilities associated with current state. This kind of system is called a Discrete Markov Model

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Discrete Markov Model Example: Discrete Markov Model with 5 states Each of the a ij represents the probability of moving from state i to state j The a ij are given in a matrix A = {a ij } The probability to start in a given state i is  i, The vector  represents these  start  probabilities.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Types of Models Ergodic model Strongly connected - directed path w/ positive probabilities from each state i to state j (but not necessarily complete directed graph)

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Types of Models (cont.) Left-to-Right (LR) model Index of state non-decreasing with time

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Discrete Markov Model - Example States – Rainy:1, Cloudy:2, Sunny:3 Matrix A – Problem – given that the weather on day 1 (t=1) is sunny(3), what is the probability for the observation O:

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Discrete Markov Model – Example (cont.) The answer is -

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Hidden Markov Models (probabilistic finite state automata) Often we face scenarios where states cannot be directly observed. We need an extension: Hidden Markov Models a 11 a 22 a 33 a 44 a 12 a 23 a 34 b 11 b 14 b 12 b Observed phenomenon a ij are state transition probabilities. b ik are observation (output) probabilities. b 11 + b 12 + b 13 + b 14 = 1, b 21 + b 22 + b 23 + b 24 = 1, etc.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Example: Dishonest Casino Actually, what is hidden in this model?

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Biological Example: CpG islands In human genome, CpG dinucleotides are relatively rare CpG pairs undergo a process called methylation that modifies the C nucleotide A methylated C can (with relatively high probability) mutate to a T Promoter regions are CpG rich These regions are not methylated, and thus mutate less often These are called CpG islands

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics CpG Islands We construct two Markov chains: One for CpG rich, one for CpG poor regions. Using observations from 60K nucleotide, we get two models, + and -.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMMs – Question I Given an observation sequence O = (O 1 O 2 O 3 … O T ), and a model M = {A, B,   }  how do we efficiently compute P(O|M), the probability that the given model M produces the observation O in a run of length T ? This probability can be viewed as a measure of the quality of the model M. Viewed this way, it enables discrimination/selection among alternative models.

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Question II (Harder) Given an observation sequence, O = (O 1 O 2 O 3 … O T ), and a model, M = {A, B,  }  how do we efficiently compute the most probable sequence(s) of states, Q? That is, the sequence of states Q = (Q 1 Q 2 Q 3 … Q T ), which maximizes P(O|Q,M), the probability that the given model M produces the given observation O when it goes through the specific sequence of states Q. Recall that given a model M, a sequence of observations O, and a sequence of states Q, we can efficiently compute P(O|Q,M) (should watch out for numeric underflows)

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Question III (Hardest) Given an observation sequence O = (O 1 O 2 O 3 … O T ), and a class of models, each of the form M = {A, B,  }, which specific model “best” explains the observations? A solution to question I enables the efficient computation of P(O|M) (the probability that a specific model M produces the observation O). Question III can be viewed as a learning problem: We want to use the sequence of observations in order to “train” an HMM and learn the optimal underlying model parameters (transition and output probabilities).

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM Recognition (question I) For a given model M = { A, B,  } and a given state sequence Q 1 Q 2 Q 3 … Q T,, the probability of an observation sequence O 1 O 2 O 3 … O T is P(O|Q,M) = b Q 1 O 1 b Q 2 O 2 b Q 3 O 3 … b Q T O T For a given hidden Markov model M = { A, B,  } the probability of the state sequence Q 1 Q 2 Q 3 … Q T is ( the initial probability of Q 1 is taken to be  Q 1 ) P(Q|M) =  Q 1 a Q 1 Q 2 a Q 2 Q 3 a Q 3 Q 4 … a Q T-1 Q T So, for a given hidden Markov model, M the probability of an observation sequence O 1 O 2 O 3 … O T is obtained by summing over all possible state sequences

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Recognition (cont.) P(O| M) =  P(O|Q) P(Q|M) =  Q  Q 1 b Q 1 O 1 a Q 1 Q 2 b Q 2 O 2 a Q 2 Q 3 b Q 2 O 2 … Requires summing over exponentially many paths But can be made more efficient

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Recognition (cont.) Why isn’t it efficient? – O(2TQ ) For a given state sequence of length T we have about 2T calculations P(Q|M) =  Q 1 a Q 1 Q 2 a Q 2 Q 3 a Q 3 Q 4 … a Q T-1 Q T P(O|Q) = b Q 1 O 1 b Q 2 O 2 b Q 3 O 3 … b Q T O T There are Q possible state sequence So, if Q=5, and T=100, then the algorithm requires computations We can use the forward-backward (F-B) algorithm T xx 100 ~ ~ x 72 T

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm Some definitions 1. Legal final state – a state at which a path through the model may end. 2.  - a “forward-going” 3.  – a “backward-going” 4. a(j|i) = a ij ; b(O|i) = b iO 5. O = the observation O 1 O 2 …O t in times 1,2,…,t ( O 1 on t=1, O 2 on t=2, etc.) 1 t

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.)  can be recursively calculated Stopping condition Moving from state i to state j But we can enter state j from all others states

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.) Now we can work sequentially And on time t=T we get what we wanted -

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.) The full algorithm – Run Demo

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics The F-B Algorithm (cont.) The likelihood is measured using any sequence of states of length T This is known as the “Any Path” Method We can choose an HMM by the probability generated using the best possible sequence of states We’ll refer to this method as the “Best Path” Method

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Most Probable States Sequence (ques. II) Idea: If we know the value of Q i, then the most probable sequence on i+1, …,n does not depend on observations before time i Let V l (i) be the probability of the best sequence Q 1, …, Q i such that Q i = l

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm A DP problem Grid X – frame index, t (time) Q – State index, i Constraints Every path must advance in time by one, and only one, time step for each path segment Final grid points on any path must be of the form (T, i f ), where i f is a legal final state in a model

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm (cont.) Cost Node (t,i) – the probability to emit the observation y(t) on state i = b iy Transition from (t-1,i) to (t,j) – the probability to change state from i to j = a ij The total cost associated with the path is given by the product of the costs (type B) Initial Transition cost: a 0i =  i Goal The best path will be the one of maximum cost

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm (cont.) We can use the trick of taking negative logarithms Multiplications of probabilities are expansive and numerically problematic Sums of numerically stable numbers are simpler The problem is turned into a minimal-cost path search

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Viterbi Algorithm (cont.) Run Demo

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – EM Training Using the Baum-Welch algorithm Is an EM algorithm Estimate – approximate the result Maximize – and if needed, re-estimate The estimation algorithm is based on DP algorithms (F-B & Viterbi)

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – EM Training (cont.) Initializing Begin with an arbitrary model M Estimate Evaluate the likelihood P(O|M) Along the way, keep track of some tallies Recalculate the matrixes A and B e.g, a ij = Maximize If P(O|M) – P(O|M) ≥ , re-estimate with M=M Use several initial models to find a favorable local maximum of P(O|M) number of transitions from i to j number of transitions exiting state i

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics HMM – Training (cont.) Why a local maximum?

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Auxiliary PhysiologyModel

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Auxiliary cont. Articulation

PreviousNextBackOutlineAuxiliary Hidden Markov Models – Computational Genomics Auxiliary cont. Spectrogram Patterson - Barney Diagram Mapping by the formants