Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.

Slides:



Advertisements
Similar presentations
Angelo Dalli Department of Intelligent Computing Systems
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Introduction to Hidden Markov Models
Hidden Markov Models Eine Einführung.
Tutorial on Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Probabilistic Reasoning Over Time (Especially for HMM and Kalman filter ) December 1 th, 2004 SeongHun Lee InHo Park Yang Ming.
Hidden Markov Models Lecture 5, Tuesday April 15, 2003.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Model: Extension of Markov Chains
Chapter 3 (part 3): Maximum-Likelihood and Bayesian Parameter Estimation Hidden Markov Model: Extension of Markov Chains All materials used in this course.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Scenario Generation for the Asset Allocation Problem Diana Roman Gautam Mitra EURO XXII Prague July 9, 2007.
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Isolated-Word Speech Recognition Using Hidden Markov Models
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
THE HIDDEN MARKOV MODEL (HMM)
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
1 Hidden Markov Models Hsin-Min Wang Institute of Information Science, Academia Sinica References: 1.L. R. Rabiner and B. H. Juang,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Algorithms in Computational Biology11Department of Mathematics & Computer Science Algorithms in Computational Biology Markov Chains and Hidden Markov Model.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Model LR Rabiner
4.0 More about Hidden Markov Models
Hassanin M. Al-Barhamtoshy
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS 10.5 Continuous observation Densities in HMMS

10.1 Discrete-Time Markov Process (1) A system at any time may be in one of a set of N distinct states indexed by {1,2,…,N}. The system undergoes a change of state (possibly back to the same state) according to a set of probability with the state. Time is represented by t=1,2,…, the state at t by q t.

Discrete-Time Markov Model (2) The discrete-time, first order, Markov chain is defined as following : P[q t =j|q t-1 =i] = a ij, 1<=i,j<=N where a ij >= 0 and Σ j=1 N a ij = 1 The example of weather. It is called observable Markov Model. Its every state corresponds an observable event.

Discrete-Time Markov Model (3) By giving the state-transition matrix, it can answer a lot of questions. (1) What is the probability of a sequence of weather : Calculate P(O|Model) by O = (sun,sun,sun,rain,rain,sun,cloudy,sun) (2) O = ( i, i, …i, j!=i ) 1 2 d d+1

Discrete-Time Markov Model (4) What is the probability that the system is at state i for first d instances. p i (d)=(a ii ) d-1 (1-a ii ) d i = 1/(1-a ii )

Hidden Markov Model (1) Extension : the observation is a probability function of the state. So there is a doubly embedded stochastic process : the underlying stochastic process is not directly observable(it is hidden), but can be observed only through another set of stochastic processes that produce the sequence of observations. The name comes.

Hidden Markov Model (2) If P(H)=P(T)=0.5, What is the probability that the next 10 tosses will produce the sequence (HHTHTTHTH)? Or (HHHHHHHHHH)? What is the probability that 5 of the next 10 tosses will be tails? Coin-toss Model If an observation is given, there could be a lot of different models which produce that sequence with different probability.

Hidden Markov Model (3) The Urn-and-Ball Model There are a couple of urns in which there are many balls with different colors. If an observation sequence is given, there are a lot of interpretation for it. Here the urns are the states, the balls of different color could be the observable events

Hidden Markov Model (4) Elements of an HMM (1) state set q = {q 1, q 2, …, q N } or {1, 2, …, N}for short; N is the number of states. (2) observation symbol set V = (v 1, v 2, …, v M ); M is the number of observation symbols. (3) The state-transition probability distribution A = { a ij }

Hidden Markov Model (5) a ij = P[q t+1 =j|q t =i] 1<=i,j<=N (4) The observation symbol distribution B={b j (k)} where b j (k) = P[o t =v k |q t =j] 1<=k<=M (5) The initial state distribution π={π i } where π i = P[q 1 =i] 1<=i<=N Sometime the model is presented by λ = ( A, B, π)

Hidden Markov Model (6) If an HMM is given, it could be used as a generator to give an observation sequence O=(o 1, o 2, …, o T ) where T is the number of observations, o i is one of the symbols from V (discrete case)

10.3 Three Basic Problems of HMMS (1) Problem 1 (Evaluating) Given the observation sequence O and model λ, how to effectively calculate P(O|λ)? Problem 2 (Optimazing or Decoding) Given the observation sequence O and model λ, how to chose an optimal state sequence q = (q 1, q 2, …, q T ) ?

Three Basic Problems of HMMS (2) Problem 3 (Training) How to adjust the model parameters λ = ( A, B, π) to maximize P(O|λ)? The solution to problem 1 In fact, all possible state sequence will contribute to P(O|λ). If a state sequence is : q = (q 1, q 2, …, q T )

Three Basic Problems of HMMS (3) P(O|q,λ) = b q1 (o 1 )b q2 (o 2 )…b qT (o T ) P(q|λ) = π q1 a q1q2 a q2q3 …a qT-1qT P(O,q|λ) = P(O|q,λ) P(q|λ) P(O|λ) = Σ q P(O|q,λ) P(q|λ) = Σ ai π q1 b q1 (o 1 )a q1q2 b q2 (o 2 ) a q2q3 … a qT- 1qT b qT (o T ) This computation needs order of 2TN T calculations, and it is infeasible. For N=5, T=100, there are about computations. A more efficient procedure is required to solve problem 1.

Three Basic Problems of HMMS (4) The Forward Procedure Define α t (i) = P(o 1, o 2, …, o t, q t =i|λ) is the probability of the partial observation sequence, o 1, o 2, …, o t, (until time t) and state i at time t, given the model λ. The Iterative procedure is as following : (1) Initialization α 1 (i) = π i b i (o 1 ) 1<=i<=N (2) Iteration α t+1 (j) = [Σ j=1 N α t (i)a ij ] b j (o t+1 ), t=1~T- 1 (3) TerminationP(O|λ) = Σ i=1 N α T (i) This procedure requires N 2 T calculations rather than 2TN T.

Three Basic Problems of HMMS (5) The Backward Procedure Define β t (i) = P(o t+1, o t+2, …, o T | q t =i,λ) is the probability of the partial observation sequence from o t+1 to the end, given state i at time t and the model λ. The iterative procedure is as following : (1) Initialization β T (i) = 1, 1<=i<=N (2) Iteration β t (i) = Σ j=1 N a ij b j (o t+1 )β t+1 (j), t=1~T-1 (3) TerminationP(O|λ)= Σ i=1 N π i β 1 (i)b i (o 1 ) It also requires about N 2 T calculations.

Three Basic Problems of HMMS (6) Solution to problem 2 The first concept is how to define the ‘optimality’. The most widely used criterion is to find the single best state sequence ( path ) to maximize P(q|O,λ) which is equivalent to maximizing P(q,O|λ). The formal technique is based on dynamic programming methods and called Viterbi algorithm The Viterbi Algorithm Define δ t (i) = max P(q 1 q 2 …q t-1, q t =i, o 1 o 2 …o t |λ) for q 1 q 2 …q t-1 is the best score along a path, at time t which accounts for the first t observations and ends in state i.

Three Basic Problems of HMMS (7) δ t+1 (j) = max [δ t (i)a ij ] b j (o t+1 ) The iterative procedure : (1) Initialization δ 1 (i) = π i b i (o 1 ), ψ 1 (i)=0, i=1~N (2) Iterationδ t (j) = max [δ t-1 (i)a ij ] b j (o t ) for i,j=1~N,t=2~T ψ t (j)= argmax [δ t-1 (i)a ij ], for i,j=1~N,t=2~N (3) Termination P * = max i=1 N [δ T (i)] ψ T * = argmax i=1 N [δ T (i)] (4) Path backtracking q t * = ψ t+1 (q t+1 * ) t=T-1~1 An alterative Viterbi implementation uses logarithm to avoid underflow.

Three Basic Problems of HMMS (8) Solution to Problem 3 There is no analytic solution for that. Only iterative procedures are available, such as Baum-Welch method (or known as Expectation Maximization) Baum re-estimation procedure Define ξ t (i,j) = P(q t =i, q t+1 =j|O,λ) ξ t (i,j) = P(q t =i, q t+1 =j,O|λ)/P(O|λ) =α t (i)a ij b j (o t+1 )β t+1 (j)/ P(O|λ) = α t (i)a ij b j (o t+1 )β t+1 (j)/Σ i=1 N Σ j=1 N α t (i)a ij b j (o t+1 )β t+1 (j)

Three Basic Problems of HMMS (9) Define γ t (i) = P(q t =i|O,λ) is the probability of being in state i at time t, given O and λ. γ t (i) = P(q t =i,O|λ)/P(O|λ) = P(q t =i,O|λ)/Σ i=1 N P(q t =i,O|λ) = α t (i)β t (i)/ Σ i=1 N α t (i)β t (i) So γ t (i) = Σ j=1 N ξ t (i,j) If we sum γ t (i) over the time index t, we get the expected of times that state i is visited, or the expected number of transitions made from state i. The sum of ξ t (i,j) over t is the expected number of transitions from i to j.

Three Basic Problems of HMMS (10) So π j ’ = γ 1 (j), a ij ’ = Σ t=1 T-1 ξ t (i,j)/ Σ t=1 T-1 γ t (i) b j ’(k) = Σ t=1 T γ t ’(j) / Σ t=1 T γ t (j) t=1~T and the numerator only considers the cases that the observation o t is v k. These are the iterative formula for the model parameters. The initial parameters λ 0 could be even distributions. Then α t (i) and β t (i) (1<=i<=N, 1<=t<=T) could be calculated for all samples, ξ t (i,j) and γ t (i) also could be calculated, so λgets updated as above.

10.4 Types of HMMS (1) Full connection A will be an n x n square matrix and all elements of A are not zero. But there are some different types. For example : The left-right HHM model. In this model the state index will be increased. For this model, a ij = 0 for j<i and π i = 1 only for i=1, and a NN = 1, a Ni = 0 for i<N. There could be some other types : more transfer with skip

10.5 Continuous Observation Densities in HMMS (1) In previous discussion we suppose the observation are discrete symbols. We must consider the continuous case. In this case, b j (k) will become probability density b j (o). The most general representation of the pdf is a finite mixture of the form b j (o) = Σ k=1 M c jk N(o, μ jk,U jk ) M is the number of distributions of the mixture, c jk > 0 and Σ k=1 M c jk = 1, j=1~N

Continuous observation densities in HMMS (2) The re-estimation formulas are : c jk ’ = Σ t=1 T γ t (j, k) / Σ t=1 T Σ k=1 M γ t (j, k) μ jk ’ = Σ t=1 T γ t (j, k)*o t / Σ t=1 T γ t (j, k) U jk ’ = Σ t=1 T γ t (j, k)*(o-μ jk ) (o-μ jk )’/ Σ t=1 T γ t (j, k) whereγ t (j, k)=[α t (i)β t (i)/ Σ i=1 N α t (i)β t (i)] * [c jk N(o, μ jk,U jk )]/ Σ k=1 M c jk N(o, μ jk,U jk )]