Pattern Recognition and Machine Learning-Chapter 13: Sequential Data

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Dynamic Bayesian Networks (DBNs)
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
INTRODUCTION TO Machine Learning 3rd Edition
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
… Hidden Markov Models Markov assumption: Transition model:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Chapter 3 (part 3): Maximum-Likelihood and Bayesian Parameter Estimation Hidden Markov Model: Extension of Markov Chains All materials used in this course.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
. Class 5: Hidden Markov Models. Sequence Models u So far we examined several probabilistic model sequence models u These model, however, assumed that.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Computer vision: models, learning and inference
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Affiliation: Kyoto University
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Pattern Recognition and Machine Learning
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models BMI/CS 576
Hidden Markov Autoregressive Models
Summarized by Kim Jin-young
Biointelligence Laboratory, Seoul National University
Presentation transcript:

Pattern Recognition and Machine Learning-Chapter 13: Sequential Data Affiliation: Kyoto University Name: Kevin Chien, Dr. Oba Shigeyuki, Dr. Ishii Shin Date: Dec. 9, 2011

Origin of Markov Models Idea Origin of Markov Models

Why Markov Models IID data not always possible. Illustrate future data (prediction) dependent on some recent data, using DAGs where inference is done by sum-product algorithm. State Space (Markov) Model: Latent Variables Discrete latent: Hidden Markov Model Gaussian latent: Linear Dynamical Systems Order of Markov Chain: data dependence 1st order: Current observation depends only on previous 1 observation

State Space Model Latent variable Zn forms a Markov chain. Each Zn contributes to its observation Xn. As order grows #parameter grows, to organize this we use State Space Model Zn-1 and Zn+1 is now independent given Zn (d-separated)

For understanding Markov Models Terminologies For understanding Markov Models

Terminologies Markovian Property: stochastic process that probability of a transition is dependent only on present state and not on the manner in which the current state is reached. Transition diagram for same variable different state

Terminologies (cont.) F is bounded above and below by g asymptotically (review)Zn+1 and Zn-1 is d-separated given Zn: means given we block Zn’s outgoing edges there is no path from Zn+1 and Zn-1 =>independent [Big_O_notation, Wikipedia, Dec. 2011]

Formula and motivation Markov Models Formula and motivation

Hidden Markov Models (HMM) Zn discrete multinomial variable Transition probability matrix Sum of each row =1 P(staying in present state) is non-zero Counting non-diagonals K(K-1) parameters

Hidden Markov Models (cont.) Emission (transition) probability with parameters governing the distribution homogeneous model: latent variable share the same parameter A Sampling data is simply noting the parameter values while following transitions with emission probability.

HMM, Expect. Max. for max. likelihood Likelihood function: marginalizing over latent variables Start with initial model parameters for Evaluate Defining Likelihood function results

HMM: forward-backward algorithm 2 stage message passing in tree for HMM, to find marginals p(node) efficiently Here the marginals are Assume p(xk|zk), p(zk|zk-1),p(z1) known X=(x1,..,xn), xi:j=(xi,xi+1,..,xj) Goal compute p(zk|x) Forward part: compute p(zk, x1:k) for every k=1,..,n Backward part: compute p(xk+1:n|zk) for every k=1,…,n

HMM: forward-backward algorithm (cont.) P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk,x1:k) p(zk,x1:k) Where xk+1:n and x1:k are d-separated given zk so P(zk|x) ∝ p(zk,x) = p(xk+1:n|zk) p(zk,x1:k) Now we can do EM algorithm and Baum-Welch algorithm to estimate parameter values Sample from posterior z given x. Most likely z with Viterbi algorithm xk+1:n

HMM forward-backward algorithm: Forward part Compute p(zk,x1:k) p(zk,x1:k)=∑(all values of zk-1) p(zk,zk-1,x1:k) = ∑(all values of zk-1) p(xk|zk,zk-1,x1:k-1)p(zk|zk-1,x1:k-1)p(zk-1,x1:k-1) mm…look like a recursive function, if p(zk,x1:k) is labeled αk(zk) then zk-1,x1:k-1 and xk d-separated given zk zk and xk-1 d-separated given zk-1 So αk(zk)=∑(all values of zk-1) p(xk|zk)p(zk|zk-1) αk-1(zk-1) xk+1:n For k=2,..,n Emission prob. transition prob. recursive part

HMM forward-backward algorithm: Forward part (cont.) α1(z1)=p(z1,x1)=p(z1)p(x1|z1) If each z has m states then computational complexity is Θ(m) for each zk for one k Θ(m2) for each k Θ(nm2) in total xk+1:n

HMM forward-backward algorithm: Backward part Compute p(xk+1:n|zk) for all zk and all k=1,..,n-1 p(xk+1:n|zk)=∑(all values of zk+1) p(xk+1:n,zk+1|zk) =∑(all values of zk+1) p(xk+2:n|zk+1,zk,xk+1)p(xk+1|zk+1,zk)p(zk+1|zk) mm…look like a recursive function, if p(xk+1:n|zk) is labeled βk(zk) then zk,xk+1 and xk+2:n d-separated given zk+1 zk and xk+1 d-separated given zk+1 So βk(zk) =∑(all values of zk+1) βk+1(zk+1) p(xk+1|zk+1)p(zk+1|zk) xk+1:n For k=1,..,n-1 recursive part Emission prob. transition prob.

HMM forward-backward algorithm: Backward part (cont.) βn(zn) =1 for all zn If each z has m states then computational complexity is same as forward part Θ(nm2) in total xk+1:n

HMM: Viterbi algorithm Max-sum algorithm for HMM, to find most probable sequence of hidden states for a given observation sequence X1:n Example: transform handwriting images into text Assume p(xk|zk), p(zk|zk-1),p(z1) known Goal: compute z*= argmaxz p(z|x) Given x=x1:n, z=z1:n Given lemma f(a)≥0 ∀a and g(a,b) ≥0 ∀a,b then Maxa,b f(a)g(a,b) = maxa[f(a) maxb g(a,b)] maxz p(z|x) ∝ maxz p(z,x)

HMM: Viterbi algorithm (cont.) μk(zk)=maxz1:k p(z1:k,x1:k) =maxz1:k p(xk|zk)p(zk|zk-1) …..f(a) part p(z1:k-1,x1:k-1) ....g(a,b) part mm…look like a recursive function, if we can make max to appear in front of p(z1:k-1,x1:k-1). Use lemma - by setting a=zk-1, b=z1:k-2 =maxzk-1[p(xk|zk)p(zk|zk-1) maxz1:k-2 p(z1:k-1,x1:k-1)] =maxzk-1[p(xk|zk) p(zk|zk-1) μk-1(zk-1) ] For k=2,…,n

HMM: Viterbi algorithm (finish up) μk(zk)=maxzk-1 p(xk|zk) p(zk|zk-1) μk-1(zk-1) μ1(z1)= p(x1,z1)=p(z1)p(x1|z1) Same method to get maxz μn(zn)=maxz p(x,z) We can get max value, to get max sequence, compute recursive equation bottom-up while remembering values (μk(zk) looks at all paths of μk-1(zk-1)) For k=2,…,n

Additional Information Excerpt of equations and diagrams from [Pattern Recognition and Machine Learning, Bishop C.M.] page 605-646 Excerpt of equations from Mathematicalmonk, Youtube LLC, Google Inc., (ML 14.6 and 14.7) various titles, July 2011