PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall 2004-2005.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Learning HMM parameters
Tutorial on Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
PatReco: Estimation/Training Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
PatReco: Bayesian Networks Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Review Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
PatReco: Bayes Classifier and Discriminant Functions Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Siddiqi and Moore, Fast Inference and Learning in Large-State-Space HMMs Sajid M. Siddiqi Andrew W. Moore The Auton Lab Carnegie Mellon.
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
PatReco: Discriminant Functions for Gaussians Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Recognition Applications Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
CS Statistical Machine learning Lecture 24
1 CONTEXT DEPENDENT CLASSIFICATION  Remember: Bayes rule  Here: The class to which a feature vector belongs depends on:  Its own value  The values.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
Learning, Uncertainty, and Information: Learning Parameters
CSC 594 Topics in AI – Natural Language Processing
Hidden Markov Models Part 2: Algorithms
Hidden Markov Autoregressive Models
Hidden Markov Model LR Rabiner
4.0 More about Hidden Markov Models
CONTEXT DEPENDENT CLASSIFICATION
Algorithms of POS Tagging
LECTURE 15: REESTIMATION, EM AND MIXTURES
Hidden Markov Models By Manish Shrivastava.
Qiang Huo(*) and Chorkin Chan(**)
Presentation transcript:

PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Markov Models: Definition  Markov chains are Bayesian networks that model sequences of events (states)  Sequential events are dependent  Two non-sequential events are conditionally independent given the intermediate events (MM-1)

Markov chains q1 q4q3q2 q0q1q4q3q2 q0q1q4q3q2 q0q1q4q3q2 MM-0 MM-1 MM-2 MM-3 … … … …

Markov Chains MM-0: P(q 1,q 2.. q N ) =  n=1..N P(q n ) MM-1: P(q 1,q 2.. q N ) =  n=1..N P(q n |q n-1 ) MM-2: P(q 1,q 2.. q N ) =  n=1..N P(q n |q n-1,q n-2 ) MM-3: P(q 1,q 2.. q N ) =  n=1..N P(q n |q n-1,q n-2,q n-3 )

Hidden Markov Models  Hidden Markov chains model sequences of events and corresponding sequences of observations  Events form an Markov chain (MM-1)  Observations are conditionally independent given the sequence of events  Each observation is directly connected with a single event (and conditionally independent with the rest of the events in the network)

Hidden Markov Models q0q1q4q3q2 … o0o1o4o3o2 … P(o 0,o 1..o N, q 0,q 1..q N ) =  n=0..N P(q n |q n-1 )P(o n |q n ) HMM-1

Parameter Estimation  The parameters that have to be estimated are the a-priori probabilities P(q 0 ) transition probabilities P(q n |q n-1 ) observation probabilities P(o n |q n )  For example if there are 3 types of events and continuous 1-D observations that follow a Gaussian distribution there are 18 parameters to estimate: 3 a-priori probabilities 3x3 transition probabilities matrix 3 means and 3 variances (observation probabilities)

Parameter Estimation  If both the sequence of events and sequences of observations are fully observable then ML is used  Usually the sequence of events q 0,q 1..q N are non-observable in which case EM is used  The EM algorithm for HMMs is the Baum- Welsh or forward-backward algorithm

Inference/Decoding  The main inference problem for HMMs is known as the decoding problem: given a sequence of observations find the best sequence of states: q = argmax q P(q|O) = argmax q P(q,O)  An efficient decoding algorithm is the Viterbi algorithm

Viterbi algorithm max q P(q,O) = max q P(o 0,o 1..o N, q 0,q 1..q N ) = max q  n=0..N P(q n |q n-1 )P(o n |q n ) = max q N {P(o N |q N ) max q N-1 {P(q N |q N-1 )P(o N-1 |q N-1 ) … max q2 {P(q 3 |q 2 )P(o 2 |q 2 ) max q1 {P(q 2 |q 1 )P(o 1 |q 1 ) max q0 {P(q 1 |q 0 ) P(o 0 |q 0 ) P(q 0 )}}}…}}

Viterbi algorithm K.... time At each node keep only the best (most probable) path from all the paths passing through that node

Deep Thoughts  HMM-0 (HMM with MM-0 event chain) is the Bayes classifier!!!  MMs and HMMs are poor models but simple and efficient computationally How do you fix this? (dependent observations?)

Some Applications  Speech Recognition  Optical Character Recognition  Part-of-Speech Tagging  …

Conclusions  HMMs and MMs are useful modeling tools for dependent sequence of events (states or classes)  Efficient algorithms exist for training HMM parameters (Baum-Welsh) and decoding the most probable sequence of states given an observation sequence (Viterbi)  HMMs have many applications