Hidden Markov Models (HMM) Rabiner’s Paper

Slides:



Advertisements
Similar presentations
CS344 : Introduction to Artificial Intelligence
Advertisements

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
Angelo Dalli Department of Intelligent Computing Systems
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Introduction to Hidden Markov Models
Hidden Markov Models Eine Einführung.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
2004/11/161 A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition LAWRENCE R. RABINER, FELLOW, IEEE Presented by: Chi-Chun.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models in NLP
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Models 戴玉書
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
7-Speech Recognition Speech Recognition Concepts
HMM - Basics.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
1 Hidden Markov Models Hsin-Min Wang Institute of Information Science, Academia Sinica References: 1.L. R. Rabiner and B. H. Juang,
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
Classification of melody by composer using hidden Markov models Greg Eustace MUMT 614: Music Information Acquisition, Preservation, and Retrieval.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models BMI/CS 576
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Hidden Markov Models - Training
Computational NeuroEngineering Lab
CSC 594 Topics in AI – Natural Language Processing
1.
Hidden Markov Model LR Rabiner
Hidden Markov Models (HMMs)
Summarized by Kim Jin-young
Handwritten Characters Recognition Based on an HMM Model
Algorithms of POS Tagging
Introduction to HMM (cont)
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

Hidden Markov Models (HMM) Rabiner’s Paper Markoviana Reading Group Computer Eng. & Science Dept. Arizona State University

Stationary and Non-stationary Stationary Process: Its statistical properties do not vary with time Non-stationary Process: The signal properties vary over time Markoviana Reading Group Fatih Gelgi – Feb, 2005

HMM Example - Casino Coin 0.9 Two CDF tables 0.2 0.1 Fair Unfair State transition Pbbties. States 0.8 Symbol emission Pbbties. 0.5 0.5 0.3 0.7 Observation Symbols H T H T Observation Sequence HTHHTTHHHTHTHTHHTHHHHHHTHTHH FFFFFFUUUFFFFFFUUUUUUUFFFFFF State Sequence Motivation: Given a sequence of H & Ts, can you tell at what times the casino cheated? Markoviana Reading Group Fatih Gelgi – Feb, 2005

Properties of an HMM First-order Markov process Time is discrete qt only depends on qt-1 Time is discrete Markoviana Reading Group Fatih Gelgi – Feb, 2005

Elements of an HMM N, the number of States M, the number of Symbols States S1, S2, … SN Observation Symbols O1, O2, … OM l, the Probability Distributions a, b, p Markoviana Reading Group Fatih Gelgi – Feb, 2005

HMM Basic Problems Given an observation sequence O=O1O2O3…OT and l, find P(O|l) Forward Algorithm / Backward Algorithm Given O=O1O2O3…OT and l, find most likely state sequence Q=q1q2…qT Viterbi Algorithm Given O=O1O2O3…OT and l, re-estimate l so that P(O|l) is higher than it is now Baum-Welch Re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

Forward Algorithm Illustration at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

Forward Algorithm Illustration (cont’d) at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Total of this column gives solution State Sj SN pNbN(O1) S (a1(i) aiN) bN(O2) … S6 p6b6(O1) S (a1(i) ai6) b6(O2) S5 p5b5(O1) S (a1(i) ai5) b5(O2) S4 p4b4(O1) S (a1(i) ai4) b4(O2) S3 p3b3(O1) S (a1(i) ai3) b3(O2) S2 p2b2(O1) S (a1(i) ai2) b2(O2) S1 p1b1(O1) S (a1(i) ai1) b1(O2) at(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

Forward Algorithm Definition: Initialization: Induction: Problem 1 Answer: at(i) is the probability of observing a partial sequence O1O2O3…Ot such that the state Si. Complexity: O(N2T) Markoviana Reading Group Fatih Gelgi – Feb, 2005

Backward Algorithm Illustration t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

Backward Algorithm Definition: Initialization: Induction: t(i) is the probability of observing a partial sequence Ot+1Ot+2Ot+3…OT such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

Q2: Optimality Criterion 1 * Maximize the expected number of correct individual states Definition: Initialization: Problem 2 Answer: t(i) is the probability of being in state Si at time t given the observation sequence O and the model . Problem: If some aij=0, the optimal state sequence may not even be a valid state sequence. Markoviana Reading Group Fatih Gelgi – Feb, 2005

Q2: Optimality Criterion 2 * Find the single best state sequence (path), i.e. maximize P(Q|O,). Definition: dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Markoviana Reading Group Fatih Gelgi – Feb, 2005

Viterbi Algorithm The major difference from the forward algorithm: Maximization instead of sum Markoviana Reading Group Fatih Gelgi – Feb, 2005

Viterbi Algorithm Illustration dt(i) is the highest probability of a state path for the partial observation sequence O1O2O3…Ot such that the state Si. Max of this col indicates traceback start State Sj SN pN bN(O1) max [d1(i) aiN] bN(O2) … S6 p6 b6(O1) max [d1(i) ai6] b6(O2) S5 p5 b5(O1) max [d1(i) ai5] b5(O2) S4 p4 b4(O1) max [d1(i) ai4] b4(O2) S3 p3 b3(O1) max [d1(i) ai3] b3(O2) S2 p2 b2(O1) max [d1(i) ai2] b2(O2) S1 p1 b1(O1) max [d1(i) ai1] b1(O2) dt(j) O1 O2 O3 O4 OT Observations Ot Markoviana Reading Group Fatih Gelgi – Feb, 2005

Relations with DBN Forward Function: Backward Function: Viterbi Algorithm: t+1(j) bj(Ot+1) aij t(i) t(i) bj(Ot+1) t+1(j) aij T(i)=1 t+1(j) bj(Ot+1) aij t(i) Markoviana Reading Group Fatih Gelgi – Feb, 2005

Some more definitions gt(i) is the probability of being in state Si at time t xt(i,j) is the probability of being in state Si at time t, and Sj at time t+1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

Baum-Welch Re-estimation Expectation-Maximization Algorithm Expectation: Markoviana Reading Group Fatih Gelgi – Feb, 2005

Baum-Welch Re-estimation (cont’d) Maximization: Markoviana Reading Group Fatih Gelgi – Feb, 2005

Notes on the Re-estimation If the model does not change, it means that it has reached a local maxima. Depending on the model, many local maxima can exist Re-estimated probabilities will sum to 1 Markoviana Reading Group Fatih Gelgi – Feb, 2005

Implementation issues Scaling Multiple observation sequences Initial parameter estimation Missing data Choice of model size and type Markoviana Reading Group Fatih Gelgi – Feb, 2005

Scaling calculation: Recursion to calculate: Markoviana Reading Group Fatih Gelgi – Feb, 2005

Scaling (cont’d) calculation: Desired condition: * Note that is not true! Markoviana Reading Group Fatih Gelgi – Feb, 2005

Scaling (cont’d) Markoviana Reading Group Fatih Gelgi – Feb, 2005

Maximum log-likelihood Initialization: Recursion: Termination: Markoviana Reading Group Fatih Gelgi – Feb, 2005

Multiple observations sequences Problem with re-estimation Markoviana Reading Group Fatih Gelgi – Feb, 2005

Initial estimates of parameters For  and A, Random or uniform is sufficient For B (discrete symbol prb.), Good initial estimate is needed Markoviana Reading Group Fatih Gelgi – Feb, 2005

Insufficient training data Solutions: Increase the size of training data Reduce the size of the model Interpolate parameters using another model Markoviana Reading Group Fatih Gelgi – Feb, 2005

References L Rabiner. ‘A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition.’ Proceedings of the IEEE 1989. S Russell, P Norvig. ‘Probabilistic Reasoning Over Time’. AI: A Modern Approach, Ch.15, 2002 (draft). V Borkar, K Deshmukh, S Sarawagi. ‘Automatic segmentation of text into structured records.’ ACM SIGMOD 2001. T Scheffer, C Decomain, S Wrobel. ‘Active Hidden Markov Models for Information Extraction.’ Proceedings of the International Symposium on Intelligent Data Analysis 2001. S Ray, M Craven. ‘Representing Sentence Structure in Hidden Markov Models for Information Extraction.’  Proceedings of the 17th International Joint Conference on Artificial Intelligence 2001. Markoviana Reading Group Fatih Gelgi – Feb, 2005