CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.

Slides:



Advertisements
Similar presentations
CS344 : Introduction to Artificial Intelligence
Advertisements

Hidden Markov Model 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction – Markov Chain – Hidden Markov Model (HMM) Formal Definition of HMM & Problems Estimate.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Apaydin slides with a several modifications and additions by Christoph Eick.
Albert Gatt Corpora and Statistical Methods Lecture 8.
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
FSA and HMM LING 572 Fei Xia 1/5/06.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
Ch10 HMM Model 10.1 Discrete-Time Markov Process 10.2 Hidden Markov Models 10.3 The three Basic Problems for HMMS and the solutions 10.4 Types of HMMS.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
THE HIDDEN MARKOV MODEL (HMM)
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 6-7: Hidden Markov Model 18.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Lecture 16, CS5671 Hidden Markov Models (“Carnivals with High Walls”) States (“Stalls”) Emission probabilities (“Odds”) Transitions (“Routes”) Sequences.
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.
Hidden Markov Model LR Rabiner
CS621: Artificial Intelligence
CS621: Artificial Intelligence
CS621: Artificial Intelligence
Algorithms of POS Tagging
CS : NLP, Speech and Web-Topics-in-AI
Hidden Markov Models By Manish Shrivastava.
CSCI 5582 Artificial Intelligence
Presentation transcript:

CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010

HMM Definition Set of states : S where |S|=N Output Alphabet : O where |O|=K Transition Probabilities : A = {a ij } Emission Probabilities : B = {b j (o k )} Initial State Probabilities : π

Markov Processes Properties Limited Horizon: Given previous t states, a state i, is independent of preceding 0 to t- k+1 states. P(X t =i|X t-1, X t-2,… X 0 ) = P(X t =i|X t-1, X t-2 … X t-k ) Order k Markov process Time invariance: (shown for k=1) P(X t =i|X t-1 =j) = P(X 1 =i|X 0 =j) …= P(X n =i|X n-1 =j)

Three basic problems (contd.) Problem 1: Likelihood of a sequence Forward Procedure Backward Procedure Problem 2: Best state sequence Viterbi Algorithm Problem 3: Re-estimation Baum-Welch ( Forward-Backward Algorithm )

Probabilistic Inference O: Observation Sequence S: State Sequence Given O find S * where called Probabilistic Inference Infer “Hidden” from “Observed” How is this inference different from logical inference based on propositional or predicate calculus?

Essentials of Hidden Markov Model 1. Markov + Naive Bayes 2. Uses both transition and observation probability 3. Effectively makes Hidden Markov Model a Finite State Machine (FSM) with probability

Probability of Observation Sequence Without any restriction, Search space size= |S| |O|

Continuing with the Urn example Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 Colored Ball choosing

Example (contd.) U1U1 U2U2 U3U3 U1U U2U U3U Given : Observation : RRGGBRGR What is the corresponding state sequence ? and RGB U1U U2U U3U Transition Probability Observation/output Probability

Diagrammatic representation (1/2) U1U1 U2U2 U3U R, 0.6 G, 0.1 B, 0.3 R, 0.1 B, 0.5 G, 0.4 B, 0.2 R, 0.3 G, 0.5

Diagrammatic representation (2/2) U1U1 U2U2 U3U3 R,0.02 G,0.08 B,0.10 R,0.24 G,0.04 B,0.12 R,0.06 G,0.24 B,0.30 R, 0.08 G, 0.20 B, 0.12 R,0.15 G,0.25 B,0.10 R,0.18 G,0.03 B,0.09 R,0.18 G,0.03 B,0.09 R,0.02 G,0.08 B,0.10 R,0.03 G,0.05 B,0.02

Observations and states O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 OBS: RRG G B R G R State: S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S i = U 1 /U 2 /U 3 ; A particular state S: State sequence O: Observation sequence S* = “best” possible state (urn) sequence Goal: Maximize P(S*|O) by choosing “best” S

Goal Maximize P(S|O) where S is the State Sequence and O is the Observation Sequence

Baye’s Theorem P(A) -: Prior P(B|A) -: Likelihood

State Transitions Probability By Markov Assumption (k=1)

Observation Sequence probability Assumption that ball drawn depends only on the Urn chosen

Grouping terms P(S).P(O|S) =[P(O 0 |S 0 ).P(S 1 |S 0 )]. [P(O 1 |S 1 ).P(S 2 |S 1 )]. [P(O 2 |S 2 ).P(S 3 |S 2 )]. [P(O 3 |S 3 ).P(S 4 |S 3 )]. [P(O 4 |S 4 ).P(S 5 |S 4 )]. [P(O 5 |S 5 ).P(S 6 |S 5 )]. [P(O 6 |S 6 ).P(S 7 |S 6 )]. [P(O 7 |S 7 ).P(S 8 |S 7 )]. [P(O 8 |S 8 ).P(S 9 |S 8 )]. We introduce the states S 0 and S 9 as initial and final states respectively. After S 8 the next state is S 9 with probability 1, i.e., P(S 9 |S 8 )=1 O 0 is ε-transition O 0 O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 Obs: ε RRG G B R G R State: S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9

Introducing useful notation S0S0 S1S1 S8S8 S7S7 S9S9 S2S2 S3S3 S4S4 S5S5 S6S6 O 0 O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 Obs: ε RRG G B R G R State: S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 ε R R GGBR G R P(O k |S k ).P(S k+1 |S k )=P(S k  S k+1 ) OkOk

Viterbi Algorithm for the Urn problem (first two symbols) S0S0 U1U1 U2U2 U3U U1U1 U2U2 U3U U1U1 U2U2 U3U3 U1U1 U2U2 U3U * *0.036 *: winner sequences ε R

Markov process of order>1 (say 2) Same theory works P(S).P(O|S) =P(O 0 |S 0 ).P(S 1 |S 0 ). [P(O 1 |S 1 ).P(S 2 |S 1 S 0 )]. [P(O 2 |S 2 ).P(S 3 |S 2 S 1 )]. [P(O 3 |S 3 ).P(S 4 |S 3 S 2 )]. [P(O 4 |S 4 ).P(S 5 |S 4 S 3 )]. [P(O 5 |S 5 ).P(S 6 |S 5 S 4 )]. [P(O 6 |S 6 ).P(S 7 |S 6 S 5 )]. [P(O 7 |S 7 ).P(S 8 |S 7 S 6 )]. [P(O 8 |S 8 ).P(S 9 |S 8 S 7 )]. We introduce the states S 0 and S 9 as initial and final states respectively. After S 8 the next state is S 9 with probability 1, i.e., P(S 9 |S 8 S 7 )=1 O 0 is ε-transition O 0 O 1 O 2 O 3 O 4 O 5 O 6 O 7 O 8 Obs: ε RRG G B R G R State: S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9

Adjustments Transition probability table will have tuples on rows and states on columns Output probability table will remain the same In the Viterbi tree, the Markov process will take effect from the 3 rd input symbol (εRR) There will be 27 leaves, out of which only 9 will remain Sequences ending in same tuples will be compared Instead of U1, U2 and U3 U 1 U 1, U 1 U 2, U 1 U 3, U 2 U 1, U 2 U 2,U 2 U 3, U 3 U 1,U 3 U 2,U 3 U 3

Probabilistic FSM (a 1 :0.3) (a 2 :0.4) (a 1 :0.2) (a 2 :0.3) (a 1 :0.1) (a 2 :0.2) (a 1 :0.3) (a 2 :0.2) The question here is: “what is the most likely state sequence given the output sequence seen” S1S1 S2S2

Developing the tree Start S1S2S1S2S1S2S1S2S1S *0.1= *0.2= *0.4= *0.3= *0.2= € a1a1 a2a2 Choose the winning sequence per state per iteration

Tree structure contd… S1S2S1S2S1S *0.1= S S S S a1a1 a2a2 The problem being addressed by this tree is a1-a2-a1-a2 is the output sequence and μ the model or the machine

Path found : (working backward) S1S1 S2S2 S1S1 S2S2 S1S1 a2a2 a1a1 a1a1 a2a2 Problem statement : Find the best possible sequence Start symbolState collectionAlphabet set Transitions T is defined as

Tabular representation of the tree €a1a1 a2a2 a1a1 a2a2 S1S1 1.0(1.0*0.1,0.0*0.2 )=(0.1,0.0) (0.02, 0.09) (0.009, 0.012)(0.0024, ) S2S2 0.0(1.0*0.3,0.0*0.3 )=(0.3,0.0) (0.04,0.0 6) (0.027,0.018)(0.0048, ) Ending state Latest symbol observed Note: Every cell records the winning probability ending in that state Final winner The bold faced values in each cell shows the sequence probability ending in that state. Going backward from final winner sequence which ends in state S2 (indicated By the 2 nd tuple), we recover the sequence.

Algorithm (following James Alan, Natural Language Understanding (2 nd edition), Benjamin Cummins (pub.), 1995 Given: 1. The HMM, which means: a. Start State: S 1 b. Alphabet: A = {a 1, a 2, … a p } c. Set of States: S = {S 1, S 2, … S n } d. Transition probability which is equal to 2. The output string a 1 a 2 …a T To find: The most likely sequence of states C 1 C 2 …C T which produces the given output sequence, i.e., C 1 C 2 …C T =

Algorithm contd… Data Structure: 1. A N*T array called SEQSCORE to maintain the winner sequence always (N=#states, T=length of o/p sequence) 2. Another N*T array called BACKPTR to recover the path. Three distinct steps in the Viterbi implementation 1. Initialization 2. Iteration 3. Sequence Identification

1.Initialization SEQSCORE(1,1)=1.0 BACKPTR(1,1)=0 For(i=2 to N) do SEQSCORE(i,1)=0.0 [ expressing the fact that first state is S 1 ] 2.Iteration For(t=2 to T) do For(i=1 to N) do SEQSCORE(i,t) = Max (j=1,N) BACKPTR(I,t) = index j that gives the MAX above

3.Seq. Identification C(T) = i that maximizes SEQSCORE(i,T) For i from (T-1) to 1 do C(i) = BACKPTR[C(i+1),(i+1)] Optimizations possible: 1.BACKPTR can be 1*T 2.SEQSCORE can be T*2 Homework:- Compare this with A*, Beam Search [Homework] Reason for this comparison: Both of them work for finding and recovering sequence