CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.

Slides:



Advertisements
Similar presentations
CS344 : Introduction to Artificial Intelligence
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Planning Planning is fundamental to “intelligent” behaviour. E.g.
All rights reserved ©L. Manevitz Lecture 61 Artificial Intelligence Planning System L. Manevitz.
Planning Russell and Norvig: Chapter 11 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2003/ home.htm.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
Apaydin slides with a several modifications and additions by Christoph Eick.
INTRODUCTION TO Machine Learning 3rd Edition
… Hidden Markov Models Markov assumption: Transition model:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Planning Russell and Norvig: Chapter 11. Planning Agent environment agent ? sensors actuators A1A2A3.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
THE HIDDEN MARKOV MODEL (HMM)
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
HMM - Basics.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
NLP. Introduction to NLP Sequence of random variables that aren’t independent Examples –weather reports –text.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Pattern Recognition and Machine Learning-Chapter 13: Sequential Data
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 6-7: Hidden Markov Model 18.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models BMI/CS 576
Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.
CS344 : Introduction to Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
Hidden Markov Model LR Rabiner
CS621: Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
Algorithms of POS Tagging
CS344 : Introduction to Artificial Intelligence
Russell and Norvig: Chapter 11 CS121 – Winter 2003
Hidden Markov Models By Manish Shrivastava.
Prof. Pushpak Bhattacharyya, IIT Bombay
Presentation transcript:

CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models

Example : Blocks World STRIPS : A planning system – Has rules with precondition deletion list and addition list on(B, table) on(A, table) on(C, A) hand empty clear(C) clear(B) on(C, table) on(B, C) on(A, B) hand empty clear(A) A C A CB B STARTGOAL Robot hand

Rules R1 : pickup(x) Precondition & Deletion List : handempty, on(x,table), clear(x) Add List : holding(x) R2 : putdown(x) Precondition & Deletion List : holding(x) Add List : handempty, on(x,table), clear(x)

Rules R3 : stack(x,y) Precondition & Deletion List :holding(x), clear(y) Add List : on(x,y), clear(x), handempty R4 : unstack(x,y) Precondition & Deletion List : on(x,y), clear(x),handempty Add List : holding(x), clear(y)

Plan for the block world problem For the given problem, Start  Goal can be achieved by the following sequence : 1.Unstack(C,A) 2.Putdown(C) 3.Pickup(B) 4.Stack(B,C) 5.Pickup(A) 6.Stack(A,B) Execution of a plan: achieved through a data structure called Triangular Table.

Why Probability? (discussion based on the book “Automated Planning” by Dana Nau)

Motivation In many situations, actions may have more than one possible outcome – Action failures e.g., gripper drops its load – Exogenous events e.g., road closed Would like to be able to plan in such situations One approach: Markov Decision Processes a c b Grasp block c a c b Intended outcome abc Unintended outcome

Stochastic Systems Stochastic system: a triple  = (S, A, P) – S = finite set of states – A = finite set of actions – P a (s | s) = probability of going to s if we execute a in s –  s  S P a (s | s) = 1

Robot r1 starts at location l1 – State s1 in the diagram Objective is to get r1 to location l4 – State s4 in the diagram Goal Start Example

No classical plan (sequence of actions) can be a solution, because we can’t guarantee we’ll be in a state where the next action is applicable – e.g., π =  move(r1,l1,l2), move(r1,l2,l3), move(r1,l3,l4)  Goal Start Example

Another Example Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 A colored ball choosing example : U1U2U3 U U U Probability of transition to another Urn after picking a ball:

Example (contd.) U1U2U3 U U U Given : Observation : RRGGBRGR State Sequence : ?? Not so Easily Computable. and RGB U U U

Example (contd.) Here : – S = {U1, U2, U3} – V = { R,G,B} For observation: – O ={o 1 … o n } And State sequence – Q ={q 1 … q n } π is U1U2U3 U U U RGB U U U A = B=

Hidden Markov Models

Model Definition Set of states : S where |S|=N Output Alphabet : V Transition Probabilities : A = {a ij } Emission Probabilities : B = {b j (o k )} Initial State Probabilities : π

Markov Processes Properties – Limited Horizon :Given previous n states, a state i, is independent of preceding 0…i-n+1 states. P(X t =i|X t-1, X t-2,… X 0 ) = P(X t =i|X t-1, X t-2 … X t-n ) – Time invariance : P(X t =i|X t-1 =j) = P(X 1 =i|X 0 =j) = P(X n =i|X 0-1 =j)

Three Basic Problems of HMM 1.Given Observation Sequence O ={o 1 … o T } – Efficiently estimate P(O|λ) 2.Given Observation Sequence O ={o 1 … o T } – Get best Q ={q 1 … q T } i.e. Maximize P(Q|O, λ) 3.How to adjust to best maximize – Re-estimate λ

Three basic problems (contd.) Problem 1: Likelihood of a sequence – Forward Procedure – Backward Procedure Problem 2: Best state sequence – Viterbi Algorithm Problem 3: Re-estimation – Baum-Welch ( Forward-Backward Algorithm )

Problem 2 Given Observation Sequence O ={o 1 … o T } – Get “best” Q ={q 1 … q T } i.e. Solution : 1.Best state individually likely at a position i 2.Best state given all the previously observed states and observations  Viterbi Algorithm

Example Output observed – aabb What state seq. is most probable? Since state seq. cannot be predicted with certainty, the machine is given qualification “hidden”. Note: ∑ P(outlinks) = 1 for all states

Probabilities for different possible seq 1 1,2 1, ,1, ,1, ,2, ,2, ,1,1, ,1,1, and so on 1,1,2, ,1,2,

If P(s i |s i-1, s i-2 ) (order 2 HMM) then the Markovian assumption will take effect only after two levels. (generalizing for n-order… after n levels) Viterbi for higher order HMM

Forward and Backward Probability Calculation

A Simple HMM q r a: 0.3 b: 0.1 a: 0.2 b: 0.1 b: 0.2 b: 0.5 a: 0.2 a: 0.4

Forward or α-probabilities Let α i (t) be the probability of producing w 1,t-1, while ending up in state s i α i (t)= P(w 1,t-1,S t =s i ), t>1

Initial condition on α i (t) α i (t)= 1.0 if i=1 0 otherwise

Probability of the observation using α i (t) P(w 1,n ) =Σ 1 σ P(w 1,n, S n+1 =s i ) = Σ i=1 σ α i (n+1) σ is the total number of states

Recursive expression for α α j (t+1) =P(w 1,t, S t+1 =s j ) =Σ i=1 σ P(w 1,t, S t =s i, S t+1 =s j ) =Σ i=1 σ P(w 1,t-1, S t =s j ) P(w t, S t+1 =s j |w 1,t-1, S t =s i ) =Σ i=1 σ P(w 1,t-1, S t =s i ) P(w t, S t+1 =s j |S t =s i ) = Σ i=1 σ α j (t) P(w t, S t+1 =s j |S t =s i )

Time Ticks INPUTεbbb bbb bbba P(w,t) The forward probabilities of “bbba”

Backward or β-probabilities Let β i (t) be the probability of seeing w t,n, given that the state of the HMM at t is s i β i (t)= P(w t,n,S t =s i )

Probability of the observation using β P(w 1,n )=β 1 (1)

Recursive expression for β β j (t-1) =P(w t-1,n |S t-1 =s j ) =Σ j=1 σ P(w t-1,n, S t =s i | S t-1 =s i ) =Σ i=1 σ P(w t-1, S t =s j |S t-1 =s i ) P(w t,n,|w t-1,S t =s j, S t-1 =s i ) =Σ i=1 σ P(w t-1, S t =s j |S t-1 =s i ) P(w t,n, |S t =s j ) (consequence of Markov Assumption) = Σ j=1 σ P(w t-1, S t =s j |S t-1 =s i ) β j (t)

Problem 1 of the three basic problems

Problem 1 (contd) Order 2TN T Definitely not efficient!! Is there a method to tackle this problem? Yes. – Forward or Backward Procedure

Forward Procedure Forward Step:

Forward Procedure

Backward Procedure

Forward Backward Procedure Benefit – Order N 2 T as compared to 2TN T for simple computation Only Forward or Backward procedure needed for Problem 1

Problem 2 Given Observation Sequence O ={o 1 … o T } – Get “best” Q ={q 1 … q T } i.e. Solution : 1.Best state individually likely at a position i 2.Best state given all the previously observed states and observations  Viterbi Algorithm

Viterbi Algorithm Define such that, i.e. the sequence which has the best joint probability so far. By induction, we have,

Viterbi Algorithm

Problem 3 How to adjust to best maximize – Re-estimate λ Solutions : – To re-estimate (iteratively update and improve) HMM parameters A,B, π Use Baum-Welch algorithm

Baum-Welch Algorithm Define Putting forward and backward variables

Baum-Welch algorithm

Define Then, expected number of transitions from S i And, expected number of transitions from S j to S i

Baum-Welch Algorithm Baum et al have proved that the above equations lead to a model as good or better than the previous