Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.

Slides:



Advertisements
Similar presentations
CS344 : Introduction to Artificial Intelligence
Advertisements

CS626: NLP, Speech and the Web
Hidden Markov Models (HMM) Rabiner’s Paper
Angelo Dalli Department of Intelligent Computing Systems
Planning Planning is fundamental to “intelligent” behaviour. E.g.
All rights reserved ©L. Manevitz Lecture 61 Artificial Intelligence Planning System L. Manevitz.
Planning Russell and Norvig: Chapter 11 Slides adapted from: robotics.stanford.edu/~latombe/cs121/2003/ home.htm.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Apaydin slides with a several modifications and additions by Christoph Eick.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Planning Russell and Norvig: Chapter 11. Planning Agent environment agent ? sensors actuators A1A2A3.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Doug Downey, adapted from Bryan Pardo,Northwestern University
Hidden Markov Models David Meir Blei November 1, 1999.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 6-7: Hidden Markov Model 18.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
MACHINE LEARNING 16. HMM. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Modeling dependencies.
Hidden Markov Models BMI/CS 576
Learning, Uncertainty, and Information: Learning Parameters
Hidden Markov Models (HMMs)
Introduction Contents Sungwook Yoon, Postdoctoral Research Associate
CHAPTER 15: Hidden Markov Models
CS344 : Introduction to Artificial Intelligence
CSC 594 Topics in AI – Natural Language Processing
CS344 : Introduction to Artificial Intelligence
Hidden Markov Models (HMMs)
Hidden Markov Models Part 2: Algorithms
CHAPTER 15: Hidden Markov Models
Hidden Markov Model LR Rabiner
CS621: Artificial Intelligence
Hidden Markov Models (HMMs)
Hassanin M. Al-Barhamtoshy
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
CS621: Artificial Intelligence
Algorithms of POS Tagging
CPSC 503 Computational Linguistics
CS344 : Introduction to Artificial Intelligence
Russell and Norvig: Chapter 11 CS121 – Winter 2003
Introduction to HMM (cont)
CS : NLP, Speech and Web-Topics-in-AI
Hidden Markov Models By Manish Shrivastava.
CSCI 5582 Artificial Intelligence
Prof. Pushpak Bhattacharyya, IIT Bombay
Presentation transcript:

Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay

Example : Blocks World STRIPS : A planning system – Has rules with precondition deletion list and addition list Robot hand Robot hand A C B A B C START GOAL on(B, table) on(A, table) on(C, A) hand empty clear(C) clear(B) on(C, table) on(B, C) on(A, B) hand empty clear(A)

Rules R1 : pickup(x) Precondition & Deletion List : handempty, on(x,table), clear(x) Add List : holding(x) R2 : putdown(x) Precondition & Deletion List : holding(x) Add List : handempty, on(x,table), clear(x)

Rules R3 : stack(x,y) Precondition & Deletion List :holding(x), clear(y) Add List : on(x,y), clear(x), handempty R4 : unstack(x,y) Precondition & Deletion List : on(x,y), clear(x),handempty Add List : holding(x), clear(y)

Plan for the block world problem For the given problem, Start  Goal can be achieved by the following sequence : Unstack(C,A) Putdown(C) Pickup(B) Stack(B,C) Pickup(A) Stack(A,B) Execution of a plan: achieved through a data structure called Triangular Table.

(discussion based on the book “Automated Planning” by Dana Nau) Why Probability? (discussion based on the book “Automated Planning” by Dana Nau)

Motivation c a b In many situations, actions may have more than one possible outcome Action failures e.g., gripper drops its load Exogenous events e.g., road closed Would like to be able to plan in such situations One approach: Markov Decision Processes Intended outcome c a b Grasp block c a b c Unintended outcome

Stochastic Systems Stochastic system: a triple  = (S, A, P) S = finite set of states A = finite set of actions Pa (s | s) = probability of going to s if we execute a in s s  S Pa (s | s) = 1

Example Robot r1 starts at location l1 Objective is to get r1 to State s1 in the diagram Objective is to get r1 to location l4 State s4 in the diagram Start Goal

Example No classical plan (sequence of actions) can be a solution, because we can’t guarantee we’ll be in a state where the next action is applicable e.g., π = move(r1,l1,l2), move(r1,l2,l3), move(r1,l3,l4) Start Goal

Another Example A colored ball choosing example : Urn 1 # of Red = 30 # of Green = 50 # of Blue = 20 Urn 3 # of Red =60 # of Green =10 # of Blue = 30 Urn 2 # of Red = 10 # of Green = 40 # of Blue = 50 Probability of transition to another Urn after picking a ball: U1 U2 U3 0.1 0.4 0.5 0.6 0.2 0.3

Example (contd.) R G B U1 0.3 0.5 0.2 U2 0.1 0.4 U3 0.6 U1 U2 U3 0.1 Given : and Observation : RRGGBRGR State Sequence : ?? Not so Easily Computable.

Example (contd.) Here : For observation: And State sequence π is S = {U1, U2, U3} V = { R,G,B} For observation: O ={o1… on} And State sequence Q ={q1… qn} π is U1 U2 U3 0.1 0.4 0.5 0.6 0.2 0.3 A = R G B U1 0.3 0.5 0.2 U2 0.1 0.4 U3 0.6 B=

Hidden Markov Models

Model Definition Set of states : S where |S|=N Output Alphabet : V Transition Probabilities : A = {aij} Emission Probabilities : B = {bj(ok)} Initial State Probabilities : π

Markov Processes Properties Limited Horizon :Given previous n states, a state i, is independent of preceding 0…i-n+1 states. P(Xt=i|Xt-1, Xt-2 ,… X0) = P(Xt=i|Xt-1, Xt-2… Xt-n) Time invariance : P(Xt=i|Xt-1=j) = P(X1=i|X0=j) = P(Xn=i|X0-1=j)

Three Basic Problems of HMM Given Observation Sequence O ={o1… oT} Efficiently estimate P(O|λ) Get best Q ={q1… qT} i.e. Maximize P(Q|O, λ) How to adjust to best maximize Re-estimate λ

Three basic problems (contd.) Problem 1: Likelihood of a sequence Forward Procedure Backward Procedure Problem 2: Best state sequence Viterbi Algorithm Problem 3: Re-estimation Baum-Welch ( Forward-Backward Algorithm )

Problem 2 Given Observation Sequence O ={o1… oT} Solution : Get “best” Q ={q1… qT} i.e. Solution : Best state individually likely at a position i Best state given all the previously observed states and observations Viterbi Algorithm

Example Output observed – aabb What state seq. is most probable? Since state seq. cannot be predicted with certainty, the machine is given qualification “hidden”. Note: ∑ P(outlinks) = 1 for all states

Probabilities for different possible seq 1 1,2 1,1 0.4 0.15 1,1,2 0.06 1,1,1 0.16 1,2,1 0.0375 1,2,2 0.0225 1,1,1,1 0.016 1,1,1,2 0.056 1,1,2,1 0.018 1,1,2,2 0.018 ...and so on

P(si|si-1, si-2) (order 2 HMM) Viterbi for higher order HMM If P(si|si-1, si-2) (order 2 HMM) then the Markovian assumption will take effect only after two levels. (generalizing for n-order… after n levels)

Forward and Backward Probability Calculation

A Simple HMM r q a: 0.2 a: 0.3 b: 0.2 b: 0.1 a: 0.2 b: 0.1 b: 0.5

Forward or α-probabilities Let αi(t) be the probability of producing w1,t-1, while ending up in state si αi(t)= P(w1,t-1,St=si), t>1

Initial condition on αi(t) 1.0 if i=1 αi(t)= 0 otherwise

Probability of the observation using αi(t) P(w1,n) =Σ1 σ P(w1,n, Sn+1=si) = Σi=1 σ αi(n+1) σ is the total number of states

Recursive expression for α αj(t+1) =P(w1,t, St+1=sj) =Σi=1 σ P(w1,t, St=si, St+1=sj) =Σi=1 σ P(w1,t-1, St=sj) P(wt, St+1=sj|w1,t-1, St=si) =Σi=1 σ P(w1,t-1, St=si) P(wt, St+1=sj|St=si) = Σi=1 σ αj(t) P(wt, St+1=sj|St=si)

The forward probabilities of “bbba” Time Ticks 1 2 3 4 5 INPUT ε b bb bbb bbba 1.0 0.2 0.05 0.017 0.0148 0.0 0.1 0.07 0.04 0.0131 P(w,t) 0.3 0.12 0.057 0.0279

Backward or β-probabilities Let βi(t) be the probability of seeing wt,n, given that the state of the HMM at t is si βi(t)= P(wt,n,St=si)

Probability of the observation using β P(w1,n)=β1(1)

Recursive expression for β βj(t-1) =P(wt-1,n |St-1=sj) =Σj=1 σ P(wt-1,n, St=si |St-1=si) =Σi=1 σ P(wt-1, St=sj|St-1=si) P(wt,n,|wt-1,St=sj, St-1=si) =Σi=1 σ P(wt-1, St=sj|St-1=si) P(wt,n, |St=sj) (consequence of Markov Assumption) = Σj=1 σ P(wt-1, St=sj|St-1=si) βj(t)

Problem 1 of the three basic problems

Problem 1 (contd) Order 2TNT Definitely not efficient!! Is there a method to tackle this problem? Yes. Forward or Backward Procedure

Forward Procedure Forward Step:

Forward Procedure

Backward Procedure

Backward Procedure

Forward Backward Procedure Benefit Order N2T as compared to 2TNT for simple computation Only Forward or Backward procedure needed for Problem 1

Problem 2 Given Observation Sequence O ={o1… oT} Solution : Get “best” Q ={q1… qT} i.e. Solution : Best state individually likely at a position i Best state given all the previously observed states and observations Viterbi Algorithm

Viterbi Algorithm Define such that, i.e. the sequence which has the best joint probability so far. By induction, we have,

Viterbi Algorithm

Viterbi Algorithm

Problem 3 How to adjust to best maximize Solutions : Re-estimate λ To re-estimate (iteratively update and improve) HMM parameters A,B, π Use Baum-Welch algorithm

Baum-Welch Algorithm Define Putting forward and backward variables

Baum-Welch algorithm

Define Then, expected number of transitions from Si And, expected number of transitions from Sj to Si

Baum-Welch Algorithm Baum et al have proved that the above equations lead to a model as good or better than the previous