Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Hidden Markov Models (HMM) Rabiner’s Paper
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Modified from:
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Hidden Markov Models Ellen Walker Bioinformatics Hiram College, 2008.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Chapter 6: HIDDEN MARKOV AND MAXIMUM ENTROPY Heshaam Faili University of Tehran.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Natural Language Processing Spring 2007 V. “Juggy” Jagannathan.
Hidden Markov Models in NLP
Albert Gatt Corpora and Statistical Methods Lecture 8.
INTRODUCTION TO Machine Learning 3rd Edition
Part II. Statistical NLP Advanced Artificial Intelligence Probabilistic Context Free Grammars Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Part II. Statistical NLP Advanced Artificial Intelligence Hidden Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
FSA and HMM LING 572 Fei Xia 1/5/06.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
. Hidden Markov Model Lecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Chapter 3 (part 3): Maximum-Likelihood and Bayesian Parameter Estimation Hidden Markov Model: Extension of Markov Chains All materials used in this course.
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
M ARKOV M ODELS & POS T AGGING Nazife Dimililer 23/10/2012.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Part of Speech Tagging & Hidden Markov Models Mitch Marcus CSE 391.
Hidden Markov Models for Information Extraction CSE 454.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
CSA3202 Human Language Technology HMMs for POS Tagging.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
Hidden Markovian Model. Some Definitions Finite automation is defined by a set of states, and a set of transitions between states that are taken based.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CPS 170: Artificial Intelligence Markov processes and Hidden Markov Models (HMMs) Instructor: Vincent Conitzer.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
Albert Gatt Corpora and Statistical Methods. Acknowledgement Some of the examples in this lecture are taken from a tutorial on HMMs by Wolgang Maass.
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
1 Hidden Markov Model Xiaole Shirley Liu STAT115, STAT215.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Hassanin M. Al-Barhamtoshy
Presentation transcript:

Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most slides taken (or adapted) from David Meir Blei, Figures from Manning and Schuetze and from Rabiner

Contents  Markov Models  Hidden Markov Models Three problems - three algorithms  Decoding  Viterbi  Baum-Welsch  Next chapter Application to part-of-speech-tagging (POS-tagging) Largely chapter 9 of Statistical NLP, Manning and Schuetze, or Rabiner, A tutorial on HMMs and selected applications in Speech Recognition, Proc. IEEE

Motivations and Applications  Part-of-speech tagging The representative put chairs on the table AT NN VBD NNS IN AT NN AT JJ NN VBZ IN AT NN  Some tags : AT: article, NN: singular or mass noun, VBD: verb, past tense, NNS: plural noun, IN: preposition, JJ: adjective

Bioinformatics  Durbin et al. Biological Sequence Analysis, Cambridge University Press.  Several applications, e.g. proteins  From primary structure ATCPLELLLD  Infer secondary structure HHHBBBBBC..

Other Applications  Speech Recognition: from From: Acoustic signals infer Infer: Sentence  Robotics: From Sensory readings Infer Trajectory / location …

What is a (Visible) Markov Model ?  Graphical Model (Can be interpreted as Bayesian Net)  Circles indicate states  Arrows indicate probabilistic dependencies between states  State depends only on the previous state  “The past is independent of the future given the present.”  Recall from introduction to N-gramms !!!

Markov Model Formalization SSSSS  {S,   S : {s 1 …s N } are the values for the hidden states Limited Horizon (Markov Assumption) Time Invariant (Stationary) Transition Matrix A

Markov Model Formalization SSSSS  {S,   S : {s 1 …s N } are the values for the hidden states     are the initial state probabilities  A = {a ij } are the state transition probabilities AAAA

Markov Models  Probabilistic Finite State Automaton  Figure 9.1

What is the probability of a sequence of states ?

Example  Fig 9.1

What is an HMM?  Graphical Model  Circles indicate states  Arrows indicate probabilistic dependencies between states HMM = Hidden Markov Model

What is an HMM?  Green circles are hidden states  Dependent only on the previous state  “The past is independent of the future given the present.”

What is an HMM?  Purple nodes are observed states  Dependent only on their corresponding hidden state

HMM Formalism  {S, K,   S : {s 1 …s N } are the values for the hidden states  K : {k 1 …k M } are the values for the observations SSS KKK S K S K

HMM Formalism  {S, K,      are the initial state probabilities  A = {a ij } are the state transition probabilities  B = {b ik } are the observation state probabilities Note : sometimes one uses B = {b ijk } output then depends on previous state / transition as well A B AAA BB SSS KKK S K S K

The crazy soft drink machine  Fig 9.2

Probability of {lem,ice} ?  Sum over all paths taken through HMM  Start in CP 1 x x x x 0.7

Inference in an HMM  Compute the probability of a given observation sequence  Given an observation sequence, compute the most likely hidden state sequence  Given an observation sequence and set of possible models, which model most closely fits the data?

oToT o1o1 otot o t-1 o t+1 Given an observation sequence and a model, compute the probability of the observation sequence Decoding

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1

Decoding oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1

Decoding oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1

Decoding oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1

Decoding oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1

Dynamic Programming

Forward Procedure oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Special structure gives us an efficient solution using dynamic programming. Intuition: Probability of the first t observations is the same for all possible t+1 length state sequences. Define:

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Forward Procedure

Dynamic Programming

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Backward Procedure Probability of the rest of the states given the first state

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 Decoding Solution Forward Procedure Backward Procedure Combination

oToT o1o1 otot o t-1 o t+1 Best State Sequence  Find the state sequence that best explains the observations  Two approaches Individually most likely states Most likely sequence (Viterbi) 

Best State Sequence (1)

oToT o1o1 otot o t-1 o t+1 Best State Sequence (2)  Find the state sequence that best explains the observations  Viterbi algorithm

oToT o1o1 otot o t-1 o t+1 Viterbi Algorithm The state sequence which maximizes the probability of seeing the observations to time t-1, landing in state j, and seeing the observation at time t x1x1 x t-1 j

oToT o1o1 otot o t-1 o t+1 Viterbi Algorithm Recursive Computation x1x1 x t-1 xtxt x t+1

oToT o1o1 otot o t-1 o t+1 Viterbi Algorithm Compute the most likely state sequence by working backwards x1x1 x t-1 xtxt x t+1 xTxT

oToT o1o1 otot o t-1 o t+1 HMMs and Bayesian Nets (1) x1x1 x t-1 xtxt x t+1 xTxT

oToT o1o1 otot o t-1 o t+1 x1x1 x t+1 xTxT xtxt x t-1 HMM and Bayesian Nets (2) Conditionally independent of Given Because of d-separation “The past is independent of the future given the present.”

Inference in an HMM  Compute the probability of a given observation sequence  Given an observation sequence, compute the most likely hidden state sequence  Given an observation sequence and set of possible models, which model most closely fits the data?

Dynamic Programming

oToT o1o1 otot o t-1 o t+1 Parameter Estimation Given an observation sequence, find the model that is most likely to produce that sequence. No analytic method Given a model and observation sequence, update the model parameters to better fit the observations. A B AAA BBBB

oToT o1o1 otot o t-1 o t+1 Parameter Estimation A B AAA BBBB Probability of traversing an arc Probability of being in state i

oToT o1o1 otot o t-1 o t+1 Parameter Estimation A B AAA BBBB Now we can compute the new estimates of the model parameters.

Instance of Expectation Maximization  We have that  We may get stuck in local maximum (or even saddle point)  Nevertheless, Baum-Welch usually effective

Some Variants  So far, ergodic models All states are connected Not always wanted  Epsilon or null-transitions Not all states/transitions emit output symbols  Parameter tying Assuming that certain parameters are shared Reduces the number of parameters that have to be estimated  Logical HMMs (Kersting, De Raedt, Raiko) Working with structured states and observation symbols  Working with log probabilities and addition instead of multiplication of probabilities (typically done)

oToT o1o1 otot o t-1 o t+1 The Most Important Thing A B AAA BBBB We can use the special structure of this model to do a lot of neat math and solve problems that are otherwise not solvable.