. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss.

Slides:



Advertisements
Similar presentations
Pattern Finding and Pattern Discovery in Time Series
Advertisements

HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Hidden Markov Models. Room Wandering I’m going to wander around my house and tell you objects I see. Your task is to infer what room I’m in at every point.
Lecture 8: Hidden Markov Models (HMMs) Michael Gutkin Shlomi Haba Prepared by Originally presented at Yaakov Stein’s DSPCSP Seminar, spring 2002 Modified.
Introduction to Hidden Markov Models
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Hidden Markov Models Adapted from Dr Catherine Sweeney-Reed’s slides.
Ch 9. Markov Models 고려대학교 자연어처리연구실 한 경 수
Statistical NLP: Lecture 11
Ch-9: Markov Models Prepared by Qaiser Abbas ( )
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
1 Hidden Markov Models (HMMs) Probabilistic Automata Ubiquitous in Speech/Speaker Recognition/Verification Suitable for modelling phenomena which are dynamic.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Apaydin slides with a several modifications and additions by Christoph Eick.
1 Reasoning Under Uncertainty Over Time CS 486/686: Introduction to Artificial Intelligence Fall 2013.
Albert Gatt Corpora and Statistical Methods Lecture 8.
INTRODUCTION TO Machine Learning 3rd Edition
… Hidden Markov Models Markov assumption: Transition model:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Ch 13. Sequential Data (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Kim Jin-young Biointelligence Laboratory, Seoul.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Hidden Markov Models (HMMs) for Information Extraction
Hidden Markov Models 戴玉書
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
. cmsc726: HMMs material from: slides from Sebastian Thrun, and Yair Weiss.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Hidden Markov Models for Information Extraction CSE 454.
Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
Hidden Markov Models CBB 231 / COMPSCI 261 part 2.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
Hidden Markov Models & POS Tagging Corpora and Statistical Methods Lecture 9.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models 1 2 K … 1 2 K … 1 2 K … … … … 1 2 K … x1x1 x2x2 x3x3 xKxK 2 1 K 2.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Conditional Markov Models: MaxEnt Tagging and MEMMs
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Visual Recognition Tutorial1 Markov models Hidden Markov models Forward/Backward algorithm Viterbi algorithm Baum-Welch estimation algorithm Hidden.
Hidden Markov Models Wassnaa AL-mawee Western Michigan University Department of Computer Science CS6800 Adv. Theory of Computation Prof. Elise De Doncker.
Hidden Markov Models HMM Hassanin M. Al-Barhamtoshy
Hidden Markov Models BMI/CS 576
Hidden Markov Models (HMMs)
Hidden Markov Models (HMMs)
CSCI 5822 Probabilistic Models of Human and Machine Learning
CPSC 503 Computational Linguistics
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

. Hidden Markov Models with slides from Lise Getoor, Sebastian Thrun, William Cohen, and Yair Weiss

Outline u Markov Models u Hidden Markov Models u The Main Problems in HMM Context u Implementation Issues u Applications of HMMs

Weather: A Markov Model Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75%5%

Ingredients of a Markov Model u States: u State transition probabilities: u Initial state distribution: Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75% 5%

Ingredients of Our Markov Model u States: u State transition probabilities: u Initial state distribution: Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75% 5%

Probability of a Seq. of States u Given: u What is the probability of this seq. of states?

Outline u Markov Models u Hidden Markov Models u The Main Problems in HMM Context u Implementation Issues u Applications of HMMs

Hidden Markov Models Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75%5% Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75% 5% 60% 10% 30% 65% 5% 30% 50% 0% 50% NOT OBSERVABLE

Ingredients of an HMM u States: u State transition probabilities: u Initial state distribution: Observations: Observation probabilities: emit output k in state j prob of moving from state j to state i

Ingredients of Our HMM u States: u Observations: u State transition probabilities: u Initial state distribution: u Observation probabilities:

Three Basic Problems u Evaluation (aka likelihood): l compute P(O| an HMM) u Decoding (aka inference): l given an observed output sequence O H compute most likely state at each time period H compute most likely state sequence l q* = argmax_q P(q|O, HMM) u Training (aka learning): l find HMM* = argmax_HMM P(O|HMM)

Probability of an Output Sequence u Given: u What is the probability of this output sequence? exponential number of terms

The Forward Algorithm S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 …

The Forward Algorithm (cont.) S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 … first, get to state i, then move to state j, then omit output O[t+1]

Exercise u What is the probability of observing AB? a. Initial state s 1 : b. Initial state chosen at random: s2s2 s1s B 0.7 A 0.2 B 0.8 A 0.2  (0.4   0.7) =  (0.5  0.3  (0.3   0.8)) =

The Backward Algorithm S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 … P(O) = sum over i: P(q1 is i) * P(emit O1 in state i) * beta_1(i)

The Forward-Backward Algorithm S2S2 S3S3 S1S1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 S2S2 S3S3 S1S1 O2O2 O3O3 O1O1 … P(O) = sum over i: alpha_t(i) * beta_t(i) for any t => you can derive the formulas for forward alg and backward alg from this

Finding the best state sequence We would like to find the most likely path (and not just the most likely state at each time slice) The Viterbi algorithm is an efficient method for finding the MPE: and we to reconstruct the path:

Hidden Markov Models Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75%5% Sunny Rainy Snowy 80% 15% 5% 60% 2% 38% 20% 75% 5% 60% 10% 30% 65% 5% 30% 50% 0% 50% NOT OBSERVABLE

Learning the Model with EM Problem: Find HMM  that makes data most likely  E-Step: Compute for given   M-Step: Compute new under these expectations (this is now a Markov model)

E-Step u Calculate using the forward-backward algorithm, for fixed model

The M Step: generate =( , a, b)

Understanding the EM Algorithm u The best way to understand the EM algorithm l start with the M step, understand what quantities it needs l then look at the E step, see how it computes those quantities with the help of the forward- backward algorithm

Summary (Learning)  Given observation sequence O  Guess initial model  u Iterate: Calculate expected times in state S i at time t (and in S j at time t  ) using forward-backward algorithm Find new model  by frequency counts

Implementing HMM Algorithms u Quantities get very small for long sequences u Taking logarithm helps l the Viterbi algorithm l computing the alphas and betas l not helpful in computing gammas u Normalization method can help these problems l see the note by ChengXiang Zhai

Problems with HMMs u Zero probabilities l Training sequence: AAABBBAAA l Test sequence: AAABBBCAAA u Finding “right” number of states, right structure u Numerical instabilities

Outline u Markov Models u Hidden Markov Models u The Main Problems in HMM Context u Implementation Issues u Applications of HMMs

Three Problems u What bird is this? u How will the song continue? u Is this bird abnormal?  Time series classification  Time series prediction  Outlier detection

Time Series Classification  Train one HMM l for each bird l  Given time series O, calculate

Outlier Detection  Train HMM  Given time series O, calculate probability u If abnormally low, raise flag u If high, raise flag

Time Series Prediction  Train HMM  Given time series O, calculate distribution over final state (via  ) and ‘hallucinate’ new states and observations according to a, b

Typical HMM in Speech Recognition 20-dim frequency space clustered using EM Use Bayes rule + Viterbi for classification Linear HMM representing one phoneme [Rabiner 86] + everyone else

Typical HMM in Robotics [Blake/Isard 98, Fox/Dellaert et al 99]

IE with Hidden Markov Models Yesterday Pedro Domingos spoke this example sentence. Person name: Pedro Domingos Given a sequence of observations: and a trained HMM: Find the most likely state sequence: (Viterbi) Any words said to be generated by the designated “person name” state extract as a person name: person name location name background

HMM for Segmentation u Simplest Model: One state per entity type

What is a “symbol” ??? Cohen => “Cohen”, “cohen”, “Xxxxx”, “Xx”, … ? 4601 => “4601”, “9999”, “9+”, “number”, … ? Datamold: choose best abstraction level using holdout set

HMM Example: “Nymble” Other examples of shrinkage for HMMs in IE: [Freitag and McCallum ‘99] Task: Named Entity Extraction Train on ~500k words of news wire text. Case Language F1. Mixed English93% UpperEnglish91% MixedSpanish90% [Bikel, et al 1998], [BBN “IdentiFinder”] Person Org Other (Five other name classes) start-of- sentence end-of- sentence Transition probabilities Observation probabilities P(s t | s t-1, o t-1 ) P(o t | s t, s t-1 ) Back-off to: P(s t | s t-1 ) P(s t ) P(o t | s t, o t-1 ) P(o t | s t ) P(o t ) or Results:

Passage Selection (e.g., for IR) Document Query Collection Information Relevant passages How is a relevant passage different from a background passage in terms of language modeling? Background passages

HMMs: Main Lessons u HMMs: Generative probabilistic models of time series (with hidden state) u Forward-Backward: Algorithm for computing probabilities over hidden states u Learning models: EM, iterates estimation of hidden state and model fitting u Extremely practical, best known methods in speech, computer vision, robotics, … u Numerous extensions exist (continuous observations, states; factorial HMMs, controllable HMMs=POMDPs, …)