Pair HMMs and edit distance Ristad & Yianilos. Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Hidden Markov Model in Biological Sequence Analysis – Part 2
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Lirong Xia Approximate inference: Particle filter Tue, April 1, 2014.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Hidden Markov Models Modified from:
Page 1 Hidden Markov Models for Automatic Speech Recognition Dr. Mike Johnson Marquette University, EECE Dept.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Statistical NLP: Hidden Markov Models Updated 8/12/2005.
INTRODUCTION TO Machine Learning 3rd Edition
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Processing Strings with HMMs: Structuring text and computing distances William W. Cohen CALD.
… Hidden Markov Models Markov assumption: Transition model:
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
FSA and HMM LING 572 Fei Xia 1/5/06.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Hidden Markov Models Pairwise Alignments. Hidden Markov Models Finite state automata with multiple states as a convenient description of complex dynamic.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
. Class 5: HMMs and Profile HMMs. Review of HMM u Hidden Markov Models l Probabilistic models of sequences u Consist of two parts: l Hidden states These.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Genome evolution: a sequence-centric approach Lecture 3: From Trees to HMMs.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Inside-outside algorithm LING 572 Fei Xia 02/28/06.
. Learning Parameters of Hidden Markov Models Prepared by Dan Geiger.
Hidden Markov Models David Meir Blei November 1, 1999.
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Hidden Markov models Sushmita Roy BMI/CS 576 Oct 16 th, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Fall 2001 EE669: Natural Language Processing 1 Lecture 9: Hidden Markov Models (HMMs) (Chapter 9 of Manning and Schutze) Dr. Mary P. Harper ECE, Purdue.
Hidden Markov Model Continues …. Finite State Markov Chain A discrete time stochastic process, consisting of a domain D of m states {1,…,m} and 1.An m.
Machine Learning & Data Mining CS/CNS/EE 155 Lecture 6: Conditional Random Fields 1.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.
Lecture 19: More EM Machine Learning April 15, 2010.
Hidden Markov Models BMI/CS 776 Mark Craven March 2002.
CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov
UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS Statistical Machine learning Lecture 24
795M Winter /12/20151 Hidden Markov Models Chris Brew The Ohio State University.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
EVOLUTIONARY HMMS BAYESIAN APPROACH TO MULTIPLE ALIGNMENT Siva Theja Maguluri CS 598 SS.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Hidden Markov Models (HMMs) –probabilistic models for learning patterns in sequences (e.g. DNA, speech, weather, cards...) (2 nd order model)
1 Hidden Markov Models Hsin-min Wang References: 1.L. R. Rabiner and B. H. Juang, (1993) Fundamentals of Speech Recognition, Chapter.
Expected accuracy sequence alignment Usman Roshan.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Learning Analogies and Semantic Relations Nov William Cohen.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Edit Distances William W. Cohen.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Data-Intensive Computing with MapReduce Jimmy Lin University of Maryland Thursday, March 14, 2013 Session 8: Sequence Labeling This work is licensed under.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
Other Models for Time Series. The Hidden Markov Model (HMM)
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Tea – Time - Talks Every Friday 3.30 pm ICS 432. We Need Speakers (you)! Please volunteer. Philosophy: a TTT (tea-time-talk) should approximately take.
Hidden Markov Models BMI/CS 576
Edit Distances William W. Cohen.
Hidden Markov Model LR Rabiner
Presentation transcript:

Pair HMMs and edit distance Ristad & Yianilos

Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign When: Wednesday, April 14, 11am (food at 10:30am) Where: Sennott Square Building, room 5317

Special meeting 4/28 (last class) First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning 10:30-11:50 am, Wean Hall 4601 All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)

Pair HMMs – Ristad & Yianolis HMM review –notation –inference (forward algorithm) –learning (forward-backward & EM) Pair HMMs –notation –generating edit strings –distance metrics (stochastic, viterbi) –inference (forward) –learning (forward-backward & EM) Results from R&Y paper –K-NN with trained distance, hidden prototypes –problem: phoneme strings => words Advanced Pair HMMs –adding state (eg for affine gap models) –Smith Waterman? –CRF training? last week today

HMM Notation

HMM Example 1 2 Pr(1->2) Pr(2->1) Pr(2->2)Pr(1->1) Pr(1->x) d0.3 h0.5 b0.2 Pr(2->x) a0.3 e0.5 o0.2 Sample output: x T =heehahaha, s T =

HMM Inference t=1t=2...t=T l=1... l=2... l=K... Key point: Pr(s i =l) depends only on Pr(l’->l) and s i-1 x1x1 x2x2 x3x3 xTxT

HMM Inference t=1t=2...t=T l=1... l=2... l=K... Key point: Pr(s i =l) depends only on Pr(l’->l) and s i-1 so you can propogate probabilities forward... x1x1 x2x2 x3x3 xTxT

HMM Inference – Forward Algorithm l=1.... l= l=K.... x1x1 x2x2 x3x3 xTxT

HMM Learning - EM Expectation maximization: –Find expectations, i.e. Pr(s i =l) for i=1,...,T forward algorithm + epsilon hidden variables are states s at times t=1,...,t=T –Maximize probability of parameters given expectations: replace #(l’->l)/#(l’) with weighted version of counts replace #(l’->x)/#(l’) with weighted version

HMM Inference t=1t=2...t=T l=1... l=2... l=K... Forward algorithm: computes probabilities α(l,t) based on information in first t letters of string, ignores “downstream” information x1x1 x2x2 x3x3 xTxT

HMM Inference l=1... l=2... l=K x1x1 x2x2 x3x3 xTxT

HMM Learning - EM Expectation maximization: –Find expectations, i.e. Pr(s i =l) for i=1,...,T forward backward algorithm hidden variables are states s at times t=1,...,t=T –Maximize probability of parameters given expectations: replace #(l’->l)/#(l’) with weighted version of counts replace #(l’->x)/#(l’) with weighted version

Pair HMM Notation

Pair HMM Example 1 ePr(e)

Pair HMM Example 1 ePr(e) Sample run: z T =,,, Strings x,y produced by z T : x=heehee, y=teehe Notice that x,y is also produced by z 4 +, and many other edit strings

Distances based on pair HMMs

Pair HMM Inference Dynamic programming is possible: fill out matrix left-to-right, top-down

Pair HMM Inference t=1t=2...t=T v=1... v=2... v=K... x1x1 x2x2 x3x3 xTxT

Pair HMM Inference t=1t=2...t=T v=1... v=2... v=K...

Pair HMM Inference t=1t=2...t=T v=1... v=2... v=K... One difference: after i emissions of pair HMM, we do not know the column position i=1 i=3 i=1 i=2 i=3

Pair HMM Inference: Forward-Backward t=1t=2...t=T v=1... v=2... v=K...

Multiple states 1 ePr(e) ePr(e) ePr(e)

...v=K... v=2...v=1 t=T...t=2 t=1 l=2 An extension: multiple states...v=K... v=2...v=1 t=T...t=2 t=1 l=1 conceptually, add a “state” dimension to the model EM methods generalize easily to this setting

Back to R&Y paper... They consider “coarse” and “detailed” models, as well as mixtures of both. Coarse model is like a back-off model – merge edit operations into equivalence classes (e.g. based on equivalence classes for chars). Test by learning distance for K-NN with an additional latent variable

K-NN with latent prototypes test example y (a string of phonemes) possible prototypes x (known word pronounciation ) x1x1 x2x2 x3x3 xmxm words from dictionary y w1w1 w2w2 wKwK learned phonetic distance

K-NN with latent prototypes x1x1 x2x2 x3x3 xmxm y w1w1 w2w2 wKwK learned phonetic distance Method needs (x,y) pairs to train a distance – to handle this, an additional level of E/M is used to pick the “latent prototype” to pair with each y

Hidden prototype K-nn

Experiments E1: on-line pronounciation dictionary E2: subset of E1 with corpus words E3: dictionary from training corpus E4: dictionary from training + test corpus (!) E5: E1 + E3

Experiments

Special meeting Wed 4/14 What: Evolving and Self-Managing Data Integration Systems Who: AnHai Doan, Univ. of Illinois at Urbana-Champaign When: Wednesday, April 14, 11am (food at 10:30am) Where: Sennott Square Building, room 5317

Special meeting 4/28 (last class) First International Joint Conference on Information Extraction, Information Integration, and Sequential Learning 10:30-11:50 am, Wean Hall 4601 All project proposals have been accepted as paper abstracts, and you’re all invited to present for 10min (including questions)