Learning, Uncertainty, and Information: Learning Parameters

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Learning HMM parameters
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
Hidden Markov Models.
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Hidden Markov Models Modified from:
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
… Hidden Markov Models Markov assumption: Transition model:
PatReco: Hidden Markov Models Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Hidden Markov Models I Biology 162 Computational Genetics Todd Vision 14 Sep 2004.
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Forward-backward algorithm LING 572 Fei Xia 02/23/06.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Hidden Markov Models David Meir Blei November 1, 1999.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Dishonest Casino Let’s take a look at a casino that uses a fair die most of the time, but occasionally changes it to a loaded die. This model is hidden.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
7-Speech Recognition Speech Recognition Concepts
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
Entropy & Hidden Markov Models Natural Language Processing CMSC April 22, 2003.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
PGM 2003/04 Tirgul 2 Hidden Markov Models. Introduction Hidden Markov Models (HMM) are one of the most common form of probabilistic graphical models,
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,... Si Sj.
Hidden Markov Models: Probabilistic Reasoning Over Time Artificial Intelligence CMSC February 26, 2008.
1 Hidden Markov Model Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sN Si Sj.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Definition of the Hidden Markov Model A Seminar Speech Recognition presentation A Seminar Speech Recognition presentation October 24 th 2002 Pieter Bas.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Hidden Markov Models: Probabilistic Reasoning Over Time Natural Language Processing CMSC February 22, 2005.
Hidden Markov Models BMI/CS 576
Automatic Speech Recognition
CSCE 771 Natural Language Processing
Date: October, Revised by 李致緯
Hidden Markov Models.
Hidden Markov Models (HMMs)
Hidden Markov Models (HMMs)
Hidden Markov Models (HMMs)
CSC 594 Topics in AI – Natural Language Processing
CHAPTER 15: Hidden Markov Models
Hidden Markov Models - Training
Computational NeuroEngineering Lab
Uncertain Reasoning over Time
CSC 594 Topics in AI – Natural Language Processing
Hidden Markov Models (HMMs)
Hidden Markov Models Part 2: Algorithms
1.
Three classic HMM problems
Hidden Markov Model LR Rabiner
4.0 More about Hidden Markov Models
N-Gram Model Formulas Word sequences Chain rule of probability
Hidden Markov Models (HMMs)
Lecture 7 HMMs – the 3 Problems Forward Algorithm
Speech Processing Speech Recognition
Algorithms of POS Tagging
LECTURE 15: REESTIMATION, EM AND MIXTURES
Hidden Markov Models Teaching Demo The University of Arizona
Introduction to HMM (cont)
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

Learning, Uncertainty, and Information: Learning Parameters Big Ideas November 10, 2004

Roadmap Noisy-channel model: Redux Hidden Markov Models The Model Decoding the best sequence Training the model (EM) N-gram models: Modeling sequences Shannon, Information Theory, and Perplexity Conclusion

Bayes and the Noisy Channel Generative and sequence

Hidden Markov Models (HMMs) An HMM is: 1) A set of states: 2) A set of transition probabilities: Where aij is the probability of transition qi -> qj 3)Observation probabilities: The probability of observing ot in state i 4) An initial probability dist over states: The probability of starting in state i 5) A set of accepting states

Three Problems for HMMs Find the probability of an observation sequence given a model Forward algorithm Find the most likely path through a model given an observed sequence Viterbi algorithm (decoding) Find the most likely model (parameters) given an observed sequence Baum-Welch (EM) algorithm

Learning HMMs Issue: Where do the probabilities come from? Supervised/manual construction Solution: Learn from data Trains transition (aij), emission (bj), and initial (πi) probabilities Typically assume state structure is given Unsupervised

Manual Construction Manually labeled data Observation sequences, aligned to Ground truth state sequences Compute (relative) frequencies of state transitions Compute frequencies of observations/state Compute frequencies of initial states Bootstrapping: iterate tag, correct, reestimate, tag. Problem: Labeled data is expensive, hard/impossible to obtain, may be inadequate to fully estimate Sparseness problems

Unsupervised Learning Re-estimation from unlabeled data Baum-Welch aka forward-backward algorithm Assume “representative” collection of data E.g. recorded speech, gene sequences, etc Assign initial probabilities Or estimate from very small labeled sample Compute state sequences given the data I.e. use forward algorithm Update transition, emission, initial probabilities

Updating Probabilities Intuition: Observations identify state sequences Adjust probability of transitions/emissions Make closer to those consistent with observed Increase P(Observations|Model) Functionally For each state i, what proportion of transitions from state i go to state j For each state i, what proportion of observations match O? How often is state i the initial state?

Estimating Transitions Consider updating transition aij Compute probability of all paths using aij Compute probability of all paths through i (w/ and w/o i->j) i j

Forward Probability Where α is the forward probability, t is the time in utterance, i,j are states in the HMM, aij is the transition probability, bj(ot) is the probability of observing ot in state bj N is the max state, T is the last time

Forward Probability Where α is the forward probability, t is the time in utterance, i,j are states in the HMM, aij is the transition probability, bj(ot) is the probability of observing ot in state bj N is the final state, T is the last time, and 1 is the start state

Backward Probability Where β is the backward probability, t is the time in sequence, i,j are states in the HMM, aij is the transition probability, bj(ot) is the probability of observing ot in state bj N is the final state, and T is the last time

Re-estimating Estimate transitions from i->j Estimate observations in j Estimate initial i