CS626-449: NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

CS344 : Introduction to Artificial Intelligence
CS626: NLP, Speech and the Web
Hidden Markov Models (HMM) Rabiner’s Paper
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16: Perceptrons and their computing power 6 th and.
HMM II: Parameter Estimation. Reminder: Hidden Markov Model Markov Chain transition probabilities: p(S i+1 = t|S i = s) = a st Emission probabilities:
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Statistical NLP: Lecture 11
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Models Fundamentals and applications to bioinformatics.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
Hidden Markov Models Usman Roshan BNFO 601.
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
. Parameter Estimation For HMM Background Readings: Chapter 3.3 in the book, Biological Sequence Analysis, Durbin et al., 2001.
S. Maarschalkerweerd & A. Tjhang1 Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al.,  Shlomo.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
. Parameter Estimation For HMM Lecture #7 Background Readings: Chapter 3.3 in the text book, Biological Sequence Analysis, Durbin et al., 2001.
Fundamentals of Hidden Markov Model Mehmet Yunus Dönmez.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 26– Recap HMM; Probabilistic Parsing cntd) Pushpak Bhattacharyya CSE Dept., IIT.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29: Perceptron training and.
1 Parrondo's Paradox. 2 Two losing games can be combined to make a winning game. Game A: repeatedly flip a biased coin (coin a) that comes up head with.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
CS621: Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5: Power of Heuristic; non- conventional search.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CSE 517 Natural Language Processing Winter 2015
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 23- Forward probability and Robot Plan; start of plan.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 6-7: Hidden Markov Model 18.
CS623: Introduction to Computing with Neural Nets (lecture-9) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 31–Inside and Outside probabilities; PCFG training; start of phonetics and phonology)
Hidden Markov Models BMI/CS 576
Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.
Hidden Markov Models - Training
Hidden Markov Models Part 2: Algorithms
Three classic HMM problems
CS621: Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
CS621: Artificial Intelligence
Introduction to HMM (cont)
CS : NLP, Speech and Web-Topics-in-AI
Hidden Markov Models By Manish Shrivastava.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training

Training Hidden Markov Model (not structure learning, i.e., the structure of the HMM is pre-given). This involves: Learning probability values ONLY Correspondence with PCFG: Not learning production rule but probabilities associated with them Training algorithm for PCFG is called Inside-Outside algorithm Baum Welch algorithm

Key Intuition Given: Training sequence Initialization: Probability values Compute:Pr (state seq | training seq) get expected count of transition compute rule probabilities Approach:Initialize the probabilities and recompute them… EM like approach a b a b a b a b qr

Building blocks: Probabilities to be used 1. S1S1 S2S2 SnSn S n+1 W1W1 W 2 …………… W n-1 WnWn

Probabilities to be used, contd… 2. Exercise 1:- Prove the following:

Start of baum-welch algorithm String = aab aaa aab aaa Sequence of states with respect to input symbols b a q r a b o/p seq State seq

Calculating probabilities from table Table of counts T=#states A=#alphabet symbols Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this. SrcDestO/PCount qra5 qqb3 rqa3 rqb2

Interplay Between Two Equations w k No. of times the transitions s i  s j occurs in the string

Learning probabilities a:0.67 b:1.0 b:0.17 a:0.16 qr a:0.4 b:1.0 b:0.48 a:0.48 qr Actual (Desired) HMM Initial guess

One run of Baum-Welch algorithm: string ababa P(path) qrqrqq qrqqqq qqqrqq qqqqqq Rounded Total  New Probabilities (P)  0.06 (0.01/( ) * is considered as starting and ending symbol of the input sequence string State sequences This way through multiple iterations the probability values will converge.

Appling Naïve Bayes Hence multiplying the transition probabilities is valid

Discussions 1.Symmetry breaking: Example: Symmetry breaking leads to no change in initial values 1.Struck in Local maxima 2.Label bias problem Probabilities have to sum to 1. Values can rise at the cost of fall of values for others. sss b:1.0 b:0.5 a:0.5 a:1.0 sss a:0.5 b:0.5 a:0.25 a:0.5 b:0.5 a:0.25 b:0.25 b:0.5 Desired Initialized

Computational part Exercise 2: What is the complexity of calculating the above expression? Hint: To find this first solve Exercise 1 i.e. understand how probability of given string can be represented as