CS621: Artificial Intelligence

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

CS344 : Introduction to Artificial Intelligence
CS626: NLP, Speech and the Web
Hidden Markov Models (HMM) Rabiner’s Paper
Large Vocabulary Unconstrained Handwriting Recognition J Subrahmonia Pen Technologies IBM T J Watson Research Center.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 16, 17- Completeness Proof; Self References and.
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12- Completeness Proof; Self References and Paradoxes 16 th August,
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
 CpG is a pair of nucleotides C and G, appearing successively, in this order, along one DNA strand.  CpG islands are particular short subsequences in.
Hidden Markov Models Fundamentals and applications to bioinformatics.
Hidden Markov Models Usman Roshan BNFO 601.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 19: Interpretation in Predicate.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12- Completeness Proof; Self References and Paradoxes 16 th August,
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010.
CS621: Artificial Intelligence
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5: Power of Heuristic; non- conventional search.
CSE 517 Natural Language Processing Winter 2015
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 6-7: Hidden Markov Model 18.
CS621 : Artificial Intelligence
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5- Deduction Theorem.
Hidden Markov Models BMI/CS 576
What is a Hidden Markov Model?
Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Hidden Markov Models - Training
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
CS621: Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
Hidden Markov Models Part 2: Algorithms
Three classic HMM problems
Hidden Markov Model LR Rabiner
CS621: Artificial Intelligence
Hidden Markov Models (HMMs)
CS344 : Introduction to Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
CS621: Artificial Intelligence
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
CS 621 Artificial Intelligence Lecture 27 – 21/10/05
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
ARTIFICIAL INTELLIGENCE
CS621: Artificial Intelligence Lecture 12: Counting no
Introduction to HMM (cont)
CS344 : Introduction to Artificial Intelligence
CS : NLP, Speech and Web-Topics-in-AI
Hidden Markov Models By Manish Shrivastava.
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
CSCI 5582 Artificial Intelligence
Prof. Pushpak Bhattacharyya, IIT Bombay
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training

Baum Welch algorithm Training Hidden Markov Model (not structure learning, i.e., the structure of the HMM is pre-given). This involves: Learning probability values ONLY Correspondence with PCFG: Not learning production rule but probabilities associated with them Training algorithm for PCFG is called Inside-Outside algorithm

Key Intuition a b q r Given: Training sequence Initialization: Probability values Compute: Pr (state seq | training seq) get expected count of transition compute rule probabilities Approach: Initialize the probabilities and recompute them… EM like approach

Building blocks: Probabilities to be used W1 W2…………… Wn-1 Wn S1 S2 Sn Sn+1

Probabilities to be used, contd… Exercise 1:- Prove the following:

Start of baum-welch algorithm q a a String = aab aaa aab aaa Sequence of states with respect to input symbols o/p seq State seq

Calculating probabilities from table Table of counts T=#states A=#alphabet symbols Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this. Src Dest O/P Count q r a 5 b 3 2

Interplay Between Two Equations wk No. of times the transitions sisj occurs in the string

Learning probabilities q r a:0.16 b:1.0 Actual (Desired) HMM a:0.4 b:0.48 q r a:0.48 b:1.0 Initial guess

One run of Baum-Welch algorithm: string ababa P(path) q r 0.00077 0.00154 0.00442 0.00884 0.02548 0.0 0.000 0.05096 0.07644 Rounded Total  0.035 0.01 0.06 0.095 New Probabilities (P)  (0.01/(0.01+0.06+0.095) 1.0 0.36 0.581 State sequences * is considered as starting and ending symbol of the input sequence string This way through multiple iterations the probability values will converge.

Appling Naïve Bayes Hence multiplying the transition probabilities is valid

Discussions Symmetry breaking: Example: Symmetry breaking leads to no change in initial values Struck in Local maxima Label bias problem Probabilities have to sum to 1. Values can rise at the cost of fall of values for others. a:0.5 b:0.25 s a:0.25 s s s b:1.0 a:0.5 a:0.5 a:1.0 b:0.5 b:0.5 a:0.25 s b:0.5 s b:0.5 Desired Initialized

Computational part Exercise 2: What is the complexity of calculating the above expression? Hint: To find this first solve Exercise 1 i.e. understand how probability of given string can be represented as