CS621: Artificial Intelligence

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 12, 13: Sequence Labeling using.
Expectation Maximization Dekang Lin Department of Computing Science University of Alberta.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS344 : Introduction to Artificial Intelligence
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CS626: NLP, Speech and the Web
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
Learning HMM parameters
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,
LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29– AI and Probability (exemplified through NLP) 4 th Oct, 2010.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward,
Part of Speech Tagging with MaxEnt Re-ranked Hidden Markov Model Brian Highfill.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence (Hidden) Markov Models Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
MAchine Learning for LanguagE Toolkit
Logics for Data and Knowledge Representation Exercises: Languages Fausto Giunchiglia, Rui Zhang and Vincenzo Maltese.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
Conditional Random Fields
CS 4705 Hidden Markov Models Julia Hirschberg CS4705.
Part of Speech Tagging & Hidden Markov Models Mitch Marcus CSE 391.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Hidden Markov Models Usman Roshan CS 675 Machine Learning.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
CS626: NLP, Speech and the Web Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 15, 17: Parsing Ambiguity, Probabilistic Parsing, sample seminar 17.
10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 29– CYK; Inside Probability; Parse Tree construction) Pushpak Bhattacharyya CSE.
S. Salzberg CMSC 828N 1 Three classic HMM problems 2.Decoding: given a model and an output sequence, what is the most likely state sequence through the.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 35–HMM; Forward and Backward Probabilities 19 th Oct, 2010.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 33,34– HMM, Viterbi, 14 th Oct, 18 th Oct, 2010.
ECE 8443 – Pattern Recognition Objectives: Reestimation Equations Continuous Distributions Gaussian Mixture Models EM Derivation of Reestimation Resources:
CS621: Artificial Intelligence Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay Lecture 19: Hidden Markov Models.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 23- Forward probability and Robot Plan; start of plan.
Hidden Markov Model Parameter Estimation BMI/CS 576 Colin Dewey Fall 2015.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 25– Probabilistic Parsing) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th March,
Part-of-Speech Tagging CSE 628 Niranjan Balasubramanian Many slides and material from: Ray Mooney (UT Austin) Mausam (IIT Delhi) * * Mausam’s excellent.
Hidden Markov Models BMI/CS 576
Introduction to Machine Learning and Text Mining
Combined Lecture CS621: Artificial Intelligence (lecture 19) CS626/449: Speech-NLP-Web/Topics-in-AI (lecture 20) Hidden Markov Models Pushpak Bhattacharyya.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
CS344 : Introduction to Artificial Intelligence
CS 2750: Machine Learning Hidden Markov Models
Three classic HMM problems
CS621: Artificial Intelligence
N-Gram Model Formulas Word sequences Chain rule of probability
Lecture 7 HMMs – the 3 Problems Forward Algorithm
Lecture 7 HMMs – the 3 Problems Forward Algorithm
CS344 : Introduction to Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
CS621: Artificial Intelligence
Introduction to HMM (cont)
CS : NLP, Speech and Web-Topics-in-AI
Hidden Markov Models By Manish Shrivastava.
Presentation transcript:

CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38,39,40– POS tag corpora; HMM Training; Baum Welch aka Forward Backward Algorithm 26th Oct, 1st Nov, 2010

Corpus Collection of coherent text ^_^ People_N laugh_V aloud_A $_$ Spoken Written Brown BNC Switchboard Corpus

Some notable text corpora of English American National Corpus Bank of English British National Corpus Corpus Juris Secundum Corpus of Contemporary American English (COCA) 400+ million words, 1990-present. Freely searchable online. Brown Corpus, forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB. International Corpus of English Oxford English Corpus Scottish Corpus of Texts & Speech

Penn Treebank Tagset: sample 1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition or subordinating conjunction 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10. LS List item marker 11. MD Modal 12. NN Noun, singular or mass 13. NNS Noun, plural 1 4. NNP Proper noun, singular

Baum Welch or Forward Backward Algorithm HMM Training Baum Welch or Forward Backward Algorithm

Key Intuition a b q r Given: Training sequence Initialization: Probability values Compute: Pr (state seq | training seq) get expected count of transition compute rule probabilities Approach: Initialize the probabilities and recompute them… EM like approach

Baum-Welch algorithm: counts a, b a,b r q a,b a,b String = abb aaa bbb aaa Sequence of states with respect to input symbols o/p seq State seq

Calculating probabilities from table Table of counts T=#states A=#alphabet symbols Now if we have a non-deterministic transitions then multiple state seq possible for the given o/p seq (ref. to previous slide’s feature). Our aim is to find expected count through this. Src Dest O/P Count q r a 5 b 3 2

Interplay Between Two Equations wk No. of times the transitions sisj occurs in the string

Illustration q r q r Actual (Desired) HMM Initial guess a:0.67 b:0.17

One run of Baum-Welch algorithm: string ababa P(path) q r 0.00077 0.00154 0.00442 0.00884 0.02548 0.0 0.000 0.05096 0.07644 Rounded Total  0.035 0.01 0.06 0.095 New Probabilities (P)  =(0.01/(0.01+0.06+0.095) 1.0 0.36 0.581 State sequences * ε is considered as starting and ending symbol of the input sequence string. Through multiple iterations the probability values will converge.

Computational part (1/2) w0 w1 w2 wk wn-1 wn S0  S1  S1  … Si  Sj …  Sn-1  Sn  Sn+1

Computational part (2/2) w0 w1 w2 wk wn-1 wn S0  S1  S1  … Si  Sj …  Sn-1  Sn  Sn+1

Discussions 1. Symmetry breaking: Example: Symmetry breaking leads to no change in initial values 2 Struck in Local maxima 3. Label bias problem Probabilities have to sum to 1. Values can rise at the cost of fall of values for others. a:0.5 b:0.25 s a:0.25 s s s b:1.0 a:0.5 a:0.5 a:1.0 b:0.5 b:0.5 a:0.25 s b:0.5 s b:0.5 Desired Initialized