Pushpak Bhattacharyya CSE Dept., IIT Bombay

Slides:



Advertisements
Similar presentations
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 18– Alignment in SMT and Tutorial on Giza++ and Moses) Pushpak Bhattacharyya CSE.
Advertisements

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
CS626: NLP, Speech and the Web
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 29– AI and Probability (exemplified through NLP) 4 th Oct, 2010.
CS460/449 : Speech, Natural Language Processing and the Web/Topics in AI Programming (Lecture 2– Introduction+ML and NLP) Pushpak Bhattacharyya CSE Dept.,
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward,
POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.
Albert Gatt Corpora and Statistical Methods Lecture 8.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
POS Tagging HMM Taggers (continued). Today Walk through the guts of an HMM Tagger Address problems with HMM Taggers, specifically unknown words.
CS344: Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 13– Search.
Combined Lecture CS621: Artificial Intelligence (lecture 25) CS626/449: Speech-NLP-Web/Topics-in- AI (lecture 26) Pushpak Bhattacharyya Computer Science.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21- Forward Probabilities and Robotic Action Sequences.
CS460/626 : Natural Language Processing/Speech, NLP and the Web Lecture 33: Transliteration Pushpak Bhattacharyya CSE Dept., IIT Bombay 8 th Nov, 2012.
Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture /09/05.
Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-27: Phonology (quiz took place on 12/10/09; Lect 26.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 3 (10/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Statistical Formulation.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-17: Probabilistic parsing; inside- outside probabilities.
Word classes and part of speech tagging Chapter 5.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-16: Probabilistic parsing; computing probability of.
CPSC 422, Lecture 15Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15 Oct, 14, 2015.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-14: Probabilistic parsing; sequence labeling, PCFG.
CS621: Artificial Intelligence
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 27– SMT Assignment; HMM recap; Probabilistic Parsing cntd) Pushpak Bhattacharyya.
Natural Language Processing Slides adapted from Pedro Domingos
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-25: Vowels cntd and a “grand” assignment.
CS621: Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 13– Search 17 th August, 2010.
NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 34: Precision, Recall, F- score, Map.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-19: Speech: Phonetics (Using Ananthakrishnan’s presentation.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture-15: Probabilistic parsing; PCFG (contd.)
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
CS : NLP, Speech and Web-Topics-in-AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 38-39: Baum Welch Algorithm; HMM training.
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 23- Forward probability and Robot Plan; start of plan.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 6-7: Hidden Markov Model 18.
Group – 8 Maunik Shah Hemant Adil Akanksha Patel.
CS : Speech, NLP and the Web/Topics in AI Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 11: Evidence for Deeper Structure; Top Down Parsing.
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 31–Inside and Outside probabilities; PCFG training; start of phonetics and phonology)
CS344 : Introduction to Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 5- Deduction Theorem.
English-Korean Machine Translation System
Automatic Transliteration for Japanese-to-English Text Retrieval
Statistical Models for Automatic Speech Recognition
NLP Assignments for Undergraduates (1)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 15
Nouns Nouns not noun noun noun not not
CS : Speech, NLP and the Web/Topics in AI
Pushpak Bhattacharyya CSE Dept., IIT Bombay 2nd Jan, 2012
CS344 : Introduction to Artificial Intelligence
CSCI 5832 Natural Language Processing
CS : Speech, NLP and the Web/Topics in AI
CSC 594 Topics in AI – Natural Language Processing
CS344 : Introduction to Artificial Intelligence
Audio Books for Phonetics Research
What part of speech is that word?
CS621: Artificial Intelligence
CS344 : Introduction to Artificial Intelligence
Social practice of the language:
CS344 : Introduction to Artificial Intelligence
CS621: Artificial Intelligence
Classical Part of Speech (PoS) Tagging
CS621: Artificial Intelligence
Pushpak Bhattacharyya CSE Dept., IIT Bombay
Hidden Markov Models Teaching Demo The University of Arizona
ARTIFICIAL INTELLIGENCE
CS : NLP, Speech and Web-Topics-in-AI
Artificial Intelligence 2004 Speech & Natural Language Processing
Pushpak Bhattacharyya CSE Dept., IIT Bombay 31st Jan, 2011
Prof. Pushpak Bhattacharyya, IIT Bombay
Presentation transcript:

Pushpak Bhattacharyya CSE Dept., IIT Bombay CS626/449 : Natural Language Processing, Speech and the Web/Topics in AI Lecture 31: POS Tagging (discussion to assist the CMU pronunciation dictionary assignment) Pushpak Bhattacharyya CSE Dept., IIT Bombay

Lexicon Example ^_ Some_ People_ Jump_ High_ ._ Lexicon/ Lexical Example Dictionary Tag Some A (Adjective) {Quantifier} People N (Noun) lot of people V (Verb) peopled the city with soldiers Jump V (Verb) he jumped high N (Noun) This was a good jump High R (Adverb) He jumped high A (Adjective) high mountain N (Noun) Bombay high; on a high

Bigram Assumption Best tag sequence = T* = argmax P(T|W) = argmax P(T)P(W|T) (by Baye’s Theorem) P(T) = P(t0=^ t1t2 … tn+1=.) = P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) … P(tn|tn-1tn-2…t0)P(tn+1|tntn-1…t0) = P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn) = P(ti|ti-1) Bigram Assumption ∏ n+1 i = 0

Lexical Probability Assumption P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) … P(wn|w0-wn-1t0-tn+1)P(wn+1|w0-wnt0-tn+1) Assumption: A word is determined completely by its tag. This is inspired by speech recognition = P(wo|to)P(w1|t1) … P(wn+1|tn+1) = P(wi|ti) = P(wi|ti) (Lexical Probability Assumption) Thus, argmax P(T)P(W|T) = Equation ∏ n+1 i = 0 ∏ n+1 i = 1

Generative Model ^_^ People_N Jump_V High_R ._. Lexical Probabilities Bigram Probabilities N A A This model is called Generative model. Here words are observed from tags as states. This is similar to HMM.

Bigram probabilities

Lexical Probability

Calculation from actual data Corpus ^ Ram got many NLP books. He found them all very interesting. Pos Tagged ^ N V A N N . ^ N V N A R A .

Recording numbers ^ N V A R . 2 1

Probabilities ^ N V A R . 1 1/5 2/5 1/2 1/3

Compare with the Pronunciation Dictionary Assignment Phoneme Example Translation ------- ------- ----------- AE at AE T AH hut HH AH T AO ought AO T AW cow K AW AY hide HH AY D B be B IY In POS tagging the Labels are already given on the words. The “alignment” of Words with labels are already Given. In the assignment the most Likely alignment is to be Discovered followed by the Best possible mapping.