Outline POS tagging Tag wise accuracy Graph- tag wise accuracy

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)
Ling 570 Day 6: HMM POS Taggers 1. Overview Open Questions HMM POS Tagging Review Viterbi algorithm Training and Smoothing HMM Implementation Details.
CS626 Assignments Abhirut Gupta Mandar Joshi Piyush Dungarwal (Group No. 6)
Final Assignment Demo 11 th Nov, 2012 Deepak Suyel Geetanjali Rakshit Sachin Pawar CS 626 – Sppech, NLP and the Web.
Parts of Speech Generally speaking, the “grammatical type” of word: –Verb, Noun, Adjective, Adverb, Article, … We can also include inflection: –Verbs:
Part of Speech Tagging The DT students NN went VB to P class NN Plays VB NN well ADV NN with P others NN DT Fruit NN flies NN VB NN VB like VB P VB a DT.
Learning with Probabilistic Features for Improved Pipeline Models Razvan C. Bunescu Electrical Engineering and Computer Science Ohio University Athens,
In Search of a More Probable Parse: Experiments with DOP* and the Penn Chinese Treebank Aaron Meyers Linguistics 490 Winter 2009.
Hidden Markov Model (HMM) Tagging  Using an HMM to do POS tagging  HMM is a special case of Bayesian inference.
Tagging with Hidden Markov Models. Viterbi Algorithm. Forward-backward algorithm Reading: Chap 6, Jurafsky & Martin Instructor: Paul Tarau, based on Rada.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
PCFG Parsing, Evaluation, & Improvements Ling 571 Deep Processing Techniques for NLP January 24, 2011.
Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Statistical Phrase-Based Translation Authors: Koehn, Och, Marcu Presented by Albert Bertram Titles, charts, graphs, figures and tables were extracted from.
1 SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 22, 2004.
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
(Some issues in) Text Ranking. Recall General Framework Crawl – Use XML structure – Follow links to get new pages Retrieve relevant documents – Today.
BIOI 7791 Projects in bioinformatics Spring 2005 March 22 © Kevin B. Cohen.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Natural Language Processing Assignment Group Members: Soumyajit De Naveen Bansal Sanobar Nishat.
Part-of-Speech Tagging
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
Probabilistic Parsing Reading: Chap 14, Jurafsky & Martin This slide set was adapted from J. Martin, U. Colorado Instructor: Paul Tarau, based on Rada.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
1 Statistical Parsing Chapter 14 October 2012 Lecture #9.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.
Hindi Parts-of-Speech Tagging & Chunking Baskaran S MSRI.
S1: Chapter 1 Mathematical Models Dr J Frost Last modified: 6 th September 2015.
A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart
Leif Grönqvist 1 Tagging a Corpus of Spoken Swedish Leif Grönqvist Växjö University School of Mathematics and Systems Engineering
A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,
Optimizing Local Probability Models for Statistical Parsing Kristina Toutanova, Mark Mitchell, Christopher Manning Computer Science Department Stanford.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Tokenization & POS-Tagging
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Supertagging CMSC Natural Language Processing January 31, 2006.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Conditional Markov Models: MaxEnt Tagging and MEMMs
Wei Lu, Hwee Tou Ng, Wee Sun Lee National University of Singapore
Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.
1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.
Stochastic Methods for NLP Probabilistic Context-Free Parsers Probabilistic Lexicalized Context-Free Parsers Hidden Markov Models – Viterbi Algorithm Statistical.
Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.
Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Chinese Named Entity Recognition using Lexicalized HMMs.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
LING 581: Advanced Computational Linguistics Lecture Notes March 2nd.
Natural Language Processing Vasile Rus
Language Identification and Part-of-Speech Tagging
Probabilistic and Lexicalized Parsing
Assignment Demonstration
Chunk Parsing CS1573: AI Application Development, Spring 2003
Retrieval Performance Evaluation - Measures
Presentation transcript:

Natural Language Processing Assignment Group Members: Soumyajit De Naveen Bansal Sanobar Nishat

Outline POS tagging Tag wise accuracy Graph- tag wise accuracy Precision recall f-score Improvements In POS tagging Implementation of tri-gram POS tagging with smoothing Improved precision, recall and f-score Comparison between Discriminative and Generative Model Next word prediction Model #1 Model #2 Implementation method and details Scoring ratio perplexity ratio

Outline NLTK Yago Different examples by using Yago Parsing Conclusions A* Implementation – A Comparison with Viterbi

Assignment#1 HMM Based POS Tagger

Tag Wise Accuracy

Graph – Tag Wise Accuracy

Precision, Recall, F-Score Precision : tp/(tp+fp) = 0.92 Recall: tp/(tp+fn) = 1 F-score: 2.precision.recall/(precision + recall) = 0.958

Improvements in HMM Based POS Tagger Assignment#1 (Cont..) Improvements in HMM Based POS Tagger

Improvement in HMM Based POS Tagger Implementation of Trigram * Issues - sparcity * Solution – Implementation of smoothing techniques * Results – increases overall accuracy up to 94%

Smoothing Technique Implementation of Smoothing Technique * Linear Interpolation Technique * Formula: i.e. * Finding value of lambda (discussed in “ TnT- A Statistical Part-of-Speech Tagger ”)

POS Tagging Accuracy With Smoothing

Precision, Recall, F-Score Precision : tp/(tp+fp) = 0.9415 Recall: tp/(tp+fn) = 1 F-score: 2.precision.recall/(precision + recall) = 0.97

Tag Wise Accuracy

Tag wise accuracy (cont..)

Improvements in HMM Based POS Tagger Handling Unknown Words Assignment#1 (Cont..) Improvements in HMM Based POS Tagger Handling Unknown Words

Precision Score (accuracy in Percentage)

Tag Wise Accuracy

Error Analysis (Tag Wise Accuracy) VVB - finite base form of lexical verbs (e.g. forget, send, live, return) Count: 9916 Confused with counts Reason VVI (infinitive form of lexical verbs (e.g. forget, send, live, return)) 1201 VVB is used to tagged the word that has the same form as the infinitive without “to” for all persons. E.g. He has to show Show me VVD (The past tense form of lexical verbs (e.g. forgot, sent, lived, returned)) 145 The base form and past tense form of many verbs are same. So domination of emission probability of such word caused VVB wrongly tagged as VVD. And effect of transition probability might got have lower influence. NN1 303 Words with similar base form gets confuse with common noun. e.g. The seasonally adjusted total regarded as… Total has been tagged as VVB and NN1

Error Analysis (cont..) ZZ0 - Alphabetical symbols (e.g. A, a, B, b, c, d) (Accuracy - 63%) Count: 337 Confused with counts Reason AT0 (Article e.g. the, a, an, no) 98 Emission probability of “a” as AT0 is much higher compare to ZZ0. Hence AT0 dominates while tagging “a” CRD (Cardinal number e.g. one, 3, fifty-five, 3609) 16 Because of the assumption of bigram/trigram Transition probability.

Error Analysis (cont..) ITJ - Interjection (Accuracy - 65%) Count: 177 Reason: ITJ Tag appeared so less number of times, that it didn't miss classified that much, but yet its percentage is so low Confused with counts Reason AT0 (Article (e.g. the, a, an, no)) 26 “No“ is used as ITJ and article in the corpus. So confusion is due to the higher emission probability of word with AT0 NN1 (Singular common noun) 14 “Bravo” is tagged as NN1 and ITJ in corpus

Error Analysis (cont..) UNC - Unclassified items (Accuracy - 23%) Count: 756 Confused with counts Reason AT0 (Article (e.g. the, a, an, no)) 69 Because of the domination of transition probability UNC is wrongly tagged NN1 (Singular common noun) 224 NP0 (Proper noun (e.g. London, Michael, Mars, IBM)) 132 New word with begin capital letter is tagged as NP0, since mostly the UNC words are not repeating among different corpus.

Assignment#2 Discriminative & Generative Model – A Comparison

Discriminative and Generative Model

Comparison Graph

Conclusion Since its unigram, Discriminative and Generative Model are giving same performance, as expected

Assignment#3 Next word prediction

Model # 1 When only previous word is given Example: He likes -------

Model # 2 When previous Tag & previous word are known. Example: He_PP0 likes_VB0 -------- Previous Work

Model # 2 (cont..) Current Work

Evaluation Method 1. Scoring Method Divide the testing corpus into bigram Match the testing corpus 2nd word of bigram with predicted word of each model Increment the score if match found The final evaluation is the ratio of the two scores of each model i.e. model1/model2 If ratio > 1 => model 1 is performing better and vice-verca.

Implementation Detail Look Up Table Previous Word Next Predicted Word (Model 1) Next Predicted Word (Model 2) I see he looks goes : Look up is used in predicting the next word

Scoring Ratio

2. Perplexity: Comparison:

Perplexity Ratio

Remarks Model 2 is performing poorer than model 1 because of words are sparse among tags.

Next word prediction Further Experiments Assignment#3 (Cont..) Next word prediction Further Experiments

Score (ratio) of word-prediction

Perplexity (ratio) of word-prediction

Remarks Perplexity is found to be decreasing in this model. Overall score has been increased.

Assignment#4 Yaqo

Example #1 Query : Amitabh and Sachin wikicategory_Living_people -- <type> -- Amitabh_Bachchan -- <givenNameOf> -- Amitabh wikicategory_Living_people -- <type> -- Sachin_Tendulkar -- <givenNameOf> -- Sachin ANOTHER-PATH wikicategory_Padma_Shri_recipients -- <type> -- Amitabh_Bachchan -- <givenNameOf> -- Amitabh wikicategory_Padma_Shri_recipients -- <type> -- Sachin_Tendulkar -- <givenNameOf> -- Sachin

Example#2 Query : India and Pakistan PATH wikicategory_WTO_member_economies -- <type> -- India wikicategory_WTO_member_economies -- <type> -- Pakistan ANOTHER-PATH wikicategory_English-speaking_countries_and_territories -- <type> -- India wikicategory_English-speaking_countries_and_territories -- <type> -- Pakistan Operation_Meghdoot -- <participatedIn> -- India Operation_Meghdoot -- <participatedIn> -- Pakistan

ANOTHER-PATH Operation_Trident_(Indo-Pakistani_War) -- <participatedIn> -- India Operation_Trident_(Indo-Pakistani_War) -- <participatedIn> -- Pakistan Siachen_conflict -- <participatedIn> -- India Siachen_conflict -- <participatedIn> -- Pakistan wikicategory_Asian_countries -- <type> -- India wikicategory_Asian_countries -- <type> -- Pakistan

ANOTHER-PATH Capture_of_Kishangarh_Fort -- <participatedIn> -- India Capture_of_Kishangarh_Fort -- <participatedIn> -- Pakistan wikicategory_South_Asian_countries -- <type> -- India wikicategory_South_Asian_countries -- <type> -- Pakistan Operation_Enduring_Freedom -- <participatedIn> -- India Operation_Enduring_Freedom -- <participatedIn> -- Pakistan wordnet_region_108630039 -- <type> -- India wordnet_region_108630039 -- <type> -- Pakistan

Example #3 Query: Tom and Jerry wikicategory_Living_people -- <type> -- Tom_Green -- <givenNameOf> -- Tom wikicategory_Living_people -- <type> -- Jerry_Brown -- <givenNameOf> -- Jerry

Assignment#5 Parser projection

Example#1

Example#2

Example#3

Example#5 Example#4

Example#6

Example#7

Example#8

Conclusion 1. VBZ always comes at the end of the parse tree in Hindi and Urdu. 2. The structure in Hindi and Urdu is always expand or reset to NP VB e.g. S=> NP VP (no change) OR VP => VBZ NP (interchange) 3. For exact translation in Hindi and Urdu, merging of sub-tree in english is sometimes required 4. One word to multiple words mapping is common while translating from English to Hindi/Urdu e.g. donar => aatiya shuda OR have => rakhta hai 5. Phrase to phrase translation is sometimes required, so chunking is required e.g. hand in hand => choli daman ka saath (Urdu) => sath sath hain (Hindi) 6. DT NN or DT NP doesn’t interchange 7. In example#7: correct translation won’t require merging of two sub-trees MD and VP e.g. could be => jasakta hai

NLTK Toolkit NLTK is a suite of open source Python modules Components of NLTK : Code, Corpora >30 annotated data sets corpus readers tokenizers stemmers taggers parsers wordnet semantic interpretation

Assignment#6 A* Implementation & Comparison with Vitervi

^ A* - Heuristic $ A B C D Selected Route Transition probability Fixed cost at each level (L)= (Min cost)* No. of Hops

Heuristic(h) F(B) = g(B) + h(B) where h(B) = min(i,j){-log(Pr(Cj|Bi) * Pr(Wc|Cj))} + min(i,j){-log(Pr(tj|ti) * Pr(Wtj|tj))} * (n-2) + min(k){-log(Pr($|Dk) * Pr($|$))} Here, n = #nodes from Bi to $ (including $) Wc = word emitted from next node, C ti ,tj = any combination of tags in the graph Wtj = word emitted from the node tj

Result Viterbi / A* Ratio : score(Viterbi) / score(A*) = 1.0 Where, score(Algo) = #correct predictions in the test corpus

Conclusion Since we are making bigram assumption and Viterbi is pruned in a careful way that is guaranteed to find the optimal path in a bigram HMM, its giving the optimal path. For A*, since our heuristic is underestimating and also maintains triangular inequality, A* is also giving the optimal path in the graph. Since A* has to backtrack at times, it requires more time and memory to find the solution compared to Viterbi.