Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Slides:

Advertisements

Similar presentations

1 Minimally Supervised Morphological Analysis by Multimodal Alignment David Yarowsky and Richard Wicentowski.

Advertisements

Punctuation Generation Inspired Linguistic Features For Mandarin Prosodic Boundary Prediction CHEN-YU CHIANG, YIH-RU WANG AND SIN-HORNG CHEN 2012 ICASSP.

CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 2 (06/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Part of Speech (PoS)

Part-Of-Speech Tagging and Chunking using CRF & TBL

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

Chinese Word Segmentation Method for Domain-Special Machine Translation Su Chen; Zhang Yujie; Guo Zhen; Xu Jin’an Beijing Jiaotong University.

POS Tagging & Chunking Sambhav Jain LTRC, IIIT Hyderabad.

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Deep Belief Networks for Spam Filtering

Course Summary LING 575 Fei Xia 03/06/07. Outline Introduction to MT: 1 Major approaches –SMT: 3 –Transfer-based MT: 2 –Hybrid systems: 2 Other topics.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Diagnosis of Ovarian Cancer Based on Mass Spectrum of Blood Samples Committee: Eugene Fink Lihua Li Dmitry B. Goldgof Hong Tang.

Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Computational Methods to Vocalize Arabic Texts H. Safadi*, O. Al Dakkak** & N. Ghneim**

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.

An Integrated Approach for Arabic-English Named Entity Translation Hany Hassan IBM Cairo Technology Development Center Jeffrey Sorensen IBM T.J. Watson.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Improving Upon Semantic Classification of Spoken Diary Entries Using Pragmatic Context Information Daniel J. Rayburn Reeves Curry I. Guinn University of.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.

Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

Tokenization & POS-Tagging

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.

Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.

Presenter: Jinhua Du ( 杜金华 ) Xi’an University of Technology 西安理工大学 NLP&CC, Chongqing, Nov , 2013 Discriminative Latent Variable Based Classifier.

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

CSC 594 Topics in AI – Text Mining and Analytics

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

POS Tagger and Chunker for Tamil

Shallow Parsing for South Asian Languages -Himanshu Agrawal.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.

NTU & MSRA Ming-Feng Tsai

Information Extraction Entity Extraction: Statistical Methods Sunita Sarawagi.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.

1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.

English-Hindi Neural machine translation and parallel corpus generation EKANSH GUPTA ROHIT GUPTA.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

An Adaptive Learning with an Application to Chinese Homophone Disambiguation from Yue-shi Lee International Journal of Computer Processing of Oriental.

Mastering the Pipeline CSCI-GA.2590 Ralph Grishman NYU.

A Simple Approach for Author Profiling in MapReduce

Neural Machine Translation

CS 388: Natural Language Processing: LSTM Recurrent Neural Networks

Deep Learning for Bacteria Event Identification

Deep Learning Amin Sobhani.

Basic machine learning background with Python scikit-learn

Hindi POS Tagger By Naveen Sharma ( )

Presentation transcript:

Part-Of-Speech Tagging using Neural Networks Ankur Parikh LTRC IIIT Hyderabad

Outline 1.Introduction 2.Background and Motivation 3.Experimental Setup 4.Preprocessing5.Representation 6.Single-neuro tagger 7.Experiments 8.Multi-neuro tagger 9.Results10.Discussion 11.Future Work

Introduction POS-Tagging: POS-Tagging: It is the process of assigning the part of speech tag to the NL text based on both its definition and its context. Uses: Parsing of sentences, MT, IR, Word Sense disambiguation, Speech synthesis etc. Methods: 1. Statistical Approach 2. Rule Based

Background: Previous Approaches  Lots of work has been done using various machine learning algorithms like TNT TNT CRF CRF for Hindi.  Trade-off: Performance versus Training time - Less precision affects later stages - For a new domain or new corpus parameter tuning is a non-trivial task.

Background: Previous Approaches & Motivation  Empirically chosen context.  Effective Handling of corpus based features  Need of the hour: - Good performance - Less training time - Multiple contexts - exploit corpus based features effectively  Two Approaches and their comparison with TNT and CRF  Word level tagging

Experimental Setup : Corpus statitstics Tag set of 25 tags Tag set of 25 tags Corpus Size (in words) Unseen words (in percentage) Training187,095- Development23, % Testing23, %

Experimental Setup: Tools and Resources  Tools - CRF++ - TNT - Morfessor Categories – MAP  Resources - Universal word – Hindi Dictionary - Hindi Word net - Morph Analyzer

Preprocessing  XC tag is removed (Gadde et. Al., 2008).  Lexicon - For each unique word w of the training corpus => ENTRY(t1,……,t24) - where tj = c(posj, w) / c(w)

Representation: Encoding & Decoding  Each word w is encoded as an n-element vector INPUT(t1,t2,…,tn) where n = size of the tag set.  INPUT(t1,t2,…,tn) comes from lexicon if training corpus contains w.  If w is not in the training corpus - N(w) = Number of possible POS tags for w - tj = 1/N(w) if posj is a candidate = 0 otherwise = 0 otherwise

Representation: Encoding & Decoding  For each word w, Desired Output is encoded as D = (d1,d2,….,dn). - dj = 1 if posj is a desired ouput = 0 otherwise = 0 otherwise  In testing, for each word w, an n-element vector OUTPUT(o1,…,on) is returned. - Result = posj, if oj = max(OUTPUT)

Single – neuro tagger: Structure

Single – neuro tagger: Training & Tagging  Error Back-propagation learning Algorithm  Weights are Initialized with Random values  Sequential mode  Momentum term  Eta = 0.4 and Alpha = 0.1  In tagging, it can give multiple outputs or a sorted list of all tags.

Experiments: Development Data FeaturesPrecision Corpus based and contextual 93.19% Root of the word 93.38% Length of the word 94.04% Handling of unseen words Root->Dictionary->Word net->Morfessor {tj = c(posj,s) + c(posj,p)/ c(s) + c(p)} 95.62%

Development of the system

Multi – neuro tagger: Structure

Multi – neuro tagger: Training

Multi – neuro tagger: Learning curves

Multi – neuro tagger: Results StructureContextDevelopmentTest %91.87% _prev95.64%92.05% _next95.66%91.95% %92.15% _prev95.56%92.14% _next95.54%92.14% %92.07%

Multi – neuro tagger: Comparison  Precision after voting : 92.19% TaggerDevelopmentTest Training Time TNT95.18%91.58% 1-2 (Seconds) Multi – neuro tagger 95.78%92.19% (Minutes) CRF96.05%92.92%2-2.5(Hours)

Conclusion  Single versus Multi-neuro tagger  Multi-neuro tagger versus TNT and CRF  Corpus and Dictionary based features  More parameters need to be tuned  24^5 = 79,62,624 n-grams, while 250,560 weights  Well suited for Indian Languages

Future Work  Better voting schemes (Confidence point based)  Finding the right context (Probability based)  Various Structures and algorithms - Sequential Neural Network - Convolution Neural Network - Combination with SVM

Thank You!! Queries???