Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA 94305-4115, USA NLP Applications.

Slides:

Advertisements

Similar presentations

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

A new Machine Learning algorithm for Neoposy: coining new Parts of Speech Eric Atwell Computer Vision and Language group School of Computing University.

Sequence Classification: Chunking Shallow Processing Techniques for NLP Ling570 November 28, 2011.

Natural Language Processing Projects Heshaam Feili

BİL711 Natural Language Processing

Part-of-speech tagging. Parts of Speech Perhaps starting with Aristotle in the West (384–322 BCE) the idea of having parts of speech lexical categories,

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

LING 388 Language and Computers Lecture 22 11/25/03 Sandiway FONG.

LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.

1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.

1 A Hidden Markov Model- Based POS Tagger for Arabic ICS 482 Presentation A Hidden Markov Model- Based POS Tagger for Arabic By Saleh Yousef Al-Hudail.

Part-of-speech Tagging cs224n Final project Spring, 2008 Tim Lai.

Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

Tagging – more details Reading: D Jurafsky & J H Martin (2000) Speech and Language Processing, Ch 8 R Dale et al (2000) Handbook of Natural Language Processing,

Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.

Part of speech (POS) tagging

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat.

Part-of-Speech Tagging & Sequence Labeling

Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,

Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.

Part-of-Speech Tagging

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

Parts of Speech Sudeshna Sarkar 7 Aug 2008.

Some Advances in Transformation-Based Part of Speech Tagging

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

National Institute of Informatics Kiyoko Uchiyama 1 A Study for Introductory Terms in Logical Structure of Scientific Papers.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.

Dept. of Computer Science & Engg. Indian Institute of Technology Kharagpur Part-of-Speech Tagging for Bengali with Hidden Markov Model Sandipan Dandapat,

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Czech-English Word Alignment Ondřej Bojar Magdalena Prokopová

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Arabic Tokenization, Part-of-Speech Tagging and Morphological Disambiguation in One Fell Swoop Nizar Habash and Owen Rambow Center for Computational Learning.

Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10.

Recognizing Names in Biomedical Texts: a Machine Learning Approach GuoDong Zhou 1,*, Jie Zhang 1,2, Jian Su 1, Dan Shen 1,2 and ChewLim Tan 2 1 Institute.

A Cascaded Finite-State Parser for German Michael Schiehlen Institut für Maschinelle Sprachverarbeitung Universität Stuttgart

Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.

인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.

10/30/2015CPSC503 Winter CPSC 503 Computational Linguistics Lecture 7 Giuseppe Carenini.

Classification Techniques: Bayesian Classification

13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.

Prototype-Driven Learning for Sequence Models Aria Haghighi and Dan Klein University of California Berkeley Slides prepared by Andrew Carlson for the Semi-

Improving Morphosyntactic Tagging of Slovene by Tagger Combination Jan Rupnik Miha Grčar Tomaž Erjavec Jožef Stefan Institute.

Tokenization & POS-Tagging

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

Natural Language Processing

Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.

Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.

Hybrid Method for Tagging Arabic Text Written By: Yamina Tlili-Guiassa University Badji Mokhtar Annaba, Algeria Presented By: Ahmed Bukhamsin.

Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Stochastic and Rule Based Tagger for Nepali Language Krishna Sapkota Shailesh Pandey Prajol Shrestha nec & MPP.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.

POS Tagging1 POS Tagging 1 POS Tagging Rule-based taggers Statistical taggers Hybrid approaches.

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)

Part-of-Speech Tagging & Sequence Labeling Hongning Wang

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Part-of-Speech Tagging CSCI-GA.2590 – Lecture 4 Ralph Grishman NYU.

Web Intelligence and Intelligent Agent Technology 2008.

1 GAPSCORE: Finding Gene and Protein Names one Word at a Time Jeffery T. Chang 1, Hinrich Schutze 2 & Russ B. Altman 1 1 Department of Genetics, Stanford.

Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.

Language Identification and Part-of-Speech Tagging

Clustering Algorithms for Noun Phrase Coreference Resolution

Presentation transcript:

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications By Masood Ghayoomi Oct 15, 2007

Outline of the Talk Introduction Introduction Brief review on the literature Brief review on the literature Presenting a hypothesis Presenting a hypothesis Introducing induction experiments Introducing induction experiments Results Results Conclusions Conclusions Discussions Discussions NLP Applications By Masood Ghayoomi Oct 15, 2007

Abstract of the Talk This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. This paper presents an algorithm for tagging words whose part-of-speech properties are unknown. This algorithm categorizes word tokens in context. This algorithm categorizes word tokens in context. NLP Applications By Masood Ghayoomi Oct 15, 2007

Introduction Why is it needed? Why is it needed? Increasing on line texts need to use automatic techniques to analyze a text. NLP Applications By Masood Ghayoomi Oct 15, 2007

Related Works Stochastic Tagging: Stochastic Tagging: -Bigram or trigram models: require a relatively large tagged training text (Church, 1989; Charniak et al.,1993) -Hidden Markov Models: require no pretagged text (Jelinek, 1985; Cutting et al., 1991; Kupiec, 1992) Rule-based Tagging: Rule-based Tagging: -Transformation-based tagging as introduced by Brill (1993): requires a hand-tagged text for training NLP Applications By Masood Ghayoomi Oct 15, 2007

Other Related Works Using connectionist net to predict words by reflecting grammatical categories (Elman, 1990) Using connectionist net to predict words by reflecting grammatical categories (Elman, 1990) Inferring grammatical category from bigram statistics (Brill et al, 1990) Inferring grammatical category from bigram statistics (Brill et al, 1990) Using vector models in which words are clustered according to the similarity of their close neighbors in a corpus (Finch and Chater, 1992; Finch, 1993) Using vector models in which words are clustered according to the similarity of their close neighbors in a corpus (Finch and Chater, 1992; Finch, 1993) Presenting a probabilistic model for entropy maximization that relies on the immediate neighbors of words in a corpus (Kneser and Ney, 1993) Presenting a probabilistic model for entropy maximization that relies on the immediate neighbors of words in a corpus (Kneser and Ney, 1993) Applying factor analysis to collocations of two target words with their immediate neighbors (Biber, 1993) Applying factor analysis to collocations of two target words with their immediate neighbors (Biber, 1993) NLP Applications By Masood Ghayoomi Oct 15, 2007

Hypothesis for New Tagging Algorithm The syntactic behavior of a word is represented with respect to its left and right context. The syntactic behavior of a word is represented with respect to its left and right context. Left neighbor  WORD  Right neighbor Left neighbor  WORD  Right neighbor     Left context vector Right context vector Left context vector Right context vector NLP Applications By Masood Ghayoomi Oct 15, 2007

4 POS Tag Induction Experiments Based on word type only Based on word type only Based on word type and context Based on word type and context Based on word type and context, restricted to “natural” contexts Based on word type and context, restricted to “natural” contexts Based on word type and context, using generalized left and right context vectors Based on word type and context, using generalized left and right context vectors NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type Only A base line to evaluate the performance of distributional POS taggers A base line to evaluate the performance of distributional POS taggers Words from BNC corpus clustered into 200 classes by considering left and right vector context similarities. All occurrences of a word assigned to one class. Words from BNC corpus clustered into 200 classes by considering left and right vector context similarities. All occurrences of a word assigned to one class. Drawback: Problematic for ambiguous words; e.g. Work, Book Drawback: Problematic for ambiguous words; e.g. Work, Book NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type and Context Dependency of a word’s syntactic role on: - the syntactic properties of its neighbors, - its own potential relationships with the neighbors. Dependency of a word’s syntactic role on: - the syntactic properties of its neighbors, - its own potential relationships with the neighbors. Considering context for distributional tagging: Considering context for distributional tagging: - The right context vector of the preceding word. - The left context vector of w. - The right context vector of w. - The left context vector of the following word. Drawback: fails for words whose neighbors are punctuation marks, since there are no grammatical dependencies between words and punctuation marks, in contrast to strong dependencies between neighboring words. Drawback: fails for words whose neighbors are punctuation marks, since there are no grammatical dependencies between words and punctuation marks, in contrast to strong dependencies between neighboring words. NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type and Context, Restricted to “Natural” Contexts For this drawback only for words with informative contexts were considered. For this drawback only for words with informative contexts were considered. words next to punctuation marks, words with rare words as neighbors (less than ten occurrences) were excluded. words next to punctuation marks, words with rare words as neighbors (less than ten occurrences) were excluded. NLP Applications By Masood Ghayoomi Oct 15, 2007

Word Type and Context, Using Generalized Left and Right Context Vectors Generalization: The right context vector makes clear the classes of left context vectors which occur to the right of a word; and vice versa. Generalization: The right context vector makes clear the classes of left context vectors which occur to the right of a word; and vice versa. In this method the information about left and right context vectors of a word is kept separate in the computation. In the previous methods left and right context vectors of a word are always used. In this method the information about left and right context vectors of a word is kept separate in the computation. In the previous methods left and right context vectors of a word are always used. This method is applied in two steps: This method is applied in two steps: - A generalized right context vector for a word is formed by considering the 200 classes - A generalized left context vectors by using word based right context vectors. NLP Applications By Masood Ghayoomi Oct 15, 2007

2 Examples “seemed” and “would” have similar left contexts, and they characterize the right contexts of “he” and “the firefighter”. The left contexts are verbs which potentially belong to one syntactic category. “seemed” and “would” have similar left contexts, and they characterize the right contexts of “he” and “the firefighter”. The left contexts are verbs which potentially belong to one syntactic category. Transitive verbs and prepositions belong to different syntactic categories, but their right contexts are identical which they require a noun phrase. Transitive verbs and prepositions belong to different syntactic categories, but their right contexts are identical which they require a noun phrase. NLP Applications By Masood Ghayoomi Oct 15, 2007

Results The Penn Treebank parses of the BNC were used. The Penn Treebank parses of the BNC were used. The results of the four experiments are evaluated by forming 16 classes of tags from the Penn Treebank. The results of the four experiments are evaluated by forming 16 classes of tags from the Penn Treebank. ttag ttag frequencythe frequency of t in the corpus frequencythe frequency of t in the corpus # classesthe number of induced tags i0, i1,..., il # classesthe number of induced tags i0, i1,..., il correct the number of times an occurrence of t was correctly labeled as belonging to one of i0, i1,..., il correct the number of times an occurrence of t was correctly labeled as belonging to one of i0, i1,..., il incorrectthe number of times that a token of a different tag t’ was miscategorized as being an instance of i0, i1,..., il incorrectthe number of times that a token of a different tag t’ was miscategorized as being an instance of i0, i1,..., il precisionthe number of correct tokens divided by the sum of correct and incorrect tokens. precisionthe number of correct tokens divided by the sum of correct and incorrect tokens. Recallthe number of correct tokens divided by the total number of tokens of t Recallthe number of correct tokens divided by the total number of tokens of t Fan aggregate score from precision and recall Fan aggregate score from precision and recall NLP Applications By Masood Ghayoomi Oct 15, 2007

Result: Word Type Only NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 1: Precision and recall for induction based on word type.

Result: Word Type and Context NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 2: Precision and recall for induction based on word type and context.

Result: Word Type and Context; Generalized Left and Right Context Vectors NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 3: Precision and recall for induction based on generalized context vectors.

Result: Word Type and Context; Restricted to “Natural” Contexts NLP Applications By Masood Ghayoomi Oct 15, 2007 Table 4: Precision and recall for induction for natural contexts.

Conclusions Taking context into account improves the performance of distributional tagging, as F score increases: Taking context into account improves the performance of distributional tagging, as F score increases: 0.49 < 0.72 < 0.74 < 0.79 Performance for generalized context vectors is better than for word-based context vectors (0.74 vs. 0.72). Performance for generalized context vectors is better than for word-based context vectors (0.74 vs. 0.72). NLP Applications By Masood Ghayoomi Oct 15, 2007

Discussions “Natural” contexts’ performance is better than the other contexts (0.79), even though having low quality of the distributional information about punctuation marks and rare words are a difficulty for this tag induction. “Natural” contexts’ performance is better than the other contexts (0.79), even though having low quality of the distributional information about punctuation marks and rare words are a difficulty for this tag induction. Performing fairly good for typical and frequent contexts: prepositions, determiners, pronouns, conjunctions, the infinitive marker, modals, and the possessive marker Performing fairly good for typical and frequent contexts: prepositions, determiners, pronouns, conjunctions, the infinitive marker, modals, and the possessive marker Failing tag induction for punctuations, rare words, and “-ing” forms of present participles and gerunds which are difficult as both exhibit verbal and nominal properties. Failing tag induction for punctuations, rare words, and “-ing” forms of present participles and gerunds which are difficult as both exhibit verbal and nominal properties. NLP Applications By Masood Ghayoomi Oct 15, 2007

Thanks for your listening!