A new Machine Learning algorithm for Neoposy: coining new Parts of Speech Eric Atwell Computer Vision and Language group School of Computing University.

Slides:

Advertisements

Similar presentations

Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.

Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING An open discussion and exchange of ideas Introduced by Eric Atwell, Language.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Machine Learning PoS-Taggers COMP3310 Natural Language Processing Eric.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.

1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.

Machine Learning Approaches to the Analysis of Large Corpora : A Survey Xunlei Rose Hu and Eric Atwell University of Leeds.

Machine Learning on.NET F# FTW!. A few words about me  Mathias Brandewinder  Background: economics, operations research .NET developer.

Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.

Albert Gatt Corpora and Statistical Methods Lecture 13.

Specialized models and ranking for coreference resolution Pascal Denis ALPAGE Project Team INRIA Rocquencourt F Le Chesnay, France Jason Baldridge.

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING A comparative study of the tagging of adverbs in modern English corpora.

Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.

©2012 Paula Matuszek CSC 9010: Text Mining Applications: Text Features Dr. Paula Matuszek (610)

Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.

Machine Learning in Natural Language Processing Noriko Tomuro November 16, 2006.

Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages Katharina Probst April 5, 2002.

Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,

Introduction to Machine Learning Approach Lecture 5.

Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.

Machine Learning Queens College Lecture 1: Introduction.

Ontology Learning from Text: A Survey of Methods Source: LDV Forum,Volume 20, Number 2, 2005 Authors: Chris Biemann Reporter:Yong-Xiang Chen.

Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.

Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.

Text Classification, Active/Interactive learning.

Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

CS774. Markov Random Field : Theory and Application Lecture 19 Kyomin Jung KAIST Nov

Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.

Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma

CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging I Introduction Tagsets Approaches.

CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.

Using Semantic Relations to Improve Passage Retrieval for Question Answering Tom Morton.

For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.

For Monday Read chapter 26 Last Homework –Chapter 23, exercise 7.

Hierarchical Clustering for POS Tagging of the Indonesian Language Derry Tanti Wijaya and Stéphane Bressan.

Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.

Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

WORDS The term word is much more difficult to define in a technical sense, and like many other linguistic terms, there are often arguments about what exactly.

For Friday Finish chapter 23 Homework –Chapter 23, exercise 15.

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web Danushka Bollegala Yutaka Matsuo Mitsuru Ishizuka International.

11 Project, Part 3. Outline Basics of supervised learning using Naïve Bayes (using a simpler example) Features for the project 2.

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

LJ # 1 Grammar #1 1/4/16 Sentence structure. Putting the building blocks together In semester 1, we learned about parts of speech. How many parts of.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

Text Annotation By: Harika kode Bala S Divakaruni.

Part-of-Speech Tagging with Limited Training Corpora Robert Staubs Period 1.

For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Part-Of-Speech Tagging Radhika Mamidi. POS tagging Tagging means automatic assignment of descriptors, or tags, to input tokens. Example: “Computational.

Introduction to Classification & Clustering

Lecture 8: Word Clustering

CSC 594 Topics in AI – Natural Language Processing

Introduction Task: extracting relational facts from text

Presentation transcript:

A new Machine Learning algorithm for Neoposy: coining new Parts of Speech Eric Atwell Computer Vision and Language group School of Computing University of Leeds

Overview Neoposy: What? Why? conflicting criteria defining PoS Unsupervised Machine Learning Clustering of word-types Unification of word-tokens Problems with token-unification Conclusions: hybrid clustering

Neoposy CED: neology/neologism: “a newly coined word…; or the practice of using or introducing neologies” Cf: pos-tagger, pos-tagged corpus, bi-pos model, uniposy/polyposy (Elliott 2002) … Neoposy: neology meaning “a newly coined classification of words into Parts of Speech; or the practice of introducing or using neoposies”

Why neoposy ? It’s interesting (well, it is to me…) “Traditional” PoS may not fit some languages Solutions may shed light on Language Universals, and on analysis of other language-like datasets A challenge for unsupervised Machine Learning, different from other classification/clustering tasks

Definition of “part of speech” CED: “a class of words sharing important syntactic or semantic features; a group of words in a language that may occur in similar positions or fulfil similar functions in a sentence” e.g. “a class of”, a group of” in the last sentence.

BUT 3 criteria can conflict: Semantic feature: noun = thing Syntactic feature: noun can inflect, sing v plural Position/function: noun fits “a X of” A word TYPE may fit more than one category, because individual TOKENS behave differently

A challenge for unsupervised ML “A supervised algorithm is one which is given the correct answers for some of the data, using those answers to induce a model which can generalize to new data it hasn’t seen before… An unsupervised algorithm does this purely from the data.” (Jurafsky and Martin 2000)

Clustering word-types e.g. Atwell 1983, Atwell and Drakos 1987, Hughes and Atwell 1994, Elliott 2002, Roberts 2002… Cluster word-types whose representative tokens in a Corpus appeared in similar contexts (e.g. word before or/and after, neighbouring function-words), trying various similarity metrics and clustering algorithms

Features and clustering Every instance (word-type) must be characterised by a vector of feature- values (neighbour word-types and coocurrence frequencies) ; Instances with similar feature-vectors are lumped together - merged Feature-vectors are also merged – AND all other feature-vectors where merged words appear in context must be updated

Example (Atwell 1983) THE: (ant 1),(cat 14),(dog 11),… OF: (a 90),(cat 2),(dog 7),…(the 130),… A: (bat 2),(cat 13),(dog 12),… … => THE: (ant 1),(bat 2),(cat 27),(dog 23),… OF: (cat 2),(dog 7),…(the 220),… …

Problems with word-type clusters These clustering algorithms assume a word-type can belong to only one class: OK for function words (articles, prepositions, personal pronouns) but not OK for open-class categories OK for high-freq word-types, but many types are sparse (Zipf) Features (context-word types) must be updated after each iteration – not part of standard clustering

Clustering tokens, not types “a class of”, “a group of” are TOKENS in similar context, but we shouldn’t generalise to say “class” and “group” are always the same PoS. Clustering relies on similar frequency- vectors, but for a TOKEN, f=1 ??? Instead of statistical models, use constraint logic programming

Unification of shared contexts ?-neoposy([the,cat,sat,on,the,mat], Tagged). Tagged= [[the,T1], [cat,T2], [sat,T3], [on,T4], [the,T5], [mat,T2]]

How many word-classes are learnt? N <= number of tokens (??) As many classes as “contexts”… Context = “word before”, N = number of word-TYPES e.g. 1M-corpus: c50K word-classes…

Iterating to use classes in contexts ?- neoposy2 ([the,cat,sat,on,the,mat,and,went,to,sleep ], Tagged). Tagged= [[the,T1], [cat,T2], [sat,T3], [on,T4], [the,T5], [mat,T2], [and,T6], [went,T7], [to,T8], [sleep,T9]]; Tagged= [[the,T1], [cat,T2], [sat,T3], [on,T4], [the,T5], [mat,T2], [and,T3], [went,T7], [to,T8], [sleep,T9]];

Cascading word-class unification Tagged= [[the,T1], [cat,T2], [sat,T3], [on,T4], [the,T5], [mat,T2], [and,T3], [went,T7], [to,T8], [sleep,T9]]; Tagged= [[the,T1], [cat,T2], [sat,T3], [on,T4], [the,T5], [mat,T2], [and,T3], [went,T4], [to,T8], [sleep,T9]]; Tagged= [[the,T1], [cat,T2], [sat,T3], [on,T4], [the,T5], [mat,T2], [and,T3], [went,T4], [to,T5], [sleep,T9]]; Tagged= [[the,T1], [cat,T2], [sat,T3], [on,T4], [the,T5], [mat,T2], [and,T3], [went,T4], [to,T5], [sleep,T2]]

Alternative constraints? Merging by word-class context may be too powerful ?constrain a word-type to N classes ?very hard constraint-satisfaction problem: 1M words, 50K types = average 20 tokens per type (but I don’t know how to do this…)

Conclusion: Hybrid clustering? Word-token constraint-based clustering for rare words, hapax legomena Word-type statistical clustering for high-freq words (closed-class function words) “Learning hints”: “seed” with limited PoS-lexicon

Future work Combining Corpus Linguistics and Machine Learning in linguistic knowledge discovery Applications in other languages (eg with minimal lexical resources), other language-like datasets

Summary Neoposy: What? Why? conflicting criteria defining PoS Unsupervised Machine Learning Clustering of word-types Unification of word-tokens Problems with token-unification Conclusions: hybrid clustering Linguistic knowledge discovery beyond English…