1 Word senses: a computational response Adam Kilgarriff.

Slides:



Advertisements
Similar presentations
Corpus Processing and NLP
Advertisements

CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
Part II. Statistical NLP Advanced Artificial Intelligence Part of Speech Tagging Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme Most.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
تمرين شماره 1 درس NLP سيلابس درس NLP در دانشگاه هاي ديگر ___________________________ راحله مکي استاد درس: دکتر عبدالله زاده پاييز 85.
Comments on Guillaume Pitel: “Using bilingual LSA for FrameNet annotation of French text from generic resources” Gerd Fliedner Computational Linguistics.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
First International Sketch Grammar Workshop Ljubljana 3-4 February 2010.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds.
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
Comparable Corpora BootCaT (CCBC) (or: In Praise of BootCaT) Adam Kilgarriff, Jan Pomikalek, Avinesh PVS Lexical Computing Ltd. Work Supported by EU FP7.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response.
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Towards the better software metrics tool motivation and the first experiences Gordana Rakić Zoran Budimac.
Do we need lexicographers? Prospects for automatic lexicography Adam Kilgarriff Lexical Computing Ltd University of Leeds UK.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 28Slide 1 CO7206 System Reengineering 4.2 Software Reengineering Most slides are Slides.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia.
Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.
Requirements Engineering Requirements Validation and Management Lecture-24.
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
1 Word senses: a computational response Adam Kilgarriff.
Information Extraction. Extracting Information from Text System : When would you like to meet Peter? User : Let’s see, if I can, I’d like to meet him.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
GDEX: Automatically finding good dictionary examples in a corpus.
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Approaches to Machine Translation
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Evaluating word sketches and corpora
CS 3304 Comparative Languages
Abstract Interpretation
Approaches to Machine Translation
Abstract Interpretation
Corpora, Language Technology and Maltese
User’s Perspective Laurie Gerber.
Presentation transcript:

1 Word senses: a computational response Adam Kilgarriff

Madrid 2010 Kilgarriff: Word senses: a computational response2 A word sense is a cluster of corpus lines  But I’m an NLP person  Automatic clustering?  Inspiration: Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 You can get semantic sense from corpora+stats

Madrid 2010 Kilgarriff: Word senses: a computational response3 First attempt  Longman 1994  Abject failure No grammar Corpus too small and noisy Naïve clustering Useless programmer

Madrid 2010 Kilgarriff: Word senses: a computational response4 Collocations  Easy Most words don’t go with most other words  Then build on what we can do well  metaphor, analogy, homonymy, rules all much harder

Madrid 2010 Kilgarriff: Word senses: a computational response5 Clustering  Word sketch Collocates organised by grammar  Dictionary Collocates (and other things) organised by meaning  How to re-organise

Madrid 2010 Kilgarriff: Word senses: a computational response6 Observation:  corpus: arbitrary sample  dictionary ( =lexicon) : systematic account Children  encounter arbitrary samples  develop systematic account

Madrid 2010 Kilgarriff: Word senses: a computational response7 Corpus  provisional, dispensable  used to develop lexicon

Madrid 2010 Kilgarriff: Word senses: a computational response8 Levels of abstraction  Direct linkage:  Fragile Updates (to C or D) break links  Dictionary: abstract  Corpus: raw  Intermediate level needed CorpusDictionary ===   ===

Madrid 2010 Kilgarriff: Word senses: a computational response9  How most automatic word sense disambiguation (WSD) works Analyse dictionary to give set of collocates Match to collocates in a corpus  Dispensable corpus CorpusDictionary ===   === ===   === Collocates

Madrid 2010 Kilgarriff: Word senses: a computational response10 Not just collocates  triples  parse the corpus  some “unary relations” I hear him singing  domain-based clues Collocates, Constructions, Domains = CoCoDo

Madrid 2010 Kilgarriff: Word senses: a computational response11  Automatically extract CoCoDos from corpus  How linked to senses? Automatic (WSD techniques) ‏  Manual “dictionary-free”: ideal for new dictionaries Labour costs  Mixed WSD with manual confirmation/correction CorpusDictionary ===   === ===   === CoCoDo CoCoDo Linking CoCoDo’s to senses

Madrid 2010 Kilgarriff: Word senses: a computational response12 Semi-automatic dictionary drafting (SADD) ‏  CoCoDo database  Automatic clustering  Lexicographer input  More clustering  Dictionary with corpus inside

Madrid 2010 Kilgarriff: Word senses: a computational response13  Automatic clustering of collocates Propose senses  Iterate: Lexicographer input  Confirm/reject/edit sense inventory  Assigns collocates / corpus lines to senses WSD  Uses seeds to build full WSD for word  Find more collocates for each sense  XML dictionary entry Load into dictionary-editing tool

Madrid 2010 Kilgarriff: Word senses: a computational response14 Atkins method for bilingual lexicography  Analyse source language From corpus List all expressions that might possibly have a non-predictable translation  Very fine grained  Lots of collocations target-language-neutral; re-usable  Translate  Edit to finalise dictionary

Madrid 2010 Kilgarriff: Word senses: a computational response15 Current projects/initiatives  Semi-automatic Dictionary Disambiguation (SADD) ‏  Tickbox Lexicography (TBL) ‏ Slovene project New English-Irish Dictionary  Putting Collocations in the Dictionary (PCID) ‏