1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response.

Slides:



Advertisements
Similar presentations
How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
Advertisements

1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System modeling 2.
Building an Ontology-based Multilingual Lexicon for Word Sense Disambiguation in Machine Translation Lian-Tze Lim & Tang Enya Kong Unit Terjemahan Melalui.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav Kirilov BulTreeBank Group LML,
XML on Semantic Web. Outline The Semantic Web Ontology XML Probabilistic DTD References.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.
On lexical ambiguity Ágoston Tóth, PhD University of Debrecen Ruzomberok 24 June, 2009 Sponsored by OTKA research grant K
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Part II. Statistical NLP Advanced Artificial Intelligence Applications of HMMs and PCFGs in NLP Wolfram Burgard, Luc De Raedt, Bernhard Nebel, Lars Schmidt-Thieme.
First International Sketch Grammar Workshop Ljubljana 3-4 February 2010.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Interpreting Dictionary Definitions Dan Tecuci May 2002.
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds.
L’età della parola Giuseppe Attardi Dipartimento di Informatica Università di Pisa ESA SoBigDataPisa, 24 febbraio 2015.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
Katrin Erk Vector space models of word meaning. Geometric interpretation of lists of feature/value pairs In cognitive science: representation of a concept.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
Collocations and Information Management Applications Gregor Erbach Saarland University Saarbrücken.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Do we need lexicographers? Prospects for automatic lexicography Adam Kilgarriff Lexical Computing Ltd University of Leeds UK.
Some questions -What is metadata? -Data about data.
1 Word senses: a computational response Adam Kilgarriff.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 28Slide 1 CO7206 System Reengineering 4.2 Software Reengineering Most slides are Slides.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia.
Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.
Requirements Engineering Requirements Validation and Management Lecture-24.
Zdroje jazykových dat Word senses Sense tagged corpora.
Bilingualism and Second Language Acquisition
Towards a Translation Assessment Assistant Tom Cheesman.
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
1 Word senses: a computational response Adam Kilgarriff.
Information Extraction. Extracting Information from Text System : When would you like to meet Peter? User : Let’s see, if I can, I’d like to meet him.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
Semantic Roles and Ontologies Ontologies Growing interest in the data structures known as ontologies Language expressions covering the.
Intro to NLP - J. Eisner1 Splitting Words a.k.a. “Word Sense Disambiguation”
Computable Contracts as Functional Elements
Urdu-to-English Stat-XFER system for NIST MT Eval 2008
Evaluating word sketches and corpora
Ontology.
Abstract Interpretation
Ontology.
Abstract Interpretation
Presentation transcript:

1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response

Auckland 2012 Kilgarriff: Word senses: a computational response2 My PhD (in 5 slides)  What is a word sense

Auckland 2012 Kilgarriff: Word senses: a computational response3 The lexicographers  They create them  Methods Introspection Other dictionaries Corpus  Atkins, Hanks, Krishnamurthy

Auckland 2012 Kilgarriff: Word senses: a computational response4 What is a word sense (1)  SFIP Sufficiently frequent insufficiently predictable  (a glass of) whisky  x (a glass of) tequila

Auckland 2012 Kilgarriff: Word senses: a computational response5 What is a word sense (2) homonymy analogy polysemy rules collocation

Auckland 2012 Kilgarriff: Word senses: a computational response6 What is a word sense (3)  A cluster Of instances of use  Operationalised as: corpus lines Clustered by lexicographers

Auckland 2012 Kilgarriff: Word senses: a computational response7 What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response8 What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response9 What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response10 What is a word sense (3)

Auckland 2012 Kilgarriff: Word senses: a computational response11 What is a word sense (3)  A cluster Of instances of use  Operationalised as: corpus lines Clustered by lexicographers  Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting

Auckland 2012 Kilgarriff: Word senses: a computational response12 I don’t believe in word senses  Believe in: resurrection ghost witch vampire god miracle fairy  Philosophy: Ontological commitment (same meaning different register)  “good entities to build belief systems on”

Auckland 2012 Kilgarriff: Word senses: a computational response13 A word sense is a cluster of corpus lines  But I’m an NLP person  Automatic clustering?  Inspiration: Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 You can get semantic sense from corpora+stats

Auckland 2012 Kilgarriff: Word senses: a computational response14 First attempt  Longman 1994  Abject failure No grammar Corpus too small and noisy Naïve clustering Useless programmer

Auckland 2012 Kilgarriff: Word senses: a computational response15 Collocations  Easy Most words don’t go with most other words  Then build on what we can do well  metaphor, analogy, homonymy, rules all much harder

Auckland 2012 Kilgarriff: Word senses: a computational response16 Clustering  Word sketch Collocates organised by grammar  Dictionary Collocates (and other things) organised by meaning  How to re-organise

Auckland 2012 Kilgarriff: Word senses: a computational response17 Observation:  corpus: arbitrary sample  dictionary ( =lexicon) : systematic account Children  encounter arbitrary samples  develop systematic account

Auckland 2012 Kilgarriff: Word senses: a computational response18 Corpus  provisional, dispensable  used to develop lexicon

Auckland 2012 Kilgarriff: Word senses: a computational response19 Levels of abstraction  Direct linkage:  Fragile Updates (to C or D) break links  Dictionary: abstract  Corpus: raw  Intermediate level needed CorpusDictionary ===   ===

Auckland 2012 Kilgarriff: Word senses: a computational response20  How most automatic word sense disambiguation (WSD) works Analyse dictionary to give set of collocates Match to collocates in a corpus  Dispensable corpus CorpusDictionary ===   === ===   === Collocates

Auckland 2012 Kilgarriff: Word senses: a computational response21 Not just collocates  triples  parse the corpus  some “unary relations” I hear him singing  domain-based clues Collocates, Constructions, Domains = CoCoDo

Auckland 2012 Kilgarriff: Word senses: a computational response22  Automatically extract CoCoDos from corpus  How linked to senses? Automatic (WSD techniques) ‏  Manual “dictionary-free”: ideal for new dictionaries Labour costs  Mixed WSD with manual confirmation/correction CorpusDictionary ===   === ===   === CoCoDo CoCoDo Linking CoCoDo’s to senses

Auckland 2012 Kilgarriff: Word senses: a computational response23 Semi-automatic dictionary drafting (SADD) ‏  CoCoDo database  Automatic clustering  Lexicographer input  More clustering  Dictionary with corpus inside

Auckland 2012 Kilgarriff: Word senses: a computational response24  Automatic clustering of collocates Propose senses  Iterate: Lexicographer input  Confirm/reject/edit sense inventory  Assigns collocates / corpus lines to senses WSD  Uses seeds to build full WSD for word  Find more collocates for each sense  XML dictionary entry Load into dictionary-editing tool

Auckland 2012 Kilgarriff: Word senses: a computational response25 Fits with Atkins method for bilingual lexicography  Analyse source language From corpus List all expressions that might possibly have a non-predictable translation  Very fine grained  Lots of collocations target-language-neutral; re-usable  Translate  Edit to finalise dictionary