Do we need lexicographers? Prospects for automatic lexicography Adam Kilgarriff Lexical Computing Ltd University of Leeds UK.

Slides:



Advertisements
Similar presentations
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Advertisements

COMP3410 DB32: Technologies for Knowledge Management 08 : Introduction to Knowledge Discovery By Eric Atwell, School of Computing, University of Leeds.
The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd
Finding multiwords of more than two words Adam Kilgarriff, Pavel Rychly, Vojtech Kovar, Vıt Baisa Lexical Computing Ltd; Masaryk Univ., Cz.
How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
1 Corpora for the coming decade Adam Kilgarriff. Dublin June 2009 Kilgarriff: Corpora for the coming decade2 How should they be different?  Bigger 
Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing.
Simple Features for Chinese Word Sense Disambiguation Hoa Trang Dang, Ching-yi Chia, Martha Palmer, Fu- Dong Chiou Computer and Information Science University.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
Overall Information Extraction vs. Annotating the Data Conference proceedings by O. Etzioni, Washington U, Seattle; S. Handschuh, Uni Krlsruhe.
Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures Presenter: Cosmin Adrian Bejan Alexander Budanitsky and.
Today Writing: using the comma –Writing task Corpus linguistics talk, Part 2 Re-organize groups –Group news discussion.
Designing clustering methods for ontology building: The Mo’K workbench Authors: Gilles Bisson, Claire Nédellec and Dolores Cañamero Presenter: Ovidiu Fortu.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
SI485i : NLP Set 9 Advanced PCFGs Some slides from Chris Manning.
WG3: Innovative e-dictionaries Simon Krek „Jožef Stefan“ Institute, Ljubljana, Slovenia Carole Tiberius Institute of Dutch Lexicology, Leiden, the Netherlands.
A Statistical and Schema Independent Approach to Identify Equivalent Properties on Linked Data † Kno.e.sis Center Wright State University Dayton OH, USA.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Lexical Semantics CSCI-GA.2590 – Lecture 7A
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
First International Sketch Grammar Workshop Ljubljana 3-4 February 2010.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
TEACHING VOCABULARY Калинина Е.А. доцент кафедры филологического образования СарИПКиПРО.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds.
Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.
W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
Automatic Question Generation for Vocabulary Assessment
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response.
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
The Sketch Engine as Infrastructure for Large Scale Text Collections for Humanities Research Adam Kilgarriff Lexical Computing Ltd. & Univ of Leeds, UK.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Hidden-Variable Models for Discriminative Reranking Jiawen, Liu Spoken Language Processing Lab, CSIE National Taiwan Normal University Reference: Hidden-Variable.
1 Word senses: a computational response Adam Kilgarriff.
Subcorpus configuration Adam Kilgarriff. Feb 2010Kilgarriff: IWSG: Subcorpora2 “you can’t get away from genre” Bonnie Weber, Keynote Lecture ICON (Indian.
Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.
Zdroje jazykových dat Word senses Sense tagged corpora.
Towards a Translation Assessment Assistant Tom Cheesman.
1 Word senses: a computational response Adam Kilgarriff.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
Evaluating word sketches and corpora
CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.
Acts 17: Acts 17:16-34 Acts 17:16-34 Learn about your audience: the unspoken rules of the people how they live what they believe, how they practice.
Corpora, Language Technology and Maltese
Presentation transcript:

Do we need lexicographers? Prospects for automatic lexicography Adam Kilgarriff Lexical Computing Ltd University of Leeds UK

Bolzano, May 2012Adam Kilgarriff 2 Outline  Precision and recall  Between corpus and dictionary  Shopping list  Conclusions

Bolzano, May 2012Adam Kilgarriff 3 Find me all the fat cats  a request for information

Bolzano, May 2012Adam Kilgarriff 4 High recall  Lots of responses  Maybe not all good

Bolzano, May 2012Adam Kilgarriff 5 High precision  Fewer hits  Higher confidence

Bolzano, May 2012Adam Kilgarriff 6 Information-seeking RecallPrecision Computers good bad People bad good

Bolzano, May 2012Adam Kilgarriff 7 Cyborg: part-human, part-computer Treat your computer with respect. You and it can do great things together.

Bolzano, May 2012Adam Kilgarriff 8 Lexicography: finding facts about words Shopping list  collocations  grammatical patterns  examples  synonyms  labels –region –domain –register  translations  meanings

Szeged, Jan 2008Kilgarriff, Global WordNet 9 What is a word sense (1)  SFIP –Sufficiently frequent insufficiently predictable  (a glass of) whisky  x (a glass of) tequila

Szeged, Jan 2008Kilgarriff, Global WordNet 10 What is a word sense (2) homonymy analogy polysemy rules collocation

Szeged, Jan 2008Kilgarriff, Global WordNet 11 What is a word sense (3)  A cluster –Of instances of use  Operationalised as: corpus lines –Clustered by lexicographers

Szeged, Jan 2008Kilgarriff, Global WordNet 12 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet 13 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet 14 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet 15 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet 16 What is a word sense (3)  A cluster –Of instances of use  Operationalised as: corpus lines –Clustered by lexicographers  Makes sense of –Overlapping senses –Different dictionaries, different senses –Lumping and splitting

Szeged, Jan 2008Kilgarriff, Global WordNet 17 I don’t believe in word senses  Believe in: –resurrection ghost witch vampire god miracle fairy  Philosophy: –Ontological commitment –(same meaning different register)  “good entities to build belief systems on”

Szeged, Jan 2008Kilgarriff, Global WordNet 18 But I’m an NLP person  Automatic clustering?  Inspiration: –Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 –You can get semantic sense from corpora+stats

Szeged, Jan 2008Kilgarriff, Global WordNet 19 First attempt  Longman 1994  Abject failure –No grammar –Corpus too small and noisy –Naïve clustering –Useless programmer

Szeged, Jan 2008Kilgarriff, Global WordNet 20 Second attempt  SENSEVALS 1998, 2001, 2004…  mitigated failure –Rarely over two thirds correct

Szeged, Jan 2008Kilgarriff, Global WordNet 21 Third attempt  SADD (semi-automatic dictionary drafting) 2008  With Pavel Rychly  I thought I knew what I was doing but –Probably a failure

Szeged, Jan 2008Kilgarriff, Global WordNet 22 Collocations  Easy –Most words don’t go with most other words  Then build on what we can do well  (metaphor, analogy, homonymy, rules: all much harder)

Bolzano, May 2012Adam Kilgarriff 23 Lexicography: finding facts about words Shopping list  collocations  grammatical patterns  examples  synonyms  labels –region –domain –register  translations  meanings Yes ? No

Bolzano, May 2012Adam Kilgarriff 24 Thank you