Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.

Slides:



Advertisements
Similar presentations
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Advertisements

Corpus Processing and NLP
How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.
The Meaning of Language
1 Why do CPA? Patrick Hanks Research Institute for Information and Language Processing, University of Wolverhampton; Bristol Centre for Linguistics, University.
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Evaluating the Waspbench A Lexicography Tool Incorporating Word Sense Disambiguation Rob Koeling, Adam Kilgarriff, David Tugwell, Roger Evans ITRI, University.
Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
1 Corpora for the coming decade Adam Kilgarriff. Dublin June 2009 Kilgarriff: Corpora for the coming decade2 How should they be different?  Bigger 
1 Linguistics and translation theory Mark Shuttleworth Teaching Translation Swansea, 20 January 2006.
Word sense disambiguation and information retrieval Chapter 17 Jurafsky, D. & Martin J. H. SPEECH and LANGUAGE PROCESSING Jarmo Ritola -
Corpus Creation for Lexicography Adam Kilgarriff, Michael Rundell Lexicography MasterClass, UK Elaine Ui Dhonnchadha ITE (Linguistics Institute of Ireland)
January 12, Statistical NLP: Lecture 2 Introduction to Statistical NLP.
The Unreasonable Effectiveness of Data Alon Halevy, Peter Norvig, and Fernando Pereira Kristine Monteith May 1, 2009 CS 652.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
PSY 369: Psycholinguistics Some basic linguistic theory part3.
Lecture 1 Introduction: Linguistic Theory and Theories
Deny A. Kwary Internal Structures of Dictionary Entries.
Lecture 1, 7/21/2005Natural Language Processing1 CS60057 Speech &Natural Language Processing Autumn 2005 Lecture 1 21 July 2005.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
First International Sketch Grammar Workshop Ljubljana 3-4 February 2010.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
1 How to Compute the Meaning of Natural Language Utterances Patrick Hanks, Research Institute of Information and Language Processing, University of Wolverhampton.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
1 The Long Road from Text to Meaning Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Word senses Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds.
Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
EEL 5937 Agent communication EEL 5937 Multi Agent Systems Lecture 10, Feb. 6, 2003 Lotzi Bölöni.
The Current State of FrameNet CLFNG June 26, 2006 Fillmore.
Word Sense Disambiguation Reading: Chap 16-17, Jurafsky & Martin Instructor: Rada Mihalcea.
1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response.
Using Corpora in Language Research Adam Kilgarriff Lexical Computing Ltd Universities of Leeds January 2013Adam Kilgarriff.
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
The Sketch Engine as Infrastructure for Large Scale Text Collections for Humanities Research Adam Kilgarriff Lexical Computing Ltd. & Univ of Leeds, UK.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
Natural Language Processing Menu Based Natural Language Interfaces -Kyle Neumeier.
Do we need lexicographers? Prospects for automatic lexicography Adam Kilgarriff Lexical Computing Ltd University of Leeds UK.
1 Word senses: a computational response Adam Kilgarriff.
Subcorpus configuration Adam Kilgarriff. Feb 2010Kilgarriff: IWSG: Subcorpora2 “you can’t get away from genre” Bonnie Weber, Keynote Lecture ICON (Indian.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Introduction Chapter 1 Foundations of statistical natural language processing.
EEL 5937 Agent communication EEL 5937 Multi Agent Systems Lotzi Bölöni.
CS460/IT632 Natural Language Processing/Language Technology for the Web Lecture 1 (03/01/06) Prof. Pushpak Bhattacharyya IIT Bombay Introduction to Natural.
The Unreasonable Effectiveness of Data
Grammar is to Meaning as the Law if to Good Behaviour Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus search What are the most common words in English
Zdroje jazykových dat Word senses Sense tagged corpora.
For Monday Read chapter 26 Homework: –Chapter 23, exercises 8 and 9.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
1 Word senses: a computational response Adam Kilgarriff.
The New English-Irish Dictionary Pádraig Ó Mianáin EFNIL 2012.
In Other Words: a Coursebook on Translation (1992)
Evaluating word sketches and corpora
Natural Language Processing
Corpora, Language Technology and Maltese
Meaning Out There Nayuta Miki (JSPS/Nihon University)
Information Retrieval
Presentation transcript:

Between Corpus and Dictionary Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds, Sussex

Szeged, Jan 2008Kilgarriff, Global WordNet2 What is a word sense?

Szeged, Jan 2008Kilgarriff, Global WordNet3 Preliminaries What is language? What is meaning?

Szeged, Jan 2008Kilgarriff, Global WordNet4 What is language?

Szeged, Jan 2008Kilgarriff, Global WordNet5 What is language? In our heads

Szeged, Jan 2008Kilgarriff, Global WordNet6 What is language? In our heads In texts and sound signals

Szeged, Jan 2008Kilgarriff, Global WordNet7 What is language? In our heads In texts and sound signals Both

Szeged, Jan 2008Kilgarriff, Global WordNet8 Methodology Study language in our heads Introspection Semantic analysis Experiments with human subjects “rationalist” (Leibniz, Chomsky) Problems: coverage, arbitrariness

Szeged, Jan 2008Kilgarriff, Global WordNet9 Methodology Study text “empiricist” (Locke, Hume) Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals

Szeged, Jan 2008Kilgarriff, Global WordNet10 It goes against the grain What is important about a sentence? its meaning Corpus methodology: Throw away individual sentence meaning Find patterns

Szeged, Jan 2008Kilgarriff, Global WordNet11 Empiricist linguistics A new way to find out about language 15 years of rapid ascent Computers Corpora bigger and bigger data sets available Language technology tools lemmatizers, POS-taggers, parsers, machine learning for pattern finding

Szeged, Jan 2008Kilgarriff, Global WordNet12 Rationalists vs empiricists in the age of the web semantic web vs Google?

Szeged, Jan 2008Kilgarriff, Global WordNet13 What are you? Temperament Complementary/alternatives Barbu and Poesio, Keller and Lapata: comparisons, evaluations (AK: current research project)

Szeged, Jan 2008Kilgarriff, Global WordNet14 What is meaning? Fregean Gricean

Szeged, Jan 2008Kilgarriff, Global WordNet15 Gottlob Frege ( ) Founder of modern logic Truth values The sentence “grass is green” is true if and only if grass is green (Tarski) Meanings of words, phrases are such that: Put them together in a sentence State basic facts Sentence computes to ‘true’ if sentence is true, ‘false’ if it is false

Szeged, Jan 2008Kilgarriff, Global WordNet16 Gottlob Frege ( ) Formal semantics Sparkling analyses for quantifiers, connectives Montague semantics Foundations for maths, databases, ontologies …

Szeged, Jan 2008Kilgarriff, Global WordNet17 H. P. Grice ( ) An agent means something by an utterance if and only if they intended the utterance to produce some effect in an audience by means of the recognition of this intention. Dictionary of Philosophy of Mind,

Szeged, Jan 2008Kilgarriff, Global WordNet18 Meaning is something you do Basis of meaning is Meaning event Speaker’s intention Speaker’s expectation of interpretation of hearer (messy, hard)

Szeged, Jan 2008Kilgarriff, Global WordNet19 Strawson commentary (1970s) For the sake of a label, we might call it the conflict between the theorists of communication-intention and the theorists of formal semantics. […] A struggle on what seems to be such a central issue in philosophy should have something of a Homeric quality; and a Homeric struggle calls for gods and heroes. I can at least, though tentatively, name some living captains and benevolent shades: on the one side, say, Grice, Austin, and the later Wittgenstein; on the other, Chomsky, Frege, and the earlier Wittgenstein.

Szeged, Jan 2008Kilgarriff, Global WordNet20 Battle of the two Adams?

Szeged, Jan 2008Kilgarriff, Global WordNet21 Relevance to word senses Fregean Supports reasoning Builds on well-defined word-meanings Identifying word meanings: can’t help Fall back on Grice

Szeged, Jan 2008Kilgarriff, Global WordNet22 Fauconnier and Turner “linguistics expressions prompt for meanings rather than express meanings” (AK chapter, Agirre and Edmonds WSD book)

Szeged, Jan 2008Kilgarriff, Global WordNet23 Preliminaries over What is a word sense

Szeged, Jan 2008Kilgarriff, Global WordNet24 The lexicographers They create them Methods Introspection Other dictionaries Corpus Atkins, Hanks, Krishnamurthy

Szeged, Jan 2008Kilgarriff, Global WordNet25 What is a word sense (1) SFIP Sufficiently frequent insufficiently predictable (a glass of) whisky x (a glass of) tequila

Szeged, Jan 2008Kilgarriff, Global WordNet26 What is a word sense (2) homonymy analogy polysemy rules collocation

Szeged, Jan 2008Kilgarriff, Global WordNet27 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers

Szeged, Jan 2008Kilgarriff, Global WordNet28 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet29 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet30 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet31 What is a word sense (3)

Szeged, Jan 2008Kilgarriff, Global WordNet32 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting

Szeged, Jan 2008Kilgarriff, Global WordNet33 I don’t believe in word senses Believe in: resurrection ghost witch vampire god miracle fairy Philosophy: Ontological commitment (same meaning different register) “good entities to build belief systems on”

Szeged, Jan 2008Kilgarriff, Global WordNet34 But I’m an NLP person Automatic clustering? Inspiration: Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 You can get semantic sense from corpora+stats

Szeged, Jan 2008Kilgarriff, Global WordNet35 First attempt Longman 1994 Abject failure No grammar Corpus too small and noisy Naïve clustering Useless programmer

Szeged, Jan 2008Kilgarriff, Global WordNet36 Collocations Easy Most words don’t go with most other words Then build on what we can do well (metaphor, analogy, homonymy, rules: all much harder)

Szeged, Jan 2008Kilgarriff, Global WordNet37 The Sketch Engine 2003: programmer problem solved Corpora More available Build big clean ones from web Grammar POS-taggers/lemmatisers available Shallow regexp grammars if no full parser Stats: progress (Lin, Curran, Evert …)

Szeged, Jan 2008Kilgarriff, Global WordNet38 demo

Szeged, Jan 2008Kilgarriff, Global WordNet39 Clustering Word sketch Collocates organised by grammar Dictionary Collocates (and other things) organised by meaning How to re-organise Three phases

Szeged, Jan 2008Kilgarriff, Global WordNet40 Semi-automatic dictionary drafting (SADD) Automatic clustering of collocates Propose senses Iterate: Lexicographer input Confirm/reject/edit sense inventory Assigns collocates / corpus lines to senses WSD Uses seeds to build full WSD for word Find more collocates for each sense XML dictionary entry Load into dictionary-editing tool

Szeged, Jan 2008Kilgarriff, Global WordNet41 Atkins method for bilingual lexicography Analyse source language From corpus List all expressions that might possibly have a non-predictable translation Very fine grained Lots of collocations target-language-neutral; re-usable Translate Edit to finalise dictionary

Szeged, Jan 2008Kilgarriff, Global WordNet42 New English-Irish Dictionary Irish: Gaelic language, some native speakers, culturally important for Ireland Project To replace dictionary from 1950s Government-funded project Lexicography MasterClass (Atkins Rundell Kilgarriff) designed project in 2003

Szeged, Jan 2008Kilgarriff, Global WordNet43 English analysis for NEID New project, 1 st Feb late 2010 Contractor: Lexicography MasterClass 12 lexicographers Plan Test SADD If viable, use it on industrial scale

Szeged, Jan 2008Kilgarriff, Global WordNet44 demo2

Szeged, Jan 2008Kilgarriff, Global WordNet45 Thank you Sketch Engine: Lexicom workshop Pre-Euralex, July, Barcelona Pre-CICLING, Mexico, Feb 2009