1 Word senses: a computational response Adam Kilgarriff Auckland 2012Kilgarriff: Word senses: a computational response
Auckland 2012 Kilgarriff: Word senses: a computational response2 My PhD (in 5 slides) What is a word sense
Auckland 2012 Kilgarriff: Word senses: a computational response3 The lexicographers They create them Methods Introspection Other dictionaries Corpus Atkins, Hanks, Krishnamurthy
Auckland 2012 Kilgarriff: Word senses: a computational response4 What is a word sense (1) SFIP Sufficiently frequent insufficiently predictable (a glass of) whisky x (a glass of) tequila
Auckland 2012 Kilgarriff: Word senses: a computational response5 What is a word sense (2) homonymy analogy polysemy rules collocation
Auckland 2012 Kilgarriff: Word senses: a computational response6 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers
Auckland 2012 Kilgarriff: Word senses: a computational response7 What is a word sense (3)
Auckland 2012 Kilgarriff: Word senses: a computational response8 What is a word sense (3)
Auckland 2012 Kilgarriff: Word senses: a computational response9 What is a word sense (3)
Auckland 2012 Kilgarriff: Word senses: a computational response10 What is a word sense (3)
Auckland 2012 Kilgarriff: Word senses: a computational response11 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting
Auckland 2012 Kilgarriff: Word senses: a computational response12 I don’t believe in word senses Believe in: resurrection ghost witch vampire god miracle fairy Philosophy: Ontological commitment (same meaning different register) “good entities to build belief systems on”
Auckland 2012 Kilgarriff: Word senses: a computational response13 A word sense is a cluster of corpus lines But I’m an NLP person Automatic clustering? Inspiration: Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 You can get semantic sense from corpora+stats
Auckland 2012 Kilgarriff: Word senses: a computational response14 First attempt Longman 1994 Abject failure No grammar Corpus too small and noisy Naïve clustering Useless programmer
Auckland 2012 Kilgarriff: Word senses: a computational response15 Collocations Easy Most words don’t go with most other words Then build on what we can do well metaphor, analogy, homonymy, rules all much harder
Auckland 2012 Kilgarriff: Word senses: a computational response16 Clustering Word sketch Collocates organised by grammar Dictionary Collocates (and other things) organised by meaning How to re-organise
Auckland 2012 Kilgarriff: Word senses: a computational response17 Observation: corpus: arbitrary sample dictionary ( =lexicon) : systematic account Children encounter arbitrary samples develop systematic account
Auckland 2012 Kilgarriff: Word senses: a computational response18 Corpus provisional, dispensable used to develop lexicon
Auckland 2012 Kilgarriff: Word senses: a computational response19 Levels of abstraction Direct linkage: Fragile Updates (to C or D) break links Dictionary: abstract Corpus: raw Intermediate level needed CorpusDictionary === ===
Auckland 2012 Kilgarriff: Word senses: a computational response20 How most automatic word sense disambiguation (WSD) works Analyse dictionary to give set of collocates Match to collocates in a corpus Dispensable corpus CorpusDictionary === === === === Collocates
Auckland 2012 Kilgarriff: Word senses: a computational response21 Not just collocates triples parse the corpus some “unary relations” I hear him singing domain-based clues Collocates, Constructions, Domains = CoCoDo
Auckland 2012 Kilgarriff: Word senses: a computational response22 Automatically extract CoCoDos from corpus How linked to senses? Automatic (WSD techniques) Manual “dictionary-free”: ideal for new dictionaries Labour costs Mixed WSD with manual confirmation/correction CorpusDictionary === === === === CoCoDo CoCoDo Linking CoCoDo’s to senses
Auckland 2012 Kilgarriff: Word senses: a computational response23 Semi-automatic dictionary drafting (SADD) CoCoDo database Automatic clustering Lexicographer input More clustering Dictionary with corpus inside
Auckland 2012 Kilgarriff: Word senses: a computational response24 Automatic clustering of collocates Propose senses Iterate: Lexicographer input Confirm/reject/edit sense inventory Assigns collocates / corpus lines to senses WSD Uses seeds to build full WSD for word Find more collocates for each sense XML dictionary entry Load into dictionary-editing tool
Auckland 2012 Kilgarriff: Word senses: a computational response25 Fits with Atkins method for bilingual lexicography Analyse source language From corpus List all expressions that might possibly have a non-predictable translation Very fine grained Lots of collocations target-language-neutral; re-usable Translate Edit to finalise dictionary