1 Word senses: a computational response Adam Kilgarriff
Madrid 2010 Kilgarriff: Word senses: a computational response2 A word sense is a cluster of corpus lines But I’m an NLP person Automatic clustering? Inspiration: Hindle 1991, Schütze 1993, Grefenstette 1993, Lin 1999 You can get semantic sense from corpora+stats
Madrid 2010 Kilgarriff: Word senses: a computational response3 First attempt Longman 1994 Abject failure No grammar Corpus too small and noisy Naïve clustering Useless programmer
Madrid 2010 Kilgarriff: Word senses: a computational response4 Collocations Easy Most words don’t go with most other words Then build on what we can do well metaphor, analogy, homonymy, rules all much harder
Madrid 2010 Kilgarriff: Word senses: a computational response5 Clustering Word sketch Collocates organised by grammar Dictionary Collocates (and other things) organised by meaning How to re-organise
Madrid 2010 Kilgarriff: Word senses: a computational response6 Observation: corpus: arbitrary sample dictionary ( =lexicon) : systematic account Children encounter arbitrary samples develop systematic account
Madrid 2010 Kilgarriff: Word senses: a computational response7 Corpus provisional, dispensable used to develop lexicon
Madrid 2010 Kilgarriff: Word senses: a computational response8 Levels of abstraction Direct linkage: Fragile Updates (to C or D) break links Dictionary: abstract Corpus: raw Intermediate level needed CorpusDictionary === ===
Madrid 2010 Kilgarriff: Word senses: a computational response9 How most automatic word sense disambiguation (WSD) works Analyse dictionary to give set of collocates Match to collocates in a corpus Dispensable corpus CorpusDictionary === === === === Collocates
Madrid 2010 Kilgarriff: Word senses: a computational response10 Not just collocates triples parse the corpus some “unary relations” I hear him singing domain-based clues Collocates, Constructions, Domains = CoCoDo
Madrid 2010 Kilgarriff: Word senses: a computational response11 Automatically extract CoCoDos from corpus How linked to senses? Automatic (WSD techniques) Manual “dictionary-free”: ideal for new dictionaries Labour costs Mixed WSD with manual confirmation/correction CorpusDictionary === === === === CoCoDo CoCoDo Linking CoCoDo’s to senses
Madrid 2010 Kilgarriff: Word senses: a computational response12 Semi-automatic dictionary drafting (SADD) CoCoDo database Automatic clustering Lexicographer input More clustering Dictionary with corpus inside
Madrid 2010 Kilgarriff: Word senses: a computational response13 Automatic clustering of collocates Propose senses Iterate: Lexicographer input Confirm/reject/edit sense inventory Assigns collocates / corpus lines to senses WSD Uses seeds to build full WSD for word Find more collocates for each sense XML dictionary entry Load into dictionary-editing tool
Madrid 2010 Kilgarriff: Word senses: a computational response14 Atkins method for bilingual lexicography Analyse source language From corpus List all expressions that might possibly have a non-predictable translation Very fine grained Lots of collocations target-language-neutral; re-usable Translate Edit to finalise dictionary
Madrid 2010 Kilgarriff: Word senses: a computational response15 Current projects/initiatives Semi-automatic Dictionary Disambiguation (SADD) Tickbox Lexicography (TBL) Slovene project New English-Irish Dictionary Putting Collocations in the Dictionary (PCID)