1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 2 How do you find out about a language? Native speakers Dictionaries and Grammars Corpus
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 3 Four ages of corpus research
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 4 Age 1: Pre-computer Oxford English Dictionary: 20 million index cards
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 5 Age 2: KWIC Concordances From 1980 Computerised
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 6 Age 2: KWIC Concordance
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 7 Age 2: KWIC Concordances From 1980 Computerised COBUILD project was innovator the coloured-pens method
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 8 1 political association 4 person in an agreement/dispute 2 social event 5 to be party to something... 3 group of people The coloured pens method
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 9 Age 2: limitations as corpora get bigger: too much data 50 lines for a word: read all 500 lines: could read all, takes a long time 5000 lines: no
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 10 Age 3: Collocation statistics Problem: too much data - how to summarise? Solution: list of words occurring in neighbourhood of headword, with frequencies Sorted by salience
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 11 Collocation listing For right collocates of save (>5 hits) wordfreqwordfreq forests6life36 $1.26dollars8 lives37costs7 enormous6thousands6 annually7face9 jobs20estimated6 money64your7
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 12 Age 4: The word sketch A corpus-derived one-page summary of a word’s grammatical and collocational behaviour
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 13 Age 4: The word sketch Large well-balanced corpus Parse to find subjects, objects, heads, modifiers etc One list for each grammatical relation Statistics to sort each list
Malta, Nov 2006
Kilgarriff, Lexical Computing Slide: 15 Macmillan English Dictionary For Advanced Learners Ed: Rundell, 2002
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 16 Developer: Pavel Rychly, Brno Users: OUP, Chambers, CUP Universities for teaching and research ELT textbook authors Demo: Self-registration for free account Paper: Kilgarriff & Rychly (2004) – Proc Euralex, Lorient, France) [pdf]pdf