Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex
How do you find out about a language? Native speakers Dictionaries and Grammars Corpus Kilgarriff, Lexical Computing
Four ages of corpus research Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing Age 1: Pre-computer James Murray, Chief Editor Oxford English Dictionary, vol 1 1879: 20 million index cards Kilgarriff, Lexical Computing
Age 2: KWIC Concordances From 1980 Computerised Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing Age 2: KWIC Concordance Kilgarriff, Lexical Computing
Age 2: KWIC Concordances From 1980 Computerised COBUILD project was innovator the coloured-pens method Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing The coloured pens method 1 political association 4 person in an agreement/dispute 2 social event 5 to be party to something... 3 group of people Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing Age 2: limitations as corpora get bigger: too much data 50 lines for a word: read all 500 lines: could read all, takes a long time 5000 lines: no Pre-computer corpus lexicography, at its most systematic, used an index card for each citation. The citation was of a word, and tended to be of a rare word; for common words it does not seem worthwhile to take down citations. For an account of the COBUILD project see Looking up, edited by John Sinclair (Collins, 1987). COBUILD was a collaboration between Birmingham University and Collins, and gave rise I n due course to the COBUILD English dictionary, for learners of English. Kilgarriff, Lexical Computing
Age 3: Collocation statistics Problem: too much data - how to summarise? Solution: list of words occurring in neighbourhood of headword, with frequencies Sorted by salience Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing Collocation listing For right collocates of save (>5 hits) word freq forests 6 life 36 $1.2 dollars 8 lives 37 costs 7 enormous thousands annually face 9 jobs 20 estimated money 64 your Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing Age 4: The word sketch A corpus-derived one-page summary of a word’s grammatical and collocational behaviour Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing Age 4: The word sketch Large well-balanced corpus Parse to find subjects, objects, heads, modifiers etc One list for each grammatical relation Statistics to sort each list Kilgarriff, Lexical Computing
Macmillan English Dictionary For Advanced Learners Ed: Rundell, 2002 Kilgarriff, Lexical Computing
Kilgarriff, Lexical Computing Developer: Pavel Rychly, Brno Users: OUP, Chambers, CUP Universities for teaching and research ELT textbook authors Demo: http://www.sketchengine.co.uk/ Self-registration for free account Kilgarriff, Lexical Computing