Download presentation
Presentation is loading. Please wait.
1
Corpora, Language Technology and Maltese
Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex
2
How do you find out about a language?
Native speakers Dictionaries and Grammars Corpus Kilgarriff, Lexical Computing
3
Four ages of corpus research
Kilgarriff, Lexical Computing
4
Kilgarriff, Lexical Computing
Age 1: Pre-computer James Murray, Chief Editor Oxford English Dictionary, vol 1 1879: 20 million index cards Kilgarriff, Lexical Computing
5
Age 2: KWIC Concordances
From 1980 Computerised Kilgarriff, Lexical Computing
6
Kilgarriff, Lexical Computing
Age 2: KWIC Concordance Kilgarriff, Lexical Computing
7
Age 2: KWIC Concordances
From 1980 Computerised COBUILD project was innovator the coloured-pens method Kilgarriff, Lexical Computing
8
Kilgarriff, Lexical Computing
The coloured pens method 1 political association person in an agreement/dispute 2 social event to be party to something... 3 group of people Kilgarriff, Lexical Computing
9
Kilgarriff, Lexical Computing
Age 2: limitations as corpora get bigger: too much data 50 lines for a word: read all 500 lines: could read all, takes a long time 5000 lines: no Pre-computer corpus lexicography, at its most systematic, used an index card for each citation. The citation was of a word, and tended to be of a rare word; for common words it does not seem worthwhile to take down citations. For an account of the COBUILD project see Looking up, edited by John Sinclair (Collins, 1987). COBUILD was a collaboration between Birmingham University and Collins, and gave rise I n due course to the COBUILD English dictionary, for learners of English. Kilgarriff, Lexical Computing
10
Age 3: Collocation statistics
Problem: too much data - how to summarise? Solution: list of words occurring in neighbourhood of headword, with frequencies Sorted by salience Kilgarriff, Lexical Computing
11
Kilgarriff, Lexical Computing
Collocation listing For right collocates of save (>5 hits) word freq forests 6 life 36 $1.2 dollars 8 lives 37 costs 7 enormous thousands annually face 9 jobs 20 estimated money 64 your Kilgarriff, Lexical Computing
12
Kilgarriff, Lexical Computing
Age 4: The word sketch A corpus-derived one-page summary of a word’s grammatical and collocational behaviour Kilgarriff, Lexical Computing
13
Kilgarriff, Lexical Computing
Age 4: The word sketch Large well-balanced corpus Parse to find subjects, objects, heads, modifiers etc One list for each grammatical relation Statistics to sort each list Kilgarriff, Lexical Computing
14
Macmillan English Dictionary For Advanced Learners
Ed: Rundell, 2002 Kilgarriff, Lexical Computing
15
Kilgarriff, Lexical Computing
Developer: Pavel Rychly, Brno Users: OUP, Chambers, CUP Universities for teaching and research ELT textbook authors Demo: Self-registration for free account Kilgarriff, Lexical Computing
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.