Download presentation
Presentation is loading. Please wait.
Published byJoella Pearl Melton Modified over 9 years ago
1
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex
2
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 2 How do you find out about a language? Native speakers Dictionaries and Grammars Corpus
3
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 3 Four ages of corpus research
4
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 4 Age 1: Pre-computer Oxford English Dictionary: 20 million index cards
5
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 5 Age 2: KWIC Concordances From 1980 Computerised
6
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 6 Age 2: KWIC Concordance
7
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 7 Age 2: KWIC Concordances From 1980 Computerised COBUILD project was innovator the coloured-pens method
8
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 8 1 political association 4 person in an agreement/dispute 2 social event 5 to be party to something... 3 group of people The coloured pens method
9
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 9 Age 2: limitations as corpora get bigger: too much data 50 lines for a word: read all 500 lines: could read all, takes a long time 5000 lines: no
10
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 10 Age 3: Collocation statistics Problem: too much data - how to summarise? Solution: list of words occurring in neighbourhood of headword, with frequencies Sorted by salience
11
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 11 Collocation listing For right collocates of save (>5 hits) wordfreqwordfreq forests6life36 $1.26dollars8 lives37costs7 enormous6thousands6 annually7face9 jobs20estimated6 money64your7
12
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 12 Age 4: The word sketch A corpus-derived one-page summary of a word’s grammatical and collocational behaviour
13
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 13 Age 4: The word sketch Large well-balanced corpus Parse to find subjects, objects, heads, modifiers etc One list for each grammatical relation Statistics to sort each list
14
Malta, Nov 2006
15
Kilgarriff, Lexical Computing Slide: 15 Macmillan English Dictionary For Advanced Learners Ed: Rundell, 2002
16
Malta, Nov 2006 Kilgarriff, Lexical Computing Slide: 16 Developer: Pavel Rychly, Brno Users: OUP, Chambers, CUP Universities for teaching and research ELT textbook authors Demo: http://www.sketchengine.co.uk/ Self-registration for free account Paper: Kilgarriff & Rychly (2004) – Proc Euralex, Lorient, France) [pdf]pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.