Download presentation
Presentation is loading. Please wait.
Published byCora Broad Modified over 10 years ago
1
The Cambridge Learner Corpus, English Profile, the Sketch Engine and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk
2
The Cambridge Learner Corpus, English Profile, the Sketch Engine, “freely available”, HOO, DANTE and the Kelly Project Adam Kilgarriff Lexical Computing Ltd http://www.sketchengine.co.uk
3
Cambridge Learner Corpus (CLC) Since 1993 – Nearly as old as CECL Leading resource (like ICLE) CUP and Cambridge ESOL – For better dictionaries, ELT courses, tests – Material: all from exams (levels A1-C2) 45m words; 22m error-tagged 200,000 scripts, 138 L1s, 203 nationalities
4
English Profile From 2006 Cambridge Univ, Univ Press, ESOL (+ others) Goal – for each CEFR level, find characteristic lexis and grammar – Main resource: CLC – Talk on Thursday Theodora Alexopolou, Helen Yannakoudakis
5
Flyers
6
Sketch Engine Leading corpus tool Word sketches – One-page summaries of a word’s grammatical and collocational behaviour In use at OUP, CUP, Collins, Macmillan, INL … 42 languages – Over 150 corpora – Since May including CHILDES: demodemo – Since last year including CLC
7
Error-coded corpus Challenge – Intuitive to search for x anywhere only where it is part of an error only where it is part of a correction where x can be a word, phrase, grammar pattern … Requirement for CLC in Sketch Engine
8
Sample text We will only use those informations to take part of our guest survey
9
Error-coded corpora in SkE demo
10
freely available
11
Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it??
12
freely available Free (MED online) Sense 1: not costing anything Sense 4: not limited by rules … anyone can get hold of it?? Available To download onto your com To use
13
Case studies ICLECLC Money225 EURNo To everyoneYesCambridge author/collab To download?No To useYes
14
Non-geeks Access is important, not download Web is beautiful
15
HOO / HOO+ Helping Our Own HOO: English-NNS NLP researchers – Developer = user: motivation – Shared task/competitive evaluation Organisers define task and prepare ‘gold standard’ Teams participate by running their software over test data Six teams (incl Tübingen), workshop end Sept
16
HOO+ (2012) Probably – English: learner data from CLC – Other languages? – Tasks Essay scoring Determiner, preposition errors ? http://www.clt.mq.edu.au/research/projects/hoo/
17
DANTE Highlights of English lexicography
18
DANTE
21
http://webdante.com Flyers
22
The KELLY Project EU Lifelong Learning Project Word cards – 9 languages Arabic Chinese English Greek Italian Norwegian Polish Russian Swedish – All 36 pairs – Words the learner should know (at A1 … C2) Partners Stockholm Univ, Gotheburg Univ, Adam Mickiewicz Univ, ILSP Athens, CNR Pisa, Oslo Univ, Leeds Univ, Keewords A/S, Lexical Computing Ltd
23
Interesting question How close to purely corpus-based can a pedagogic list be?
24
Method Take a general corpus Count Review, add, delete using other lists and corpora Translate (72 directed-lg-pairs) Words not in source list which occur in translations: – Review source list http://kelly.sketchengine.co.uk
25
Symmatrical pairs: and Cliques: – For x, y, z, … all pairs are symmetrical – 9-language cliques (English members) hospital library music sun theory
26
Homage
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.