Download presentation
Presentation is loading. Please wait.
Published byKory Henry Modified over 9 years ago
1
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing Ltd, UK Masaryk University, Czech Rep A&C Black Publishers Ltd., UK Macmillan Education, UK Lexicography MasterClass Ltd., UK
2
Users appreciate examples Paper: space constraints Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing
3
Project Macmillan English dictionary Licensing arrangement with A&C Black Already had 1000 collocation boxes See collocationality paper, ELX 2006 Average 8 per box New electronic version All 8000 collocations need examples Authentic; from corpus
4
Old method Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit
5
New method Lexicographer Gets sorted concordance 20 best examples in spreadsheet Less reading through Tick the first good one, edit
6
What makes a good example? Readable EFL users Informative Typical, for the collocation Gives context which helps user understand the target word/phrase
7
Readability 70 years research Not just (or mainly) EFL Educational theory Teaching children to read Instruction manuals Publishing
8
Readability tests Fleish Reading Ease test (1948) Ave sentence length, ave word length In some word processing software Many similar measures Recent work Language modelling from training data Target levels US grades Common European Framwork
9
GDEX Get concordance for collocation For each sentence Score it Sort Show best ones
10
GDEX heuristics Sentence length (10-26 words) Mostly common words: good Rare words: bad Sentences Start with capital, end with one of.!? No [, ],, http, \ Penalise: Other punctuation, numbers More than 2 or 3 capitals Typicality: third collocate is a plus
11
Weighting For each sentence Score on each heuristic Weight scores Add together weighted score How to set weights?
12
Machine learning Two students: Manually judged 1000 “good examples” Weights set to mimic students´ choices
13
Was it successful? Did it save lexicographer time? Definitely (says project manager) Corpus choice Started with BNC but Too old Not enough examples If no good examples in corpus, GDEX can’t help Changed to UKWaC 20 times bigger; from web; contemporary Better Most web junk filtered out Usually a good example in top twenty
14
GDEX and TALC TALC Teaching and Language Corpora Goal: bring corpora into lg teaching Usual problem Concordances are tough for learners to read Way forward GDEX examples Half way between dictionary and corpus
15
GDEX: Models for use More examples for dictionaries Speed up, as with MED or Fully automatic “more examples” Corpus query tool Sort concordances, best first Now an option in the Sketch Engine Automatic collocations dictionary http://forbetterenglish.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.