Download presentation
Presentation is loading. Please wait.
Published byMariah Ross Modified over 8 years ago
1
GDEX: Automatically finding good dictionary examples in a corpus
2
Madrid 2010Kilgarriff: GDEX2 Users appreciate examples Paper: space constraints Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing
3
Madrid 2010Kilgarriff: GDEX3 Project Macmillan English dictionary Already had 1000 collocation boxes Average 8 per box New electronic version All 8000 collocations need examples Authentic; from corpus
4
Madrid 2010Kilgarriff: GDEX4 Old method Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit
5
Madrid 2010Kilgarriff: GDEX5 New method Lexicographer Gets sorted concordance 20 best examples in spreadsheet Less reading through Tick the first good one, edit
6
Madrid 2010Kilgarriff: GDEX6 What makes a good example? Readable EFL users Informative Typical, for the collocation Gives context which helps user understand the target word/phrase
7
Madrid 2010Kilgarriff: GDEX7 Readability 70 years research Not just (or mainly) EFL Educational theory Teaching children to read Instruction manuals Early work: US military Publishing People like newspapers and magazines that they find easy to read
8
Madrid 2010Kilgarriff: GDEX8 Readability tests Fleish Reading Ease test 1948 Ave sentence length, ave word length In some word processing software Many similar measures Recent work training data for different reading levels Language modelling Target levels US grades Now, increasingly: Common European Framwork
9
Madrid 2010Kilgarriff: GDEX9 GDEX Get concordance for collocation For each sentence Score it Sort Show best ones to lexicographer
10
Madrid 2010Kilgarriff: GDEX10 GDEX heuristics Sentence length (10-26 words) Mostly common words is good Rare words are bad Sentences Start with capital, end with one of.!? No [, ],, http, \ Not much other punctuation, numbers Not too many capitals Typicality: third collocate is a plus
11
Madrid 2010Kilgarriff: GDEX11 Weighting For each sentence Score on each heuristic Weight scores Add together weighted score How to set weights?
12
Madrid 2010Kilgarriff: GDEX12 Machine learning Two students: Manually judged 1000 “good examples” Weights set so that the system made the same choices as the students
13
Madrid 2010Kilgarriff: GDEX13 Was it successful? Did it save lexicographer time? Definitely (says project manager) Corpus choice Started with BNC but Too old Not enough examples If no good examples in corpus, GDEX can’t help Changed to UKWaC 20 times bigger; from web; contemporary Better Most web junk filtered out Usually a good example in top twenty
14
Madrid 2010Kilgarriff: GDEX14 GDEX and TALC TALC (Teaching and Language Corpora) Goal: bring corpora into lg teaching Usual problem Concordances are tough for learners to read Way forward GDEX examples Half way between dictionary and corpus
15
Madrid 2010Kilgarriff: GDEX15 GDEX: Models for use More examples for dictionaries Speed up, as with MED or Fully automatic “more examples” Corpus query tool Sort concordances, best first Now an option in the Sketch Engine Automatic collocations dictionary http://forbetterenglish.com
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.