Presentation is loading. Please wait.

Presentation is loading. Please wait.

GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.

Similar presentations


Presentation on theme: "GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing."— Presentation transcript:

1 GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing Ltd, UK Masaryk University, Czech Rep A&C Black Publishers Ltd., UK Macmillan Education, UK Lexicography MasterClass Ltd., UK

2 Users appreciate examples  Paper: space constraints  Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing

3 Project  Macmillan English dictionary Licensing arrangement with A&C Black  Already had 1000 collocation boxes See collocationality paper, ELX 2006  Average 8 per box  New electronic version All 8000 collocations need examples  Authentic; from corpus

4 Old method  Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit

5 New method  Lexicographer Gets sorted concordance  20 best examples in spreadsheet Less reading through Tick the first good one, edit

6 What makes a good example?  Readable EFL users  Informative Typical, for the collocation Gives context which helps user understand the target word/phrase

7 Readability  70 years research  Not just (or mainly) EFL Educational theory  Teaching children to read Instruction manuals Publishing

8 Readability tests  Fleish Reading Ease test (1948) Ave sentence length, ave word length In some word processing software  Many similar measures  Recent work Language modelling from training data  Target levels US grades Common European Framwork

9 GDEX  Get concordance for collocation  For each sentence Score it Sort Show best ones

10 GDEX heuristics  Sentence length (10-26 words)  Mostly common words: good  Rare words: bad  Sentences Start with capital, end with one of.!?  No [, ],, http, \  Penalise: Other punctuation, numbers More than 2 or 3 capitals  Typicality: third collocate is a plus

11 Weighting  For each sentence Score on each heuristic Weight scores Add together weighted score  How to set weights?

12 Machine learning  Two students: Manually judged 1000 “good examples” Weights  set to mimic students´ choices

13 Was it successful?  Did it save lexicographer time? Definitely (says project manager)  Corpus choice Started with BNC but  Too old  Not enough examples If no good examples in corpus, GDEX can’t help Changed to UKWaC  20 times bigger; from web; contemporary  Better  Most web junk filtered out  Usually a good example in top twenty

14 GDEX and TALC  TALC Teaching and Language Corpora  Goal: bring corpora into lg teaching  Usual problem Concordances are tough for learners to read  Way forward GDEX examples Half way between dictionary and corpus

15 GDEX: Models for use  More examples for dictionaries Speed up, as with MED or Fully automatic “more examples”  Corpus query tool Sort concordances, best first Now an option in the Sketch Engine  Automatic collocations dictionary http://forbetterenglish.com


Download ppt "GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing."

Similar presentations


Ads by Google