Presentation is loading. Please wait.

Presentation is loading. Please wait.

GDEX: Automatically finding good dictionary examples in a corpus.

Similar presentations


Presentation on theme: "GDEX: Automatically finding good dictionary examples in a corpus."— Presentation transcript:

1 GDEX: Automatically finding good dictionary examples in a corpus

2 Madrid 2010Kilgarriff: GDEX2 Users appreciate examples  Paper: space constraints  Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing

3 Madrid 2010Kilgarriff: GDEX3 Project  Macmillan English dictionary  Already had 1000 collocation boxes  Average 8 per box  New electronic version All 8000 collocations need examples  Authentic; from corpus

4 Madrid 2010Kilgarriff: GDEX4 Old method  Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit

5 Madrid 2010Kilgarriff: GDEX5 New method  Lexicographer Gets sorted concordance  20 best examples in spreadsheet Less reading through Tick the first good one, edit

6 Madrid 2010Kilgarriff: GDEX6 What makes a good example?  Readable EFL users  Informative Typical, for the collocation Gives context which helps user understand the target word/phrase

7 Madrid 2010Kilgarriff: GDEX7 Readability  70 years research  Not just (or mainly) EFL Educational theory  Teaching children to read Instruction manuals  Early work: US military Publishing  People like newspapers and magazines that they find easy to read

8 Madrid 2010Kilgarriff: GDEX8 Readability tests  Fleish Reading Ease test 1948 Ave sentence length, ave word length In some word processing software  Many similar measures  Recent work training data for different reading levels Language modelling  Target levels US grades Now, increasingly: Common European Framwork

9 Madrid 2010Kilgarriff: GDEX9 GDEX  Get concordance for collocation  For each sentence Score it Sort Show best ones to lexicographer

10 Madrid 2010Kilgarriff: GDEX10 GDEX heuristics  Sentence length (10-26 words) ‏  Mostly common words is good  Rare words are bad  Sentences Start with capital, end with one of.!?  No [, ],, http, \  Not much other punctuation, numbers  Not too many capitals  Typicality: third collocate is a plus

11 Madrid 2010Kilgarriff: GDEX11 Weighting  For each sentence Score on each heuristic Weight scores Add together weighted score  How to set weights?

12 Madrid 2010Kilgarriff: GDEX12 Machine learning  Two students: Manually judged 1000 “good examples” Weights set so that the system made the same choices as the students

13 Madrid 2010Kilgarriff: GDEX13 Was it successful?  Did it save lexicographer time? Definitely (says project manager) ‏  Corpus choice Started with BNC but  Too old  Not enough examples If no good examples in corpus, GDEX can’t help Changed to UKWaC  20 times bigger; from web; contemporary  Better  Most web junk filtered out  Usually a good example in top twenty

14 Madrid 2010Kilgarriff: GDEX14 GDEX and TALC  TALC (Teaching and Language Corpora) ‏  Goal: bring corpora into lg teaching  Usual problem Concordances are tough for learners to read  Way forward GDEX examples Half way between dictionary and corpus

15 Madrid 2010Kilgarriff: GDEX15 GDEX: Models for use  More examples for dictionaries Speed up, as with MED or Fully automatic “more examples”  Corpus query tool Sort concordances, best first Now an option in the Sketch Engine  Automatic collocations dictionary http://forbetterenglish.com


Download ppt "GDEX: Automatically finding good dictionary examples in a corpus."

Similar presentations


Ads by Google