GDEX: Automatically finding good dictionary examples in a corpus.

Slides:



Advertisements
Similar presentations
1 Corpora for all Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Advertisements

Linking Dictionary and Corpus Adam Kilgarriff Lexicography MasterClass Ltd Lexical Computing Ltd University of Sussex UK.
1 Corpora for the coming decade Adam Kilgarriff. Dublin June 2009 Kilgarriff: Corpora for the coming decade2 How should they be different?  Bigger 
L EARNERS ’ D ICTIONARY Deny A. Kwary
Augmenting online dictionary entries with corpus data for Search Engine Optimisation Holger Hvelplund, 1 Adam Kilgarriff, 2 Vincent Lannoy, 1 Patrick White.
Corpus Creation for Lexicography Adam Kilgarriff, Michael Rundell Lexicography MasterClass, UK Elaine Ui Dhonnchadha ITE (Linguistics Institute of Ireland)
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
The Sketch Engine -What is The Sketch Engine? -What is a corpus? -Looking at the BASE and the BAWE corpora. -How can this help.
Today Listening test Corpus linguistics talk, Part 3 News task NEOs Life on Mars.
Talking about your homework News story? –What made you choose…? One of your words? –What made you choose…? (Give your vocabulary books to another student.
1 Corpora for the coming decade Adam Kilgarriff Lexical Computing Ltd.
Today Writing: using the comma –Writing task Corpus linguistics talk, Part 2 Re-organize groups –Group news discussion.
LESSON PLANNING PLAN THINKING SKILL MAP LOG PLANNING IS NOT SCRIPTING Prediction Content Teachers are REASONS FOR A CHANGE Anticipation Materials own Magic.
Simple Maths for Keywords Adam Kilgarriff Lexical Computing Ltd.
Labels: automation Adam Kilgarriff. Auckland 2012Kilgarriff / Labels: automation2 Which words are:  Most distinctive of business English?  Most often.
1 Evaluating word sketches Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Using Corpora for Teaching Chinese Dr. Adam Kilgarriff Lexical Computing Ltd Leeds University UK.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
GDEX: Automatically finding good dictionary examples in a corpus Adam Kilgarriff, Miloš Husák, Katy McAdam, Michael Rundell, Pavel Rychlý Lexical Computing.
1 Corpora, Dictionaries, and points in between in the age of the web Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of.
Works Cited Page. Overview: Your Works Cited page is where you will list all the articles/books/websites/etc you will use in your paper. If you decide.
Administrative Software Chapter 7 Teaching and Learning with Technology.
1 Corpora, Language Technology and Maltese Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd University of Sussex.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Mining the Web to Create Minority Language Corpora Rayid Ghani Accenture Technology Labs - Research Rosie Jones Carnegie Mellon University Dunja Mladenic.
Paul Mundy Readability. Counts  3800 words  113 paragraphs  150 sentences Averages  2 sentences/paragraph  24 words/sentence  5.4.
TALC Applying some Developments in Corpus Building Technology to Language Teaching and Learning TALC 2006 Paris.
1 CA202 Spreadsheet Application Publishing Information on the Web Lecture # 15 Dammam Community College.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
1 Evaluating word sketches and corpora Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities of Leeds and Sussex.
Corpus Evaluation Adam Kilgarriff Lexical Computing Ltd Corpus evaluationPortsmouth Nov
Malta, May 2010Kilgarriff: Corpora by Web Services1 Corpora by Web Services Adam Kilgarriff Lexical Computing Ltd Lexicography MasterClass Ltd Universities.
Copyright © 2010 – MICS 2010, Curt Hill Instructor Tools: Test Data Generation Curt Hill Valley City State University.
Auckland 2012Kilgarriff: NLP and Corpus Processing1 The contribution of NLP: corpus processing.
Intermediate 2 Computing Unit 2 - Software Development Topic 2 - Software Development Languages and Environments.
L ITERATURE REVIEW RESEARCH METHOD FOR ACADEMIC PROJECT I.
Sketch engine for Chinese Discussion notes. Wordsketch, subsequently Sketch Engine Was developed by Kilgarriff et al at Brighton Gives automatic, corpus-based.
GDEX: Automatically finding good dictionary examples in a corpus Auckland 2012Kilgarriff: GDEX1.
Exploring Variation in Lexis and Genre in the Sketch Engine Adam Kilgarriff Lexical Computing Ltd., UK Supported by EU Project PRESEMT.
GDEX: Automatically finding good dictionary examples in a corpus Kivik 2013Kilgarriff: GDEX1.
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
THE PROCESS OF WORDS BEING ENTERED IN A DICTIONARY WORD FORMATION IN ENGLISH Magdalena Soklevska April, 2016.
© NCSR, Frascati, July 18-19, 2002 CROSSMARC big picture Domain-specific Web sites Domain-specific Spidering Domain Ontology XHTML pages WEB Focused Crawling.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
A CASE STUDY OF GERMAN INTO ENGLISH BY MACHINE TRANSLATION: MOSES EVALUATED USING MOSES FOR MERE MORTALS. Roger Haycock 
Using language corpora in developing Arabic lessons & syllabuses
Automatic Writing Evaluation
Corpora: a key part of a materials writer’s toolkit
Differentiating Instruction Using Nettrekker
Setting up Categories, Grading Preferences and Entering Grades
Making useful wordlists for ELT
Evaluating word sketches and corpora
How to Learn English Mark Brierley.
Introduction to Corpus Linguistics: Applications Lexicography
Studying Humour Features - Bolla, Whelan
Corpora and Concordancers in ESL/EFL Class:
European Network of e-Lexicography
Administrative Software
Microsoft Excel 101.
COMP444 Human Computer Interaction Usability Engineering
CBA Assessments in Eduphoria Aware
University of Illinois System in HOO Text Correction Shared Task
The quality of choices determines the quantity of Key words
Corpora, Language Technology and Maltese
Computer Basics Applications.
7th Grade Computers.
Prime Time Simply the best From online corpora to word clouds
Presentation transcript:

GDEX: Automatically finding good dictionary examples in a corpus

Madrid 2010Kilgarriff: GDEX2 Users appreciate examples  Paper: space constraints  Electronic: no space constraints Give lots of examples Constraint: Cost of selection, editing

Madrid 2010Kilgarriff: GDEX3 Project  Macmillan English dictionary  Already had 1000 collocation boxes  Average 8 per box  New electronic version All 8000 collocations need examples  Authentic; from corpus

Madrid 2010Kilgarriff: GDEX4 Old method  Lexicographer Gets concordance for collocation Reads through until they find a good example Cut, paste, edit

Madrid 2010Kilgarriff: GDEX5 New method  Lexicographer Gets sorted concordance  20 best examples in spreadsheet Less reading through Tick the first good one, edit

Madrid 2010Kilgarriff: GDEX6 What makes a good example?  Readable EFL users  Informative Typical, for the collocation Gives context which helps user understand the target word/phrase

Madrid 2010Kilgarriff: GDEX7 Readability  70 years research  Not just (or mainly) EFL Educational theory  Teaching children to read Instruction manuals  Early work: US military Publishing  People like newspapers and magazines that they find easy to read

Madrid 2010Kilgarriff: GDEX8 Readability tests  Fleish Reading Ease test 1948 Ave sentence length, ave word length In some word processing software  Many similar measures  Recent work training data for different reading levels Language modelling  Target levels US grades Now, increasingly: Common European Framwork

Madrid 2010Kilgarriff: GDEX9 GDEX  Get concordance for collocation  For each sentence Score it Sort Show best ones to lexicographer

Madrid 2010Kilgarriff: GDEX10 GDEX heuristics  Sentence length (10-26 words) ‏  Mostly common words is good  Rare words are bad  Sentences Start with capital, end with one of.!?  No [, ],, http, \  Not much other punctuation, numbers  Not too many capitals  Typicality: third collocate is a plus

Madrid 2010Kilgarriff: GDEX11 Weighting  For each sentence Score on each heuristic Weight scores Add together weighted score  How to set weights?

Madrid 2010Kilgarriff: GDEX12 Machine learning  Two students: Manually judged 1000 “good examples” Weights set so that the system made the same choices as the students

Madrid 2010Kilgarriff: GDEX13 Was it successful?  Did it save lexicographer time? Definitely (says project manager) ‏  Corpus choice Started with BNC but  Too old  Not enough examples If no good examples in corpus, GDEX can’t help Changed to UKWaC  20 times bigger; from web; contemporary  Better  Most web junk filtered out  Usually a good example in top twenty

Madrid 2010Kilgarriff: GDEX14 GDEX and TALC  TALC (Teaching and Language Corpora) ‏  Goal: bring corpora into lg teaching  Usual problem Concordances are tough for learners to read  Way forward GDEX examples Half way between dictionary and corpus

Madrid 2010Kilgarriff: GDEX15 GDEX: Models for use  More examples for dictionaries Speed up, as with MED or Fully automatic “more examples”  Corpus query tool Sort concordances, best first Now an option in the Sketch Engine  Automatic collocations dictionary