Making useful wordlists for ELT Topical vocabulary from the WWW Simon Smith & Scott Sommers Ming Chuan University, Taipei Adam Kilgarriff, Lexical Computing Ltd, UK Generous support from National Science Council, Taiwan
Outline Importance of learning natural English Wordlists in English learning Making relevant wordlists Using two corpus analysis tools WebBootCat Sketch Engine Conclusions and future plans
The problem Learning non-authentic English It’s raining cats and dogs! Long time no see! In Taiwan, all students learn these They may believe they are authentic But English speakers hardly use them!
Word and phrase lists Students must learn vocabulary It is best to learn vocabulary through practice: Reading Speaking to American people Interacting in the language That is difficult for Asian students In Taiwan, students must learn vocabulary from lists
From the MOE 6000 word high school list Probably useful for policy makers May be useful for teachers Not useful for learners Better to organize wordlists by topic?
So, we should teach vocabulary by topic? Khmer learning Game © North Illinois University
From the ELC textbook Unit 1 Getting started at University Nouns attendance course facilities helmet initiative major vendor Verbs accomplish consider improve tease Adjectives challenging fortunate impatient occasional protective It is not easy to make up a good vocabulary list for an abstract topic Try these topics: Unit 1: Getting started at University Unit 2: Family and Hometown Unit 3: English and You Please Choose a topic Write down some good keywords Better use computer to help us!
Getting wordlists from the web
WebBootCat: making corpora from the web User chooses some seed words For example freshman and university WebBootCat searches Yahoo for seed words throws away lists of numbers, HTML, prices lists… puts all running text into a corpus tags the corpus (noun, verb etc) if required
WebBootCat passes query to Yahoo! User enters seed words WebBootCat passes query to Yahoo! 12345 56789 $$$$$ £££££ *&%^ WebBootCat throws away non-data web pages WebBootCat puts text pages in corpus
Now, we can use Sketch Engine software to make a concordance If I write notes, will they appear???
Or, we can make a wordlist, using WebBootCat
Now, we can bootstrap a new wordlist Now, we can bootstrap a new wordlist. We use the first wordlist as seed words for the second one.
Now, let’s make a list of multi-word terms.
Advantages of automatic wordlist creation contain relevant, topical vocabulary created easily and conveniently of course, we can select the words manually, from the automatic list!
Disadvantages of manual wordlist creation It is difficult to get inspiration to make good wordlists manually. Manual wordlists may include rare or unnecessary vocabulary.
Future work: Automatic cloze exercise generation Q: It’s a ___ day today! Choose: (a) toasty (b) tepid (c) lukewarm (d) sunny
Summary: making wordlists choose a topic get a topic corpus from the web extract topic wordlist from it Use recursive bootstrapping to extend the wordlist include multi-word terms in the wordlist
Thank you www.sketchengine.co.uk