Using Corpora to Teach Vocabulary Helping Students Help Themselves 1
What are Corpora? Large free computerized databases of natural language Corpus of Contemporary American English (COCA) MICASE (Michigan Corpus of Academic Spoken English MICUSP (Michigan Corpus of Upper-Level Student Papers) British National Corpus 2
Corpus Linguistics = Methodology Bennett (2010) – Corpus-influenced materials Textbooks, materials based on frequency & patterns – Corpus-cited texts Dictionaries (Collins COBUILD) Grammar books (Real Grammar: A Corpus-Based Approach to English) – Corpus-designed materials Learner or teacher-created using a corpus
CORPUS LEARNING 101 Pre-made Materials
Vocabulary Based on Corpus Studies Frequency Lists West’s General Service List (first ~2000 most frequent words) Academic Word List (570 word families; 3000 words ) LexTutor’s VocabProfiler Insert your own texts to assess vocabulary level
West’s General Service List 1 the 2 be 3 of 4 and 5 a 6 to 7 in 8 he 9 have 10 it 11 that 12 for 13 they 14 I 15 with 17 not 18 on 19 she 20 at 21 by 22 this 23 we 24 you 25 do 26 but 27 from 28 or 29 which 30 one 31 would
AWL abandon abstract academy access accommodate accompany accumulate accurate achieve acknowledge acquire adapt
AWL Analyse – head word analysers analyses analysing analysis – most common analyst Analysts analytic analytical analytically analyze analyzed analyzes analyzing
General English
VocabProfiler Why? Materials development Check vocabulary levels of webpages Decide on vocabulary to focus on How? Create a.txt document In Word (save as, then select.txt) Copy the text Paste the text into the VocabProfile site Double click on proper nouns to exclude Click Submit
MS Office Shortcuts Ctrl + A select all Ctrl + Ccopy Ctrl + Vpaste Ctrl + Xcut Ctrl + Zundo
VocabProfiler
USING A CORPUS TO TEACH VOCABULARY Data-Driven Learning
Knowing a Word (Nation, 2001) Metalinguistic awareness = dictionary definition + spelling morphology part of speech pronunciation variant meanings collocations specific uses register
Data Driven Learning (Johns, 1991) Learners become “language detectives” Johns, 1991 Authentic examples & encourages “noticing” or “awareness-raising” Romer, 2008
Using a Corpus Pros Natural Language Practice analytical skills/verify choices Creates self-sufficient learners Contexts rich, varied Focus on accuracy Cons Significant teacher training needed Few ready-made exercises and challenging to design Lexical information vast/confusing Contexts incomplete No focus on fluency 19
DATA-DRIVEN LEARNING: THE CORPUS OF CONTEMPORARY AMERICAN ENGLISH
COCA 450 million words 20 million words added yearly ( ) 90 million spoken words Academic and general Spoken Fiction Magazines Newspapers Academics 21
Academic Genres Education Geography/Social Science Law/Philosophy Humanities Philosophy/Religion Science/Technology Medicine Miscellaneous 22
Training Yourself to Use the COCA
Brief Five-Minute Tour
Class Use Sign up for group access at least 2 days prior to use – Notice the group limits – One active request at a time – Four hour limit – Teacher must be a registered user
COCA Search Screen
COCA Corpus Search
Parts of Speech with KWIC (Key Words in Context) They certainly will not grow as learners without opportunity to analyze their strengths and weaknesses.
Language Development KWIC search – Parts of speech color coded Students code nearby words Student code 100 word sample
Language Development Frequency searches (easiest) Reading fluency – Should you memorize dawdle, meander, or drift?
Phrasal Verb Frequencies Intermediate Class – Explain what phrasal verbs are with examples (mess around, use up, call on, wrap up) – Use COCA to find sample sentences
High beginning writing class – Check spelling and non-English words on 30- minute timed writing – Students look for words that might be misspelled Use COCA If frequency below 10, circle the word (e.g., speciel)
COCA for Morphology Transport – transportation – transported – transports
Wildcard* Searches Circle the word not related in meaning clar**note clarifyconnote clarinetdenote claritykeynote clark
What are Concordancers? Computer programs used to analyze text LexTutor VocabProfiler AntConc Create specialized corpora for ESP classes
Websites of Interest ELT Resource Training Wiki (with Amber Warren) AWL VocabProfiler Grimm’s Fairy Tales in.txt
Contact Information Debra S. Lee Vanderbilt University English Language Center Twitter: dleetn Google+: dleetn