Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme used to search this database
2 Considerations General English / Academic English / Specialised English (e.g. medical, law, 1K and 2K graded, UWW corpora on Compleat Lexical Tutor Written / Spoken? Size? Currency? Free of charge?
Corpus Size “I don’t think there can be any corpora, however large, that contain information about all of the areas of English….that I want to explore [but] every corpus that I’ve had a chance to examine, however small, has taught me facts that I couldn’t imagine finding out about in any other way.” (Fillmore, 1992, p. 35)
Use of Corpora Word lists and dictionary entries (different senses of a word / typical examples of usage / frequency information) are compiled by computational linguists using a corpus of the language. E.g. the COBUILD project was the first project using a computerised corpus for dictionary making in the 1980s, Collins started to use a computerised corpus (then called the COBUILD corpus) with John Sinclair of University of Birmingham; now the Collins Cobuild Corpus has 2.5 billion words (part of which is the Bank of English Corpus) 4
Major Corpora Matching exercise Matching exercise 5
6 Major corpus: BNC 100 million words Written (90%) and spoken (10%) samples British English from the 1980’s to 1993 General English
7 Major corpus: Brown corpus 1 million words American English One of the earliest corpora / compiled in 1960s 500 text samples from 15 text categories Searchable through Compleat Lexical Tutor at d_e.html d_e.html
8 Major corpus: Bank of English Part of Collins Cobuild Corpus 450 million words as of 2005 (650 million words as of 2012) 75% written and 25% spoken 70% British, 20% American and 10% others Contemporary English html html
Major Corpus: The Corpus of Contemporary American English (COCA) Contemporary American English containing about 450 million words (from 1990 to 2011) five genres: spoken, fiction, popular magazines, newspapers, and academic journals five genres: spoken, fiction, popular magazines, newspapers, and academic journals 9
10 Major corpus: MICASE Michigan Corpus of Academic Spoken English Michigan Corpus of Academic Spoken English started in 1997 started in 1997 contains transcripts and audio files of academic speech contains transcripts and audio files of academic speech
Some user-friendly concordancers Word Neighbors (developed by University of Science and Technology) COCA (needs registration) Create your own concordance using tools provided by CAES, HKU: 11
The use of chemicals in food has started concern in the public …..Do we say “start concern”? The use of chemicals in food has started concern in the public …..Do we say “start concern”?
The use of chemicals in food has caused great concern among the public. The public have expressed deep concern about the use of chemicals in food. The use of chemicals in food has started concern in the public …..Do we say “start concern”?
Tasks - answers The public have expressed concern about … / … are of great concern to the public Improve / increase / promote efficency Substitute for
COCA Corpus What are the differences between “ardent” and “fervent”? Can they be used interchangeably? What are the differences between “sheer” and “complete”? Can they be used interchangeably? 19
Create your own concordance using tools provided by CAES, HKU htm htm 20
How can corpora be used in the classroom? Student A – part 2 of the talk Student B – part 3 of the talk
Getting students to use a corpus in the classroom Which 3 nouns come most frequently after “underlying”? Then, compare your results with examples from a dictionary. How to use the phrase “not only … but (also) …” 22
Answers Word Neighbours: Underlying cause/s Underlying assumptions Underlying principle Cambridge Dictionary Online: Underlying significance Word Neighbours: Not only (verb) but also (verb) Not only (noun) but also (noun) Not only (adjective) but also (adjective) Not only (prep + noun) but also (prep + noun) 23
How can concordancers be used to facilitate vocabulary learning/teaching? See which words are low-frequency words (off-list words using Vocab Profiler) to see which words are likely to cause difficulty (can pre-teach these words), and see whether a text is likely to cause difficulty to students Study words in context and increase depth of processing Check grammatical behaviour of words e.g. what prepositions to use after a verb Check collocations and lexical patterns Find out about the frequencies of words / word combinations Find out about usage of a word in different text types (e.g. fiction vs academic / spoken vs written), e.g. by using “Range” on Compleat Lexical Tutor
VOCABULARY ASSESSMENT 25
Vocabulary Assessment Tools What kind of vocabulary knowledge is being tested in each of the tests? Do you see any problems with some of the tests? 26
27 Various vocabulary assessment tools (available at Vocabulary Levels Tests (VLTs) To check vocabulary size at different word frequency levels – both receptive and productive 2000, 3000, 5000, word levels; AWL Aim at score of at least 80% Word Association Test Meaning (different senses of a word), collocations Vocabulary Knowledge Scale (VKS) To check “quality” or “depth” of vocab knowledge Vocab Profiler Lexical richness (type/token ratio) – more different words More frequent words or more low-frequency words being used
Vocabulary Knowledge Scale (VKS) “retire” iii. I have seen this word before and I think it means “stop working because of old age” (3 pts) iv. I know this word. It means “stop working because of old age” (3 pts) v. I can use this word in a sentence: He spent more time with his family after retire. (4 pts) He spent more time with his family after he retired. (5 pts) He wants to retire. (? pts) 28
29 VKS Problems: Self-reported in nature Level V: ability to produce sentence with target vocab = ability to use the word appropriately?
Preparation for next class Make a plan for your assignment For discussion next week 30