BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH Workshop Purdue University November 2015
Agenda Essential background: COCA, other BYU corpora, basics of the interface Search functions: information & practice Search syntax: information & practice Results analysis Activities (Possibly: Pedagogical uses)
COCA: Overview (1 & 2) “The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. “ (COCA website) Corpus: a database of texts that you can query Text types (registers) in COCA: spoken, fiction, popular magazines, newspapers, and academic (page 2) Timeframe of COCA collection: 1990-2012 Let’s look at text types here
COCA and other corpora (3) Wikipedia Corpus Global Web-based English (the power to compare across dialects, e.g. US/UK) Corpus of Historical American English (CoHA) ( texts from 1810- 2000) Time Magazine British National Corpus (BNC) It is important to underline: - These corpora share the interface and fucntions. Once you become fluent in using one corpus you will be able to use another one. Question: What might a researcher who is looking up of the same words and phrases in: Wikipedia and Globwe COCA and BNC CoHA & COCA be looking for exactly?
COCA Interface: Welcome Screen Interface consists of 3 active & independent frames
COCA Interface: Results Display
COCA Interface: How to search? Display: List, Chart, KWIC, Compare Search String (clicking on the word “collocates” turns off and on the function; the same with POS) Sections: Registers (Spoken, Fiction, Magazine, Newspaper, Academic) Time of publication Subregisters: MAG: Sci/Tech; FIC:Juvenile Click and scroll time (click on Collocates, POS List, Section Scroll)
Corpus: What to search for? Cheat Sheet words mysterious phrases nooks and crannies or faint + noun lemmas all forms of words, like sing or tall wildcards un*ly or r?n* complex searches such as un-X-ed adjectives or verb + any word + a form of ground.
COCA Interface: What are tags? phrases faint + noun faint [nn*] Tags can be easily checked in the POS list Add a space between the word and the tag TAGS system: CLAW 7 Tags are ascribed by the automated tagger (there are some that are wrong, but it is a small margin) Let’s check the tags for singular nouns wh- adverbs (who, when, where, how)
Activities time!
Activity 3 FREQ: tokens Per milion: shows proportion of tokens in the corpus
Activity 4.
Activity 5. Collocates delimiting function. = Search any (*) noun collocates of the word laugh (in the role of a noun) 5 spaces before or after the word laugh. “Crystal threw back her head and laughed, a throaty little laugh of sheer exuberance with a sort of purr in it. In a moment he” LEFT node RIGHT node and laughed a throaty little laugh of sheer exuberance with 5 4 3 2 1
Activity 6. KWIC: looking at research prepositions.
Pedagogical applications of corpora: Words and Phrase Analysis http://www.wordandphrase.info
THANK YOU!