Presentation is loading. Please wait.

Presentation is loading. Please wait.

BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH

Similar presentations


Presentation on theme: "BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH"— Presentation transcript:

1 BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH
Workshop Purdue University November 2015

2 Agenda Essential background: COCA, other BYU corpora, basics of the interface Search functions: information & practice Search syntax: information & practice Results analysis Activities (Possibly: Pedagogical uses)

3 COCA: Overview (1 & 2) “The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. “ (COCA website) Corpus: a database of texts that you can query Text types (registers) in COCA: spoken, fiction, popular magazines, newspapers, and academic (page 2) Timeframe of COCA collection: Let’s look at text types here

4 COCA and other corpora (3)
Wikipedia Corpus Global Web-based English (the power to compare across dialects, e.g. US/UK) Corpus of Historical American English (CoHA) ( texts from ) Time Magazine British National Corpus (BNC) It is important to underline: - These corpora share the interface and fucntions. Once you become fluent in using one corpus you will be able to use another one. Question: What might a researcher who is looking up of the same words and phrases in: Wikipedia and Globwe COCA and BNC CoHA & COCA be looking for exactly?

5 COCA Interface: Welcome Screen
Interface consists of 3 active & independent frames

6 COCA Interface: Results Display

7 COCA Interface: How to search?
Display: List, Chart, KWIC, Compare Search String (clicking on the word “collocates” turns off and on the function; the same with POS) Sections: Registers (Spoken, Fiction, Magazine, Newspaper, Academic) Time of publication Subregisters: MAG: Sci/Tech; FIC:Juvenile Click and scroll time (click on Collocates, POS List, Section Scroll)

8 Corpus: What to search for?
Cheat Sheet words mysterious phrases nooks and crannies or faint + noun lemmas all forms of words, like sing or tall wildcards un*ly or r?n* complex searches such as un-X-ed adjectives or verb + any word + a form of ground.

9 COCA Interface: What are tags?
phrases faint + noun faint [nn*] Tags can be easily checked in the POS list Add a space between the word and the tag TAGS system: CLAW 7 Tags are ascribed by the automated tagger (there are some that are wrong, but it is a small margin) Let’s check the tags for singular nouns wh- adverbs (who, when, where, how)

10 Activities time!

11 Activity 3 FREQ: tokens Per milion: shows proportion of tokens in the corpus

12 Activity 4.

13 Activity 5. Collocates delimiting function.
= Search any (*) noun collocates of the word laugh (in the role of a noun) 5 spaces before or after the word laugh. “Crystal threw back her head and laughed, a throaty little laugh of sheer exuberance with a sort of purr in it. In a moment he” LEFT node RIGHT node and laughed a throaty little laugh of sheer exuberance with 5 4 3 2 1

14 Activity 6. KWIC: looking at research prepositions.

15 Pedagogical applications of corpora:
Words and Phrase Analysis

16 THANK YOU!


Download ppt "BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH"

Similar presentations


Ads by Google