Download presentation
Presentation is loading. Please wait.
Published byAugust Darcy Stewart Modified over 5 years ago
1
BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH
Workshop Purdue University November 2015
2
Agenda Essential background: COCA, other BYU corpora, basics of the interface Search functions: information & practice Search syntax: information & practice Results analysis Activities (Possibly: Pedagogical uses)
3
COCA: Overview (1 & 2) “The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. “ (COCA website) Corpus: a database of texts that you can query Text types (registers) in COCA: spoken, fiction, popular magazines, newspapers, and academic (page 2) Timeframe of COCA collection: Let’s look at text types here
4
COCA and other corpora (3)
Wikipedia Corpus Global Web-based English (the power to compare across dialects, e.g. US/UK) Corpus of Historical American English (CoHA) ( texts from ) Time Magazine British National Corpus (BNC) It is important to underline: - These corpora share the interface and fucntions. Once you become fluent in using one corpus you will be able to use another one. Question: What might a researcher who is looking up of the same words and phrases in: Wikipedia and Globwe COCA and BNC CoHA & COCA be looking for exactly?
5
COCA Interface: Welcome Screen
Interface consists of 3 active & independent frames
6
COCA Interface: Results Display
7
COCA Interface: How to search?
Display: List, Chart, KWIC, Compare Search String (clicking on the word “collocates” turns off and on the function; the same with POS) Sections: Registers (Spoken, Fiction, Magazine, Newspaper, Academic) Time of publication Subregisters: MAG: Sci/Tech; FIC:Juvenile Click and scroll time (click on Collocates, POS List, Section Scroll)
8
Corpus: What to search for?
Cheat Sheet words mysterious phrases nooks and crannies or faint + noun lemmas all forms of words, like sing or tall wildcards un*ly or r?n* complex searches such as un-X-ed adjectives or verb + any word + a form of ground.
9
COCA Interface: What are tags?
phrases faint + noun faint [nn*] Tags can be easily checked in the POS list Add a space between the word and the tag TAGS system: CLAW 7 Tags are ascribed by the automated tagger (there are some that are wrong, but it is a small margin) Let’s check the tags for singular nouns wh- adverbs (who, when, where, how)
10
Activities time!
11
Activity 3 FREQ: tokens Per milion: shows proportion of tokens in the corpus
12
Activity 4.
13
Activity 5. Collocates delimiting function.
= Search any (*) noun collocates of the word laugh (in the role of a noun) 5 spaces before or after the word laugh. “Crystal threw back her head and laughed, a throaty little laugh of sheer exuberance with a sort of purr in it. In a moment he” LEFT node RIGHT node and laughed a throaty little laugh of sheer exuberance with 5 4 3 2 1
14
Activity 6. KWIC: looking at research prepositions.
15
Pedagogical applications of corpora:
Words and Phrase Analysis
16
THANK YOU!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.