Computer Corpora and What They Can Tell Us about How People Use Language 情報科学入門 26 July 2012.

Slides:



Advertisements
Similar presentations
Introduction to phrases & clauses
Advertisements

Ana Bertha Camargo Mejía
Memory Strategy – Using Mental Images
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Part of Speech PowerPoint Presentation
Language Learning Targets based on CLIMB standards.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
How Can Corpora Help Me To Be Successful in CO150?
What is a M.C. Cloze? Section C – Reading and Language System.
Unit 8 LANGUAGE FOCUS. Content  Word study  Word used in Computing and Telephoning  Grammar  Pronoun  Indirect speech with conditional sentences.
PROJECT EDITING 8th grade Project. WRITING CHECKLIST 8th grade Project.
IELTS Intensive Writing part two. IELTS Writing Two parts of ielts writing Part one writing about a Graph, chart, diagram Part two is an essay.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
The Eight Parts of Speech Yes!! Awesome!! Finally!! English is so much fun!!
Making trouble-free corpus tasks in 10 minutes Jennie Wright.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Post to Profile “SHARE YOUR NEWS WITH YOUR FACEBOOK FRIENDS.“ reported statements and questions intermediate level 55 SLIDES, ANIMATED, WITH COMPUTER.
Reported Speech (Junior Secondary) Poon, Ka Chun Jason; Lee, Fung King Jackie The Education University of Hong Kong 1.
Unit 6 Parts 1 and 2 Reported Speech REPORTED SPEECH There are two ways of telling someone what someone else said. We may choose to repeat their actual.
Using language corpora in developing Arabic lessons & syllabuses
Indian Community Languages Schools Parents and Teachers Conference July 2017.
PERSUASIVE ESSAY BLOOPERS!
QUESTIONS & NEGATIVES.
Unit 7 The Birthday Party Topic 3 Everyone had a good time.
SPAG What we need to know….
Grammar 1.
REPORTED SPEECH.
English Week 20 Day 1.
Beginnings of language: Words to Sentences
Subject-Verb Agreement
Subject Pronouns A subject pronoun takes the place of a noun or nouns in the subject of a sentence. Singular Subject Pronouns: I, you, he, she, it Plural.
Noun Clauses Chapter 12.
Day 3 – Honors Prepositions and Annotations.
Searching corpora.
Project editing IInd grade Project.
Argumentative Research
Monday February 28.
Exploring the BNC Corpus
Simple Present vs. Present Continuous
Simple Present vs. Present Continuous
Taking notes when listening
Corpus Linguistics I ENG 617
Grammar in Context 2 Chapter 7
Telegraphic speech: two- and three-word utterances
SAT GRAMMAR.
Word Classes and Linguistic Terms
Improving Written Communication: “To Do” Verb Phrase Problems
Corpora and Concordancers in ESL/EFL Class:
Day 4 – Prepositional Phrases and The Time Factor
NOUNS person, place, thing, or idea
PRONOUN CASE NINTH GRADE ENGLISH.
By: Mrs. Smith St. Mary’s Middle School English
English II January 9, 2018 As you come in, please get a chromebook from the cart – please get the number assigned to you. Some of you have new numbers.
Grammar and Vocabulary Development
Grammar and Vocabulary Development
Simple Present vs. Present Continuous
Core Concepts Lecture 1 Lexical Frequency.
Subject-Verb Agreement
Business English January 9, 2018
English parts of speech
Blackboard Tutorial (Student)
Haresfield C of E Primary School
FCE (FIRST CERTIFICATE IN ENGLISH) General information.
RIDDLE You see me quite often, But don’t really care. If you pass by me, You’ll often stop and stare. I can’t speak or see, But don’t think me uncouth,
Grammar overview Aims to reflect on the importance of language awareness for teachers of English as a foreign language raised our own awareness of English.
Language Arts: Tuesday I.N. 59
Unit 1: Skills By Mohammad Farran.
UNIT 1 1st ESO AROUND THE WORLD.
BYU COCA: CORPUS OF CONTEMPORARY AMERICAN ENGLISH
DESCRIÇÃO E ANÁLISE MORFOSSINTÁTICA DO INGLÊS
Presentation transcript:

Computer Corpora and What They Can Tell Us about How People Use Language 情報科学入門 26 July 2012

“Corpus”? Latin “corpus” = body. Latin “corpora” = bodies. English “corpus” = collection of texts English “corpora” = collections of texts Japanese “コーパス” = 文書などの集大成

What is a computer corpus? A corpus is a collection of texts stored on a computer. Books, magazines, letters, Internet pages, e-mails, or parts of these. Or transcriptions of speeches, phone calls, or radio programs. Often stored as a single file in simple text format.

How big is a computer corpus? It can be very big or very small. The biggest (e.g. the British National Corpus and the Corpus of Contemporary American English) have many millions of words. A small corpus might have only a few hundred words.

Benefits of computer corpora In what way do you think computer corpora might be useful? Any ideas?

What are computer corpora for? We can use corpora to study language. What are the most common words? What words are used together? What words of a particular type are used together (e.g., under + NOUN)? If we compare two corpora (e.g. e-mail and textbooks), is a word more common in one? How do people use words in sentences?

Computer corpora and dictionaries All major English dictionaries are now based on computer corpora. How common is a word? How many different meanings does it have? What are some examples of its use? Is it used in a good or bad sense? What grammatical patterns is it used with? What other words is it used with?

Word frequencies What do you think are the most common words in English? Make a list of about five words.

The most common English words (http://www.world-english.org) Of To And A In Is It You That

In various situations

Concordances One of the most common ways to study computer corpora is to use a concordance. A concordance finds all the instances of a word or phrase in a corpus. It presents a list of the instances, often with the search word in the middle of the screen.

Example of a concordance list

What does this tell us? In the words before forget, there are many examples of negative words: not, won’t, don’t, couldn’t, shouldn’t, never, nobody many contractions: won’t, don’t, you’ll, couldn’t, shouldn’t, you’d several examples of to

What does this tell us? In the words after forget, there are several examples of to several examples of –ing several examples of what and that several examples of the several examples of he, she, you, it, and we Notice also that forget usually comes in the middle of a sentence, not at the beginning or end.

Open a concordance on your PC Go to http://corpus.byu.edu/coca/. This site allows you to access the Corpus of Contemporary American English (COCA). The largest free corpus in the world: 425 million words, 5 types of text Spoken Fiction Magazine Newspaper Academic

Display At the top left, you will see under Display: List: Shows a list of words in the right column Chart: Shows two charts in the right column Types of text (spoken, fiction, magazine, etc.) Time (1990-1994, 1995-1999, etc.) KWIC (Key Words in Context) Shows nouns, verbs, etc. around the search string Compare: Shows results for two words

Search String Under Search String, you will see: Word: Type a word (e.g. head). Collocates: Type a word used near head. The two boxes next to Collocates show Maximum number of words before head Maximum number of words after head POS (Part Of Speech): Select a part of speech (e.g., noun, verb, etc.) used near head. Random: This chooses a random search string. Search: Click this to begin your search Reset: Clear the left column

Sections Show: Check this box to show charts for Type of text (Spoken, Magazine, etc.) Time 1: Choose a type of text for the search string Ignore (= all types) Spoken Magazine Newspaper Academic 2: If you are comparing two search strings, choose the type of text for the second string.

Search syntax To find two words: To find the neighboring word: To find “good luck”, type “good luck” in Word(s). To find the neighboring word: To find what word comes after “dog”, type “dog *”. To find what word comes before “dog”, type “* dog”. To find two words with 1–4 words between: Word(s): dog Collocate: bark  “dog bark”, “dog will bark”, “dog will often bark”, “dog will not always bark”, “dog will in no situation bark”. 5

Query syntax (2) To find different forms of a word: Word(s): [blow] away  “blow away”, “blows away”, “blew away”, “blowing away”, “blown away” To find all the words that begin the same way: Word(s): comp*  “compare”, “compute”, “computer”, “compiler”, “comply”, etc. To find all of a set of words: Word(s): cut|cuts|cutting  “cut”, “cuts”, “cutting”.

Try the COCA concordance In the top right corner, type My e-mail address. Our group password. In the Word(s) box, type “playing”. Click on “Search” In the top right column, click on “PLAYING”. What topics are most of the examples about?

COCA concordance for “playing”

Findings for “playing” Acting Playing Santa Claus, playing the mother Sports Playing basketball, left the playing field Other games Playing chess, playing the video game Music Quiet music is playing, playing the guitar 遊んだ Playing with him on the school’s grounds

Word frequency At the top right, under TOT, you see “58676”. The corpus contains 58676 examples of playing. Under Display, select CHART. Click the Search button. The right column shows the frequency of playing in different types of text. In which type is it most common? Why? You can also see the frequency for 5-year periods. In which period was it most common?

Try a two-word search Click the Reset button. In the Word(s) box, type “* friend of *”. Click on “Search”. Notice the words before and after “friend of”. What did you find?

Collocations for “friend+of”

Findings for “friend of” Before “a” “good” “close” “old” After “mine” “the” “his” “hers” “ours” “theirs”

Two words with an optional gap Click the Reset button. In the Word(s) box, type “a”. Click on Collocates. In the Collocates box, type “teacher” . Click “Search”. In the top right column, click on “TEACHER” Notice the words between “a” and “teacher”. What did you find? 5

Concordance for “a . . . teacher”

Findings for “a . . . teacher” “a technology teacher” “a high school teacher” “a new head teacher” “a 29 year old teacher” “a German-language teacher” “a preschool teacher” “a highly qualified teacher” “a French teacher”, etc.

Word + Part of Speech (POS) You can also search for a word with a POS. E.g., made me + VERB (動詞) Click on the POS button in the left column. noun.ALL: all common nouns (名詞) verb.ALL: all verbs (動詞) adj.ALL: all adjectives (形容詞) adv.ALL: all adverbs (副詞) neg.ALL all instances of “not”, “n’t” art.ALL all articles (“a”, “an”, “the”) det.ALL all determiners (“this”, “these”, etc.) pron.ALL all pronouns (代名詞) poss.ALL all possessive pronouns (“my”, “your”, etc.) prep.ALL all prepositions (前置詞) conj.ALL all conjunctions (接続詞) noun.ALL+ all common and proper nouns (名詞) noun.SG: singular noun (単数の名詞) noun.PL: plural noun (複数の名詞) noun.CMN common noun (普通名詞) noun.+PROP proper nouns (固有名詞) verb.BASE base form of verb (“know”, “think”, etc.) verb.INF infinitive form of verb (“be”, “have”, etc.) verb.MODAL modal form of verb (“may”, “might”, etc.) verb.3SG 3rd person singular verb (“has”, “goes”, etc.) verb.ED past tense verb (“went”, “played”, etc.) verb.ING “ing” form of verb (“going”, “playing”, etc.) PUNC all punctuation marks (. , ; : ! ? - etc.)

Search for a word + POS In the left column, click “Reset” In the Word(s) box, type “made me”. Click “POS” In the POS box, type VERB(ALL) Click “Search”. Notice the words after “me”. What did you find?

Concordance for “made me VERB”

Findings for “made me” All the words after “me” were bare infinitives. The most common verb was “want” (328). There were many “thinking” verbs, e.g., “realize”, “see”, “believe”, “think”, “understand”. There were also some “action” verbs, e.g., “do”, “look”, “take”, “get”.

Inflected forms Click “Reset” In the Word(s) box, type “I wish I [be]”. Click “Search”. Notice the word after “I wish I”. What did you find?

Concordance for “I wish I [be]”

Findings for “I wish I” “I wish I was” (224 cases) “I wish I were” (205 cases) Grammatically, “I wish I were” is correct. Native English speakers do not always use English “correctly”.

Pre-lecture quiz What answers did you get? happy ______ What a _______ I haven’t a _______ as good as ______ ______ the winter I’ve _____ arrived Don’t be a ______ a ______ breakfast He didn’t take any _______ She ______ her head

Answers to the pre-lecture quiz happy [to, with, and, about, birthday] What a [great, lot, good, wonderful, difference] I haven’t a [clue, thing, single, choice] as good as[the, it, they, any, a, I, you] [in, during, for, of, through] the winter I’ve [just, now, finally, always, already] arrived Don’t be a [fool, stranger, hero, jerk, baby] A [big, good, hearty, late, quick] breakfast He didn’t take any [questions, of, shit, precautions] She [shook, shakes, turned, tilted] her head

Summary We can learn a lot about language from computer corpora. In particular, concordances can show us how people really use language in practice. Concordances are useful for students of English To check how vocabulary is used. To check grammatical constructions.

Some other online concordances Michigan Corpus of Academic Spoken English (MICASE) http://quod.lib.umich.edu/m/micase/ Web Concordancer (English) http://vlc.polyu.edu.hk/concordance/WWWConcappE.htm Corpus Concordance English http://www.lextutor.ca/concordancers/concord_e.html

Post-lecture quiz Please complete the quiz paper I gave you today. Submit it to the 講師室 by 5:30 p.m. Wednesday evening. If you don’t submit it, you will not get any points for attending this lecture. That’s it, folks!