Corpus search What are the most common words in English

Slides:



Advertisements
Similar presentations
ENG 626 CORPUS APPROACHES TO LANGUAGE STUDIES language teaching (1) Bambang Kaswanti Purwo
Advertisements

Uses of a Corpus “[E]xplore actual patterns of language use”
Using Corpus Tools in Discourse Analysis Discourse and Pragmatics Week 12.
Introduction: A discourse perspective on grammar
Verbs Longman Student Grammar of Spoken and Written English Biber; Conrad; Leech (2009, p ) Verbs provide the focal point of the clause. The main.
Stress.
LINGUA 1 Mock exam. Change and variation in English What is Old English and what are its most important characteristics? (about 100 words)
Chapter 4 Basics of English Grammar
Used in place of a noun pronoun.
Ian Cushing English teacher, Surbiton High School UK Linguistics Olympiad Committee Education Committee, Linguistics Association of Great Britain Grammar.
Word Order Choices Chapter 12
What is a corpus?* A corpus is defined in terms of  form  purpose The word corpus is used to describe a collection of examples of language collected.
Corpus 06 Discourse Characteristics. Reasons why discourse studies are not corpus-based: 1. Many discourse features cannot be identified automatically.
1/23 LELA Lecture 2 Corpus-based research in Linguistics See esp. Meyer pp
Corpus 05 Grammar. Unlike lexicography, grammar does not have a long tradition of empirical study. Prescriptive vs descriptive: traditionally, grammatical.
Presented by Jennifer Robison TexTESOL II March 12, 2010 San Antonio, TX.
Corpus Linguistics What can a corpus tell us ? Levels of information range from simple word lists to catalogues of complex grammatical structures and.
Instructions for using this template. Remember this is Jeopardy, so where I have written “Answer” this is the prompt the students will see, and where.
Corpus Linguistics Case study 2 Grammatical studies based on morphemes or words. G Kennedy (1998) An introduction to corpus linguistics, London: Longman,
Chapter 3: An Introduction to Corpus Linguistics Compiled by: Sajjad Ghadamyari Farhad Ghiasvand Presentation Date: Dec. 8, Monday.
Chapter 2 Words and word classes.
Memory Strategy – Using Mental Images
GRAMMAR APPROACH By: Katherine Marzán Concepción EDUC 413 Prof. Evelyn Lugo.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Daily Grammar Practice
Online Corpora in L2 Writing Class Zawan Al Bulushi Indiana University Bloomington November 15,
Lecture 6 Verb and verb phrase
Researching language with computers Paul Thompson.
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 12.
Why We Need Corpora and the Sketch Engine Adam Kilgarriff Lexical Computing Ltd, UK Universities of Leeds and Sussex.
CRESST ONR/NETC Meetings, July 2003, v1 ONR Advanced Distributed Learning Linguistic Modification of Test Items Jamal Abedi University of California,
The Eight Parts of Speech Establishing a common grammar vocabulary.
Corpora and Concordancers in ESL/EFL Class: Truly Authentic Language for Language Learning. and opening.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Words Introduction to English Language Phrases Professor Sabine Mendes Moura.
C HAPTER 11 Grammar Fundamentals. T HE P ARTS OF S PEECH AND T HEIR F UNCTIONS Nouns name people, places things, qualities, or conditions Subject of a.
GoBack definitions Level 1 Parts of Speech GoBack is a memorization game; the teacher asks students definitions, and when someone misses one, you go back.
New Englishes. Global English  ‘[…] the English language ceased to be the sole possession of the English some time ago’ (Rushdie, 1991)  Loss of ownership.
Register Analysis. Registers we use Think of all of the reading, writing, listening, and speaking you have done in the past week.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
VOCABULARY BUILDING ONE. WORDS ARE A GROUP OF LETTERS WHICH FORM A MEANING.
Differences between Spoken and Written Discourse
Category 2 Category 6 Category 3.
Differences between Spoken and Written Discourse Source: Paltridge, p.p
Text type variation: Biber’s approach Andrew Hardie LING306.
Approaches to teaching English The differences between EAP and General EFL Louis Rogers.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
A Linguistic Toolkit Grammar Chapter 7: What is grammar? Chapter 8: Clause by Clause Chapter 9: Verb phrases: what’s going on?
PRIMENJENA LINGVISTIKA I NASTAVA JEZIKA II 3 rd class.
Syntax Parts of Speech and Parts of the Sentence.
Adapted from Kaplan SAT Premier 2017 Chapter 23
Collecting Written Data
Vocabulary Module 2 Activity 5.
Parts of Speech Review.
1. Review of last Friday (Form, Function, Fluency)
CORPUS LINGUISTICS Corpus linguistics is the study of language as expressed in samples (corpora) or "real world" text. An approach to derive at a set of.
Revision Outcome 1, Unit 1 The Nature and Functions of Language
Exploring the BNC Corpus
عمادة التعلم الإلكتروني والتعليم عن بعد
Introduction to Corpus Linguistics: Exploring Collocation
ALL ABOUT VERBS GRAMMAR SUMMARY.
ENGLISH LANGUAGE COURSE
Introduction to Corpus Linguistics: Colligation
Chapter 4 Basics of English Grammar
Translation Problems.
Core Concepts Lecture 1 Lexical Frequency.
PREPOSITIONAL PHRASES
Week 3 Warm-Ups English 12 Mrs. Fountain.
Chapter 4 Basics of English Grammar
Presentation transcript:

Corpus search What are the most common words in English What are the most common verbs What is the most common pronoun What is the most common proper noun What are the most common noun+noun collocations?

Whom in British and American Fill in the tables using COCA and BNC Freq. whom Freq. Prep. + whom Freq. Whom – (prep + whom) British American

Whom in British and American COCA has 450M and BNC 100M Calculate per million frequencies Freq. whom Freq. Prep. + whom Freq. Whom – (prep + whom) British American

Simple past vs. present perfect Use the BNC and COCA to fill in this chart with frequencies per million Past participle ([vh*][v?n*] = auxiliary have verb + PP) Simple past ([v?d*] = past tense verb) US UK Simple Pres. Perf. just already yet ever

Simple past vs. present perfect Use the BNC and COCA to fill in this chart with frequencies per million Past participle ([vh*][v?n*] = auxiliary have verb + PP) Simple past ([v?d*] = past tense verb) US UK Simple Pres. Perf. just already yet ever

Browse corpora at BYU Corpus.byu.edu

Corpus Applications

What is corpus linguistics good for? Making a concordance List of all words in a text and where they are found Scriptures Works of Shakespeare

What is corpus linguistics good for? Finding word frequencies Psycholinguistic experiments Language instruction Put most common words in L2 vocabulary toxicomano

What is corpus linguistics good for? Lexicography What words to include in a dictionary? What do words mean? How are meanings changing? How are spellings changing? Blowtorch Blow-torch Blow torch Identifying regionalisms

What is corpus linguistics good for? Computer systems development Text to speech Text messaging If you have typed gla- frequency data says glass is highly probable and fills it in for you Speech synthesis Natural language processing

What is corpus linguistics good for? Testing linguistic theories Generativists relied on personal introspection So what if Dayton is less frequent than New York in a corpus I’m a native speaker and know what sounds right and wrong

What is corpus linguistics good for? Problems with introspections They’re subjective They can’t be verified Your introspection probably go along with your theory

What is corpus linguistics good for? Corpus data . . . Are objective Can be verified Can be shared Can be used to test theories Can be used to get ideas for theories

Limitations of corpora They can’t contain every sentence Some data aren’t interesting Frequency of Dayton versus New York They have mistakes

Lexical Word lists General Service List Academic Word List (Coxhead) 2,000 most frequent words in English Academic Word List (Coxhead) 570 words in English academic writing Academic Vocabulary List (Davies & Gardner) 3,000 words High frequency in ACAD, low frequency in other registers Measure of dispersion (Juilland’s D)

Lexical Word lists General Service List Academic Word List (Coxhead) 2,000 most frequent words in English Academic Word List (Coxhead) 570 words in English academic writing Academic Vocabulary List (Davies & Gardner) 3,000 words High frequency in ACAD, low frequency in other registers Measure of dispersion (Juilland’s D)

Phraseology Formulaic sequences (lexical bundles) Corpus-driven Frequency Function Fixedness at the * of What do you think most often fills the *? Check in COCA

Grammar Descriptive reference grammars Describe descriptions of how language is actually used rather than prescriptions about how language should be used Longman Grammar of Spoken and Written English

Lexicogrammar Certain words are more likely to occur in some grammatical structures than others E.g., some verbs (e.g., deem, base, subject) are much more common in the passive than active voice The material was deemed faulty. Her choice was based solely on… The matter may be subjected to… Collostructional analysis is a means of measuring the strength of a relationship between a word and a grammatical structure

Register variation Does ‘general English’ exist?

Frequent phrases in conversation Phraseological feature Examples Personal pronoun + lexical verb phrase I don’t know what, I don’t want to, I was going to Yes-no question fragments do you want to, are you going to Wh-question fragments what are you doing, what do you mean, what do you think, what do you want

Frequent phrases in academic writing Phraseological feature Examples Noun phrase with of-phrase fragment the end of the, one of the most Prepositional phrases with embedded of-phrase fragments in the case of   Other prepositional phrase fragments on the other hand

Register variation—complexity Which is more complex—speech or writing? Define the type(s) of complexity we find in each.

Multi-Dimensional analysis Identify a comprehensive set of relevant linguistic features Identify and quantify those features in a corpus of texts Use factor analysis to identify dimensions based on co- occurrence among linguistic features Interpret dimensions functionally Calculate scores for each text on each dimension Compare mean scores of registers/varieties

Involved vs. Informational

Non-technical Synthesis vs. Specialized Information Density Positive features: Verbs: verb HAVE (.36) Adverbs: general adverbs (.59), amplifiers (.43), certainty adverbs (.37), emphatics (.36) Coordination: adverbial conjuncts (.51), phrasal coordinating conjunctions (.39) Nominal Modifiers: that-relative clauses (.36) Lexical Features: COCA Core Vocabulary (1-500) (.61) Negative features: Nouns: pre-nominal modifiers (-.73); nouns (-.73), technical concrete nouns (-.31) Verbs: agentless passive voice (-0.42) 27

Study 1—Dimension 1 Results 28

Register variation Does ‘general English’ exist?

Dialect variation activity In what country is this expression permitted? Allow to Verb Permit to Verb Where is the word banjaxed used? Meaning? UK vs. US use of Different from/to Which do they use in Australia? Needn’t vs. don't need Haven't a Noun vs. don't have a Noun

Diachronic change whom [be] [v?n*] [get] [v?n*] end up [v?g*] need n’t Others?

Data-driven Learning Language learners actually use corpora in the classroom Research is mixed It seems to be more useful/effective for advanced learners

Corpus-informed materials

Political discourse Www.speechwars.com