Presentation is loading. Please wait.

Presentation is loading. Please wait.

Corpus search What are the most common words in English

Similar presentations


Presentation on theme: "Corpus search What are the most common words in English"— Presentation transcript:

1 Corpus search What are the most common words in English
What are the most common verbs What is the most common pronoun What is the most common proper noun What are the most common noun+noun collocations?

2 Whom in British and American
Fill in the tables using COCA and BNC Freq. whom Freq. Prep. + whom Freq. Whom – (prep + whom) British American

3 Whom in British and American
COCA has 450M and BNC 100M Calculate per million frequencies Freq. whom Freq. Prep. + whom Freq. Whom – (prep + whom) British American

4 Simple past vs. present perfect
Use the BNC and COCA to fill in this chart with frequencies per million Past participle ([vh*][v?n*] = auxiliary have verb + PP) Simple past ([v?d*] = past tense verb) US UK Simple Pres. Perf. just already yet ever

5 Simple past vs. present perfect
Use the BNC and COCA to fill in this chart with frequencies per million Past participle ([vh*][v?n*] = auxiliary have verb + PP) Simple past ([v?d*] = past tense verb) US UK Simple Pres. Perf. just already yet ever

6 Browse corpora at BYU Corpus.byu.edu

7 Corpus Applications

8 What is corpus linguistics good for?
Making a concordance List of all words in a text and where they are found Scriptures Works of Shakespeare

9 What is corpus linguistics good for?
Finding word frequencies Psycholinguistic experiments Language instruction Put most common words in L2 vocabulary toxicomano

10 What is corpus linguistics good for?
Lexicography What words to include in a dictionary? What do words mean? How are meanings changing? How are spellings changing? Blowtorch Blow-torch Blow torch Identifying regionalisms

11 What is corpus linguistics good for?
Computer systems development Text to speech Text messaging If you have typed gla- frequency data says glass is highly probable and fills it in for you Speech synthesis Natural language processing

12 What is corpus linguistics good for?
Testing linguistic theories Generativists relied on personal introspection So what if Dayton is less frequent than New York in a corpus I’m a native speaker and know what sounds right and wrong

13 What is corpus linguistics good for?
Problems with introspections They’re subjective They can’t be verified Your introspection probably go along with your theory

14 What is corpus linguistics good for?
Corpus data . . . Are objective Can be verified Can be shared Can be used to test theories Can be used to get ideas for theories

15 Limitations of corpora
They can’t contain every sentence Some data aren’t interesting Frequency of Dayton versus New York They have mistakes

16 Lexical Word lists General Service List Academic Word List (Coxhead)
2,000 most frequent words in English Academic Word List (Coxhead) 570 words in English academic writing Academic Vocabulary List (Davies & Gardner) 3,000 words High frequency in ACAD, low frequency in other registers Measure of dispersion (Juilland’s D)

17 Lexical Word lists General Service List Academic Word List (Coxhead)
2,000 most frequent words in English Academic Word List (Coxhead) 570 words in English academic writing Academic Vocabulary List (Davies & Gardner) 3,000 words High frequency in ACAD, low frequency in other registers Measure of dispersion (Juilland’s D)

18 Phraseology Formulaic sequences (lexical bundles) Corpus-driven
Frequency Function Fixedness at the * of What do you think most often fills the *? Check in COCA

19 Grammar Descriptive reference grammars
Describe descriptions of how language is actually used rather than prescriptions about how language should be used Longman Grammar of Spoken and Written English

20 Lexicogrammar Certain words are more likely to occur in some grammatical structures than others E.g., some verbs (e.g., deem, base, subject) are much more common in the passive than active voice The material was deemed faulty. Her choice was based solely on… The matter may be subjected to… Collostructional analysis is a means of measuring the strength of a relationship between a word and a grammatical structure

21 Register variation Does ‘general English’ exist?

22 Frequent phrases in conversation
Phraseological feature Examples Personal pronoun + lexical verb phrase I don’t know what, I don’t want to, I was going to Yes-no question fragments do you want to, are you going to Wh-question fragments what are you doing, what do you mean, what do you think, what do you want

23 Frequent phrases in academic writing
Phraseological feature Examples Noun phrase with of-phrase fragment the end of the, one of the most Prepositional phrases with embedded of-phrase fragments in the case of Other prepositional phrase fragments on the other hand

24 Register variation—complexity
Which is more complex—speech or writing? Define the type(s) of complexity we find in each.

25 Multi-Dimensional analysis
Identify a comprehensive set of relevant linguistic features Identify and quantify those features in a corpus of texts Use factor analysis to identify dimensions based on co- occurrence among linguistic features Interpret dimensions functionally Calculate scores for each text on each dimension Compare mean scores of registers/varieties

26 Involved vs. Informational

27 Non-technical Synthesis vs. Specialized Information Density
Positive features: Verbs: verb HAVE (.36) Adverbs: general adverbs (.59), amplifiers (.43), certainty adverbs (.37), emphatics (.36) Coordination: adverbial conjuncts (.51), phrasal coordinating conjunctions (.39) Nominal Modifiers: that-relative clauses (.36) Lexical Features: COCA Core Vocabulary (1-500) (.61) Negative features: Nouns: pre-nominal modifiers (-.73); nouns (-.73), technical concrete nouns (-.31) Verbs: agentless passive voice (-0.42) 27

28 Study 1—Dimension 1 Results
28

29 Register variation Does ‘general English’ exist?

30 Dialect variation activity
In what country is this expression permitted? Allow to Verb Permit to Verb Where is the word banjaxed used? Meaning? UK vs. US use of Different from/to Which do they use in Australia? Needn’t vs. don't need Haven't a Noun vs. don't have a Noun

31 Diachronic change whom [be] [v?n*] [get] [v?n*] end up [v?g*] need n’t
Others?

32 Data-driven Learning Language learners actually use corpora in the classroom Research is mixed It seems to be more useful/effective for advanced learners

33 Corpus-informed materials

34 Political discourse


Download ppt "Corpus search What are the most common words in English"

Similar presentations


Ads by Google